Skip to main content

browser_utils.requests_markdown_browser

RequestsMarkdownBrowser

class RequestsMarkdownBrowser(AbstractMarkdownBrowser)

(In preview) An extremely simple Python requests-powered Markdown web browser. This browser cannot run JavaScript, compute CSS, etc. It simply fetches the HTML document, and converts it to Markdown. See AbstractMarkdownBrowser for more details.

__init__

def __init__(start_page: Optional[str] = None,
viewport_size: Optional[int] = 1024 * 8,
downloads_folder: Optional[Union[str, None]] = None,
search_engine: Optional[Union[AbstractMarkdownSearch,
None]] = None,
markdown_converter: Optional[Union[MarkdownConverter,
None]] = None,
requests_session: Optional[Union[requests.Session, None]] = None,
requests_get_kwargs: Optional[Union[Dict[str, Any],
None]] = None)

Instantiate a new RequestsMarkdownBrowser.

Arguments:

  • start_page - The page on which the browser starts (default: "about:blank")
  • viewport_size - Approximately how many characters fit in the viewport. Viewport dimensions are adjusted dynamically to avoid cutting off words (default: 8192).
  • downloads_folder - Path to where downloads are saved. If None, downloads are disabled. (default: None)
  • search_engine - An instance of MarkdownSearch, which handles web searches performed by this browser (default: a new BingMarkdownSearch() with default parameters)
  • markdown_converted - An instance of a MarkdownConverter used to convert HTML pages and downloads to Markdown (default: a new MarkdownConerter() with default parameters)
  • request_session - The session from which to issue requests (default: a new requests.Session() instance with default parameters)
  • request_get_kwargs - Extra parameters passed to evert .get() call made to requests.

address

@property
def address() -> str

Return the address of the current page.

set_address

def set_address(uri_or_path: str) -> None

Sets the address of the current page. This will result in the page being fetched via the underlying requests session.

Arguments:

  • uri_or_path - The fully-qualified URI to fetch, or the path to fetch from the current location. If the URI protocol is search:, the remainder of the URI is interpreted as a search query, and a web search is performed. If the URI protocol is file://, the remainder of the URI is interpreted as a local absolute file path.

viewport

@property
def viewport() -> str

Return the content of the current viewport.

page_content

@property
def page_content() -> str

Return the full contents of the current page.

page_down

def page_down() -> None

Move the viewport down one page, if possible.

page_up

def page_up() -> None

Move the viewport up one page, if possible.

find_on_page

def find_on_page(query: str) -> Union[str, None]

Searches for the query from the current viewport forward, looping back to the start if necessary.

find_next

def find_next() -> None

Scroll to the next viewport that matches the query

visit_page

def visit_page(path_or_uri: str) -> str

Update the address, visit the page, and return the content of the viewport.

open_local_file

def open_local_file(local_path: str) -> str

Convert a local file path to a file:/// URI, update the address, visit the page, and return the contents of the viewport.