Skip to content

Browser Automation

GenAIScript provides a simplified API to interact with a headless browser using Playwright . This allows you to interact with web pages, scrape data, and automate tasks.

const page = await host.browse(
"https://github.com/microsoft/genaiscript/blob/main/packages/sample/src/penguins.csv"
)
const table = page.locator('table[data-testid="csv-table"]')
const csv = parsers.HTMLToMarkdown(await table.innerHTML())
def("DATA", csv)
$`Analyze DATA.`

Installation

Playwright needs to install the browsers and dependencies before execution. GenAIScript will automatically try to install them if it fails to load the browser; but you can also do it manually using the following command:

Terminal window
npx playwright install --with-deps chromium

If you see this error message, you might have to install the dependencies manually.

╔═════════════════════════════════════════════════════════════════════════╗
║ Looks like Playwright Test or Playwright was just installed or updated. ║
║ Please run the following command to download new browsers: ║
║ ║
║ yarn playwright install ║
║ ║
║ <3 Playwright Team ║
╚═════════════════════════════════════════════════════════════════════════╝

host.browse

This function launches a new browser instance and optionally navigates to the page. The page are automatically closed when the script ends.

const page = await host.browse(url)

You can configure a number of options for the browser instance:

const page = await host.browse(url, { incognito: true })

Locators

You can select elements on the page using the page.get... or page.locator method.

// select by Aria roles
const button = page.getByRole("button")
// select by test-id
const table = page.getByTestId("csv-table")

Element contents

You can access innerHTML, innerText, value and textContent of an element.

const table = page.getByTestId("csv-table")
const html = table.innerHTML() // without the outer <table> tags!
const text = table.innerText()
const value = page.getByRole("input").value()

You can use the parsers in HTML to convert the HTML to Markdown.

const md = HTML.convertToMarkdown(html)
const text = HTML.convertToText(html)
const tables = HTML.convertTablesToJSON(html)

Screenshot

You can take a screenshot of the current page or a locator and use it with vision-enabled LLM (like gpt-4o) using defImages.

const screenshot = await page.screenshot() // returns a node.js Buffer
defImages(screenshot)

Interacting with Elements

(Advanced) Native Playwright APIs

The page instance returned is a native Playwright Page object. You can import playwright and case the instance back to the native playwright object.

import { Page } from "playwright"
const page = await host.browse(url) as Page