Skip to content

Browser Automation

GenAIScript provides a simplified API to interact with a headless browser using Playwright . This allows you to interact with web pages, scrape data, and automate tasks.

const page = await host.browse(
"https://github.com/microsoft/genaiscript/blob/main/packages/sample/src/penguins.csv"
)
const table = page.locator('table[data-testid="csv-table"]')
const csv = parsers.HTMLToMarkdown(await table.innerHTML())
def("DATA", csv)
$`Analyze DATA.`

Installation

Playwright needs to install the browsers and dependencies before execution. GenAIScript will automatically try to install them if it fails to load the browser. However, you can also do it manually using the following command:

Terminal window
npx playwright install --with-deps chromium

If you see this error message, you might have to install the dependencies manually.

╔═════════════════════════════════════════════════════════════════════════╗
║ Looks like Playwright Test or Playwright was just installed or updated. ║
║ Please run the following command to download new browsers: ║
║ ║
║ yarn playwright install ║
║ ║
║ <3 Playwright Team ║
╚═════════════════════════════════════════════════════════════════════════╝

host.browse

This function launches a new browser instance and optionally navigates to a page. The pages are automatically closed when the script ends.

const page = await host.browse(url)

`incognito“

Setting incognito: true will create a isolated non-persistent browser context. Non-persistent browser contexts don’t write any browsing data to disk.

const page = await host.browse(url, { incognito: true })

recordVideo

Playwright can record a video of each page in the browser session. You can enable it by passing the recordVideo option. Recording video also implies incognito mode as it requires creating a new browsing context.

const page = await host.browse(url, { recordVideo: true })

By default, the video size will be 800x600 but you can change it by passing the sizes as the recordVideo option.

const page = await host.browse(url, {
recordVideo: { width: 500, height: 500 },
})

The video will be saved in a temporary directory under .genaiscript/videos/<timestamp>/ once the page is closed. You need to close the page before accessing the video file.

await page.close()
const videoPath = await page.video().path()

The video file can be further processed using video tools.

connectOverCDP

You can provide an enpoint that uses the Chrome DevTools Protocol using the connectOverCDP.

const page = await host.browse(url, { connectOverCDP: "endpointurl" })

Locators

You can select elements on the page using the page.get... or page.locator method.

// select by Aria roles
const button = page.getByRole("button")
// select by test-id
const table = page.getByTestId("csv-table")

Element contents

You can access innerHTML, innerText, value and textContent of an element.

const table = page.getByTestId("csv-table")
const html = table.innerHTML() // without the outer <table> tags!
const text = table.innerText()
const value = page.getByRole("input").value()

You can use the parsers in HTML to convert the HTML to Markdown.

const md = await HTML.convertToMarkdown(html)
const text = await HTML.convertToText(html)
const tables = await HTML.convertTablesToJSON(html)

Screenshot

You can take a screenshot of the current page or a locator and use it with vision-enabled LLM (like gpt-4o) using defImages.

const screenshot = await page.screenshot() // returns a node.js Buffer
defImages(screenshot)

(Advanced) Native Playwright APIs

The page instance returned is a native Playwright Page object. You can import playwright and cast the instance back to the native Playwright object.

import { Page } from "playwright"
const page = await host.browse(url) as Page