Hi HN! I’m Anirudh — longtime lurker, first time poster, and I couldn’t be more excited to show you Stagehand.
Stagehand is a TypeScript project that extends Playwright with three simple AI methods — act, extract, and observe. We’d love for you to try it out using the command below:
npx create-browser-app --example quickstart
Here’s a sample workflow:
const stagehand = new Stagehand();
await stagehand.init();
// Stagehand overrides the Playwright Page and Context classes
const { page, context } = stagehand
await page.goto("instadash.com") // Regular Playwright
// Take action on the page
await page.act({ action: "click on taqueria cazadores" })
// Extract relevant data from the page
const { price } = await page.extract({
instruction: "extract the price of the super burrito",
schema: z.object({
price: z.number()
})
})
We built Stagehand because we loved building browser automations using Playwright and Selenium, but we grew frustrated at how cumbersome it is to just get started and write simple browser automations. These frameworks, while incredibly powerful, are built for QA testing and are thus notoriously prone to fail if there are minor changes in the UI or underlying DOM structure.
The goal of Stagehand is twofold:
1. Make browser automations easier to write
2. Make browser automations more resilient to DOM changes.
We were super energized by what we’ve been seeing with vision-based computer use agents. We think with a browser, you can provide even richer data by leveraging the information in the DOM + a11y tree in addition to what’s rendered on the page. However, we didn’t want to go so far as to build an agent, since we wanted fine-grained control over each step that an agent can take.
Therefore, the happy medium we built was to extend the existing powerful functionalities of Playwright with simple and extensible AI APIs that return the decision-making power back to the developer at each step.
Check out our docs: https://docs.stagehand.dev
We’d love for you to join and give us feedback on Slack as well: https://stagehand.dev/slack
What I would love to see either as something leveraging this, or built in to this, is if you prompt stagehand to extract data from a page, it also returns the xpath elements you'd use to re-scrape the page without having to use an LLM to do that second scraping.
So basically, you can scrape new pages never before seen with the non-deterministic LLM tool, and then when you need to rescrape the page again to update content for example, you can use the cheaper old-school scraping method.
Not sure how brittle this would be both going from LLM version to xcode version reliably, or how to fallback to the LLM version if your xcode script fails, but overall conceptually, being able to scrape using the smart tools but then building up basically a library of dumb scraping scripts over time would be killer.