Browser AutomationAgent TechnologyAI Agents

Comparing Browser Automation Frameworks: What Agents Use Under the Hood

Agent Checker20 March 20265 min read

Every browser-based AI agent needs a way to control a web browser. Understanding how browser agents work shows that the intelligence comes from the language model, but the actual clicking, typing, and page reading happens through a browser automation framework. The choice of framework affects speed, reliability, and what the agent can do.

Here is how the main options compare.

Playwright

Playwright, developed by Microsoft, has become the default choice for new agent frameworks. Browser Use, Stagehand, and most other recent open-source agent projects run on Playwright.

The reasons are practical. Playwright supports Chromium, Firefox, and WebKit from a single API. It handles modern web features well: shadow DOM, iframes, file uploads, downloads, and service workers all work without workarounds. Its auto-waiting mechanism pauses execution until elements are ready for interaction, which reduces flaky behaviour in agent loops.

Playwright's network interception is particularly useful for agents. You can monitor API calls the page makes, block tracking requests to speed up loading, and even mock responses for testing. The page.route() API makes this straightforward.

The framework also supports browser contexts, which are isolated browser sessions that share a single browser instance. An agent can run multiple parallel tasks in separate contexts without them interfering with each other, and without the memory cost of launching separate browser processes.

Best for: New agent projects, cross-browser needs, modern web applications.

Limitations: Larger dependency footprint than Puppeteer. The API surface is wide, which means more to learn.

Puppeteer

Puppeteer was the first major headless browser automation library from a browser vendor (Google). It controls Chrome and Chromium through the Chrome DevTools Protocol (CDP).

Many existing agent frameworks and tools still use Puppeteer because they were built before Playwright gained dominance. It remains a solid choice for Chromium-only use cases. The API is well-documented, the community is large, and most web pages work fine with Chromium alone.

Puppeteer's main technical advantage is its direct CDP access. The Chrome DevTools Protocol gives low-level control over the browser, including performance profiling, memory analysis, and protocol-level network control. Some advanced agent use cases, like monitoring page performance or analysing resource loading, are easier with direct CDP access.

Best for: Chromium-only projects, teams with existing Puppeteer expertise, use cases that need low-level CDP access.

Limitations: Chrome/Chromium only. Some modern features (like shadow DOM piercing) require more manual work than Playwright.

Selenium

Selenium is the oldest of the major browser automation tools, with origins going back to 2004. It uses the WebDriver protocol, a W3C standard, to control browsers.

Very few new agent frameworks choose Selenium. It is slower than Playwright or Puppeteer, its API is more verbose, and it lacks built-in support for many features that agents need, like network interception and automatic waiting. You can add these through additional libraries, but that increases complexity.

Selenium's advantage is compatibility. It supports every major browser through standardised WebDriver implementations, and it works with languages beyond JavaScript: Python, Java, C#, Ruby, and more. If an agent framework is written in Python and needs broad browser support, Selenium with its Python bindings is a proven option.

Best for: Python-centric projects, regulatory environments that require W3C-standard protocols, legacy system integration.

Limitations: Slower execution, more boilerplate, no built-in network interception.

CDP Direct

Some agent frameworks skip the high-level libraries entirely and talk directly to the Chrome DevTools Protocol. This gives maximum control and minimum overhead.

The trade-off is that you are working at a much lower level. Instead of page.click('#submit'), you are sending protocol commands to find the element, calculate its centre coordinates, dispatch mouse events, and handle the response. This is more code, more complexity, and more things that can go wrong.

Direct CDP is used mainly by frameworks that need very specific browser control, like precise timing of actions, custom rendering pipelines, or access to experimental browser features.

Best for: Specialised use cases, performance-critical applications, experimental features.

Limitations: Chromium only, much more code required, no cross-browser support.

What Agent Frameworks Actually Choose

Looking at the current landscape:

Browser Use runs on Playwright. It uses Playwright's DOM access to extract page structure and Playwright's action API for interactions.
Stagehand (from Browserbase) also runs on Playwright. It adds an abstraction layer that maps natural language actions to Playwright commands.
LaVague uses Selenium with its Python bindings, reflecting its Python-first architecture.
Skyvern uses Playwright, with custom vision processing on top.
Many proprietary agent services use Puppeteer or direct CDP, often because they were built earlier when Playwright was less mature.

Performance Comparison

For agent use cases, the performance differences between frameworks are less important than you might think. The bottleneck is almost always the language model inference, not the browser automation. A Playwright action takes milliseconds. A model API call takes seconds.

That said, page loading speed matters because agents load many pages per task. Playwright and Puppeteer are roughly equivalent here. Selenium is noticeably slower due to the WebDriver protocol overhead.

Browser startup time also matters for agents that create fresh browser instances per task. Playwright's browser context feature avoids this issue entirely, spinning up isolated sessions within an already-running browser in milliseconds.

What This Means for Your Website

The framework an agent uses should not, in theory, affect how it interacts with your site. A button click is a button click regardless of whether Playwright or Puppeteer drives it.

In practice, there are subtle differences. Playwright and Puppeteer dispatch synthetic events that match real user interactions closely. Selenium's WebDriver commands are slightly more abstracted. Some JavaScript-heavy sites handle these synthetic events differently, particularly custom event listeners that check for specific event properties.

If your site works correctly with one automation framework, it will almost certainly work with the others. The exceptions are sites that actively detect and block automation, using checks like navigator.webdriver detection, CDP fingerprinting, or behavioural analysis.

The most reliable approach is to make your site work well with standard browser APIs and W3C accessibility standards. Every automation framework, current and future, builds on these foundations. Agent Checker can test how your site performs across these automation layers.