Agent-Browser: Rethinking Web Automation for AI

I recently came across Vercel’s agent-browser project, and it’s solving a problem I didn’t fully realize existed until now. If you’ve ever tried to build AI-powered browser automation, you know the pain of dealing with CSS selectors that break every time someone changes the UI.

The Old Way Doesn’t Work Great

Here’s what traditional browser automation looks like:

playwright click "div.container > button.submit-btn:nth-child(3)"

This works, but it’s fragile. Change the layout? Your automation breaks. Add a new button? Breaks again. AI models also struggle to generate these selectors reliably because they’re tied to the DOM structure, not what things actually do.

A Different Approach

Agent-browser flips this around. Instead of CSS selectors, it uses accessibility trees to create simple references:

agent-browser open example.com
agent-browser snapshot
# Returns: @e1 (Sign In button), @e2 (Email input), @e3 (Password input)

agent-browser click @e1
agent-browser fill @e2 "user@example.com"

That’s it. You get @e1, @e2, @e3 references that map to actual UI elements based on their purpose, not their position in the HTML.

Why This Actually Matters

The accessibility tree already exists in every browser—it’s how screen readers work. Agent-browser just leverages that to give you stable, semantic references. So when you say “click @e1,” it knows that’s the Sign In button, regardless of whether it’s the 3rd or 5th element in the DOM.

This means AI agents can interact with websites more like humans do: by understanding what things are, not where they sit in the code.

Built for Speed and Compatibility

The tool is written in Rust, so it’s fast. But it also falls back to Node.js if the Rust binary isn’t available, which means it works pretty much everywhere without fuss.

npm install -g agent-browser
agent-browser install

That’s the setup. Simple.

Practical Features

Multiple Sessions

You can run multiple browser sessions at once, each completely isolated:

agent-browser --session user1 open app.com
agent-browser --session user2 open app.com

Each session has its own cookies, storage, and state. Great for testing different user scenarios simultaneously.

Persistent Profiles

Want to stay logged in between runs? Use profiles:

agent-browser --profile gmail open gmail.com

Finding Elements Semantically

Beyond the @e references, you can also find elements by what they actually say or do:

# Find by button name
agent-browser find role button --name "Submit" click

# Find by label
agent-browser find label "Email Address" fill "test@example.com"

# Find by visible text
agent-browser find text "Sign Up" click

This is way more intuitive than trying to construct complex selectors.

How It Works

When you run agent-browser snapshot, it:

Grabs the accessibility tree from the browser
Filters out non-interactive elements (if you use --interactive-only)
Assigns each element a simple @e reference
Returns structured data that’s easy for AI models to parse

The output looks like this:

Interactive Elements:
@e1: button "Sign In"
@e2: textbox "Email Address" (required)
@e3: textbox "Password" (password, required)
@e4: link "Forgot password?"

Clean, semantic, and stable.

Cloud Browser Support

If you’re running this in serverless environments or CI/CD, you can connect to cloud browser providers:

# Browserbase
agent-browser --provider browserbase open example.com

# Or set via environment variable
export BROWSERBASE_API_KEY=xxx
agent-browser open example.com

No need to install Chrome or manage browser binaries in your containers.

Real Use Cases

Here’s where this actually shines:

Testing: Let an AI agent explore your app and run test scenarios based on natural language instructions.

Data Extraction: Pull structured data from websites without brittle scraping scripts.

Form Automation: Fill out forms by describing fields, not hunting for IDs.

Monitoring: Check if certain elements exist or have changed, using semantic queries instead of fragile selectors.

When to Use This vs Traditional Tools

Agent-browser is great when you want AI-friendly automation that’s resilient to UI changes. But if you need pixel-perfect visual testing or deep programmatic control, Playwright or Puppeteer might still be better choices.

The sweet spot is when you’re building AI agents that need to interact with websites autonomously without constant maintenance.

Getting Started

Here’s a basic flow:

# Install
npm install -g agent-browser
agent-browser install

# Navigate
agent-browser open "https://github.com/login"

# Get elements
agent-browser snapshot --interactive-only

# Interact
agent-browser fill @e1 "myusername"
agent-browser fill @e2 "mypassword"
agent-browser click @e3

# Capture result
agent-browser screenshot result.png

# Clean up
agent-browser close

That’s a complete automation script in a few commands.

Using with Claude Code

Agent-browser becomes even more powerful when combined with Claude Code. Instead of manually writing commands, you can describe what you want to do and let Claude figure out the automation.

In Terminal

# In your project directory
claude

# Then give natural language instructions:
"Use agent-browser to open github.com and take a snapshot"
"Navigate to example.com and fill out the contact form"
"Open twitter.com, find the login button, and take a screenshot"

Claude Code will execute the agent-browser commands for you and show you the results.

In VS Code

Open VS Code integrated terminal (Ctrl+ or Cmd+)
Run claude
Describe your automation task

Example prompts that work well:

“Use agent-browser to test the login flow on my local app”
“Open agent-browser session, navigate to the pricing page, and capture all button elements”
“Automate filling out the signup form with test data”
“Check if the ‘Submit’ button exists on contact page”

Why This Combination Works

Claude Code understands the context of your instructions and:

Chooses the right agent-browser commands
Handles session management automatically
Interprets the snapshot output
Can make decisions based on what elements are found
Chains multiple commands into workflows

For example, you can say:

"Open github.com, search for 'agent-browser',
find the first repository link, and click it"

Claude Code will break this down into the appropriate agent-browser commands:

agent-browser open "https://github.com"
agent-browser snapshot --interactive-only
agent-browser fill @e[search-input] "agent-browser"
agent-browser click @e[search-button]
agent-browser snapshot
agent-browser click @e[first-repo-link]

This makes browser automation feel conversational rather than procedural.

Final Thoughts

Agent-browser isn’t trying to replace existing automation tools. It’s solving a specific problem: making browser automation work better with AI by using semantic, stable references instead of fragile selectors.

If you’re building AI agents that need to interact with websites, this approach feels like a natural fit. The fact that it’s fast, works everywhere, and integrates with cloud browsers is a nice bonus.

Worth exploring if you’re in this space.

Check it out: github.com/vercel-labs/agent-browser