Agent-Browser: Rethinking Web Automation for AI
Vercel's agent-browser introduces a fresh approach to browser automation by using accessibility trees instead of fragile CSS selectors, making it way easier for AI agents to interact with websites.
I recently came across Vercel’s agent-browser project, and it’s solving a problem I didn’t fully realize existed until now. If you’ve ever tried to build AI-powered browser automation, you know the pain of dealing with CSS selectors that break every time someone changes the UI.
The Old Way Doesn’t Work Great
Here’s what traditional browser automation looks like:
playwright click "div.container > button.submit-btn:nth-child(3)"
This works, but it’s fragile. Change the layout? Your automation breaks. Add a new button? Breaks again. AI models also struggle to generate these selectors reliably because they’re tied to the DOM structure, not what things actually do.
A Different Approach
Agent-browser flips this around. Instead of CSS selectors, it uses accessibility trees to create simple references:
agent-browser open example.com
agent-browser snapshot
# Returns: @e1 (Sign In button), @e2 (Email input), @e3 (Password input)
agent-browser click @e1
agent-browser fill @e2 "user@example.com"
That’s it. You get @e1, @e2, @e3 references that map to actual UI elements based on their purpose, not their position in the HTML.
Why This Actually Matters
The accessibility tree already exists in every browser—it’s how screen readers work. Agent-browser just leverages that to give you stable, semantic references. So when you say “click @e1,” it knows that’s the Sign In button, regardless of whether it’s the 3rd or 5th element in the DOM.
This means AI agents can interact with websites more like humans do: by understanding what things are, not where they sit in the code.
Built for Speed and Compatibility
The tool is written in Rust, so it’s fast. But it also falls back to Node.js if the Rust binary isn’t available, which means it works pretty much everywhere without fuss.
npm install -g agent-browser
agent-browser install
That’s the setup. Simple.
Practical Features
Multiple Sessions
You can run multiple browser sessions at once, each completely isolated:
agent-browser --session user1 open app.com
agent-browser --session user2 open app.com
Each session has its own cookies, storage, and state. Great for testing different user scenarios simultaneously.
Persistent Profiles
Want to stay logged in between runs? Use profiles:
agent-browser --profile gmail open gmail.com
Login once, and the profile remembers your session for next time.
Finding Elements Semantically
Beyond the @e references, you can also find elements by what they actually say or do:
# Find by button name
agent-browser find role button --name "Submit" click
# Find by label
agent-browser find label "Email Address" fill "test@example.com"
# Find by visible text
agent-browser find text "Sign Up" click
This is way more intuitive than trying to construct complex selectors.
How It Works
When you run agent-browser snapshot, it:
- Grabs the accessibility tree from the browser
- Filters out non-interactive elements (if you use
--interactive-only) - Assigns each element a simple @e reference
- Returns structured data that’s easy for AI models to parse
The output looks like this:
Interactive Elements:
@e1: button "Sign In"
@e2: textbox "Email Address" (required)
@e3: textbox "Password" (password, required)
@e4: link "Forgot password?"
Clean, semantic, and stable.
Cloud Browser Support
If you’re running this in serverless environments or CI/CD, you can connect to cloud browser providers:
# Browserbase
agent-browser --provider browserbase open example.com
# Or set via environment variable
export BROWSERBASE_API_KEY=xxx
agent-browser open example.com
No need to install Chrome or manage browser binaries in your containers.
Real Use Cases
Here’s where this actually shines:
Testing: Let an AI agent explore your app and run test scenarios based on natural language instructions.
Data Extraction: Pull structured data from websites without brittle scraping scripts.
Form Automation: Fill out forms by describing fields, not hunting for IDs.
Monitoring: Check if certain elements exist or have changed, using semantic queries instead of fragile selectors.
When to Use This vs Traditional Tools
Agent-browser is great when you want AI-friendly automation that’s resilient to UI changes. But if you need pixel-perfect visual testing or deep programmatic control, Playwright or Puppeteer might still be better choices.
The sweet spot is when you’re building AI agents that need to interact with websites autonomously without constant maintenance.
Getting Started
Here’s a basic flow:
# Install
npm install -g agent-browser
agent-browser install
# Navigate
agent-browser open "https://github.com/login"
# Get elements
agent-browser snapshot --interactive-only
# Interact
agent-browser fill @e1 "myusername"
agent-browser fill @e2 "mypassword"
agent-browser click @e3
# Capture result
agent-browser screenshot result.png
# Clean up
agent-browser close
That’s a complete automation script in a few commands.
Using with Claude Code
Agent-browser becomes even more powerful when combined with Claude Code. Instead of manually writing commands, you can describe what you want to do and let Claude figure out the automation.
In Terminal
# In your project directory
claude
# Then give natural language instructions:
"Use agent-browser to open github.com and take a snapshot"
"Navigate to example.com and fill out the contact form"
"Open twitter.com, find the login button, and take a screenshot"
Claude Code will execute the agent-browser commands for you and show you the results.
In VS Code
- Open VS Code integrated terminal (Ctrl+
or Cmd+) - Run
claude - Describe your automation task
Example prompts that work well:
- “Use agent-browser to test the login flow on my local app”
- “Open agent-browser session, navigate to the pricing page, and capture all button elements”
- “Automate filling out the signup form with test data”
- “Check if the ‘Submit’ button exists on contact page”
Why This Combination Works
Claude Code understands the context of your instructions and:
- Chooses the right agent-browser commands
- Handles session management automatically
- Interprets the snapshot output
- Can make decisions based on what elements are found
- Chains multiple commands into workflows
For example, you can say:
"Open github.com, search for 'agent-browser',
find the first repository link, and click it"
Claude Code will break this down into the appropriate agent-browser commands:
agent-browser open "https://github.com"
agent-browser snapshot --interactive-only
agent-browser fill @e[search-input] "agent-browser"
agent-browser click @e[search-button]
agent-browser snapshot
agent-browser click @e[first-repo-link]
This makes browser automation feel conversational rather than procedural.
Final Thoughts
Agent-browser isn’t trying to replace existing automation tools. It’s solving a specific problem: making browser automation work better with AI by using semantic, stable references instead of fragile selectors.
If you’re building AI agents that need to interact with websites, this approach feels like a natural fit. The fact that it’s fast, works everywhere, and integrates with cloud browsers is a nice bonus.
Worth exploring if you’re in this space.
Check it out: github.com/vercel-labs/agent-browser
Elyor Djalalov
Katta dastur muhandisi. Veb-dasturlash, dasturiy ta'minot arxitekturasi va muhandislik haqida yozaman.
Barcha maqolalar →