Back to blog

Browser-use: Step-by-Step AI Browser Automation Guide

Browser-use is a Python library that lets an AI agent control a real browser – navigating dynamic pages, submitting forms, and extracting structured data without brittle selectors. Unlike traditional headless browser setups wired to rigid rules, it reasons with what it sees and adapts. By the end of this guide, you'll have a working agent scraping product data, interacting with web apps, and handling failure scenarios.

What are AI agents and how does Browser-use fit in?

A traditional automation script follows fixed steps: click this selector, wait for that element, extract this text. When the page changes, the script breaks. An AI agent works differently – it receives a description of the current page state, reasons about what it sees, and decides which action to take next. 

Browser-use is an AI agent library that uses Playwright as its browser control layer. In practice, Browser-use feeds the current page state to an LLM, which decides what to do next, then executes that decision directly in a real browser: click, type, scroll, extract. Playwright handles the browser mechanics under the hood, so the agent is always the one driving. 

This workflow makes Browser-use reliable on JavaScript-heavy pages with cookie banners, login flows, or layouts that shift between visits. Where a CSS-selector scraper would break, the agent reads what's on the page and adapts.

On the model side, Browser-use works with the most widely used LLMs out of the box. OpenAI GPT-4o is the most common choice, but Anthropic Claude, Google Gemini, and any LangChain-compatible model work too, so you're not tied to a single provider. For teams looking to go further, the AI agent orchestration guide covers chaining Browser-use into larger automated workflows.

Prerequisites and project setup

Before writing any agent logic, you need a working environment with the right Python version, dependencies, and project structure in place.

Environment setup

Browser-use requires Python 3.11 or higher. If you're on an older version, upgrade before continuing. Once you're on 3.11+, create a virtual environment to keep your dependencies isolated:

python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate

If you prefer Poetry for dependency management, it works equally well here. Then, install the core dependencies:

pip install browser-use playwright langchain-openai python-dotenv

After installation, fetch the Chromium browser binary that Playwright uses:

playwright install chromium

LLM API key

Browser-use needs an LLM to drive the agent's decisions. The examples in this guide use OpenAI GPT-4o. Head to platform.openai.com, generate an API key, and store it in a .env file at the root of your project.

OPENAI_API_KEY=your_key_here

If you'd rather use Anthropic Claude or Google Gemini, swap in the relevant LangChain provider class when initializing the model – no other code changes are needed.

Boilerplate script structure

Every Browser-use agent follows the same minimal pattern. Here's the starting point:

import asyncio
from dotenv import load_dotenv
from browser_use import Agent
from langchain_openai import ChatOpenAI
load_dotenv()
async def main():
agent = Agent(
task="Your task here",
llm=ChatOpenAI(model="gpt-4o"),
)
await agent.run()
asyncio.run(main())

The Agent class takes a task string and an LLM instance. Meanwhile, asyncio.run() handles the async execution. That's the entire skeleton – everything else builds on top of it.

Browser customization

Browser-use exposes a BrowserConfig class for controlling how the browser behaves. The most useful options during development are pointing the agent at a local Chrome installation, toggling headless mode, and setting the viewport:

from browser_use import Agent, Browser, BrowserConfig
browser = Browser(
config=BrowserConfig(
executable_path="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
headless=False, # Set to True in production
)
)

Using a local Chrome instance is worth doing early on – it'll already have saved sessions and cookies, which makes testing against login-gated pages much smoother. Run headless in production once your tasks are reliable.

Suggested project structure

Keeping configuration and task logic separate from the start saves time later.

project/
├── main.py # Entry point and agent execution
├── prompts.py # Task strings, kept separate from logic
├── config.py # Browser and agent configuration
└── .env # API keys -- never commit this file

If you hit setup issues along the way, the Python errors and exceptions guide covers the most common failure patterns, and the guide to running Python in the terminal is a useful reference if you're newer to the command line.

Setting up and using the Browser-use web UI

Browser-use ships with an official web UI – a Gradio-based interface that lets you run agent tasks without writing any Python. It's useful for rapid prompt prototyping and accessible to non-developers who want to test automation tasks without touching code.

Installation

Start by cloning the web UI repository:

git clone https://github.com/browser-use/web-ui.git
cd web-ui

Install the requirements:

pip install -r requirements.txt

Then copy the example environment file and add your API key:

cp .env.example .env

Open .env and add your OPENAI_API_KEY. If Playwright throws a "browser binary not found" error at this stage, run playwright install chromium to resolve it.

Configuration options

The UI exposes several settings directly in the interface.

  • Maximum steps – the number of actions the agent will attempt before stopping. Start high during testing, tighten once your prompt is reliable.
  • LLM model and temperature – select your model and adjust how deterministic the agent's decisions are. Lower temperature produces more consistent behavior.
  • Browser visibility – toggle between headed and headless mode directly from the UI.
  • Session recording – the agent saves a GIF of every session to agent_history.gif in the project root. This is the fastest way to diagnose unexpected behavior.

Running a task

Start the server with the following command:

python webui.py --ip 127.0.0.1 --port 7788

Navigate to http://127.0.0.1:7788 in your browser, and you’ll see this:

Open the Run Agent tab, enter your prompt, and click Run Agent.

To see it in action, try a simple public example: ask the agent to find the current price of a specific product on books.toscrape.com and return a formatted summary. It requires no login, produces a clear output, and gives you an immediate sense of how the agent interprets and executes a natural language instruction.

Once the task completes, review the output in the UI and open agent_history.gif to watch a replay of every action the agent took.

Running a task through the UI quickly reveals how much prompt quality matters. Let’s see how to write prompts that produce reliable, consistent output.

Writing effective prompts for AI browser agents

Prompt quality has a bigger impact on Browser-use reliability than most developers expect. A traditional script is explicit by design – every step is hard-coded. A Browser-use agent infers what you mean from natural language, which means a vague instruction produces inconsistent behavior. The agent's decisions are only as good as the information it's given.

Structural principles

How you structure a prompt determines how reliably the agent executes, because vague instructions get interpreted while precise ones get followed. The following principles apply whether you're scraping a single page or chaining a multi-step workflow.

  • Start with a named objective. Before listing steps, state what the task is meant to accomplish. This gives the agent a goal to orient toward if it gets confused mid-task.
  • Break complex tasks into numbered steps. Each step should correspond to 1 discrete browser action: navigate, find an element, click, extract, verify. Combining multiple actions into a single step increases the chance of the agent skipping one.
  • Add verification checkpoints. After critical actions, tell the agent to confirm the result before continuing. "Verify that at least 1 product listing is visible before extracting data" prevents the agent from running extraction logic against an empty page.
  • Define failure behavior explicitly. Left without instructions, the agent will improvise when something goes wrong. Specify what it should do instead: log an error and stop, retry up to N times, or skip the step and continue.
  • Be precise about element references. "The search box" is ambiguous. "The search input in the top navigation bar with a magnifying glass icon" is not. The more specific your reference, the less room there is for misinterpretation.

The principles above are easier to apply with a consistent structure to build from. Not every task needs every field, but starting with all of them and removing what's unnecessary is faster than debugging a prompt that was missing something from the start.

The template below covers the 5 components a reliable Browser-use prompt needs – objective, context, steps, error handling, and output format.

Objective: [What this task is meant to accomplish]
Context: [Any relevant information the agent needs before starting]
Steps:
1. [Navigate to URL]
2. [Find element -- describe it precisely]
3. [Perform action]
4. [Verify result]
5. [Extract data / perform next action]
Error handling:
- If [expected failure], [specific instruction]
- If the page fails to load, retry up to 3 times before stopping
Output format: [Describe exactly how the result should be structured]

Before and after

Let’s look at a practical example.

Vague prompt:

Go to books.toscrape.com and get the book prices.

This gives the agent no guidance on how many books to extract, what fields to include, how to handle pagination, or what to do if the page structure changes. The output will vary between runs.

Revised prompt:

Objective: Extract pricing data for the first 10 books listed on books.toscrape.com.
Steps:
1. Navigate to http://books.toscrape.com.
2. Verify that at least 1 book listing is visible before proceeding.
3. For each of the first 10 books, extract: title, price, star rating, and availability status.
4. If a field is missing for a book, record it as null rather than skipping the book.
5. Return the data as a JSON array.
Error handling:
- If the page fails to load, retry up to 3 times before stopping.
- If fewer than 10 books are visible, extract all available books and note the count in the output.

The difference isn't just length. The objective gives the agent something to recover toward if it loses its place. The numbered steps eliminate ambiguity about sequencing. The null instruction means a missing field produces clean data rather than a crashed run. And the error handling clauses mean the agent stops predictably when something goes wrong, rather than improvising in ways you can't anticipate. Every addition removes a decision the agent would otherwise make on its own – and those improvised decisions are where inconsistent output comes from.

For readers interested in LLM-assisted scraping more broadly, the ChatGPT web scraping guide covers complementary approaches, and the LangChain end-to-end workflows article goes deeper on building LangChain-based automation pipelines.

Practical use cases: step-by-step examples

With the prompt principles in place, the examples below apply them to 3 distinct tasks – product data extraction, multi-step web app interaction, and stock monitoring. Each includes a full prompt and the code to run it.

Scraping product listings and saving to JSON

The target here is books.toscrape.com, a public sandbox e-commerce site built for scraping practice. The task is to extract title, price, star rating, and availability for the first 10 results, then write the output to a local JSON file.

import asyncio
import json
from dotenv import load_dotenv
from browser_use import Agent
from langchain_openai import ChatOpenAI
load_dotenv()
TASK = """
Objective: Extract data for the first 10 books on books.toscrape.com.
Steps:
1. Navigate to http://books.toscrape.com.
2. Verify that at least 1 book listing is visible before proceeding.
3. For each of the first 10 books, extract: title, price, star rating, and availability status.
4. If a field is missing for a book, record it as null rather than skipping the book.
5. Return the data as a JSON array.
Error handling:
- If the page fails to load, retry up to 3 times before stopping.
- If fewer than 10 books are visible, extract all available books and note the count in the output.
""".strip(),
"stock_check": """
Objective: Check stock status for a product and trigger an alert if it's available.
Steps:
1. Navigate to http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html.
2. Locate the availability status on the product page.
3. If the status reads "In stock", return:
ALERT: [Product title] is in stock. Price: [price]. Checked at: [current time].
4. If the status reads anything other than "In stock", return:
Out of stock. No action taken.
Error handling:
- If the page fails to load, retry up to 3 times before stopping.
""".strip(),
}

The resulting JSON will look like this.

For more on handling and storing scraped output, the guide to saving scraped data covers the main storage patterns in depth. 

Single-site extraction is a good starting point, but Browser-use's real advantage shows up when a task spans multiple sites and requires the agent to carry state between them

Summarizing content and creating a task card

Task: Visit a public GitHub issues list, summarize the 3 most recent open issues, then log into a project management tool and create a card with the summary.

This is where Browser-use is ahead of traditional scraping most clearly. A CSS-selector script would require separate, site-specific logic for each site – and would break independently whenever either site updates its layout. A Browser-use agent handles both in a single task description.

TASK = """
Objective: Summarize the 3 most recent open issues from a GitHub repo and create a task card.
Steps:
1. Navigate to https://github.com/browser-use/browser-use/issues.
2. Verify that issue listings are visible.
3. Extract the title and URL of the 3 most recently opened issues.
4. Write a brief summary for each (1-2 sentences).
5. Navigate to [your project management tool URL].
6. Log in using the credentials from environment variables TOOL_USERNAME and TOOL_PASSWORD.
7. Create a new card titled "GitHub Issues Summary - [today's date]" with the 3 summaries as the card body.
8. Confirm the card was created successfully before stopping.
Error handling:
- If login fails, stop execution and log the error.
- If fewer than 3 issues are visible, summarize all available issues.
"""

Note: Store credentials in your .env file and load them with python-dotenv. Never hardcode login details in the task string or script.

The previous 2 use cases run once and return a result. The monitoring pattern below is designed to run repeatedly, checking for a condition and acting on it when it's met.

Stock monitoring and alerting

Task: Check a product page for stock status. If the item is in stock, write a formatted alert to a local file.

TASK = """
Objective: Check stock status for a product and trigger an alert if it's available.
Steps:
1. Navigate to http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html.
2. Locate the availability status on the product page.
3. If the status reads "In stock", write the following to a file named alert.txt:
ALERT: [Product title] is in stock. Price: [price]. Checked at: [current time].
4. If the status reads anything other than "In stock", write: "Out of stock. No action taken."
Error handling:
- If the page fails to load, retry up to 3 times before stopping.
"""

Run this on a schedule using cron or a task scheduler to build a basic restock monitor. The same pattern applies to price tracking – adjust the extracted field and alert condition accordingly. For sites with aggressive bot detection, routing the agent through a residential proxy is the most reliable way to keep sessions running cleanly – more on that in the next section.

For combining Browser-use with traditional extraction logic, the AI web scraping with Python guide covers the overlap in detail.

Managing errors and agent failures

The use cases above assume clean runs. In practice, pages change, networks drop, and agents occasionally misread what's in front of them. Building in error handling from the start is what separates a demo from something you can actually run on a schedule.

Browser-use agents fail for predictable reasons – the LLM misinterprets an element reference, the page structure shifts between the perception step and the action step, or a network timeout cuts the session short. In other cases, the agent simply runs out of its configured maximum steps before finishing the task. Knowing which failure mode you're dealing with makes debugging significantly faster.

Prompt-level error handling

Always include explicit failure instructions in your prompt. Left without them, the agent will improvise – and the improvised behavior is rarely what you want.

  • Tell the agent what to do when an element isn't found: "if the element is not found, log an error and stop execution."
  • Add retry instructions for expected transient failures: "if the page fails to load, retry up to 3 times before stopping."
  • Verify critical actions before moving on: "confirm that the form was submitted successfully before proceeding to the next step."

Code-level error handling

Wrap agent.run() in a try/except block so unexpected exceptions don't crash your process silently.

async def main():
agent = Agent(
task=TASK,
llm=ChatOpenAI(model="gpt-4o"),
max_steps=25,
)
try:
result = await agent.run()
print(result)
except Exception as e:
print(f"Agent failed: {e}")

The max_steps parameter caps how many actions the agent can take before stopping. Set it generously during development, then tighten it once the task is reliable. Without it, a confused agent can run indefinitely.

To prevent hung sessions, set a timeout at the browser context level using BrowserContextConfig.

from browser_use import Browser, BrowserConfig
from browser_use.browser.context import BrowserContextConfig
browser = Browser(
config=BrowserConfig(
headless=True,
)
)
context_config = BrowserContextConfig(
wait_for_network_idle_page_load_time=30.0,
)
agent = Agent(
task=TASK,
llm=ChatOpenAI(model="gpt-4o"),
browser=browser,
browser_context=context_config,
)

Browser-use records the agent's full action history internally. Logging it after a failure gives you a step-by-step trace of what the agent did before things went wrong:

result = await agent.run()
print(agent.history)

Practical debugging tips

Run new tasks in headed mode first. Watching the agent navigate in a visible browser window makes it immediately obvious when it clicks the wrong element or gets stuck on a banner. It's the fastest debugging tool available.

Start with a high max_steps value – 50 or more – during development. Once the task completes reliably, reduce it to something tighter. Test prompts incrementally too: verify each step works before adding the next one.

The Python errors and exceptions guide covers exception handling patterns in more depth, and the Python requests retry guide is a useful background for the retry logic pattern.

Using proxies with Browser-use for scalable automation

Solid error handling keeps your agent running cleanly on a single site. Proxies are what keep it running cleanly at scale – across many sessions, against sites that actively look for bot traffic, or when you need results from specific geographic locations.

Running Browser-use locally against a single target site works well for one-off tasks. At scale – multiple sessions, high-frequency monitoring, or sites with active bot detection – running from a single IP will eventually trigger rate limits or blocks. Residential proxies solve this by routing each session through a different IP associated with a real consumer device, which is far less likely to be flagged than a datacenter IP.

Geo-targeted proxies add another dimension: they let you pull region-specific prices, availability, and content that wouldn't be visible from a single fixed location. For retail, travel, and SERP monitoring use cases, this is often essential.

Configuring Decodo residential proxies in Browser-use

Residential proxies work because each IP is tied to a real consumer device – the kind of traffic that looks identical to an ordinary user visiting a site. Datacenter IPs, by contrast, are easy to fingerprint and block in bulk. For Browser-use specifically, where sessions are already designed to mimic human behavior, pairing the agent with Decodo residential proxies closes the last gap (the IP layer) that bot detection systems look for.

Decodo's network covers 195+ countries with ethically-sourced IPs, which matters when your use case requires geo-specific data. Retail prices, travel availability, and SERP results all vary by location – a single fixed IP only ever shows you one version of that data. With Decodo, you target the exact location you need and get the results that a real user in that region would see.

Here's how easy it is to set up Decodo residential proxies:

  1. Register or log in to the Decodo dashboard.
  2. Navigate to find residential proxies and choose a subscription that suits your needs or start with a 3-day free trial.
  3. Go to Proxy setup.
  4. Select a location or choose Random.
  5. Set the rotating session type and choose a protocol (HTTP(S) or SOCKS5).
  6. Choose the authentication type.
  7. Copy the generated endpoint and credentials or download them.

Get residential proxies for scraping text

Unlock superior scraping performance with a free 3-day trial of Decodo's residential proxy network.

The configuration passes proxy settings through a BrowserProfile and BrowserSession. Store credentials in .env and load them at runtime:

# .env
PROXY_HOST=gate.decodo.com
PROXY_PORT=10001
PROXY_USERNAME=your_username
PROXY_PASSWORD=your_password

Load these variables into Python and pass them into the ProxySettings, which will then be attached to your browser profile:

#proxytest.py
import os
from dotenv import load_dotenv
from browser_use import BrowserProfile
from browser_use.browser import BrowserSession
from browser_use.browser.profile import ProxySettings
load_dotenv()
proxy = ProxySettings(
server=f"http://{os.getenv('PROXY_HOST')}:{os.getenv('PROXY_PORT')}",
username=os.getenv("PROXY_USERNAME"),
password=os.getenv("PROXY_PASSWORD"),
)
profile = BrowserProfile(
headless=True,
proxy=proxy,
)
browser_session = BrowserSession(browser_profile=profile)

Rotating proxies for multi-session tasks (updated)

For tasks that run across multiple regions or require a fresh IP per session, Decodo's rotating residential proxies handle IP rotation automatically. Each new session gets a different IP without any manual pool management on your end – which matters at scale, where manually cycling IPs becomes a maintenance burden faster than most teams expect.

Create a new BrowserSession per run with the proxy settings you need:

import asyncio
from browser_use import Agent, BrowserProfile
from browser_use.browser import BrowserSession
from browser_use.browser.profile import ProxySettings
from browser_use.llm import ChatOpenAI
async def run_with_proxy(proxy_config: ProxySettings, task: str):
profile = BrowserProfile(headless=True, proxy=proxy_config)
browser_session = BrowserSession(browser_profile=profile)
agent = Agent(
task=task,
llm=ChatOpenAI(model="gpt-4o"),
browser_session=browser_session,
)
return await agent.run(max_steps=30)

Add a randomized delay between sessions to avoid predictable timing patterns. Even behind residential IPs, machine-speed intervals are a detectable signal:

import random
import asyncio
await asyncio.sleep(random.uniform(2, 6))

Final thoughts

Browser-use is genuinely useful, but it's worth being clear-eyed about where it sits right now. The library is maturing fast and the fundamentals are solid – the LLM-to-Playwright architecture is sound, the prompt-driven workflow is flexible, and the proxy integration is straightforward. For tasks involving dynamic pages, multi-step flows, and layout variability, it outperforms anything built on fixed selectors.

The current constraint is reliability at volume. A well-prompted agent handles most runs cleanly, but edge cases still surface – an unexpected modal, a layout A/B test, a page load that takes slightly too long. The debugging tools help, and max_steps keeps things from spiraling, but production deployments still need human review loops for anything where a missed extraction or a failed interaction has real consequences.

That constraint will shrink as the models improve. The agents that work well today with careful prompting will eventually work well with less. The teams positioned to benefit are the ones building the infrastructure now – prompt libraries, error handling patterns, proxy rotation logic – rather than waiting for the technology to be perfect before touching it.

Access residential proxies now

Try residential proxies free for 3 days – full access, zero restrictions.

About the author

Vilius Sakutis

Head of Partnerships

Vilius leads performance marketing initiatives with expertize rooted in affiliates and SaaS marketing strategies. Armed with a Master's in International Marketing and Management, he combines academic insight with hands-on experience to drive measurable results in digital marketing campaigns.


Connect with Vilius via LinkedIn

All information on Decodo Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Decodo Blog or any third-party websites that may belinked therein.

Frequently asked questions

What is Browser-use?

Browser-use is a Python library that connects an LLM to a real Playwright-controlled browser, enabling AI agents to navigate pages, interact with forms, and extract structured data. Unlike rule-based automation scripts, Browser-use agents adapt to layout changes and handle dynamic content without hard-coded selectors.

Which LLMs does Browser-use support?

OpenAI GPT-4o is the most commonly used model, but Browser-use works with any LangChain-compatible LLM – including Anthropic Claude and Google Gemini. Switching models requires only changing the LangChain model class in your script.

How do I avoid getting blocked when using Browser-use?

Routing Browser-use sessions through residential proxies is the most reliable approach. Residential IPs are associated with real consumer devices and are far less likely to trigger bot detection than datacenter IPs. Decodo's rotating residential proxies handle IP rotation automatically, so each new session gets a fresh IP without manual management.

Playwright Web Scraping: A Practical Tutorial

Web scraping can feel like directing a play without a script – unpredictable and chaotic. That’s where Playwright steps in: a powerful, headless browser automation tool that makes scraping modern, dynamic websites smoother than ever. In this practical tutorial, you’ll learn how to use Playwright to reliably extract data from any web page.

Playwright vs. Selenium in 2026: Which Browser Automation Tool Should You Choose?

As websites become more dynamic and better at detecting automated traffic, choosing the right automation tool has become more challenging. At the same time, performance, reliability, and anti-detection capabilities matter more than ever. Two tools dominate the space: Selenium, a mature and widely adopted standard, and Playwright, a newer framework built for modern web apps. This guide compares them through practical use cases like web scraping and dynamic content extraction to help you decide which fits your needs best.

Complete Guide for Building n8n Web Scraping Automations

If you're tired of duct-taping complicated scripts just to grab web data, this n8n web scraping tutorial is for you. You'll see how to use n8n for web scraping, why it beats DIY scrapers, and what you need to get started. Perfect for developers and coding beginners looking to automate data extraction without the headaches.

© 2018-2026 decodo.com (formerly smartproxy.com). All Rights Reserved