Web Scraping with Camoufox: A Developer's Complete Guide

If you're scraping with Playwright or Selenium, you've hit this. Your script works on unprotected sites, but Cloudflare, PerimeterX (HUMAN Security), and DataDome detect the headless browser and block it within seconds. Stealth plugins help, but each browser update breaks the patches. Camoufox takes a different approach – it modifies Firefox at the binary level to spoof browser fingerprints, making automated sessions look like real user traffic. This guide covers Camoufox setup in Python, residential proxy integration, real-world test results against protected targets, and when browser-level tools aren't enough.

Justinas Tamasevicius

Last updated: Mar 31, 2026

14 min read

TL;DR

A Camoufox proxy setup combines binary-level Firefox fingerprint spoofing with residential IP rotation to bypass bot detection.

Install with pip install -U 'camoufox[geoip]' and run camoufox fetch to download the browser binary
Pass a residential proxy and set geoip=True for automatic timezone, locale, and language matching
Use the sync API for single-target scripts, the async API for concurrent multi-page scraping
Camoufox does not solve CAPTCHAs or alter network-level TLS fingerprints. Heavy protection may still require a full Web Scraping API.

Quick start: Connect Camoufox to a proxy

If Camoufox is already installed, here's the proxy connection:

import os
from dotenv import load_dotenv
from camoufox.sync_api import Camoufox

load_dotenv()

proxy = {
    "server": f"http://{os.getenv('DECODO_HOST')}:{os.getenv('DECODO_PORT')}",
    "username": f"user-{os.getenv('DECODO_USERNAME')}-country-us",
    "password": os.getenv("DECODO_PASSWORD"),
}

with Camoufox(headless=True, proxy=proxy, geoip=True) as browser:
    page = browser.new_page()
    page.goto("https://ip.decodo.com/json", timeout=30000)
    print(page.text_content("body"))

Expected output (IP and city vary per session, but the country matches the US target):

{
  "browser": {"name": "Firefox", "version": "135.0"},
  "platform": {"name": "Macintosh", "os": "Intel Mac OS X 10.15", "type": "desktop"},
  "engine": {"name": "Gecko", "version": "20100101"},
  "isp": {"asn": 7922, "isp": "Comcast Cable", "organization": "Comcast Cable"},
  "city": {"name": "Chicago", "state": "Illinois", "time_zone": "America/Chicago"},
  "proxy": {"ip": "203.0.113.42"},
  "country": {"name": "United States", "code": "US", "continent": "North America"}
}

The snippet loads credentials from a .env file, launches Camoufox with GeoIP-aware locale matching, and prints the exit IP details. The full setup, credential configuration, and proxy modes (rotating, sticky, country-targeted) start in the sections below.

The diagram below shows how the pieces connect. With standard Playwright, raw automation traffic goes directly to the target and gets blocked. With Camoufox and a residential proxy, the traffic passes through a spoofed fingerprint layer and a residential IP before reaching the target:

Python script flows through Camoufox and Residential proxy to Target site — allowed

What is Camoufox?

Camoufox is a customized Firefox build designed for anti-detection browser automation. Now the question comes, why does Camoufox use Firefox instead of headless Chromium? Bot detection systems target headless Chromium more than any other browser engine. It exposes navigator.webdriver, produces consistent Canvas outputs, and leaks Chrome DevTools Protocol (CDP) artifacts. Stealth plugins for Playwright and Puppeteer mask some of these signals, but each browser update can break the patches. For details on how protection systems detect automated browsers, see the guide on anti-scraping techniques and how to outsmart them.

Camoufox takes a different approach: it modifies Firefox at the binary level. The patches exist below JavaScript execution, so page scripts can't detect them. Firefox also sees less automated traffic, so fewer protection vendors target Firefox automation patterns. The core browser patches are open-source, but the fingerprint spoofing layer is intentionally closed-source to prevent protection vendors from reverse-engineering countermeasures.

Camoufox fingerprint spoofing and anti-detection patches

Camoufox includes these capabilities:

Fingerprint spoofing. Camoufox spoofs navigator properties, screen dimensions, WebGL renderer strings, Canvas noise, audio context, font enumeration, and geolocation. All attributes are configurable per session. Each launch uses BrowserForge, an open-source library, to generate a fingerprint that resembles real browser profiles. All pages opened within the same Camoufox() session share that fingerprint. To get a different fingerprint, launch a new browser instance.
Anti-detection patches. Camoufox fixes known automation leaks at the binary level. navigator.webdriver returns false, and Camoufox removes headless Firefox detection flags. The Playwright page agent runs in an isolated execution context separate from the main page world.
GeoIP-aware configuration. With a proxy and the geoip option enabled, Camoufox detects the proxy's geographic location and auto-configures timezone, locale, and language to match. GeoIP matching prevents mismatches like a proxy exiting in Germany while the browser's Intl.DateTimeFormat still returns America/New_York.
Virtual display mode. On Linux servers with headless="virtual", Camoufox runs in headed mode inside an _X virtual framebuffe_r (Xvfb) rather than in true headless mode. Headful rendering produces a more realistic browser environment, passing detection checks where headless mode fails.
Font anti-fingerprinting. Camoufox spoofs the available system font list to match the declared OS, preventing font-based device identification.
Persistent context support. With persistent_context=True and a user_data_dir path, session state (cookies, localStorage) persists across runs without re-login.
Playwright-compatible API. The Camoufox Playwright integration exposes the standard Playwright API on Firefox. Migrating existing Playwright code requires few changes. The main difference is browser initialization.
Remote server mode. Camoufox can expose a Playwright-compatible API server, enabling control from any language with Playwright support, not only Python. Run the browser on a dedicated server and connect from separate scraping workers.

What Camoufox doesn't handle

Camoufox handles browser fingerprints, but protection systems check more than the browser.

IP reputation. A spoofed browser fingerprint isn't enough if the target site flags the source IP. For heavily protected targets, residential proxies are the least likely to get blocked. They route through IPs assigned to Internet service provider (ISP) subscribers.
Behavioral analysis. Request timing patterns, mouse movement, and scroll behavior go beyond browser attributes. The Camoufox humanize parameter adds cursor movement, but full behavioral evasion requires additional randomized delays and interaction patterns in your scraping code.
TLS and network-level fingerprinting. Protection systems like Cloudflare check TLS fingerprints (JA3/JA4 hashes). HTTP/2 frame ordering and TCP stack characteristics are additional network-level signals that fingerprint the client. Camoufox doesn't modify any of these network-level behaviors, so sites that inspect them can still detect automated traffic.
CAPTCHA solving. Camoufox has no built-in CAPTCHA solver. The guide on how to bypass Google CAPTCHA covers CAPTCHA avoidance strategies and mentions third-party solver services. For a broader overview of bot detection methods, see the guide on navigating anti-bot systems.

Set up Camoufox with Python and Decodo proxies

This guide uses Python 3.9+. All code examples use Camoufox 0.4.11 on PyPI (the last stable release, from January 2025) and browser engine v135.0.1-beta.24. Active development continues under the CloverLabsAI organization.

Create a virtual environment and install dependencies

Set up an isolated environment to avoid package conflicts:

python -m venv camoufox-env
source camoufox-env/bin/activate  # macOS/Linux
# camoufox-env\Scripts\activate   # Windows

Install Camoufox with the GeoIP extra for proxy-based locale matching:

pip install -U 'camoufox[geoip]' python-dotenv

Download the Camoufox browser binary. The Camoufox binary replaces the standard Playwright browser install, so you don't need a separate playwright install command:

camoufox fetch

If you're on Linux, install the required system libraries:

# Debian/Ubuntu
sudo apt install -y libgtk-3-0 libx11-xcb1 libasound2

# Arch Linux
sudo pacman -S gtk3 libx11 libxcb cairo libasound alsa-lib

On macOS and Windows, the Camoufox binary bundles these dependencies. With the browser installed, set up a project directory to organize your scraping scripts.

Project structure

Create a project directory and keep your proxy credentials separate from scraping logic:

camoufox-scraper/
├── .env              # Proxy credentials
└── scraper.py        # Camoufox browser logic and data extraction

Store proxy credentials in environment variables

Sign up for a Decodo residential proxy plan – new accounts get a free trial with up to 2,000 requests (at time of writing), enough to run every example in this guide. The residential proxy quick start guide walks through authentication setup, endpoint configuration, and usage tracking. After registration, open the dashboard, select Residential in the left sidebar, and copy the Authentication username and Password from the Proxy setup tab:

Residential proxy UI showing 'Proxy setup', AUTHENTICATION and PASSWORD fields, Used traffic 0.61 GB, 'Pay As You Go'

Create a .env file in your project root with these credentials:

DECODO_USERNAME=your_username
DECODO_PASSWORD=your_password
DECODO_HOST=gate.decodo.com
DECODO_PORT=7000

Load these in your Python code with python-dotenv instead of hardcoding credentials:

import os
from dotenv import load_dotenv

load_dotenv()

proxy = {
    "server": f"http://{os.getenv('DECODO_HOST')}:{os.getenv('DECODO_PORT')}",
    "username": os.getenv("DECODO_USERNAME"),
    "password": os.getenv("DECODO_PASSWORD"),
}

Scale your scrapers without the bans

Decodo’s residential proxies offer 99.9% uptime and a massive global pool to ensure your requests look like real users, every single time. Stop troubleshooting blocks and start gathering data at scale.

Get started now

Scrape with Camoufox in sync and async mode

Camoufox offers 2 API modes: synchronous (blocking) and asynchronous (non-blocking).

Synchronous API: Single-target scraping

The sync API blocks until each operation completes. Use it for single-target scripts, small batch jobs, or when you're prototyping a scraper before scaling it.

Import from camoufox.sync_api and use the Camoufox context manager:

from camoufox.sync_api import Camoufox

with Camoufox(headless=True, os="windows") as browser:
    page = browser.new_page()
    page.goto("https://example.com", timeout=30000)
    user_agent = page.evaluate("navigator.userAgent")
    print(f"User-Agent: {user_agent}")

Expected output:

User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:135.0) Gecko/20100101 Firefox/135.0

The os="windows" parameter generates a Windows fingerprint. User agent, navigator properties, screen dimensions, and font list all reflect a real Windows machine.

Camoufox vs. standard Playwright: Detection signals

To see how Camoufox differs from standard Playwright, check the properties that bot detection systems inspect:

from camoufox.sync_api import Camoufox

with Camoufox(headless=True, os="windows") as browser:
    page = browser.new_page()
    page.goto("https://example.com", timeout=30000)

    checks = page.evaluate("""() => ({
        webdriver: navigator.webdriver,
        languages: navigator.languages.join(', '),
        platform: navigator.platform,
        hardwareConcurrency: navigator.hardwareConcurrency,
        deviceMemory: navigator.deviceMemory || 'undefined',
        chrome: typeof window.chrome !== 'undefined',
    })""")

    for key, value in checks.items():
        print(f"{key}: {value}")

Expected output (hardwareConcurrency varies per session – BrowserForge generates a random value each launch):

webdriver: False
languages: en-US, en
platform: Win32
hardwareConcurrency: 16
deviceMemory: undefined
chrome: False

With standard Playwright Firefox, navigator.webdriver returns True and automation-related properties keep their default values. webdriver: False matters because it's one of the first properties that bot detection scripts check. The chrome: False indicates Firefox (not Chromium), and deviceMemory: undefined is correct for Firefox, which doesn't implement that API.

Key initialization options:

Core options:

headless – True for headless, False for visible browser, "virtual" for headful rendering inside a virtual display (Linux only)
os – "windows", "macos", or "linux" (or a list to randomly pick from)
proxy – Playwright-format proxy dict with server, username, and password
geoip – True to auto-detect proxy location and configure locale/timezone (requires a proxy – without one, Camoufox skips GeoIP detection and uses system defaults)

Advanced options:

persistent_context – True to persist cookies and localStorage across runs (requires user_data_dir)
block_images – True to skip image loading for faster page loads and lower proxy bandwidth (a typical page load uses approximately 3 MB with images, under 500 KB without)
config – dictionary of custom navigator property overrides (for example, custom navigator.platform or hardwareConcurrency values)
main_world_eval – True to enable running individual page.evaluate() calls in the main page world (prefix each call with mw: to activate)
humanize – True or a float (max seconds) to add human-like cursor movement

Pass all options as keyword arguments to the Camoufox() constructor.

Scrape JavaScript-rendered content

This sync script extracts data from a JavaScript-rendered page:

from camoufox.sync_api import Camoufox

with Camoufox(headless=True) as browser:
    page = browser.new_page()
    page.goto("https://quotes.toscrape.com/js/", timeout=30000)
    page.wait_for_selector(".quote", timeout=10000)

    quotes = page.query_selector_all(".quote")
    for quote in quotes:
        text = quote.query_selector(".text").text_content()
        author = quote.query_selector(".author").text_content()
        tags = [tag.text_content() for tag in quote.query_selector_all(".tag")]
        print(f"{text[:80]}... - {author} (tags: {', '.join(tags)})")

Expected output:

"The world as we have created it is a process of our thinking. It cannot be chan... - Albert Einstein (tags: change, deep-thoughts, thinking, world)
"It is our choices, Harry, that show what we truly are, far more than our abilit... - J.K. Rowling (tags: abilities, choices)
"There are only two ways to live your life. One is as though nothing is a miracl... - Albert Einstein (tags: inspirational, life, live, miracle, miracles)

The wait_for_selector call is critical here. Without it, query_selector_all runs before JavaScript has rendered the content and returns an empty list. The guide on scraping dynamic websites covers wait strategies and single-page application (SPA) extraction patterns for JavaScript-rendered pages.

Asynchronous API: Concurrent multi-page scraping

The async API is more efficient when scraping multiple pages concurrently or integrating Camoufox into an async pipeline. To add proxy support, pass proxy and geoip to AsyncCamoufox() the same way as the sync examples. Import from camoufox.async_api and use async with:

import asyncio
from camoufox.async_api import AsyncCamoufox

async def main():
    sem = asyncio.Semaphore(2)

    async with AsyncCamoufox(headless=True) as browser:
        urls = [
            "https://quotes.toscrape.com/js/page/1/",
            "https://quotes.toscrape.com/js/page/2/",
            "https://quotes.toscrape.com/js/page/3/",
        ]

        async def scrape_page(url):
            async with sem:
                page = await browser.new_page()
                try:
                    await page.goto(url, timeout=30000)
                    await page.wait_for_selector(".quote", timeout=10000)
                    quotes = await page.query_selector_all(".quote")
                    results = []
                    for q in quotes:
                        text_el = await q.query_selector(".text")
                        author_el = await q.query_selector(".author")
                        text = await text_el.text_content()
                        author = await author_el.text_content()
                        results.append({"text": text[:60], "author": author})
                    return results
                except Exception as e:
                    print(f"Error on {url}: {e}")
                    return []
                finally:
                    await page.close()

        all_results = await asyncio.gather(
            *[scrape_page(url) for url in urls]
        )

        total = sum(len(r) for r in all_results)
        print(f"Scraped {total} quotes across {len(urls)} pages")
        for i, results in enumerate(all_results, 1):
            print(f"  Page {i}: {len(results)} quotes")

asyncio.run(main())

Expected output:

Scraped 30 quotes across 3 pages
  Page 1: 10 quotes
  Page 2: 10 quotes
  Page 3: 10 quotes

The asyncio.Semaphore(2) limits concurrent page loads to 2. Each Camoufox instance runs a full Firefox process, so too many simultaneous pages exhaust memory and cause timeouts. Start with a semaphore value of 2, then increase based on your machine's resources. Each Camoufox() browser launch starts at approximately 200 MB of RAM and grows with page complexity; additional pages within the same browser use less.

Async only helps when you're waiting on page loads. For CPU-bound parsing work, async won't help – the event loop still runs on a single thread.

When to use sync vs. async

Match the API mode to your scraping pattern:

Scenario

Recommended mode

Prototyping a new scraper

Sync

Scraping a single page or URL list under 10

Sync

Scraping 10+ pages from the same domain

Async with semaphore

Integrating into an existing async application

Async

Sync for login, async for data collection

Camoufox proxy configuration with Decodo residential IPs

Combining Camoufox with residential proxies addresses both the browser fingerprint and the IP layer.

Why proxy type matters for Camoufox

Anti-bot systems check the IP address independently of the browser fingerprint. Protected sites can still flag a perfectly configured Camoufox session if the IP belongs to a known cloud provider (AWS, GCP, and Azure). The same happens when the IP geography contradicts the browser's declared locale.

Residential proxies are the least likely to get flagged on sites that actively fingerprint visitors.

Proxy bandwidth and cost

Residential proxies bill by bandwidth. At roughly 3 MB per page with images, 1,000 pages use approximately 3 GB. Check the pricing page for current rates and plan options.

On sites that don't use Cloudflare, use block_images=True to reduce per-request cost (Cloudflare can detect image blocking as a bot signal). Monitor your bandwidth usage in the Decodo proxy dashboard.

Pass proxy credentials to Camoufox

The Camoufox proxy parameter uses the standard Playwright proxy format: a dictionary with server, username, and password:

import os
import json
from dotenv import load_dotenv
from camoufox.sync_api import Camoufox

load_dotenv()

proxy = {
    "server": f"http://{os.getenv('DECODO_HOST')}:{os.getenv('DECODO_PORT')}",
    "username": f"user-{os.getenv('DECODO_USERNAME')}-country-us",
    "password": os.getenv("DECODO_PASSWORD"),
}

with Camoufox(headless=True, proxy=proxy, geoip=True) as browser:
    page = browser.new_page()
    page.goto("https://ip.decodo.com/json", timeout=30000)
    data = json.loads(page.text_content("body"))
    print(f"IP: {data['proxy']['ip']}")
    print(f"Country: {data['country']['name']} ({data['country']['code']})")
    print(f"City: {data['city']['name']}")
    print(f"ISP: {data['isp']['isp']}")

Expected output (IP and city vary per session, but the country matches the country-us target):

IP: 47.13.101.97
Country: United States (US)
City: McKenzie
ISP: Spectrum

Camoufox sends a request through the proxy to detect the exit IP and its geographic location. It then sets the browser's timezone, locale, and language to match. Without geoip=True, a browser's Intl.DateTimeFormat timezone might show Asia/Tokyo while the IP geolocates to the US, which triggers detection.

Verify fingerprint-to-IP consistency

After launching with geoip=True, verify that the browser's internal settings match the proxy's location:

import os
import json
from dotenv import load_dotenv
from camoufox.sync_api import Camoufox

load_dotenv()

proxy = {
    "server": f"http://{os.getenv('DECODO_HOST')}:{os.getenv('DECODO_PORT')}",
    "username": f"user-{os.getenv('DECODO_USERNAME')}-country-us",
    "password": os.getenv("DECODO_PASSWORD"),
}

with Camoufox(headless=True, proxy=proxy, geoip=True) as browser:
    page = browser.new_page()
    page.goto("https://ip.decodo.com/json", timeout=30000)
    ip_data = json.loads(page.text_content("body"))

    browser_tz = page.evaluate("Intl.DateTimeFormat().resolvedOptions().timeZone")
    browser_lang = page.evaluate("navigator.language")

    print(f"Proxy country: {ip_data['country']['name']}")
    print(f"Proxy timezone: {ip_data['city']['time_zone']}")
    print(f"Browser timezone: {browser_tz}")
    print(f"Browser language: {browser_lang}")

Expected output:

Proxy country: United States
Proxy timezone: America/Chicago
Browser timezone: America/New_York
Browser language: en-US

The proxy timezone and browser timezone may differ slightly – the GeoIP database and the geo-resolution service sometimes map the same residential IP to different cities within the same country. Timezone, language, and locale stay internally consistent with a US locale, even if the exact timezone doesn't match the geo-label.

Sticky sessions vs. rotating IPs

The gateway supports 3 proxy modes. The following snippets show only the proxy dictionary format. They aren't standalone scripts. Replace YOUR_USERNAME and YOUR_PASSWORD with the values from your .env file, then pass the dictionary to Camoufox(proxy=proxy) as shown in the full examples above.

Rotating sessions assign a new IP for each connection. Use rotating sessions for independent page requests where each URL is a separate, stateless task (search result pages, product listings, category pages):

# Rotating: each browser launch gets a new IP
proxy = {
    "server": "http://gate.decodo.com:7000",
    "username": "YOUR_USERNAME",
    "password": "YOUR_PASSWORD",
}

Sticky sessions maintain the same IP for a configurable duration. Use sticky sessions for multi-step flows where the target site tracks session continuity (login sequences, checkout flows, paginated results that use server-side cursors):

# Sticky: same IP for the session duration
proxy = {
    "server": "http://gate.decodo.com:10001",
    "username": "user-YOUR_USERNAME-session-abc123",
    "password": "YOUR_PASSWORD",
}

Change the port number and append a session identifier to the username to enable sticky sessions. Replace abc123 with any unique string (for example, a UUID or timestamp).

Country-targeted sessions route traffic through a residential IP in a specific country. Use country targeting when the target site serves region-specific content or when IP-to-content geographic consistency matters:

# US-targeted: exit through a US residential IP
proxy = {
    "server": "http://gate.decodo.com:7000",
    "username": "user-YOUR_USERNAME-country-us",
    "password": "YOUR_PASSWORD",
}

The country-us parameter in the username tells the gateway to assign a US residential IP. Replace us with any 2-letter country code (for example, country-gb for the UK, country-de for Germany). The residential proxy pool covers 195+ locations with targeting at continent, country, state, city, ZIP code, and ASN/ISP levels.

With geoip=True, the browser's timezone and locale align with the exit IP. Combine sticky sessions and country targeting (for example, user-YOUR_USERNAME-session-abc123-country-us). The guide on rotating proxies covers how IP rotation works, proxy types, and use cases for session management.

Important: Wrong proxy credentials produce a connection error, not a silent fallthrough. But if the proxy server becomes unreachable (network issue, server downtime), Camoufox falls back to a direct connection – the default Firefox behavior. The fallback exposes your real IP. Before scraping real targets, verify the proxy is active by checking the exit IP.

Camoufox proxy results against real-world anti-bot protection

The following examples test this setup against real targets with bot detection, including both successes and failures.

Scrape localized hotel pricing from Booking.com

Travel sites change prices, currency, and language based on the visitor's detected region. The browser's timezone, locale, and language must match the proxy's exit location for the site to serve consistent results.

This scraper extracts hotel pricing from Booking.com. Update the checkin and checkout query parameters to future dates before running:

import os
import json
from dotenv import load_dotenv
from camoufox.sync_api import Camoufox

load_dotenv()

proxy = {
    "server": f"http://{os.getenv('DECODO_HOST')}:{os.getenv('DECODO_PORT')}",
    "username": f"user-{os.getenv('DECODO_USERNAME')}-country-us",
    "password": os.getenv("DECODO_PASSWORD"),
}

with Camoufox(headless=True, proxy=proxy, geoip=True) as browser:
    page = browser.new_page()
    page.goto(
        "https://www.booking.com/searchresults.html"
        "?ss=New+York&checkin=2027-01-15&checkout=2027-01-17",
        timeout=45000,
    )
    page.wait_for_selector(
        '[data-testid="property-card"]', timeout=15000
    )

    hotel_cards = page.query_selector_all(
        '[data-testid="property-card"]'
    )
    hotels = []
    for card in hotel_cards[:5]:
        name_el = card.query_selector('[data-testid="title"]')
        price_el = card.query_selector(
            '[data-testid="price-and-discounted-price"]'
        )
        if name_el:
            hotel = {"name": name_el.text_content().strip()}
            if price_el:
                hotel["price"] = price_el.text_content().strip()
            hotels.append(hotel)

    browser_tz = page.evaluate(
        "Intl.DateTimeFormat().resolvedOptions().timeZone"
    )
    print(f"Browser timezone: {browser_tz}")
    print(f"Hotels extracted: {len(hotels)}")
    print(json.dumps(hotels, indent=2))

Expected output (hotel names, prices, and currency vary by proxy exit location):

Browser timezone: America/New_York
Hotels extracted: 5
[
  {
    "name": "Holiday Inn Manhattan 6th Ave - Chelsea by IHG",
    "price": "US$123"
  },
  {
    "name": "World Center Hotel",
    "price": "US$152"
  },
  {
    "name": "Hyatt Place New York Chelsea",
    "price": "US$199"
  },
  {
    "name": "Washington Square Hotel",
    "price": "US$390"
  },
  {
    "name": "Hotel Boutique at Grand Central",
    "price": "US$194"
  }
]

The country-us parameter in the proxy username targets a US residential IP. With geoip=True, Camoufox aligns the browser's timezone and locale to the US exit location, so Booking.com serves USD pricing. To get pricing in a different currency, change the country code – for example, country-in for Indian Rupees or country-gb for British Pounds.

Booking.com results page showing 'Holiday Inn Manhattan 6th Ave - Chelsea by IHG' hotel listing from New York search

Scrape property listings from Zillow

Zillow uses PerimeterX (HUMAN Security) for bot detection, checking both browser fingerprints and IP reputation. Zillow is a US-only site. Target a US residential IP with country-us in the proxy username to match the IP location to the search location:

import os
import json
import time
import random
from dotenv import load_dotenv
from camoufox.sync_api import Camoufox

load_dotenv()

proxy = {
    "server": f"http://{os.getenv('DECODO_HOST')}:{os.getenv('DECODO_PORT')}",
    "username": f"user-{os.getenv('DECODO_USERNAME')}-country-us",
    "password": os.getenv("DECODO_PASSWORD"),
}

with Camoufox(headless=True, proxy=proxy, geoip=True, humanize=True) as browser:
    # headless="virtual" on Linux servers for better detection evasion
    page = browser.new_page()

    # Randomized delay before the request to avoid fixed timing patterns
    time.sleep(random.uniform(2, 5))

    page.goto("https://www.zillow.com/new-york-ny/", timeout=45000)

    # Check for a block before extracting data
    if page.query_selector('.cf-turnstile, #px-captcha, form[action*="validateCaptcha"], [class*="captcha"]'):
        print("Blocked: CAPTCHA or challenge page detected")
    else:
        page.wait_for_selector("article", timeout=20000)
        page.wait_for_timeout(3000)

        cards = page.query_selector_all("article")
        listings = []
        for card in cards[:5]:
            price_el = card.query_selector(
                '[data-test="property-card-price"]'
            )
            addr_el = card.query_selector("address")
            if price_el and addr_el:
                listings.append({
                    "price": price_el.text_content().strip(),
                    "address": addr_el.text_content().strip(),
                })

        print(f"Listings extracted: {len(listings)}")
        print(json.dumps(listings, indent=2))

Expected output:

Listings extracted: 3
[
  {
    "price": "$318,888",
    "address": "50 Fort Pl APT A6a, Staten Island, NY 10301"
  },
  {
    "price": "$200,000",
    "address": "78 E 127th St #19A, New York, NY 10035"
  },
  {
    "price": "$329,000",
    "address": "21-28 35th #2A, Astoria, NY 11105"
  }
]

This scraper differs from the Booking.com example in 2 ways. The time.sleep(random.uniform(2, 5)) call inserts a randomized delay before the request – fixed timing is a behavioral signal that protection systems flag. The CAPTCHA/challenge check after page.goto detects a block before attempting data extraction, so the script fails explicitly instead of returning empty results.

PerimeterX on Zillow checks for headless browser signals, automation frameworks, and datacenter IPs. Camoufox bypasses the browser checks, and the US-targeted residential proxy provides a US residential IP with a matching timezone via geoip=True. The wait_for_timeout(3000) allows JavaScript-rendered listing cards to fully load after the initial page structure appears.

Verification modal reading 'Before we continue...' and 'Press & Hold' over Zillow New York property listings and map

Results on Zillow are inconsistent. Some sessions extract data without issues. Others trigger a PerimeterX "Press & Hold" interactive challenge. The challenge appears either as an overlay on partially loaded content or as a full block page with zero data.

The detection goes beyond browser fingerprints: PerimeterX also checks TLS fingerprints (JA3/JA4), behavioral signals, and session trust (cookie chain from prior visits). Camoufox can't modify any of these. If your target consistently triggers the challenge, the Web Scraping API section below covers infrastructure-level approaches that handle these challenges server-side.

Sites with dynamic CSS class names (common in React and Next.js apps) need stable selectors – use data-testid attributes, ARIA roles, or structural position instead.

Handle pagination and lazy-loaded content

Before testing against protected targets, production scrapers need 2 patterns: paging through results and scrolling to load lazy content before extraction.

Pagination keeps the same browser instance open and navigates to the next page within the session. Reusing the browser instance avoids launching a new browser per page and preserves cookies and session state:

from camoufox.sync_api import Camoufox

with Camoufox(headless=True) as browser:
    page = browser.new_page()
    all_quotes = []

    for page_num in range(1, 4):
        page.goto(
            f"https://quotes.toscrape.com/js/page/{page_num}/",
            timeout=30000,
        )
        page.wait_for_selector(".quote", timeout=10000)

        quotes = page.query_selector_all(".quote")
        for q in quotes:
            text_el = q.query_selector(".text")
            all_quotes.append(text_el.text_content()[:60])

        print(f"Page {page_num}: {len(quotes)} quotes")

    print(f"Total: {len(all_quotes)} quotes across 3 pages")

Expected output:

Page 1: 10 quotes
Page 2: 10 quotes
Page 3: 10 quotes
Total: 30 quotes across 3 pages

Lazy-loaded content requires scrolling the page to trigger rendering before extracting data. Many sites load content only when it enters the viewport:

# Scroll to bottom to trigger lazy-loaded elements
page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
page.wait_for_timeout(2000)

Combine scrolling with wait_for_selector to confirm the content has rendered before extracting it.

Where Camoufox + proxies get blocked

This setup doesn't work on every site. In testing, Amazon blocked the Camoufox + residential proxy setup.

Amazon served a CAPTCHA validation page. The Amazon bot detection combines device fingerprinting with behavioral scoring and account-level signals. Sites using Cloudflare Bot Management with managed challenges (Turnstile, proof-of-work) also block this setup – Cloudflare checks JA3/JA4 TLS fingerprints, which Camoufox doesn't modify.

Amazon dialog showing 'Click the button below to continue shopping' with a yellow 'Continue shopping' button on white page

In your code, a block typically causes a TimeoutError on wait_for_selector. The expected content elements never render because the page shows a challenge or CAPTCHA instead. To detect this programmatically, check for common block indicators before assuming the scrape succeeded: .cf-turnstile (Cloudflare Turnstile), #px-captcha (PerimeterX), and a form with an action containing "validateCaptcha" (Amazon).

For targets that use CAPTCHA gates or Cloudflare-managed challenges, browser-level tools alone aren't sufficient. The Web Scraping API section below covers one option that handles these challenges server-side.

The workflow has 2 phases: log in and save cookies to a file, then restore cookies in subsequent runs.

Phase 1 logs in and saves session cookies to a JSON file. This example uses headless=False, which opens a visible browser window – run it on a machine with a desktop environment, not a headless server:

from camoufox.sync_api import Camoufox
import json

COOKIE_FILE = "./cookies.json"

with Camoufox(headless=False) as browser:
    page = browser.new_page()
    page.goto("https://quotes.toscrape.com/login", timeout=30000)

    page.fill("#username", "admin")
    page.fill("#password", "admin")
    page.click('input[type="submit"]')
    page.wait_for_url("**/", timeout=10000)

    cookies = page.context.cookies()
    with open(COOKIE_FILE, "w") as f:
        json.dump(cookies, f)
    print(f"Login successful - {len(cookies)} cookies saved")

Expected output:

Login successful - 1 cookies saved

If the target site requires manual CAPTCHA or 2FA solving, run with headless=False to interact with the browser window.

Phase 2 restores the cookies and scrapes as an authenticated user:

from camoufox.sync_api import Camoufox
import json

COOKIE_FILE = "./cookies.json"

with Camoufox(headless=True) as browser:
    page = browser.new_page()

    with open(COOKIE_FILE) as f:
        cookies = json.load(f)
    page.context.add_cookies(cookies)

    page.goto("https://quotes.toscrape.com/", timeout=30000)

    logout_link = page.query_selector('a[href="/logout"]')
    if logout_link:
        print("Session is active - scraping as authenticated user")
    else:
        print("Session expired - re-run Phase 1 to log in again")

Expected output:

Session is active - scraping as authenticated user

Build a session validation check into your scraper. Before starting data collection, verify that a login-only element (logout link, profile avatar, dashboard nav) exists on the page. If it doesn't, trigger the login flow again.

Camoufox vs. Playwright, SeleniumBase, Selenium, and Puppeteer

Each tool makes a different trade-off between anti-detection depth and ecosystem support:

Tool

Anti-detection level

Language

Key trade-off

Camoufox

Binary-level Firefox patches

Python

Binary-level patches, Firefox only

Playwright + stealth plugin

JavaScript patches on Chromium

Python, JS

Wider browser support, JS-level patches

SeleniumBase UC mode

Driver-disconnect on Chromium

Python

Handles Turnstile and reCAPTCHA, Chromium only

Selenium

No built-in evasion

Python, Java, JS

Largest ecosystem, most detectable

Puppeteer

JavaScript patches on Chromium

Node.js only

No Python support

Camoufox vs. Playwright with stealth plugins (playwright-stealth). Stealth plugins inject JavaScript patches into a Chromium browser. They mask some signals (navigator.webdriver, headless detection) but leave others intact (the Chromium headless fingerprint, CDP detection artifacts). Camoufox patches at the binary level, which addresses signals that JavaScript patches can't reach. But Camoufox is Firefox-only, while stealth plugins give you broader Chromium site compatibility.

Camoufox vs. SeleniumBase UC mode. SeleniumBase's Undetected Chrome (UC) mode disconnects chromedriver before loading a protected page, lets the page pass its challenge checks without detecting automation, then reconnects. UC mode includes built-in methods for handling Cloudflare Turnstile and reCAPTCHA challenges that Camoufox can't solve. The trade-off: UC mode works only on Chromium browsers, is detectable in headless mode (requires a virtual display on Linux), and the disconnect-reconnect pattern adds latency to each page load. Both tools still need residential proxies for IP reputation – browser-level evasion alone isn't enough on heavily protected targets.

Firefox vs. Chromium compatibility. Some sites use Chromium-specific APIs or render differently on Firefox. For those targets, Playwright with a stealth plugin or SeleniumBase UC mode on Chromium may be the only viable browser automation path.

If the site doesn't have bot detection, standard Playwright works. Use Camoufox when the target actively fingerprints browsers. If you're migrating from Selenium, the guide on web scraping with Selenium in Python covers setup and detection limitations. For teams evaluating Puppeteer, Puppeteer CAPTCHA bypass walks through that approach and its limitations.

Troubleshoot Camoufox proxy issues

These are the most common problems and their solutions:

Proxy, binary, and session errors

Proxy authentication errors. If Camoufox throws a connection error or the target page returns a 407 status, check the credentials in your .env file. Verify the username and password match the values in the dashboard, and confirm that DECODO_HOST is gate.decodo.com and DECODO_PORT is 7000. A common mistake is copying the username with a trailing space.

Proxy failures raise playwright._impl._errors.Error with a connection refused or timeout message – wrap page.goto() in a try/except to handle proxy issues separately. The guide on proxy error codes explains common HTTP proxy error codes, their causes, and troubleshooting strategies.

Browser binary not found after install. Confirm camoufox fetch completed without errors. On slow connections, partial downloads can leave a corrupted binary. Run camoufox remove followed by camoufox fetch to force a fresh download.

GeoIP module not resolving correctly. Verify you've installed the geoip extra (pip install -U 'camoufox[geoip]'). Camoufox resolves GeoIP data at launch time and sets the browser's locale and timezone once – it doesn't change them during the session.

Pages loading but returning empty content. Some sites require JavaScript execution in the main world rather than the isolated execution context. Set main_world_eval=True when initializing Camoufox, and prefix individual page.evaluate() calls with mw: (for example, page.evaluate("mw:document.title")). Only the prefixed calls run in the main world. Main world execution increases detectability, so use it only when isolated execution fails for a specific target.

Memory growth during long runs. Each open page holds memory until you close it. Close pages explicitly with page.close() (or await page.close() in async mode) as soon as you've extracted the data. Don't rely on garbage collection to clean up browser contexts.

Persistent context cookies not surviving restarts. Firefox profiles don't save session-scoped cookies (those without an explicit Expires or Max-Age header) to disk by default. Extract them with context.cookies(), save to a file, and re-inject with context.add_cookies() on the next run.

Scale limitations

Resource intensity. Camoufox launches a full Firefox browser per session. Plan for roughly 200 MB of memory per active instance. At 10 concurrent instances, plan for 2+ GB of RAM and multiple CPU cores.

Concurrency ceiling. Running more than 10 simultaneous Camoufox instances on one machine is impractical for most setups. For high-volume scraping, use horizontal scaling (multiple machines or containers) rather than increasing concurrency on a single host. Wrap each scrape in a retry loop with proxy rotation and exponential backoff between retries.

Maintenance overhead. The Camoufox fingerprint database and Firefox patches need to stay current. Monitor the Camoufox GitHub releases or the CloverLabsAI repo for updates. Test your scrapers after updating.

No built-in CAPTCHA solving. Targets that serve interactive challenges (Cloudflare Turnstile, reCAPTCHA, hCaptcha) require a separate CAPTCHA-solving service. Camoufox can render the CAPTCHA page, but solving it programmatically needs an external integration.

Handle blocked targets with the Decodo Web Scraping API

For higher-volume scraping or targets with aggressive protection, the Web Scraping API handles the infrastructure.

Send a URL to the API endpoint and receive rendered HTML or structured data. Pre-built scraping templates handle targets like Amazon and Google where browser-level tools get blocked.

The Amazon CAPTCHA gate blocked the Camoufox + proxy setup in the real-world tests. The same product returns structured data through the API:

# pip install requests (if not already installed)
import os
import requests
from dotenv import load_dotenv

load_dotenv()

# Add DECODO_API_TOKEN to your .env file
payload = {
    "target": "amazon_product",
    "query": "B09G9FPHY6",
    "parse": True,
}

headers = {
    "accept": "application/json",
    "content-type": "application/json",
    "authorization": f"Basic {os.getenv('DECODO_API_TOKEN')}",
}

response = requests.post(
    "https://scraper-api.decodo.com/v2/scrape",
    json=payload,
    headers=headers,
)
data = response.json()

product = data["results"][0]["content"]["results"]
print(f"Title: {product['title']}")
print(f"Price: {product['price']} {product['currency']}")
print(f"Rating: {product['rating']} ({product['reviews_count']} reviews)")

Expected output (product details vary by availability and region):

Title: Apple iPad (9th Generation): with A13 Bionic chip, 10.2-inch Retina Display, 64GB, Wi-Fi, 12MP front/8MP Back Camera, Touch ID, All-Day Battery Life - Space Gray
Price: 329 USD
Rating: 4.8 (75643 reviews)

The target parameter selects a pre-built scraping template that handles the Amazon anti-bot challenges. With parse enabled, the response contains structured JSON fields instead of raw HTML – no selectors to maintain. The Web Scraping API quick start guide covers authentication setup and making your first API request, with links to detailed parameter and target documentation.

The API has a free plan with 2,000 requests per month – enough to test the examples above and explore other pre-built templates before committing to a paid tier.

Consider switching to a scraping API when any of these conditions apply:

Block rates increase despite clean fingerprints. The protection system has moved to network-layer or behavioral signals that browser patching alone can't handle.
Targets serve interactive challenges on every visit. CAPTCHAs, Cloudflare Turnstile, and proof-of-work gates block fully automated runs.

The difference between residential and datacenter proxies affects block rates on protected targets. For targets that need more than proxies but less than the full API, the Site Unblocker works as a drop-in proxy endpoint. Point Camoufox at https://unblock.decodo.com:60000 with the Site Unblocker credentials from the dashboard (separate from your residential proxy credentials) – the endpoint handles CAPTCHA solving, JavaScript rendering, and fingerprinting automatically.

Bottom line

Pick a target, run the examples against it, and adjust from there. Every site has different protection, so what works on Booking.com might need tweaking for your use case. If blocks keep happening despite clean fingerprints and residential proxies, the Web Scraping API is the next step up.

Ultimate stealth for high-stakes scraping

Whether you’re bypassing sophisticated CAPTCHAs or scraping dynamic retail sites, our residential IPs provide the legitimacy you need to stay under the radar.

Start free trial

About the author

Justinas Tamasevicius

Director of Engineering

Justinas Tamaševičius is Director of Engineering with over two decades of expertise in software development. What started as a self-taught passion during his school years has evolved into a distinguished career spanning backend engineering, system architecture, and infrastructure development.

Connect with Justinas via LinkedIn.

All information on Decodo Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Decodo Blog or any third-party websites that may belinked therein.

In this article

Web scraping made effortless

Decodo’s Web Scraping API gets you the HTML you need, bypassing even the toughest anti-bot shields automatically. Perfect for teams that need data, not infrastructure headaches.

Try it out

Frequently asked questions

Is Camoufox free to use?

Camoufox is free to use. The browser builds on Firefox (Mozilla Public License 2.0), with a closed-source fingerprint spoofing layer. The Python library on PyPI uses the MIT license and costs nothing to download. Residential proxy services are separate paid subscriptions that provide trusted IPs for the network layer.

Can Camoufox run on a cloud server without a display?

Set headless=True for any server without a display. On Linux, headless="virtual" runs the browser headfully inside an invisible Xvfb display instead. Virtual display mode produces more realistic rendering output and passes detection checks where headless mode fails, but uses more CPU.

What is the difference between headless mode and virtual display mode?

Headless mode (headless=True) skips rendering entirely – the browser creates no display surface. Virtual display mode (headless="virtual") launches a full Xvfb display on Linux and renders every frame. The DOM and Canvas output is identical to a visible browser. The trade-off is CPU: virtual display uses more processing power because the browser performs real rendering.

Does Camoufox guarantee a scraper won't be blocked?

No tool guarantees zero blocks. Camoufox handles browser fingerprint evasion only. Protection systems also check IP reputation, behavioral signals, TLS fingerprints (JA3/JA4 hashes), and CAPTCHAs. Camoufox is one layer in a multi-layer setup – pair it with residential proxies and randomized request delays.

How many concurrent Camoufox sessions can I run?

Each Camoufox instance starts at approximately 200 MB of RAM and grows during page loads. On a machine with 8 GB of available memory, plan for 5-10 simultaneous sessions with memory reserved for the OS. Use asyncio.Semaphore to cap concurrency in async scrapers, and close pages immediately after extracting data.

Can Camoufox be used with any proxy provider?

Camoufox accepts any HTTP or SOCKS5 proxy through the standard Playwright proxy parameter. Pass a dictionary with server, username, and password keys. The geoip=True parameter works with any proxy provider – it detects the exit IP's location and auto-configures the browser's locale to match.

Do I need residential proxies with Camoufox?

Camoufox only handles browser fingerprints. Without a proxy, your real IP is visible to the target site. Rotating proxies distribute requests across multiple IPs, reducing the chance of triggering rate limits on any single address. Residential proxies provide IPs assigned to ISP subscribers, which are the least likely to get flagged.

Is Camoufox compatible with all websites?

Most websites render identically on Firefox and Chromium. A small number of sites use Chromium-specific APIs or behave differently on Firefox. For those targets, Playwright with a stealth plugin on Chromium is the recommended alternative. Test your specific targets on Firefox before committing to a Camoufox-based pipeline.

DATA COLLECTION

Playwright Web Scraping: A Practical Tutorial

Web scraping can feel like directing a play without a script – unpredictable and chaotic. That’s where Playwright steps in: a powerful, headless browser automation tool that makes scraping modern, dynamic websites smoother than ever. In this practical tutorial, you’ll learn how to use Playwright to reliably extract data from any web page.

Zilvinas Tamulis

Last updated: Jan 13, 2025

8 min read

Magnifying glass highlighting a robot face icon over a browser window on a dark background

UNBLOCK

DATA COLLECTION

Navigating Anti-Bot Systems: Pro Tips For 2026

With the rapid improvements in artificial intelligence technologies, it seems that 2026 will present some new challenges for web scraping enthusiasts and professionals. Over the years, anti-bot systems have become increasingly sophisticated, which makes extracting valuable data from websites a true challenge. As businesses intensify their efforts to protect against automated bots, traditional web scraping methods are being put to the test. The surge in anti-bot measures is not only due to heightened cybersecurity awareness but also signifies a shift in the digital ecosystem and growing competition. As a result, those who want to leverage publicly available data need to recalibrate their strategies to navigate and circumvent anti-bot systems.

If CAPTCHAs and IP bans were not on your bingo card for 2026, our comprehensive guide is a must-read. We’ve sat down with our scraping gurus and discussed the best practices, gathered all the pro tips, and summarized what’s coming next for anti-bot systems and scrapers. As 2026 approaches, it demands a proactive approach to understanding, outsmarting, and ultimately thriving in the face of escalating anti-bot measures, so grab a cup of coffee and dive into our guide.

If you can't access the whole article, make sure you have disabled your ad blocker

Vilius Sakutis

Last updated: Jan 05, 2026

11 min read

DATA COLLECTION

PYTHON

Scraping the Web with Selenium and Python: A Step-By-Step Tutorial

Modern websites rely heavily on JavaScript and anti-bot measures, making data extraction a challenge. Basic tools fail with dynamic content loaded after the initial page, but Selenium with Python can automate browsers to execute JavaScript and interact with pages like a user. In this tutorial, you'll learn to build scrapers that collect clean, structured data from even the most complex websites.

Dominykas Niaura

Last updated: Jul 30, 2025

10 min read

Web Scraping with Camoufox: A Developer's Complete Guide

TL;DR

Quick start: Connect Camoufox to a proxy

What is Camoufox?

Camoufox fingerprint spoofing and anti-detection patches

What Camoufox doesn't handle

Set up Camoufox with Python and Decodo proxies

Create a virtual environment and install dependencies

Project structure

Store proxy credentials in environment variables

Scrape with Camoufox in sync and async mode

Synchronous API: Single-target scraping

Camoufox vs. standard Playwright: Detection signals

Scrape JavaScript-rendered content

Asynchronous API: Concurrent multi-page scraping

When to use sync vs. async

Camoufox proxy configuration with Decodo residential IPs

Why proxy type matters for Camoufox

Proxy bandwidth and cost

Pass proxy credentials to Camoufox

Verify fingerprint-to-IP consistency

Sticky sessions vs. rotating IPs

Scrape localized hotel pricing from Booking.com

Scrape property listings from Zillow

Handle pagination and lazy-loaded content

Where Camoufox + proxies get blocked

Handle login sessions with Camoufox

Save and reuse login cookies

Camoufox vs. Playwright, SeleniumBase, Selenium, and Puppeteer

Troubleshoot Camoufox proxy issues

Proxy, binary, and session errors

Scale limitations

Handle blocked targets with the Decodo Web Scraping API

Bottom line

Frequently asked questions

Is Camoufox free to use?

Can Camoufox run on a cloud server without a display?

What is the difference between headless mode and virtual display mode?

Does Camoufox guarantee a scraper won't be blocked?

How many concurrent Camoufox sessions can I run?

Can Camoufox be used with any proxy provider?

Do I need residential proxies with Camoufox?

Is Camoufox compatible with all websites?

Related articles