NEW

Puppeteer vs. Selenium: Which Tool Should You Use for Web Scraping?

Puppeteer and Selenium are the 2 most-used browser automation tools for scraping JavaScript-heavy pages. Comparing puppeteer vs selenium for web scraping isn't just about speed. Browser support, language support, and anti-bot handling all play a role. This guide covers what each tool does well, what it doesn't, and how to pick the right one.

Mykolas Juodis

Last updated: May 07, 2026

12 min read

TL;DR

Puppeteer is Google's Node.js library that drives Chrome directly. It's faster and lighter, with Firefox support since v23.
Selenium is a cross-language, cross-browser framework that supports Java, Python, C#, Ruby, Kotlin, and JavaScript.
For Chrome-only scraping in a Node.js stack, Puppeteer wins on speed and setup.
For polyglot teams, cross-browser scraping, or QA-plus-scraping work, Selenium is the better fit.
Both struggle against serious anti-bot systems like Cloudflare, Akamai, or DataDome – that's where a managed Web Scraping API earns its keep.

What is Puppeteer?

Puppeteer is a Node.js library from Google's Chrome team that drives Chrome (and, since v23, Firefox) through the Chrome DevTools Protocol (CDP) or WebDriver BiDi. Put simply – it lets your script act like a real user clicking through pages, except it happens in milliseconds and nobody's touching the mouse.

By default, Puppeteer runs in headless mode (no visible UI), which is what you want for scraping. If you've never worked with one before, our primer on what is a headless browser covers the basics.

What Puppeteer does well:

Navigating pages, filling forms, clicking buttons, waiting for elements to load.
Capturing screenshots and PDFs of rendered pages.
Intercepting network requests, which is handy for blocking ads or faking API responses.
Running JavaScript inside the page, so you can pull data straight out of live app state.

Where it falls short:

It's Node.js only. Python developers are out of luck here (more on that later).
Built-in support is Chrome-first. Firefox works via WebDriver BiDi from v23, but Safari and Edge are not on the table.
No built-in grid for running hundreds of scrapers in parallel across machines. You'll roll your own or bring something like Docker Swarm or Kubernetes.

The Puppeteer ecosystem also has puppeteer-extra, a plugin system that adds things like stealth mode (to hide automation fingerprints) and ad blocking. For scraping work where sites sniff for bots, puppeteer-extra-plugin-stealth is basically a must-have.

What is Selenium?

Selenium has been around since 2004, making it older than most junior engineers reading this post. It was built for cross-browser QA testing first, and scraping adoption came later.

Selenium has 3 components, and it helps to know what each one does:

WebDriver – the core API that talks to browsers. This is the part you use for scraping.
Selenium IDE – a record-and-replay tool that runs as a browser extension. Great for non-developers. Not really a scraping tool.
Selenium Grid – lets you run tests (or scrapers) in parallel across many machines. Useful once you outgrow a single box.

Selenium’s biggest selling point is its broad language and browser support. It works with Java, Python, JavaScript, C#, Ruby, and Kotlin. It can also automate Chrome, Firefox, Safari, and Edge (plus Internet Explorer, if you're unlucky enough to still need it), something no other automation tool matches.

The honest trade-offs for scraping:

You need a separate driver binary for each browser (chromedriver, geckodriver, etc.), and the driver version has to match your browser version. Selenium Manager (shipped with v4.6 and up) auto-downloads the right driver, which takes most of the pain out of this.
Commands travel through the W3C WebDriver protocol over HTTP. That's more portable than Puppeteer's direct CDP connection, but also slower per command. We'll get into the numbers later.
Setup feels heavier on day 1. Most of that complexity fades once you've done it twice.

If Selenium seems like the right fit for your project, our tutorial on scraping the web with Selenium and Python is a practical place to start.

Key differences between Puppeteer and Selenium

Here's the comparison at a glance, followed by explanations of the points that matter most for scraping work.

Dimension

Puppeteer

Selenium

Maintained by

Google Chrome team

Selenium Project (open source)

First released

2017

2004

Language support

Node.js only

Java, Python, JS, C#, Ruby, Kotlin

Browser support

Chrome, Chromium (Firefox experimental)

Chrome, Firefox, Safari, Edge

Protocol

CDP (direct)

W3C WebDriver (HTTP)

Driver setup

Chromium bundled, no separate driver

Requires driver binary (auto-managed v4.6+)

Speed on Chrome

Faster

Slower due to HTTP overhead

Distributed execution

DIY

Selenium Grid built in

Anti-bot tooling

puppeteer-extra-plugin-stealth

nodriver (community-maintained)

Best suited for

Chrome scraping, Node.js projects

Cross-browser, polyglot teams, QA + scraping

Protocol matters

Puppeteer talks to Chrome through CDP—a direct connection, like speaking a native language. Selenium sends commands over HTTP using the WebDriver protocol, which then communicates with the browser—more like speaking through a translator. The translator works everywhere, but each interaction adds a bit of overhead, which makes Puppeteer faster than Selenium.

Selenium's biggest downside – driver management

Install a new version of Chrome and, without intervention, your chromedriver breaks overnight. Selenium Manager, shipped with v4.6, now handles driver downloads automatically. Puppeteer avoids the problem entirely by bundling a compatible Chromium version with every release.

Anti-bot tooling isn't equal

Puppeteer has puppeteer-extra-plugin-stealth, which patches the automation fingerprints that systems like Cloudflare check for. Selenium has no official equivalent. The community workaround was undetected-chromedriver, which has since been replaced by nodriver from the same developer.

This comparison focuses on Puppeteer and Selenium. Playwright has emerged as a strong third option, but covering it properly would need its own article. For that comparison, see Playwright vs. Selenium in 2026.

Browser and language support

These 2 dimensions drive most tool decisions in practice. Let's look at them properly.

Browser support

Puppeteer is primarily a Chrome and Chromium tool. Firefox support has been added through WebDriver BiDi. As of mid-2026, it remains experimental and is not recommended for production scraping. Safari and Edge are not supported.

Selenium supports Chrome, Firefox, Safari, and Edge in production. Each uses its own driver binary. This matters for scraping more than it might seem. Some websites serve different content to different browsers. That could be for feature detection, analytics targeting, or other reasons. If you need the Firefox or Safari view of a page, Selenium is the only option between these 2 tools.

Language support

Puppeteer is Node.js only. Python, Java, Ruby, and other languages have no supported path to Puppeteer. There was a community port called pyppeteer. It's been unmaintained for several years. It's not a reliable option.

Selenium supports Java, Python, JavaScript, C#, Ruby, and Kotlin. For teams that already work in Python – common in data and scraping contexts – this is a clear advantage. Python's Selenium bindings are mature and well-documented. For teams with multiple languages across different functions, Selenium is the only tool that covers the whole group.

If your team writes Python and adding Node.js just for scraping isn't an option, Puppeteer is off the table. The choice narrows to Selenium or Playwright.

Setup and implementation

This section walks through installing both tools, writing a working scraper, and adding proxy support. The scraping target is books.toscrape.com – a site built specifically for scraping practice. Both examples are in Node.js so the comparison stays direct.

Puppeteer setup

A single command installs Puppeteer and downloads a compatible Chromium version:

npm install puppeteer

If you're managing your own Chrome binary, puppeteer-core skips the Chromium download. Point it at your binary via the executablePath option on launch.

Here's a working script that pulls book titles and prices:

import puppeteer from 'puppeteer';

async function main() {
  let browser;
  const launchOptions = {
    headless: true,
    args: ['--no-sandbox', '--disable-setuid-sandbox'],
  };

  try {
    browser = await puppeteer.launch(launchOptions);
  } catch (err) {
    const msg = err instanceof Error ? err.message : String(err);
    if (msg.includes('Could not find Chrome')) {
      browser = await puppeteer.launch({ ...launchOptions, channel: 'chrome' });
    } else {
      throw err;
    }
  }

  try {
    const page = await browser.newPage();
    page.setDefaultTimeout(30_000);
    page.setDefaultNavigationTimeout(30_000);

    await page.goto('https://books.toscrape.com/', {
      waitUntil: 'networkidle2',
      timeout: 30_000,
    });

    const books = await page.evaluate(() => {
      const items = document.querySelectorAll('article.product_pod');
      return Array.from(items).map((item) => {
        const titleEl = item.querySelector('h3 a');
        const priceEl = item.querySelector('.price_color');
        return {
          title: titleEl?.getAttribute('title') ?? titleEl?.textContent?.trim() ?? null,
          price: priceEl?.textContent?.trim() ?? null,
        };
      });
    });

    console.log(`Found ${books.length} books`);
    console.log(books.slice(0, 5));
  } finally {
    await browser.close();
  }
}

main().catch((err) => {
  const msg = err instanceof Error ? err.message : String(err);
  if (msg.includes('Could not find Chrome')) {
    console.error(
      [
        'Puppeteer could not find a Chrome/Chromium binary.',
        'Fix by installing one of these:',
        '- Install bundled Chrome: npx puppeteer browsers install chrome',
        '- Or install Google Chrome and keep the code using channel: "chrome"',
      ].join('\n'),
    );
  } else if (msg.includes('Failed to launch the browser process')) {
    console.error(
      [
        'Puppeteer failed to launch the browser process.',
        'Common fixes:',
        '- Ensure Chrome/Chromium is installed (or run: npx puppeteer browsers install chrome)',
        '- If running in CI/Docker: keep --no-sandbox/--disable-setuid-sandbox args',
        '- Check the troubleshooting guide: https://pptr.dev/troubleshooting',
        '',
        'Original error:',
        msg,
      ].join('\n'),
    );
  } else {
    console.error(err);
  }
  process.exitCode = 1;
});

import puppeteer from 'puppeteer';

async function main() {
  let browser;
  const launchOptions = {
    headless: true,
    args: ['--no-sandbox', '--disable-setuid-sandbox'],
  };

  try {
    browser = await puppeteer.launch(launchOptions);
  } catch (err) {
    const msg = err instanceof Error ? err.message : String(err);
    if (msg.includes('Could not find Chrome')) {
      browser = await puppeteer.launch({ ...launchOptions, channel: 'chrome' });
    } else {
      throw err;
    }
  }

  try {
    const page = await browser.newPage();
    page.setDefaultTimeout(30_000);
    page.setDefaultNavigationTimeout(30_000);

    await page.goto('https://books.toscrape.com/', {
      waitUntil: 'networkidle2',
      timeout: 30_000,
    });

    const books = await page.evaluate(() => {
      const items = document.querySelectorAll('article.product_pod');
      return Array.from(items).map((item) => {
        const titleEl = item.querySelector('h3 a');
        const priceEl = item.querySelector('.price_color');
        return {
          title: titleEl?.getAttribute('title') ?? titleEl?.textContent?.trim() ?? null,
          price: priceEl?.textContent?.trim() ?? null,
        };
      });
    });

    console.log(`Found ${books.length} books`);
    console.log(books.slice(0, 5));
  } finally {
    await browser.close();
  }
}

main().catch((err) => {
  const msg = err instanceof Error ? err.message : String(err);
  if (msg.includes('Could not find Chrome')) {
    console.error(
      [
        'Puppeteer could not find a Chrome/Chromium binary.',
        'Fix by installing one of these:',
        '- Install bundled Chrome: npx puppeteer browsers install chrome',
        '- Or install Google Chrome and keep the code using channel: "chrome"',
      ].join('\n'),
    );
  } else if (msg.includes('Failed to launch the browser process')) {
    console.error(
      [
        'Puppeteer failed to launch the browser process.',
        'Common fixes:',
        '- Ensure Chrome/Chromium is installed (or run: npx puppeteer browsers install chrome)',
        '- If running in CI/Docker: keep --no-sandbox/--disable-setuid-sandbox args',
        '- Check the troubleshooting guide: https://pptr.dev/troubleshooting',
        '',
        'Original error:',
        msg,
      ].join('\n'),
    );
  } else {
    console.error(err);
  }
  process.exitCode = 1;
});

What's worth noting in the script:

page.evaluate() runs JavaScript inside the browser's own context. You have full DOM access, exactly as if you were typing into the browser console.
waitUntil: 'networkidle2' waits until the network is mostly quiet before extracting. This works well for pages that finish loading quickly.
The --no-sandbox and --disable-setuid-sandbox flags keep the script portable. They're needed in Docker and most CI environments.
The fallback to channel: 'chrome' covers cases where Puppeteer's bundled Chromium isn't installed. Useful in slim CI images or after a clean install.
The error handler at the bottom catches the 2 most common launch failures and prints a fix. That saves a lot of debugging time on the first run.

Run either script with node your_file_name.js and you'll see output like this:

Found 20 books
[
  { title: 'A Light in the Attic', price: '£51.77' },
  { title: 'Tipping the Velvet', price: '£53.74' },
  { title: 'Soumission', price: '£50.10' },
  { title: 'Sharp Objects', price: '£47.82' },
  { title: 'Sapiens: A Brief History of Humankind', price: '£54.23' }
]

Selenium setup

Install the WebDriver bindings. Selenium Manager handles the driver download automatically from v4.6 onwards:

npm install selenium-webdriver

Here's the equivalent scraper:

import { Builder, By, until } from 'selenium-webdriver';
import chrome from 'selenium-webdriver/chrome.js';

async function main() {
  let driver;

  try {
    const options = new chrome.Options()
      .addArguments('--headless=new', '--disable-gpu')
      .addArguments('--no-sandbox', '--disable-setuid-sandbox');

    // In some sandboxed environments, Node cannot read network interfaces.
    // Force loopback hostname to avoid Selenium's interface probing.
    //
    // Also, if this environment blocks Selenium Manager from downloading a driver,
    // you can set CHROMEDRIVER_PATH to a local chromedriver binary to avoid network.
    const chromedriverPath = process.env.CHROMEDRIVER_PATH || undefined;
    const service = new chrome.ServiceBuilder(chromedriverPath).setHostname('127.0.0.1');

    driver = await new Builder()
      .forBrowser('chrome')
      .setChromeOptions(options)
      .setChromeService(service)
      .build();

    await driver.manage().setTimeouts({
      implicit: 0,
      pageLoad: 30_000,
      script: 30_000,
    });

    await driver.get('https://books.toscrape.com/');
    await driver.wait(until.elementLocated(By.css('article.product_pod')), 30_000);

    const items = await driver.findElements(By.css('article.product_pod'));
    const books = [];

    for (const item of items) {
      const titleEl = await item.findElement(By.css('h3 a'));
      const priceEl = await item.findElement(By.css('.price_color'));
      const title = await titleEl.getAttribute('title');
      const price = (await priceEl.getText())?.trim() ?? null;
      books.push({ title, price });
    }

    console.log(`Found ${books.length} books`);
    console.log(books.slice(0, 5));
  } finally {
    if (driver) {
      await driver.quit();
    }
  }
}

main().catch((err) => {
  const msg = err instanceof Error ? err.message : String(err);
  if (msg.includes('unknown error: cannot find Chrome binary') || msg.includes('cannot find Chrome binary')) {
    console.error(
      [
        'Selenium could not find a Chrome/Chromium binary.',
        'Fix by installing Google Chrome, or ensure Chromium is installed and on PATH.',
        '',
        'Original error:',
        msg,
      ].join('\n'),
    );
  } else if (msg.includes('chromedriver') || msg.toLowerCase().includes('driver')) {
    console.error(
      [
        'Selenium failed to start a Chrome driver session.',
        'With selenium-webdriver v4+, Selenium Manager usually auto-downloads a driver.',
        'If this still fails, try updating Chrome, or reinstall selenium-webdriver, or ensure network access.',
        '',
        'Original error:',
        msg,
      ].join('\n'),
    );
  } else {
    console.error(err);
  }
  process.exitCode = 1;
});

import { Builder, By, until } from 'selenium-webdriver';
import chrome from 'selenium-webdriver/chrome.js';

async function main() {
  let driver;

  try {
    const options = new chrome.Options()
      .addArguments('--headless=new', '--disable-gpu')
      .addArguments('--no-sandbox', '--disable-setuid-sandbox');

    // In some sandboxed environments, Node cannot read network interfaces.
    // Force loopback hostname to avoid Selenium's interface probing.
    //
    // Also, if this environment blocks Selenium Manager from downloading a driver,
    // you can set CHROMEDRIVER_PATH to a local chromedriver binary to avoid network.
    const chromedriverPath = process.env.CHROMEDRIVER_PATH || undefined;
    const service = new chrome.ServiceBuilder(chromedriverPath).setHostname('127.0.0.1');

    driver = await new Builder()
      .forBrowser('chrome')
      .setChromeOptions(options)
      .setChromeService(service)
      .build();

    await driver.manage().setTimeouts({
      implicit: 0,
      pageLoad: 30_000,
      script: 30_000,
    });

    await driver.get('https://books.toscrape.com/');
    await driver.wait(until.elementLocated(By.css('article.product_pod')), 30_000);

    const items = await driver.findElements(By.css('article.product_pod'));
    const books = [];

    for (const item of items) {
      const titleEl = await item.findElement(By.css('h3 a'));
      const priceEl = await item.findElement(By.css('.price_color'));
      const title = await titleEl.getAttribute('title');
      const price = (await priceEl.getText())?.trim() ?? null;
      books.push({ title, price });
    }

    console.log(`Found ${books.length} books`);
    console.log(books.slice(0, 5));
  } finally {
    if (driver) {
      await driver.quit();
    }
  }
}

main().catch((err) => {
  const msg = err instanceof Error ? err.message : String(err);
  if (msg.includes('unknown error: cannot find Chrome binary') || msg.includes('cannot find Chrome binary')) {
    console.error(
      [
        'Selenium could not find a Chrome/Chromium binary.',
        'Fix by installing Google Chrome, or ensure Chromium is installed and on PATH.',
        '',
        'Original error:',
        msg,
      ].join('\n'),
    );
  } else if (msg.includes('chromedriver') || msg.toLowerCase().includes('driver')) {
    console.error(
      [
        'Selenium failed to start a Chrome driver session.',
        'With selenium-webdriver v4+, Selenium Manager usually auto-downloads a driver.',
        'If this still fails, try updating Chrome, or reinstall selenium-webdriver, or ensure network access.',
        '',
        'Original error:',
        msg,
      ].join('\n'),
    );
  } else {
    console.error(err);
  }
  process.exitCode = 1;
});

Run the script with node your_file_name.js and you'll see output like this:

Found 20 books
[
  { title: 'A Light in the Attic', price: '£51.77' },
  { title: 'Tipping the Velvet', price: '£53.74' },
  { title: 'Soumission', price: '£50.10' },
  { title: 'Sharp Objects', price: '£47.82' },
  { title: 'Sapiens: A Brief History of Humankind', price: '£54.23' }
]

Here's how the Selenium version differs from the Puppeteer version:

Each element lookup is a separate await call. Selenium's API is more verbose than Puppeteer's for this kind of extraction.
The try/finally block is important. Selenium browser sessions don't clean themselves up. If driver.quit() doesn't run, the browser process stays open.
Waiting is explicit. Puppeteer used waitUntil: 'networkidle2' on the goto call. Selenium uses until.elementLocated after navigation, which blocks until a target element appears in the DOM.
The driver layer is configurable in ways Puppeteer doesn't expose. ServiceBuilder lets you point at a local chromedriver binary via CHROMEDRIVER_PATH or pin the hostname to 127.0.0.1. Both help in sandboxed CI environments. One covers the case where Selenium Manager can't reach the network. The other covers the case where Node can't read network interfaces.
Timeouts are set via setTimeouts({ implicit, pageLoad, script }). Puppeteer used setDefaultTimeout at the page level. Same idea, different surface area.

Adding a proxy

For any real-world scraping, you'll want proxy support. Both tools accept a proxy at launch.

Puppeteer with a proxy:

const browser = await puppeteer.launch({
  headless: true,
  args: ['--proxy-server=gate.decodo.com:7000'],
});

const page = await browser.newPage();
await page.authenticate({
  username: 'YOUR_PROXY_USERNAME',
  password: 'YOUR_PROXY_PASSWORD',
});

Selenium with a proxy:

const options = new chrome.Options()
  .addArguments('--headless=new')
  .addArguments('--proxy-server=gate.decodo.com:7000');

const driver = await new Builder()
  .forBrowser('chrome')
  .setChromeOptions(options)
  .build();

Both tools support HTTP and SOCKS5 proxies. For Selenium, the W3C protocol doesn't handle proxy authentication directly. The cleanest approach is using the proxy vendor's SDK or a browser extension.

If the target site blocks datacenter IPs, Decodo residential proxies are a practical fit. They also work well when you need requests from a specific country. Our Selenium proxy guide covers the Selenium-specific setup in more detail.

Scraping pages that load content through JavaScript? Our guide on how to web scrape dynamic content covers the timing and wait strategies worth knowing.

Get residential proxies for scraping

Claim your 3-day free trial of residential proxies and explore full features with unrestricted access.

Start now

Performance and resource usage: Selenium vs. Puppeteer

Speed comparisons between these tools require some context. "Puppeteer is faster" is true – but only in specific ways.

Per-action speed

Puppeteer uses CDP, a binary protocol with minimal overhead per command. Each click, type, or navigation crosses 1 boundary – from your script to the browser.

Selenium adds an intermediate step. Commands go from your script to a driver process over HTTP. Then from the driver to the browser. That's 2 extra hops. In typical benchmarks, this difference runs between 50 and 200 milliseconds per command.

For a script that performs 10 actions, this is not noticeable. For a scraper running 10,000 actions, it adds up to several minutes.

Memory

Memory is where expectations sometimes don't match reality. Every headless browser session is a full Chromium process. That's true whether it was launched through Puppeteer or Selenium. The memory cost is around 300 to 500 MB per instance regardless of which tool you used.

The difference is in how sessions are managed. Puppeteer can open multiple pages within a single browser instance at low cost. Selenium typically uses 1 browser per session, which means more processes at the same level of concurrency. This gap is not relevant at 10 concurrent scrapers, but it becomes significant at 500.

Startup time

Puppeteer launches a bundled Chromium binary with no driver handshake. Cold start runs around 500 to 800 ms.

Selenium has to start a driver process first. Then wait for it to initialize. Then open a WebDriver session with the browser. That adds roughly 200 to 500 ms on top of the browser startup itself.

Concurrency

Puppeteer's strength at concurrency is running many pages within 1 browser instance using async/await. This is straightforward in Node.js and keeps memory use low.

Selenium's answer to concurrency is Selenium Grid, which distributes execution across multiple machines. If you need hundreds of parallel scrapers on a cluster, Grid gives you that without custom infrastructure work. With Puppeteer, you'd build that layer yourself.

For guidance on high-volume pipelines, see our post on web scraping at scale.

Common use cases

Both tools cover most scraping scenarios. The differences appear at the edges.

Where Puppeteer works best

Scraping single-page applications that rely on Chrome-specific APIs or behaviour.
Generating PDFs or screenshots of rendered pages as part of a pipeline.
Projects where the stack is already Node.js and adding another runtime isn't warranted.
Scenarios that need precise browser control – request interception, DevTools access, performance traces.

Where Selenium works best

Teams working in Python, Java, or another language that Puppeteer doesn't support.
Scraping targets that behave differently in Firefox or Safari.
Projects that combine scraping with QA testing in the same codebase.
Distributed scraping runs where Selenium Grid handles the parallel execution.
Prototyping workflows using Selenium IDE before writing code.

Where both tools reach their limits

Heavily bot-protected sites – Cloudflare, Akamai, DataDome – fingerprint browser automation at multiple levels. They check the TLS handshake, the JavaScript runtime, and behavioral patterns. Both Puppeteer and Selenium get blocked on these targets without additional tooling.

Puppeteer can be extended with puppeteer-extra-plugin-stealth, and Camoufox is a Firefox-based option built specifically for anti-detection scraping. These help, but they're in an ongoing arms race with detection systems that update frequently.

For targets that block reliably, a managed solution is the practical step. Decodo's Web Scraping API and Site Unblocker handle fingerprinting, CAPTCHA solving, and IP rotation on the server side. They return fully rendered HTML without requiring you to manage a browser. Our post on anti-scraping techniques and how to outsmart them explains why bot protection is as effective as it is.

Try Web Scraping API for free

Activate your free plan with 1K requests and scrape structured public data at scale.

Start for free

Integration with third-party tools and services

Scrapers don't live in isolation. Here's how each tool fits into a broader data pipeline:

CI/CD – Both tools run headlessly in Docker containers. They work with GitHub Actions, Jenkins, and CircleCI. Puppeteer needs the --no-sandbox flag in Linux containers due to a sandboxing conflict. Selenium connects to remote WebDriver endpoints, making it straightforward to point a CI runner at a Selenium Grid.

Cloud execution – Puppeteer runs on AWS Lambda using chromium-lambda layers, and on Google Cloud Functions. This makes it a good fit for event-driven or serverless scraping. Selenium is typically deployed on persistent VMs or containers with a WebDriver server – not a natural fit for serverless.

Proxy rotation – Both tools accept a proxy URL at launch. For rotating sessions, use a sticky-session endpoint. It handles IP rotation transparently. That's cleaner than swapping proxy addresses in code between requests.

Testing frameworks – Selenium has deep integrations with TestNG, JUnit, and pytest. These integrations are the reason many enterprises use Selenium in the first place. Puppeteer works with Jest and Mocha, but it isn't a testing framework itself.

Data pipelines – Neither tool stores data. Both return raw HTML or extracted structures. From there, you'd pass that into a parser. Cheerio or jsdom in Node.js. Beautiful Soup or Scrapy in Python. Then into a database or warehouse. Our tutorial on JavaScript web scraping covers the Node.js pipeline in detail.

Running browsers in containers? Our headless Chrome glossary entry explains what's happening at the OS level.

Community, ecosystem, and documentation

The community behind a tool has real consequences for how well-supported your project will be over time. Here's where things stand in mid-2026.

Downloads and release activity

Puppeteer (npm) receives around 8.7M weekly downloads. The current release is v24.40, maintained by Google's Chrome team with a consistent weekly release cadence.

Selenium's JavaScript package (selenium-webdriver) gets around 2.1M weekly npm downloads – but that number understates Selenium's actual reach. Its Python package pulls roughly 50M downloads a month on PyPI. The Java and .NET bindings add significant volume on top. When you look across all ecosystems, Selenium's install base is substantially larger than the npm figure suggests.

Documentation quality

Puppeteer's API documentation is generated directly from the source code and is co-maintained with the Chrome DevTools team. It stays accurate with each release.

Selenium's documentation has historically been fragmented across language bindings. The main site has improved considerably since v4, and the core documentation is genuinely useful now. That said, Python-specific examples are often easier to find through community resources than through the official site.

Community and support

Selenium's community is the larger of the 2 by a significant margin. Two decades of QA adoption across enterprises has built up a lot. More Stack Overflow answers. More tutorials in every supported language. A deeper pool of third-party integrations. When you encounter an unusual problem, someone has likely run into it and written it up.

Puppeteer has a smaller but more technically focused community. Google's backing means browser-level issues are addressed quickly, and the issue tracker is well-maintained. Third-party tutorials, however, are thinner – particularly in languages other than JavaScript.

The Playwright factor

Both Puppeteer and Selenium have lost some developer mindshare to Playwright over the past few years. Playwright launched with an API that Puppeteer users found familiar. It added cross-browser support and built-in auto-wait from the start. For a full breakdown, see Playwright vs. Selenium in 2026. Want a wider view of available options? Our top 10 best web scraping tools post covers the full range.

Market trends and popularity

These figures reflect the state of things as of mid-2026.

npm weekly downloads – Puppeteer is around 8.7M, selenium-webdriver is around 2.1M, and Playwright is around 35M. The npm comparison flatters Playwright because it's a JavaScript-first project. Selenium's Python and Java usage doesn't appear in these numbers at all.

Google Trends – Selenium has held higher overall search volume for years. A significant portion of that reflects QA testing interest, not scraping. Puppeteer search interest peaked around 2020 to 2021 and has since held steady. Playwright's trend line has climbed consistently.

Stack Overflow – Selenium has significantly more tagged questions than Puppeteer, reflecting its longer history and broader adoption. Puppeteer's question volume has grown steadily in recent years, which typically indicates active new adoption rather than just legacy use.

Job market – Selenium remains the standard in QA automation job listings. Puppeteer and Playwright appear more frequently in data engineering, scraping, and developer tooling roles.

The overall picture – Selenium is still the default in enterprise QA, thanks to its language range and framework integrations. Puppeteer holds a clear niche for Chrome-specific Node.js scraping. Playwright is the fastest-growing of the 3. None of these tools is in real decline. Each serves a distinct enough set of needs that all 3 remain valid choices in 2026.

Choosing the right tool

Choose Puppeteer when:

Your project is already in Node.js and adding another runtime isn't justified.
You need direct CDP access for low-level browser control.
The target site only needs to be scraped in Chrome.
You want to extend with puppeteer-extra plugins for anti-detection work.

Choose Selenium when:

Your team works in Python, Java, or another language Puppeteer doesn't support.
You need Firefox or Safari rendering to access the right content.
Distributed parallel scraping via Selenium Grid is a requirement.
Scraping sits alongside a QA workflow that already uses Selenium.

Choose Playwright when:

You need cross-browser support and want a modern async API.
Built-in auto-wait and multiple isolated browser contexts are useful for your setup.
You want the most actively developed of Puppeteer, Selenium, and Playwright. See Playwright vs. Selenium in 2026 for a full comparison.

Consider a managed API when:

The target site actively fingerprints browser automation.
You need geo-targeted IPs without managing a proxy pool yourself.

You'd rather send a URL and receive structured data than manage a browser fleet. Decodo's Web Scraping API handles JavaScript rendering, CAPTCHA solving, and IP rotation in one request.

Skip the boilerplate

Decodo's Web Scraping API handles proxies, CAPTCHAs, and anti-bot detection so your code stays short and your requests actually land.

Start now

Final thoughts

Puppeteer is the faster, simpler option for Chrome-only Node.js scraping. Selenium offers broader browser and language support at the cost of more setup complexity. The right choice depends on your team's language, the browsers you need, and how much anti-bot handling matters.

When browser automation becomes the main source of friction, a managed solution is worth considering. Blocked sessions, proxy management, and fingerprint updates all add up. At that point, continuing to tune browser settings is rarely the most productive approach.

About the author

Mykolas Juodis

Head of Marketing

Mykolas is a seasoned digital marketing professional with over a decade of experience, currently leading Marketing department in the web data gathering industry. His extensive background in digital marketing, combined with his deep understanding of proxies and web scraping technologies, allows him to bridge the gap between technical solutions and practical business applications.

Connect with Mykolas via LinkedIn.

All information on Decodo Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Decodo Blog or any third-party websites that may belinked therein.

In this article

Industry-leading residential proxies

Access 115M+ residential IPs with fast response times and high success rates.

Start free trial

Frequently asked questions

Which is better for web scraping, Puppeteer or Selenium?

It depends on your stack. Puppeteer is faster and simpler if you're already in Node.js and only need to scrape Chrome-rendered pages. Selenium is the better fit for teams working in Python or another non-JS language. It's also the right pick if you need Firefox or Safari rendering. For Cloudflare-protected sites, neither tool wins outright. Both need stealth plugins or a managed API to bypass detection reliably.

Is Puppeteer faster than Selenium?

Yes, for Chrome-specific work. Puppeteer communicates with Chrome directly through CDP, while Selenium routes every command through the WebDriver protocol over HTTP. The gap is typically 50 to 200 ms per action. At high action frequency or across many concurrent sessions, this accumulates noticeably. For most scraping workloads at typical scale, however, it's rarely the main bottleneck.

Can Puppeteer scrape sites that Selenium cannot?

Not categorically. Both tools render JavaScript and interact with the DOM in similar ways. Puppeteer does expose Chrome-specific APIs that Selenium doesn't surface, like DevTools coverage and performance timeline. These aren't often scraping-critical, though. For anti-bot work, puppeteer-extra-plugin-stealth gives Puppeteer a practical advantage. That said, serious bot protection systems block both tools reliably.

Do I need proxies with Puppeteer or Selenium for web scraping?

For scraping at meaningful volume or on sites that rate-limit by IP address, yes. Both tools send requests from a single IP that can be blocked. Rotating residential proxies hide the scraper's origin and enable geo-targeted requests when needed. Both Puppeteer and Selenium support proxy configuration at launch, including authenticated credentials.

Is Selenium still worth learning in 2026?

Yes – particularly for Python developers and QA engineers. Selenium's cross-browser support, mature Python bindings, and deep integrations with test frameworks keep it the standard in enterprise QA. For pure scraping work, Playwright is worth evaluating as an alternative. For teams that need Firefox or Safari rendering and work in Python, Selenium remains the most straightforward option.

Can I use Puppeteer with Python?

Not directly. Puppeteer is a Node.js library with no official Python support. Python developers who want CDP-based browser automation have 2 practical options. Playwright, which has official Python bindings. Or Selenium with ChromeDriver. Pyppeteer was a community Python port that is no longer maintained and is not recommended.

DATA COLLECTION

PYTHON

Playwright vs. Selenium in 2026: Which Browser Automation Tool Should You Choose?

As websites become more dynamic and better at detecting automated traffic, choosing the right automation tool has become more challenging. At the same time, performance, reliability, and anti-detection capabilities matter more than ever. Two tools dominate the space: Selenium, a mature and widely adopted standard, and Playwright, a newer framework built for modern web apps. This guide compares them through practical use cases like web scraping and dynamic content extraction to help you decide which fits your needs best.

Dominykas Niaura

Last updated: Mar 27, 2026

10 min read

DATA COLLECTION

Playwright Web Scraping: A Practical Tutorial

Web scraping can feel like directing a play without a script – unpredictable and chaotic. That’s where Playwright steps in: a powerful, headless browser automation tool that makes scraping modern, dynamic websites smoother than ever. In this practical tutorial, you’ll learn how to use Playwright to reliably extract data from any web page.

Zilvinas Tamulis

Last updated: Jan 13, 2025

8 min read

DATA COLLECTION

PYTHON

Scraping the Web with Selenium and Python: A Step-By-Step Tutorial

Modern websites rely heavily on JavaScript and anti-bot measures, making data extraction a challenge. Basic tools fail with dynamic content loaded after the initial page, but Selenium with Python can automate browsers to execute JavaScript and interact with pages like a user. In this tutorial, you'll learn to build scrapers that collect clean, structured data from even the most complex websites.

Dominykas Niaura

Last updated: Jul 30, 2025

10 min read

Puppeteer vs. Selenium: Which Tool Should You Use for Web Scraping?

TL;DR

What is Puppeteer?

What is Selenium?

Key differences between Puppeteer and Selenium

Protocol matters

Selenium's biggest downside – driver management

Anti-bot tooling isn't equal

Browser and language support

Browser support

Language support

Setup and implementation

Puppeteer setup

Selenium setup

Adding a proxy

Performance and resource usage: Selenium vs. Puppeteer

Per-action speed

Memory

Startup time

Concurrency

Common use cases

Where Puppeteer works best

Where Selenium works best

Where both tools reach their limits

Integration with third-party tools and services

Community, ecosystem, and documentation

Downloads and release activity

Documentation quality

Community and support

The Playwright factor

Market trends and popularity

Choosing the right tool

Final thoughts

Frequently asked questions

Which is better for web scraping, Puppeteer or Selenium?

Is Puppeteer faster than Selenium?

Can Puppeteer scrape sites that Selenium cannot?

Do I need proxies with Puppeteer or Selenium for web scraping?

Is Selenium still worth learning in 2026?

Can I use Puppeteer with Python?

Related articles