Back to blog

Scraping the Web with Selenium and Python: A Step-By-Step Tutorial

Modern websites rely heavily on JavaScript and anti-bot measures, making data extraction a challenge. Basic tools fail with dynamic content loaded after the initial page, but Selenium with Python can automate browsers to execute JavaScript and interact with pages like a user. In this tutorial, you'll learn to build scrapers that collect clean, structured data from even the most complex websites.

Dominykas Niaura

Jul 30, 2025

10 min read

Set up your environment

Let’s set up a clean, isolated Python environment for web scraping with Selenium. You'll install the necessary tools, create a virtual environment, and verify everything works properly.

Prerequisites

Make sure you have the following installed:

  • Python 3.9 or later. Download it from the official website and follow the instructions for your OS.
  • A modern browser. Ensure you have at least one of these browsers installed: Chrome, Firefox, or Edge. Modern Selenium automatically manages drivers, but you need the actual browser software installed.

Create a virtual environment

Creating a virtual environment ensures that your project dependencies don’t conflict with those of other Python projects. Use Python’s built-in venv module:

python -m venv .venv

This creates a .venv folder with an isolated Python environment.

Activate the virtual environment using the appropriate command for your OS:

# macOS/Linux
source .venv/bin/activate
# Windows (Command Prompt)
.venv\Scripts\activate.bat
# Windows (PowerShell)
.venv\Scripts\Activate.ps1

Once activated, your terminal shows (.venv) – you’re now working inside the virtual environment.

Install dependencies

Install Selenium using pip:

pip install selenium

Run your first Selenium script

Now that your environment is ready, let’s build your first scraper using Selenium. This simple script will open a browser, load a webpage, and extract the full HTML from scrapingcourse.com/ecommerce/.

Create a file named main.py in your project directory and add the following code:

from selenium import webdriver
# Initialize Chrome browser
driver = webdriver.Chrome()
# Navigate to the site
driver.get("https://www.scrapingcourse.com/ecommerce/")
# Extract the page HTML
print(driver.page_source)
# Clean up
driver.quit()

Open your terminal and run: python main.py

You’ll see a Chrome window launch, navigate to the test page, and then close.

The HTML will print directly to your terminal. Example output:

<html lang="en-US"><head>
<!-- ... -->
<title>Ecommerce Test Site to Learn Web Scraping - ScrapingCourse.com</title>
<link rel="pingback" href="https://www.scrapingcourse.com/ecommerce/xmlrpc.php">
<!-- ... -->
</html>

If you’re seeing similar output, your scraper is up and running!

Visible vs headless mode

By default, Selenium opens a visible browser window. However, in production or automation workflows, it’s common to run in headless mode, which launches the browser in the background and reduces resource usage.

Here’s how to enable it:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
# Configure headless mode
options = Options()
options.add_argument("--headless=new")
driver = webdriver.Chrome(options=options)
driver.get("https://www.scrapingcourse.com/ecommerce/")
print(driver.page_source)
driver.quit()

Locate elements with Selenium

Selenium offers multiple strategies for locating elements on web pages. Choosing the right one ensures your scraper stays reliable across site changes and content variations.

Core location methods

There are two essential methods:

Method 1: find_element()

Returns the first matching element:

product = driver.find_element(By.CLASS_NAME, "product")

This is helpful when you're targeting a unique element. If no match is found, it raises a NoSuchElementException – so wrap it in a try-except block when needed.

Method 2: find_elements()

Returns all matching elements in a list:

products = driver.find_elements(By.CLASS_NAME, "product")

This returns an empty list ([]) if no match is found. Use it when targeting multiple elements or when you want to prevent errors from missing matches.


Comparison example

Consider this HTML:

<div class="item">Gaming Laptop</div>
<div class="item">Wireless Mouse</div>
<div class="item">Mechanical Keyboard</div>

Here's the Python code:

# Single element
first_item = driver.find_element(By.CLASS_NAME, "item")
print(first_item.text) # Gaming Laptop
# Multiple elements
all_items = driver.find_elements(By.CLASS_NAME, "item")
print(f"Found {len(all_items)} items") # Found 3 items

Tip: Even when targeting one element, use find_elements() and check the list length to avoid crashes on missing elements.


Element locator strategies

By.ID – unique identifier targeting

<div id="search-container">
<input type="text" placeholder="Search products...">
</div>
search_box = driver.find_element(By.ID, "search-container")

Best for: precise, stable targeting – IDs are unique and unlikely to change.

By.CLASS_NAME – style-based targeting

<div class="product-card">Gaming Laptop</div>
<div class="product-card">Wireless Mouse</div>
<div class="product-card">Mechanical Keyboard</div>
first_product = driver.find_element(By.CLASS_NAME, "product-card")
all_products = driver.find_elements(By.CLASS_NAME, "product-card")

Best for: reusable patterns like product listings or buttons.

By.TAG_NAME – tag-type targeting

<h1>Electronics Store</h1>
<p>Free shipping on orders over $50</p>
<p>30-day return policy</p>
<a href="/laptops">Laptops</a>
<a href="/accessories">Accessories</a>
heading = driver.find_element(By.TAG_NAME, "h1")
paragraphs = driver.find_elements(By.TAG_NAME, "p")
links = driver.find_elements(By.TAG_NAME, "a")

Best for: gathering bulk content (e.g., all links or paragraphs).

By.CSS_SELECTOR – attribute and structure targeting

<div class="product featured">
<h2 class="product-title">Gaming Laptop</h2>
<span class="price" data-currency="USD">$1299</span>
<button class="btn primary">Add to Cart</button>
</div>
featured_product = driver.find_element(By.CSS_SELECTOR, ".product.featured")
buy_button = driver.find_element(By.CSS_SELECTOR, ".btn.primary")
price = driver.find_element(By.CSS_SELECTOR, "[data-currency='USD']")
title = driver.find_element(By.CSS_SELECTOR, ".product .product-title")

Best for: advanced patterns, nested selectors, and attribute-based filtering.

By.XPATH – structure-based navigation

<table>
<tr>
<td>Product</td>
<td>Price</td>
<td>Availability</td>
</tr>
<tr>
<td>Gaming Laptop</td>
<td>$1299</td>
<td>In Stock</td>
</tr>
</table>
laptop_row = driver.find_element(By.XPATH, "//td[text()='Gaming Laptop']")
price_cell = driver.find_element(By.XPATH, "//td[text()='Gaming Laptop']/../td[2]")
stock_status = driver.find_element(By.XPATH, "//td[contains(text(), 'Stock')]")

Best for: deeply nested or text-dependent queries.

Read our xpath vs css selector guide for a deeper comparison.

By.LINK_TEXT – exact text matching

<a href="/home">Home</a>
<a href="/products">All Products</a>
home_link = driver.find_element(By.LINK_TEXT, "Home")
products_link = driver.find_element(By.LINK_TEXT, "All Products")

Note: Case-sensitive and requires an exact match of the link’s visible text.

By.PARTIAL_LINK_TEXT – flexible text matching

<a href="/cart">Add to Shopping Cart</a>
<a href="/wishlist">Add to Wishlist</a>
<a href="/compare">to Comparison</a>
cart_link = driver.find_element(By.PARTIAL_LINK_TEXT, "Shopping")
wishlist_link = driver.find_element(By.PARTIAL_LINK_TEXT, "Wishlist")
add_links = driver.find_elements(By.PARTIAL_LINK_TEXT, "Add to")

Best for: dynamic or partially known link text.


Inspect elements using DevTools

To locate the right selector:

  1. Right-click the element
  2. Click Inspect
  3. Review the HTML structure
  4. Choose the most stable and specific selector available

For example, on a demo eCommerce site, products appear inside <li> tags with a "product" class.

Learn more about how to inspect elements on any website.

Always test selectors against different page states or slight layout changes. Prefer strategies that tolerate minor DOM shifts for reliable web scraping.

Extract product data from the page

Let’s walk through how to extract product information from a demo eCommerce site using Selenium. You’ll learn how to scrape both single and multiple products using flexible CSS selectors.

Single product walkthrough

Each product is wrapped in an <li> tag with the class "product". This consistent structure makes it easy to extract details using a shared selector.

Start by importing the By class:

from selenium.webdriver.common.by import By

Now, extract data from the first product container:

first_product = driver.find_element(By.CSS_SELECTOR, ".product")
name = first_product.find_element(By.CSS_SELECTOR, ".product-name").text
price = first_product.find_element(By.CSS_SELECTOR, ".price").text
url = first_product.find_element(By.CSS_SELECTOR, "a").get_attribute("href")
image = first_product.find_element(By.CSS_SELECTOR, "img").get_attribute("src")
print(f"Name: {name}")
print(f"Price: {price}")
print(f"URL: {url}")
print(f"Image: {image}")

Here’s the full script, including browser setup:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless=new")
driver = webdriver.Chrome(options=options)
driver.get("https://www.scrapingcourse.com/ecommerce/")
first_product = driver.find_element(By.CSS_SELECTOR, ".product")
name = first_product.find_element(By.CSS_SELECTOR, ".product-name").text
price = first_product.find_element(By.CSS_SELECTOR, ".price").text
url = first_product.find_element(By.CSS_SELECTOR, "a").get_attribute("href")
image = first_product.find_element(By.CSS_SELECTOR, "img").get_attribute("src")
print(f"Name: {name}")
print(f"Price: {price}")
print(f"URL: {url}")
print(f"Image: {image}")
driver.quit()

Sample output:

Name: Abominable Hoodie
Price: $69.00
URL: https://www.scrapingcourse.com/ecommerce/product/abominable-hoodie/
Image: https://www.scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/mh09-blue_main.jpg

Loop through all products

Extracting one product is useful for testing selectors. But in real-world scraping, you'll typically loop through all items on the page.

All product cards use the class "product", which we can target using find_elements():

products = driver.find_elements(By.CSS_SELECTOR, ".product")

Then loop through and collect the data:

extracted_data = []
for product in products:
name = product.find_element(By.CSS_SELECTOR, ".product-name").text
price = product.find_element(By.CSS_SELECTOR, ".price").text
url = product.find_element(By.CSS_SELECTOR,
"a").get_attribute("href")
image = product.find_element(By.CSS_SELECTOR, "img").get_attribute("src")
extracted_data.append({
"name": name,
"price": price,
"url": url,
"image": image
})

Here’s the complete script:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless=new")
driver = webdriver.Chrome(options=options)
driver.get("https://www.scrapingcourse.com/ecommerce/")
products = driver.find_elements(By.CSS_SELECTOR, ".product")
extracted_data = []
for product in products:
name = product.find_element(By.CSS_SELECTOR, ".product-name").text
price = product.find_element(By.CSS_SELECTOR, ".price").text
url = product.find_element(By.CSS_SELECTOR, "a").get_attribute("href")
image = product.find_element(By.CSS_SELECTOR, "img").get_attribute("src")
extracted_data.append({
"name": name,
"price": price,
"url": url,
"image": image
})
print(extracted_data)
driver.quit()

Your output will look like this:

[{'name': 'Abominable Hoodie',
'price': '$69.00',
'url': 'https://www.scrapingcourse.com/ecommerce/product/abominable-hoodie/',
'image': 'https://www.scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/mh09-blue_main.jpg'},
{'name': 'Adrienne Trek Jacket',
'price': '$57.00',
'url': 'https://www.scrapingcourse.com/ecommerce/product/adrienne-trek-jacket/',
'image': 'https://www.scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/wj08-gray_main.jpg'},
# ...
]

Great! You’ve now built a fully functional product scraper using Selenium.

Handle dynamic content

Modern websites often load content asynchronously using JavaScript. Elements may appear after the page visually loads, and Selenium needs explicit handling to wait for these elements to be available.

Instead of relying on time.sleep() (which slows things down unnecessarily), use smarter wait strategies.

Use explicit waits with WebDriverWait

If a page injects elements dynamically, find_element() might fail because the content hasn't finished rendering yet. WebDriverWait helps by pausing execution until specific conditions are met.

Start by importing the required modules:

from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

Wait for all products to appear:

wait = WebDriverWait(driver, 10)
try:
products = wait.until(
EC.presence_of_all_elements_located((By.CSS_SELECTOR,
".product"))
)
print(f"Found {len(products)} products")
except TimeoutException:
print("Products didn't load within 10 seconds")

Here is how WebDriverWait works:

  • Polling frequency – checks every 500ms by default
  • Timeout – raises a TimeoutException if the condition isn't met in time
  • Non-blocking – continues immediately once the condition is satisfied
  • Exception handling – can ignore specific exceptions during polling

Constructor breakdown:

WebDriverWait(driver, timeout, poll_frequency=0.5, ignored_exceptions=None)

Here:

  • driver – your active browser session
  • timeout – max duration (in seconds) to wait
  • poll_frequency – check interval (default: 0.5s)
  • ignored_exceptions – list of exceptions to suppress (optional)

Here are the common expected_conditions:

  • presence_of_element_located
  • presence_of_all_elements_located
  • element_to_be_clickable
  • visibility_of_element_located
  • invisibility_of_element_located

Wait for full page load with JavaScript

Selenium’s driver.get() typically waits for the initial HTML, but may not wait for dynamically injected content. Use document.readyState to ensure the full page – scripts, styles, and all – has loaded.

driver.get("https://www.scrapingcourse.com/ecommerce/")
wait = WebDriverWait(driver, 10)
wait.until(lambda driver: driver.execute_script("return
document.readyState") == "complete")
# Now it's safe to extract dynamic elements
products = driver.find_elements(By.CSS_SELECTOR, ".product")

What does document.readyState mean? This JavaScript property reflects the loading state of the page:

  • "loading" – the document is still parsing.
  • "interactive" – DOM is parsed, but other resources (like images) may still be loading.
  • "complete" – the entire page and sub-resources are fully loaded.

Use this when scraping:

  • Single-page apps (SPAs)
  • AJAX-heavy websites
  • Pages where the content lags after visual load

You can check out our 4-minute video on scraping dynamic websites.

Handle pagination and infinite scroll

Many modern websites use dynamic UI patterns like infinite scrolling or paginated listings to display large datasets. Scraping these pages requires more than just locating elements – it involves simulating user behavior and handling asynchronous content loads.

Scrape infinite scroll pages

Infinite scroll is common on product feeds, social media, and news aggregators. Content is appended to the DOM via JavaScript as users scroll, meaning Selenium must mimic that behavior to collect all items.

Here’s how to automate infinite scrolling:

from selenium import webdriver
from selenium.webdriver.common.by import By
import time
driver = webdriver.Chrome()
driver.get("https://www.scrapingcourse.com/infinite-scrolling")
print("Starting infinite scroll...")
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2) # Wait for new content
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
time.sleep(1)
final_height = driver.execute_script("return document.body.scrollHeight")
if final_height == last_height:
print("No more content to load.")
break
last_height = new_height
print(f"Scrolled to: {new_height}")
print("Scroll completed.")
products = driver.find_elements(By.CSS_SELECTOR, ".product-item")
print(f"Found {len(products)} products after scrolling")
driver.quit()

Why this works

  • Tracks document.body.scrollHeight to detect dynamic content injection.
  • Scrolls repeatedly until no height change is detected.
  • Uses small delays to allow content to render.
  • Stops only after confirming no new content appears.

Scrape paginated listings

Classic pagination splits content across numbered pages or a "Next" icon or button.

To scrape all data, your script needs to:

  1. Extract items on the current page.
  2. Click the navigation arrow.
  3. Wait for the new content to load.
  4. Repeat until pagination ends.

Here’s a resilient solution:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get("https://www.scrapingcourse.com/ecommerce/")
total_products = 0
page_number = 1
while True:
print(f"Processing page {page_number}...")
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, ".product")))
products = driver.find_elements(By.CSS_SELECTOR, ".product")
page_product_count = len(products)
total_products += page_product_count
print(f"Found {page_product_count} products on page {page_number}")
next_buttons = driver.find_elements(By.CSS_SELECTOR, ".next.page-numbers")
if next_buttons:
next_button = next_buttons[0]
driver.execute_script("arguments[0].scrollIntoView();", next_button)
reference = products[0] if products else None
driver.execute_script("arguments[0].click();", next_button)
if reference:
wait.until(EC.staleness_of(reference)) # Wait for page change
page_number += 1
else:
print("No more pages.")
break
print(f"Total products scraped across {page_number} pages: {total_products}")
driver.quit()

Read our guide on pagination in web scraping to go deeper.

Capture screenshots for debugging

Visual debugging is one of the most effective ways to troubleshoot scraping issues, especially when running in headless mode, where you can’t see what the browser is rendering.

Screenshots help you quickly identify:

  • Page rendering issues
  • Missing or delayed content
  • Element visibility problems
  • JavaScript execution errors

They’re also useful for documenting production failures or layout changes.


Capture full-page screenshots

Use save_screenshot() to capture the full visible portion of the browser:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
options = Options()
options.add_argument("--headless=new")
options.add_argument("--window-size=1920,1080")
driver = webdriver.Chrome(options=options)
driver.get("https://www.scrapingcourse.com/ecommerce/product/bruno-compete-hoodie/")
time.sleep(2)
driver.save_screenshot("page_screenshot.png")
print("Screenshot saved: page_screenshot.png")
driver.quit()

Here’s the screenshot output:

Best for: confirming page load, checking layout, or inspecting element visibility.

Capture screenshots during key scraping steps

For complex flows, such as scrolling, pagination, or multi-step scraping, capture screenshots at different stages to track behavior and identify failures.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from datetime import datetime
import os
import time
os.makedirs("screenshots", exist_ok=True)
options = Options()
options.add_argument("--headless=new")
options.add_argument("--window-size=1920,1080")
driver = webdriver.Chrome(options=options)
driver.get("https://www.scrapingcourse.com/ecommerce/")
# Screenshot after initial page load
filename = f"screenshots/01_initial_{datetime.now().strftime('%Y%m%d_%H%M%S')}.png"
driver.save_screenshot(filename)
print(f"Saved: {filename}")
# Wait for products to load
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, ".product")))
# Screenshot after content loads
filename = f"screenshots/02_loaded_{datetime.now().strftime('%Y%m%d_%H%M%S')}.png"
driver.save_screenshot(filename)
# Scroll and capture
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(1)
filename = f"screenshots/03_scrolled_{datetime.now().strftime('%Y%m%d_%H%M%S')}.png"
driver.save_screenshot(filename)
# Navigate into a product
driver.find_element(By.CSS_SELECTOR, ".product a").click()
time.sleep(2)
filename = f"screenshots/04_product_{datetime.now().strftime('%Y%m%d_%H%M%S')}.png"
driver.save_screenshot(filename)
driver.quit()

All screenshots are saved in the screenshots/ folder with timestamps for easy tracking.

Capture specific elements

Sometimes, you only want to capture a single UI component, like a product card or a pagination bar.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless=new")
options.add_argument("--window-size=1920,1080")
driver = webdriver.Chrome(options=options)
driver.get("https://www.scrapingcourse.com/ecommerce/")
product = driver.find_element(By.CSS_SELECTOR, ".product")
product.screenshot("single_product.png")
pagination = driver.find_element(By.CSS_SELECTOR, ".woocommerce-pagination")
pagination.screenshot("pagination.png")
driver.quit()

Best for: debugging layout issues on key elements or documenting UI changes.

Here’s the screenshot output:

Screenshot method reference

Here are several ways to capture screenshots in Selenium:

# Save to file
driver.save_screenshot("page.png")
driver.get_screenshot_as_file("page.png") # Equivalent
# Get binary data
png_data = driver.get_screenshot_as_png()
with open("page.png", "wb") as f:
f.write(png_data)
# Base64 for inline use (e.g., logging or HTML embedding)
base64_data = driver.get_screenshot_as_base64()

Avoid blocks with proxies

Most websites today implement anti-bot defenses – from rate limiting and IP fingerprinting to geofencing and browser challenges. If your scraper keeps failing or returns inconsistent results, you likely need proxies.

Why proxies matter for web scraping

Proxies act as intermediaries between your scraper and the target site. They offer four essential benefits:

  • IP rotation. Avoid blocks by distributing requests across multiple IPs.
  • Authentic traffic. Residential proxies mimic real users on real devices.
  • Geo-targeting. Route traffic through specific countries, cities, or ZIPs.
  • Scalability. Unlock multi-region scraping without tripping rate limits.

Unlike datacenter IPs, residential proxies are tied to real consumer devices and ISP-assigned IPs, making them harder to detect.

Our case study shows how proxy-based scraping doubled reliability and revenue for a data team scraping at scale.

Use proxies in Selenium with Selenium Wire

The base Selenium package doesn’t support authenticated proxies out of the box. Selenium Wire adds:

  • Full proxy support (auth included)
  • Request/response inspection
  • Advanced networking control

Install it with compatible packages:

pip install selenium-wire packaging setuptools "blinker<1.8"

Configure a proxy like this:

from seleniumwire import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless=new")
proxy_options = {
"proxy": {
"http":
"http://PROXY_USERNAME:[email protected]:7000",
"https":
"http://PROXY_USERNAME:[email protected]:7000"
}
}
driver = webdriver.Chrome(
seleniumwire_options=proxy_options,
options=options
)
# Check IP using Decodo's IP echo
driver.get("https://ip.decodo.com/json")
print(driver.find_element(By.TAG_NAME, "pre").text)
driver.quit()

Use residential proxies on protected sites

Sites like Amazon, Walmart, and other major eCommerce platforms are notoriously hard to scrape without rotating, real-user IPs.

Scraping a product page without protection:

from selenium import webdriver
import time
driver = webdriver.Chrome()
driver.get("https://www.walmart.com/ip/1970766503")
time.sleep(3)
driver.quit()

This can trigger CAPTCHAs, empty or broken HTML, 403 Forbidden, or 429 Too Many Requests errors. Learn more in our guide to anti-bot systems and how to bypass them effectively.

Example CAPTCHA from Walmart when running the above code:

Scrape protected pages with proxies

Re-run the same scraper using Decodo's residential proxy network:

from seleniumwire import webdriver
proxy_options = {
"proxy": {
"http": "http://PROXY_USERNAME:[email protected]:7000",
"https": "http://PROXY_USERNAME:[email protected]:7000"
}
}
driver = webdriver.Chrome(seleniumwire_options=proxy_options)
driver.get("https://www.walmart.com/ip/1970766503")
driver.quit()

You’ll now access the real page content like below:

Scale with Decodo’s proxy network

Decodo’s residential proxies give you:

  • 115M+ ethically-sourced IPs
  • 195+ countries and regions
  • Targeting by country, city, ZIP, or ASN
  • Rotating or sticky sessions
  • <0.6s response time
  • Unlimited concurrency

You can get started in minutes with our quick start guide or test things with a free trial.

Performance optimization

Most scraping tasks don’t need to load images, videos, ads, or notifications. Blocking these unnecessary assets can dramatically improve speed and reduce bandwidth, especially on media-heavy sites like retail platforms or travel listings.

Why block images and assets? The average mobile web page is nearly 2 MB, and nearly half of that is images. If your scraper only needs text or metadata, there’s no reason to download them.

Primary target: images

Images are the largest payload on most product or blog pages. You can disable image loading using Chrome’s prefs setting:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
opts = Options()
opts.add_experimental_option("prefs", {
"profile.managed_default_content_settings.images": 2
})
driver = webdriver.Chrome(options=opts)
driver.get("https://www.scrapingcourse.com/ecommerce/product/bruno-compete-hoodie/")
time.sleep(5)
driver.quit()

The result is you’ll skip all image downloads, reducing page size by up to 1.5–3×.

You can optionally block notifications:

{
"profile.managed_default_content_settings.notifications": 2,
}

Use eager page-load strategy

By default, Selenium waits for every resource (including fonts, ads, and iframes) before it continues. That’s often unnecessary.

Use eager mode to return control once DOMContentLoaded fires:

opts = Options()
opts.page_load_strategy = "eager"

This approach is ideal for pure data scraping, without layout rendering, CSS checks, or visual capture. In real‑world tests on large eCommerce pages, you can observe 20–50 % faster load times.

When not to block assets

Blocking images and styling is a powerful optimization, but it’s not always appropriate. Skip this technique if:

  • You’re capturing screenshots or testing the visual layout.
  • Your target data includes images, videos, or UI states.
  • You’re doing UX simulations or responsive layout checks.
  • You’re actively debugging layout or JS rendering issues.

Export data to CSV

After extracting product data, the next step is to save it in a structured format. CSV is the go-to choice for loading into spreadsheets, databases, or data pipelines.

Write CSV with csv.DictWriter

Use Python’s built-in csv module to write a list of dictionaries into a clean, structured file:

import csv
with open("products.csv", "w", newline="", encoding="utf-8") as csvfile:
fieldnames = ["name", "price", "url", "image"]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for product in all_products:
writer.writerow(product)

If your data is already structured as a list of dictionaries, use writer.writerows(all_products) for a faster bulk write.

import csv
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("--headless=new")
driver = webdriver.Chrome(options=options)
driver.get("https://www.scrapingcourse.com/ecommerce/")
all_products = []
while True:
print("Scraping page...")
products = driver.find_elements(By.CSS_SELECTOR, ".product")
for product in products:
name = product.find_element(By.CSS_SELECTOR,
".product-name").text
price = product.find_element(By.CSS_SELECTOR, ".price").text
url = product.find_element(By.CSS_SELECTOR,
"a").get_attribute("href")
image = product.find_element(By.CSS_SELECTOR,
"img").get_attribute("src")
all_products.append({
"name": name,
"price": price,
"url": url,
"image": image
})
next_buttons = driver.find_elements(By.CSS_SELECTOR,
".next.page-numbers")
if not next_buttons:
break
driver.execute_script("arguments[0].click();", next_buttons[0])
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, ".product"))
)
# Save to CSV
with open("products.csv", "w", newline="", encoding="utf-8") as csvfile:
fieldnames = ["name", "price", "url", "image"]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(all_products)
print(f"Total products scraped: {len(all_products)}")
driver.quit()

The following script scrapes every product across paginated pages, then writes all results to products.csv in one go:

Best practices and next steps

By now, you have a working Selenium scraper that can handle dynamic content. To wrap up, here are some best practices and suggestions for expanding your web scraping skills:

  • Respect website policies. Always check a site's robots.txt and terms of service. Ethical scraping means not overloading servers with excessive requests and respecting usage policies.
  • Avoid detection. Use rotating proxies, randomize your User-Agent string, and introduce small, human-like delays between interactions. Selenium’s default speed is fast and robotic – slowing it down mimics organic behavior and reduces the risk of being blocked.
  • Handle CAPTCHAs. Many sites use CAPTCHAs to deter scraping. Solving these challenges often requires third-party services or AI/ML solutions – consider if the data justifies the cost. The ultimate solution is comprehensive APIs that handle anti-bot measures automatically.
  • Scale carefully. Scraping thousands of pages can be resource-heavy. Selenium isn’t always the most efficient choice at scale. For larger tasks, consider headless browsers or distributed scraping frameworks like Playwright, Puppeteer (for Node.js), or even Scrapy for pure crawling workflows.
  • Consider APIs and specialized services. Sometimes the best approach is to use an official API rather than scraping. Many websites offer APIs that are safer and more reliable than scraping. Additionally, companies provide scraping API services for various use cases, including social media, search engines, and eCommerce, that handle complexity and anti-bot measures for you.
  • Learn and iterate. Web scraping combines technical skill with problem-solving. Each website presents unique challenges. Stay current with new tools and techniques – for example, libraries like SeleniumBase offer higher-level wrappers around Selenium, which might suit some projects better.

Wrapping up

We hope this tutorial helped you better understand how to target and extract data from JavaScript-heavy websites using Selenium with Python. Mastering this skill is essential for collecting data from pages that rely on client-side rendering.

Don’t forget – pairing Selenium with reliable residential proxies or a Web Scraping API ensures your data extraction remains smooth, stable, and undetected. Whether you’re getting started or refining a mature pipeline, this setup gives you the flexibility and power to scale any web data project.

About the author

Dominykas Niaura

Technical Copywriter

Dominykas brings a unique blend of philosophical insight and technical expertise to his writing. Starting his career as a film critic and music industry copywriter, he's now an expert in making complex proxy and web scraping concepts accessible to everyone.


Connect with Dominykas via LinkedIn

All information on Decodo Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Decodo Blog or any third-party websites that may belinked therein.

Beautiful Soup Tutorial: Master Web Data Parsing with Python OG

A Complete Guide to Web Data Parsing Using Beautiful Soup in Python

Beautiful Soup is a widely used Python library that plays a vital role in data extraction. It offers powerful tools for parsing HTML and XML documents, making it possible to extract valuable data from web pages effortlessly. This library simplifies the often complex process of dealing with the unstructured content found on the internet, allowing you to transform raw web data into a structured and usable format.

HTML document parsing plays a pivotal role in the world of information. The HTML data can be used further for data integration, analysis, and automation, covering everything from business intelligence to research and beyond. The web is a massive place full of valuable information; therefore, in this guide, we’ll employ various tools and scripts to explore the vast seas and teach them to bring back all the data.

Zilvinas Tamulis

Nov 16, 2023

14 min read

Scraping Amazon Product Data Using Python: Step-by-Step Guide

This comprehensive guide will teach you how to scrape Amazon product data using Python. Whether you’re an eCommerce professional, researcher, or developer, you’ll learn to create a solution to extract valuable insights from Amazon’s marketplace. By following this guide, you’ll acquire practical knowledge on setting up your scraping environment, overcoming common challenges, and efficiently collecting the needed data.

Zilvinas Tamulis

Mar 27, 2025

15 min read

Beautiful Soup Web Scraping: How to Parse Scraped HTML with Python

Web scraping with Python is a powerful technique for extracting valuable data from the web, enabling automation, analysis, and integration across various domains. Using libraries like Beautiful Soup and Requests, developers can efficiently parse HTML and XML documents, transforming unstructured web data into structured formats for further use. This guide explores essential tools and techniques to navigate the vast web and extract meaningful insights effortlessly.

Zilvinas Tamulis

Mar 25, 2025

14 min read

🐍 Python Web Scraping: In-Depth Guide 2025

Welcome to 2025, the year of the snake – and what better way to celebrate than by mastering Python, the ultimate "snake" in the tech world! If you’re new to web scraping, don’t worry – this guide starts from the basics, guiding you step-by-step on collecting data from websites. Whether you’re curious about automating simple tasks or diving into more significant projects, Python makes it easy and fun to start. Let’s slither into the world of web scraping and see how powerful this tool can be!

Zilvinas Tamulis

Feb 28, 2025

15 min read

Frequently asked questions

What is web scraping?

Web scraping is a method to gather public data from websites. With a dedicated API (Application Programming Interface), you can automatically fetch web pages to retrieve the entire HTML code or specific data points.

At Decodo, we offer Web Scraping API for social media platforms, search engine result pages, online marketplaces, and various other websites.

But if you’re already set with a web scraping tool for your project, don’t forget to equip it with residential proxies for ultimate success.

What is Selenium web scraping and how does it work?

Selenium web scraping uses browser automation to extract data from dynamic websites. Unlike traditional tools that use HTTP requests, Selenium launches a real browser and mimics user behavior. It can execute JavaScript, interact with forms and buttons, handle authentication flows, and render single-page applications, making it ideal for modern JavaScript-heavy websites.

Is Selenium good for web scraping compared to other tools?

Selenium is powerful for JavaScript-heavy sites but has trade-offs. It's slower than HTTP-based tools since it loads full browser instances, uses more resources, and requires a complex setup. However, for sites that rely heavily on JavaScript for content rendering, Selenium is often the only viable option for reliable data extraction.

How do I avoid detection and blocking when using Selenium for web scraping?

To avoid detection, rotate user-agent strings, use residential proxies, and add human-like delays between interactions. Respect rate limits, manage sessions with proper cookies/headers, and use headless mode cautiously since many sites detect it. For maximum protection, consider switching to managed web scraping APIs that handle anti-bot challenges automatically.

Can Selenium handle JavaScript-rendered or dynamic content?

Yes, this is Selenium's main advantage. Since it controls a real browser, it executes all JavaScript just like your browser would. Content loaded via AJAX or built by frameworks like React, Vue, or Angular will be fully rendered before you access it, making Selenium invaluable for modern web applications.

What are the use cases of web scraping?

Some of the most common web scraping use cases include competitor analysis, market research, trend analysis, lead generation, pricing strategies, content and news monitoring, data analysis, and real estate market analysis.

What is parsing?

Data parsing is turning raw, hard-to-read data into a well-structured format. One example of parsing would be turning HTML into JSON, CSV, a chart, or a table. Read more about parsing and its use cases in our blog.

© 2018-2025 decodo.com. All Rights Reserved