Scraping the Web with Selenium and Python: A Step-By-Step Tutorial

Since the late 2000s, web scraping has become a crucial method for extracting public data – and a major advantage for those who master it. One of the biggest challenges? Handling websites that load content dynamically with JavaScript, often delaying the data you want to extract. Traditional scraping tools usually fall short here, but that’s where Selenium with Python shines. In this step-by-step tutorial, you’ll learn how to scrape dynamic web pages efficiently using Selenium, complete with a working Python example and a video walkthrough to follow along.

Dominykas Niaura

Nov 09, 2023

10 min read

Preparing Selenium Python

First things first, let’s prepare our Selenium Python web scraping approach by using the virtualenv package. It should come as a default library with Python 3.3 and above, or you can learn how to install it here.

Download the full project from our GitHub.
Open the Terminal or command-line interface based on your operating system.
Navigate to the directory where you downloaded the project to create the virtual environment. You can use the command cd path/to/directory to get there quickly.
Input the virtualenv package commands. On macOS and Linux: source myenv/bin/activate. On Windows (in Command Prompt or PowerShell): .\myenv\Scripts\activate.
‏‏‎ ‎

Now, you’ll be working within the virtual environment, and any Python packages you install will be local to that environment. So, let’s talk about the packages we’ll need for this Selenium Python web scraping method:

Webdriver-manager is a utility tool that streamlines the process of setting up and managing different web drivers for browser automation.

Selenium is a powerful tool for controlling a web browser through code, facilitating automated testing and web scraping.

Bs4, also known as BeautifulSoup, is a parsing library that makes it easy to parse the scraped information from web pages, allowing for efficient HTML and XML data extraction.
‏‏‎ ‎

You can download the packages using these commands via your terminal:

pip install webdriver-manager
pip install selenium
pip install beautifulsoup4

Once we have the packets installed, the first thing to do is to import everything into the script file:

from webdriver_manager.chrome import ChromeDriverManager
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from extension import proxies
from bs4 import BeautifulSoup
import json

Setting up residential proxies

The next step involves integrating proxies into our code. Without proxies, target websites might detect your Selenium Python web scraping efforts and halt any program attempting data collection. To efficiently gather public data, your scraping project must seem like a regular internet user.

A residential proxy is an intermediary that provides the user with an IP address allocated by an Internet Service Provider (ISP). Maintaining a low profile when web scraping is essential, so residential proxies are the perfect choice. They provide a high level of anonymity and are unlikely to be blocked by websites.

We at Decodo offer industry-leading residential proxies with a vast 115M+ IP pool across 195+ locations, the fastest response time in the market (<0.6s), a 99.86% success rate, and an excellent entry-point via the Pay As You Go payment option.

Once you get yourself a proxy plan and set up your user, insert the proxy credentials into the code:

username = ’your_username’
password = ’your_password’
endpoint = ’proxy_endpoint’
port = ’proxy_port’

Replace your proxy username, password, endpoint, and port by replacing your_username, your_password, proxy_endpoint, and proxy_port, respectively.

WebDriver page properties

Time to truly unleash the power of Selenium Python web scraping. The first line creates a web driver with Chrome options that define how the browser should work. The first thing we add to the options is telling it to use proxies. We enable an extension (from the extension.py file) with our credentials to enable the proxy, ensuring your scraping activity remains anonymous and uninterrupted. Note that you don’t have to enter your proxy information here; it’s already been defined before.

Then, we’re adding one more Chrome option to activate headless mode instead of browser mode. The last line indicates that we’re spawning a web driver over the Chrome instance and providing Chrome options, saying we’d like to install the proxy extension.

chrome_options = webdriver.ChromeOptions()

proxies_extension = proxies(username, password, endpoint, port)

chrome_options.add_extension(proxies_extension)
chrome_options.add_argument("--headless=new")

chrome = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)

Targeting and delaying

When performing Selenium Python web scraping, precision is of the utmost importance. A good rule of thumb is to define what you’re targeting on the page that you intend to scrape. Dynamic content sometimes means that you’ll have to adapt creatively.

In our example, we’ve selected the URL of a website dedicated to showcasing quotes from famous people. It’s a purposefully slow-loading page, so it will return an error if we don’t give the web driver enough delay time before scraping. Therefore, we’re setting a delay time of 30 seconds and targeting only the quote element by class name.

url = "https://quotes.toscrape.com/js-delayed/"
chrome.get(url)

wait = WebDriverWait(chrome, 30)
quote_elements = wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, "quote")))

Extracting the HTML element

Once we get all the elements from the page, we can create a simple loop where we iterate through all the elements and extract the necessary data from the HTML code. For this, we’re using the BeautifulSoup library.

We extract the quote by targeting a specific span with a class text and extracting the text. We do the same for the elements author and tags. Then, we create a dictionary and format it based on our preferences.

quote_data = []
for quote_element in quote_elements:
soup = BeautifulSoup(quote_element.get_attribute("outerHTML"), ’html.parser’)
quote_text = soup.find(’span’, class_=’text’).text
author = soup.find(’small’, class_=’author’).text
tags = [tag.text for tag in soup.find_all(’a’, class_=’tag’)]
quote_info = {
"Quote": quote_text,
"Author": author,
"Tags": tags
}
quote_data.append(quote_info)

with open(’quote_info.json’, ’w’) as json_file:
json.dump(quote_data, json_file, indent=4)

chrome.quit()

quote_data = []
for quote_element in quote_elements:
soup = BeautifulSoup(quote_element.get_attribute("outerHTML"), ’html.parser’)
quote_text = soup.find(’span’, class_=’text’).text
author = soup.find(’small’, class_=’author’).text
tags = [tag.text for tag in soup.find_all(’a’, class_=’tag’)]
quote_info = {
"Quote": quote_text,
"Author": author,
"Tags": tags
}
quote_data.append(quote_info)

with open(’quote_info.json’, ’w’) as json_file:
json.dump(quote_data, json_file, indent=4)

chrome.quit()

Run the code by executing the following command in your terminal:

python quotes.py

The result will appear in your Terminal or command line interface and be saved in a JSON file. The benefit of storing JSON files is that it makes the data well-organized and easy to interpret.

Prefer browser mode?

For those who like a visual representation of the Selenium Python web scraping process, you can switch the headless mode off. In that case, you’ll witness a Chrome instance being launched, offering a real-time view of the scraping. It’s a matter of personal preference, but it’s always good to have an option for checking if it works or at which point the errors strike.

If you go to the WebDriver page properties step, simply put a # symbol before the line that mentions headless to comment it out and make it inactive:

# chrome_options.add_argument("--headless=new")

The full Selenium Python web scraping code and video

Let's recap. The project is downloadable from our GitHub. And the code is as follows:

from webdriver_manager.chrome import ChromeDriverManager
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from extension import proxies
from bs4 import BeautifulSoup
import json

# Credentials and Proxy Details
username = ’your_username’
password = ’your_password’
endpoint = ’proxy_endpoint’
port = ’proxy_port’

# Set up Chrome WebDriver
chrome_options = webdriver.ChromeOptions()
proxies_extension = proxies(username, password, endpoint, port)
chrome_options.add_extension(proxies_extension)
# Comment the next line to disable headless mode
chrome_options.add_argument("--headless=new")

chrome = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)

# Open the desired webpage
url = "https://quotes.toscrape.com/js-delayed/"
chrome.get(url)

# Wait for the "quotes" divs to load
wait = WebDriverWait(chrome, 30)
quote_elements = wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, "quote")))

# Extract the HTML of all "quote" elements, parse them with BS4 and save to JSON
quote_data = []
for quote_element in quote_elements:
soup = BeautifulSoup(quote_element.get_attribute("outerHTML"), ’html.parser’)
quote_text = soup.find(’span’, class_=’text’).text
author = soup.find(’small’, class_=’author’).text
tags = [tag.text for tag in soup.find_all(’a’, class_=’tag’)]
quote_info = {
"Quote": quote_text,
"Author": author,
"Tags": tags
}
quote_data.append(quote_info)

# Save data to JSON file
with open(’quote_info.json’, ’w’) as json_file:
json.dump(quote_data, json_file, indent=4)

# Close the WebDriver
chrome.quit()

from webdriver_manager.chrome import ChromeDriverManager
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from extension import proxies
from bs4 import BeautifulSoup
import json

# Credentials and Proxy Details
username = ’your_username’
password = ’your_password’
endpoint = ’proxy_endpoint’
port = ’proxy_port’

# Set up Chrome WebDriver
chrome_options = webdriver.ChromeOptions()
proxies_extension = proxies(username, password, endpoint, port)
chrome_options.add_extension(proxies_extension)
# Comment the next line to disable headless mode
chrome_options.add_argument("--headless=new")

chrome = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)

# Open the desired webpage
url = "https://quotes.toscrape.com/js-delayed/"
chrome.get(url)

# Wait for the "quotes" divs to load
wait = WebDriverWait(chrome, 30)
quote_elements = wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, "quote")))

# Extract the HTML of all "quote" elements, parse them with BS4 and save to JSON
quote_data = []
for quote_element in quote_elements:
soup = BeautifulSoup(quote_element.get_attribute("outerHTML"), ’html.parser’)
quote_text = soup.find(’span’, class_=’text’).text
author = soup.find(’small’, class_=’author’).text
tags = [tag.text for tag in soup.find_all(’a’, class_=’tag’)]
quote_info = {
"Quote": quote_text,
"Author": author,
"Tags": tags
}
quote_data.append(quote_info)

# Save data to JSON file
with open(’quote_info.json’, ’w’) as json_file:
json.dump(quote_data, json_file, indent=4)

# Close the WebDriver
chrome.quit()

Wrapping up

We hope this tutorial has helped you better understand how to target and extract data from dynamically rendered web pages using Selenium with Python. In today’s vast digital landscape, this technique is like carrying a Swiss Army knife through the unpredictable jungles of JavaScript-heavy websites.

Don’t forget: pairing Selenium with our reliable residential proxies ensures your scraping runs smoothly and remains undetected. Whether you're just starting out or have some experience under your belt, this powerful combination sets you up for success in any data extraction project.

About the author

Dominykas Niaura

Technical Copywriter

Dominykas brings a unique blend of philosophical insight and technical expertise to his writing. Starting his career as a film critic and music industry copywriter, he's now an expert in making complex proxy and web scraping concepts accessible to everyone.

Connect with Dominykas via LinkedIn

All information on Decodo Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Decodo Blog or any third-party websites that may belinked therein.

In this article

PARSING

A Complete Guide to Web Data Parsing Using Beautiful Soup in Python

Beautiful Soup is a widely used Python library that plays a vital role in data extraction. It offers powerful tools for parsing HTML and XML documents, making it possible to extract valuable data from web pages effortlessly. This library simplifies the often complex process of dealing with the unstructured content found on the internet, allowing you to transform raw web data into a structured and usable format.

HTML document parsing plays a pivotal role in the world of information. The HTML data can be used further for data integration, analysis, and automation, covering everything from business intelligence to research and beyond. The web is a massive place full of valuable information; therefore, in this guide, we’ll employ various tools and scripts to explore the vast seas and teach them to bring back all the data.

Zilvinas Tamulis

Nov 16, 2023

14 min read

DATA COLLECTION

PYTHON

Scraping Amazon Product Data Using Python: Step-by-Step Guide

This comprehensive guide will teach you how to scrape Amazon product data using Python. Whether you’re an eCommerce professional, researcher, or developer, you’ll learn to create a solution to extract valuable insights from Amazon’s marketplace. By following this guide, you’ll acquire practical knowledge on setting up your scraping environment, overcoming common challenges, and efficiently collecting the needed data.

Zilvinas Tamulis

Mar 27, 2025

15 min read

PARSING

DATA COLLECTION

Beautiful Soup Web Scraping: How to Parse Scraped HTML with Python

Web scraping with Python is a powerful technique for extracting valuable data from the web, enabling automation, analysis, and integration across various domains. Using libraries like Beautiful Soup and Requests, developers can efficiently parse HTML and XML documents, transforming unstructured web data into structured formats for further use. This guide explores essential tools and techniques to navigate the vast web and extract meaningful insights effortlessly.

Zilvinas Tamulis

Mar 25, 2025

14 min read

Frequently asked questions

What is web scraping?

Web scraping is a method to gather public data from websites. With a dedicated API (Application Programming Interface), you can automatically fetch web pages to retrieve the entire HTML code or specific data points.

At Decodo, we offer Social Media Scraping API for social media platforms, SERP Scraping API for search engine result pages, eCommerce Scraping API for online marketplaces, and Web Scraping API for various other websites.

But if you’re already set with a web scraping tool for your project, don’t forget to equip it with residential proxies for ultimate success.

What are the use cases of web scraping?

Some of the most common web scraping use cases include competitor analysis, market research and trend analysis, lead generation, pricing strategies, content and news monitoring, data analysis, and real estate market analysis.

What are the challenges of web scraping?

CATPCHAs, IP blocks, and rate limitations are some of the most frequent challenges web scrapers face. Use residential proxies to have a smooth scraping experience without getting caught for being a robot. These proxies come from a residential network, or, in other words, are real device IPs. In turn, any residential proxy traffic to a website looks like a request from an ordinary person.

What is parsing?

Data parsing is turning raw, hard-to-read data into a well-structured format. One example of parsing would be turning HTML into JSON, CSV, a chart, or a table. Read more about parsing and its use cases in our other blog.

Scraping the Web with Selenium and Python: A Step-By-Step Tutorial

Preparing Selenium Python

Setting up residential proxies

WebDriver page properties

Targeting and delaying

Extracting the HTML element

Prefer browser mode?

The full Selenium Python web scraping code and video

Wrapping up

Related articles

Frequently asked questions

What is web scraping?

What are the use cases of web scraping?

What are the challenges of web scraping?

What is parsing?