Back to blog

How to Scrape Images From Any Website With Python

If you need a bunch of images and the thought of saving them one by one already feels tedious, you're not alone. This can be especially draining when you're preparing a dataset for a machine learning project. The good news is that web scraping makes the whole process faster and far more manageable by letting you collect large quantities of images in just a few steps. In this blog post, we'll walk you through a straightforward way to grab images from a static website. We'll use Python, a few handy libraries, and proxies to keep things running smoothly.

Dominykas Niaura

Nov 20, 2025

10 min read

Python Tutorial: How To Scrape Images From Websites

How Python image scraping works

Before scraping images, it helps to understand what's actually happening under the hood. In most cases, the workflow looks like this:

  • Access the target page using an HTTP request.
  • Parse the HTML to extract image URLs.
  • Download the images to your machine for later use.

This flow is straightforward when a site serves the same HTML to every visitor. Things get trickier when parts of the page are generated by JavaScript, which brings us to an important distinction.

Static vs. dynamic websites

A static website delivers fixed HTML. What you see is exactly what's stored on the server, and everyone else sees the same thing. This makes static sites ideal for scraping as the HTML already contains all the image URLs you need.

A dynamic website generates content on the fly. The server or client-side JavaScript tailors the page to each visitor, often based on factors such as account data, browsing history, location, or real-time information like weather or stock updates. Dynamic websites require a different approach, as these pages may not expose image URLs in the initial HTML, which means you'll need tools that can fully render the page before you extract anything.

Both types are common, and knowing the difference early will help you choose the right approach for your scraper.

Determining whether a website is static or dynamic

A quick way to spot a dynamic site is when it greets you with personalised touches – anything from "Welcome back" to reminders about items you viewed earlier. That kind of tailored behaviour signals that the page is being generated on the fly rather than served as fixed HTML.

More generally, you can look at a few simple indicators. Static sites tend to deliver the same unchanging content to every visitor, usually stored as straightforward HTML files. Dynamic sites mix in elements like user logins, personalised recommendations, search-driven results, or forms that update based on what you enter. You might also notice differences in how URLs behave: static pages usually keep the same address, while dynamic pages often generate new query parameters or change the URL as you interact with them.

It also helps to consider the nature of the content itself. Pages that update frequently (such as weather services, news feeds, or stock information) are almost always dynamic, since the data is pulled from a database or API each time. In contrast, static sites only change when someone updates them manually.

Choosing the right tools for Python image scraping

Python gives you several ways to fetch and process images. Each tool has its own strengths, so picking the right one depends on how the website behaves. Let's overview a few popular libraries and when to use them.

Requests + Beautiful Soup

This pairing is usually the most efficient choice for scraping static websites. Requests fetches the page's HTML, and Beautiful Soup makes it easy to navigate that HTML and pull out image URLs directly. The process is quick, lightweight, and ideal when the content you need is already present in the initial source code without any JavaScript manipulation.

urllib

If you want to keep dependencies to a minimum, urllib can handle both page requests and file downloads using only Python's standard library. It's not as streamlined as requests and BeautifulSoup, but it gets the job done when you need a simple way to access a page and save images without bringing in additional packages.

Selenium

Dynamic or JavaScript-driven websites often load images only after scripts run or user actions take place. In these cases, Selenium is the most reliable solution. It automates a real browser environment, allowing the page to fully render before you extract image URLs. This makes it suitable for more complex scraping tasks where requests alone won't reveal the necessary content.

Playwright

Playwright is a more modern alternative for browser automation and is particularly handy for dynamic, JavaScript-heavy pages. Like Selenium, it controls a real browser, but it offers faster execution, built-in support for headless browsing, and a cleaner API for tasks such as waiting for network activity, handling multiple pages, and working with authenticated proxies. If you're starting from scratch or care about reliability and developer experience, Playwright is often the better choice for scraping images from dynamic sites.

Pillow

Pillow isn't a scraping tool, but it's useful once your images are downloaded. You can use it to resize, convert formats, inspect dimensions, or make other adjustments before storing or processing the files further. It's entirely optional, yet helpful if your workflow involves preparing images for datasets, machine learning models, or further analysis.

What you need for a simple Python image scraper

Let's build two scripts using Python: one for scraping images from static websites and another for collecting them from dynamic pages. But before running the scripts, you'll need a few essentials to get started. Here's what to have ready:

  • Python 3.7 or higher. Make sure Python is installed on your system. You can download it from the official Python website. To verify installation, open your terminal and run:
python --version
  • A text editor or IDE. You'll need somewhere to write and run your code. Visual Studio Code, PyCharm, or even a simple text editor paired with your system's terminal works fine.
  • Requests. This library sends HTTP requests to fetch web pages. It's lightweight and perfect for grabbing HTML from static sites.
  • Beautiful Soup. Once you have the HTML, Beautiful Soup parses it and lets you extract specific elements like image tags. It's the go-to tool for navigating HTML structure.
  • Playwright. For dynamic websites that load content with JavaScript, Playwright automates a real browser session. It renders pages fully before you scrape them, ensuring you capture images that wouldn't show up in raw HTML.

Install these three libraries with a single command:

pip install requests beautifulsoup4 playwright

After installing Playwright, you'll also need to download the browser binaries it uses:

playwright install
  • Proxies. Both scripts use residential proxies to mask your IP address and avoid getting blocked when scraping multiple pages. Without proxies, websites may detect your automated activity and limit access. Residential or rotating proxies work best for this.

Decodo offers residential proxies with a 99.86% success rate, average response times under 0.6 seconds, and a 3-day free trial. Here's how to get started:

  1. Create an account on the Decodo dashboard.
  2. On the left panel, select Residential proxies.
  3. Choose a subscription, Pay As You Go plan, or claim a 3-day free trial.
  4. In the Proxy setup tab, configure your location and session preferences.
  5. Copy your proxy credentials for integration into your scraping script.
free-trial.svg

Get residential proxies

Claim your 3-day free trial of residential proxies and explore full features with unrestricted access.

How to scrape images from static websites

We'll begin with a basic scraper designed for pages that openly display their images without any scripting involved. This approach uses lightweight tools like Requests to fetch the page and Beautiful Soup to parse it, making it a clean, beginner-friendly starting point. Since nothing needs to load dynamically, it's a straightforward way to learn how image extraction works.

Inspecting the target website

Static sites are straightforward because everything is already baked into the page's HTML. To confirm this, you can open the site in your browser, right-click an image, and inspect its <img> tag using the developer tools. Check if it includes attributes like src or srcset, and whether the displayed image uses a full URL or a relative path. With static pages, what you see in DevTools is exactly what you'll scrape.

Extracting image URLs

The static script sends a simple GET request to the website, then feeds its HTML into a parser that searches specifically for <img> tags. Each tag contains attributes that point to where the image file is hosted. The script loops over each of these tags and collects their URLs. Nothing has to load dynamically, so there's no need to render the page or trigger JavaScript – the HTML source already tells us everything.

Downloading the images

After collecting the URLs, the script moves through them one by one and downloads each image to your machine. It builds the file name from the URL, makes another HTTP request to fetch the image data, and stores it in a local folder. If the site uses relative paths, the script combines them with the main website address to ensure every image link becomes valid. The end result is a folder full of images taken directly from the static page.

Handling common issues

Even static pages can throw curveballs. Some servers reject automated scraping unless a User-Agent header is sent, so adding a short browser-like identifier can help avoid 4xx errors. A better solution for such errors is using residential proxies. Other things to keep in mind include broken image tags, duplicate image names, and missing URLs that need to be skipped gracefully. These checks are light and usually enough to scrape successfully as long as the site isn't actively blocking bots.

The full image scraper code for static websites

Save the following code as a .py file and run it in your terminal or preferred IDE. Add your proxy details and paste in the target URL you want to scrape, then start the script. It will scan the site, extract every image source it finds, and print those URLs directly in your terminal. From there, it automatically downloads each file and saves it to a local folder, giving you a clean collection of images ready to use.

from bs4 import BeautifulSoup
import requests
html_page = "https://help.decodo.com/docs/how-do-i-use-proxies"
proxy = f'http://YOUR_PROXY_USERNAME:[email protected]:7000'
response = requests.get(html_page, proxies={'http': proxy, 'https': proxy})
soup = BeautifulSoup(response.text, 'html.parser')
for img in soup.findAll('img'):
if img.get('src') != None:
print(img.get('src'))
img_url = img.get('src')
name = img_url.split('/')[-1]
img_response = requests.get(img_url)
file = open(name, "wb")
file.write(img_response.content)
file.close()

How to scrape images from dynamic websites

For sites that rely on JavaScript, scraping requires a bit more muscle. Images often appear only after the page has fully rendered, so a standard HTTP request won't be enough. Here we'll use a browser automation library Playwright to load the content, interact when needed, and then extract the image sources just as a user would see them.

Inspecting the target website

Dynamic pages often disguise their images behind JavaScript. Opening DevTools allows you to see that <img> tags only appear after the page loads fully, or that the source field doesn't contain a conventional URL. It's common to see base64-encoded blobs in the src attribute, meaning the image is embedded inside the page instead of being hosted separately. Inspecting the page reveals whether the script must wait for elements to appear or decode the image manually.

Extracting image URLs

Unlike static sites, you can't simply download the HTML and parse it. Instead, the dynamic script launches a browser session in the background, visits the page, and waits for network activity to settle so all JavaScript content has finished loading. Once the page is fully rendered, it queries all <img> elements, just as a user would see them. Some images return normal URLs, while others are base64 strings embedded into the document. The script categorizes both types so they can be handled differently.

Downloading images

In this script, an HTTP request saves the images to a local folder. Embedded base64 images require a different approach: instead of making a network request, the script decodes the base64 data directly and writes it to disk as a binary file. Both methods end up in the same image directory, but one comes from the network while the other comes from the page's encoded contents.

Handling common issues

Dynamic sites come with their own challenges. Because they rely on JavaScript, the script may need to wait longer for images to load, retry rendering, or pause for network requests to finish. Some pages use anti-bot logic, so scraping through a proxy or using realistic browser headers helps blend in with normal traffic. Missing or broken image sources and repeated filenames are also common, so the script checks for duplicates and skips unusable entries. With these safeguards, even heavily scripted pages become scrape-friendly.

The full image scraper code for dynamic websites

Save the script as a .py file, add your proxy details and target URL, and run it from your terminal or IDE. This script opens the page in a headless browser, waits for the content to render, and then collects every image source it finds. During execution, the script reports its progress – first showing the page it's loading, then how many images were detected, and finally printing each downloaded filename as it's saved. When the job finishes, it confirms how many files were successfully stored and where they can be found on your computer.

from playwright.sync_api import sync_playwright
import requests
from urllib.parse import urljoin
import os
target_url = "https://www.bbc.co.uk/news/topics/c2vdnvdg6xxt"
output_dir = "images"
proxy = "https://YOUR_PROXY_USERNAME:[email protected]:7000"
os.makedirs(output_dir, exist_ok=True)
with sync_playwright() as p:
browser = p.chromium.launch(
headless=True,
proxy={"server": proxy.split("@")[1], "username": proxy.split("//")[1].split(":")[0], "password": proxy.split(":")[2].split("@")[0]}
)
page = browser.new_page()
print(f"Loading: {target_url}")
page.goto(target_url, wait_until="domcontentloaded", timeout=60000)
page.wait_for_timeout(5000)
for i in range(5):
page.evaluate("window.scrollBy(0, window.innerHeight)")
page.wait_for_timeout(1500)
image_urls = []
for img in page.query_selector_all("img"):
src = img.get_attribute("src") or img.get_attribute("data-src")
if src and not src.startswith("data:"):
full_url = urljoin(target_url, src)
if full_url.startswith("http"):
image_urls.append(full_url)
browser.close()
print(f"Found {len(image_urls)} images\n")
downloaded = 0
for img_url in image_urls:
try:
name = img_url.split("/")[-1].split("?")[0]
if not name:
name = f"image_{image_urls.index(img_url)}.jpg"
img_response = requests.get(img_url, proxies={'http': proxy, 'https': proxy}, timeout=15)
if len(img_response.content) > 1000:
file = open(os.path.join(output_dir, name), "wb")
file.write(img_response.content)
file.close()
downloaded += 1
print(f"Downloaded: {name}")
except:
pass
print(f"\nCompleted: {downloaded} images saved to file://{os.path.abspath(output_dir)}")

Advanced tips and best practices

Once you have a basic scraper working, you can make it more powerful and easier to maintain by adding a few extra features and safeguards.

Saving your scraped data to a CSV file or database helps you keep track of what you have collected and reuse the data later. Instead of only downloading images, you can store each image URL along with fields such as the page it came from, a timestamp, and any metadata you captured. For simple projects, a CSV file is often enough. For larger workflows or integrations with other tools, pushing records into a database makes filtering and querying much easier.

Scraping image metadata can be just as valuable as the images themselves. While you are already looping through <img> elements, you can also read attributes like alt, title, dimensions, or custom data attributes. If the page includes captions, photographer names, tags, or category labels near the image, these can be collected too by inspecting neighbouring elements in the HTML. This is especially useful when you are building datasets for machine learning or want better search and filtering later on.

Many modern sites rely on infinite scroll or lazy loading, which means new images only appear after you scroll or interact with the page. Browser automation tools let you simulate these actions by scrolling in steps, clicking Load more buttons, or waiting for new elements to appear. In practice, the script performs the same actions a user would, then repeats the extraction logic on the newly loaded content.

To be a responsible scraper, it is important to add rate limiting and avoid overwhelming the website. Introducing small delays between requests, limiting the number of pages you fetch in a single run, and reusing existing connections reduces load on the server and makes your scraper less likely to be blocked. Using realistic headers and not hammering endpoints with rapid-fire requests goes a long way.

Finally, solid error handling and logging turn a fragile script into a reliable tool. Network timeouts, missing attributes, redirects, and unexpected HTML changes are all common. Wrapping your requests and parsing logic in try-except blocks, logging failed URLs, and printing clear messages when something goes wrong will help you debug quickly. Over time, these logs become a useful record of how your scraper behaves and which parts of the site are more prone to issues.

Sample projects and use cases

Image scraping is useful far beyond simple downloading. Once you start gathering images at scale, the same techniques can power a wide range of practical projects.

One common application is building datasets for machine learning. Many computer vision models rely on large, well-organized image collections that include not just files, but also metadata such as labels, descriptions, or categories. By scraping thousands of visuals from curated sites and pairing them with tags or alt text, you can quickly assemble training data for tasks like object recognition, style detection, or recommendation systems.

In marketing research, image scraping helps teams track trends, competitor branding, visual ad themes, or product design developments. Collecting images from landing pages, eCommerce listings, or social campaigns makes it easier to analyze how brands present themselves, what styles get reused across industries, or how visual messaging changes over time. This data can be turned into insights for brand strategy, creative direction, or product positioning.

Content aggregation is another practical use case. Publishers and community platforms can pull visuals from multiple sources and curate them into galleries, feeds, newsletters, or inspiration boards. When combined with metadata such as titles, authors, or categories, it becomes possible to create searchable archives or automatically update content streams. In these contexts, scraped images become more than raw files – they form structured collections that support new products, discovery tools, and editorial work.

On a final note

Scraping images with Python can be as simple or as advanced as the website demands. Static pages let you fetch visuals quickly with tools like Requests and Beautiful Soup, while dynamic, JavaScript-driven platforms call for browser automation with Playwright. Once you learn to inspect pages, extract sources, and save the files, you become capable of gathering visual data efficiently and adapting your approach to virtually any site.

From here, the real value lies in how you use the images you've collected. You might build datasets for machine learning, analyze visuals for marketing or research, or organize them into searchable archives. As your needs grow, you can store metadata in databases, automate repeat scraping, or scale with proxies and cloud tools – turning a simple script into a powerful, reusable workflow.

free-trial.svg

Scrape smarter with proxies

Boost your scraper with Decodo’s residential proxies and capture images with outstanding success.

About the author

Dominykas Niaura

Technical Copywriter

Dominykas brings a unique blend of philosophical insight and technical expertise to his writing. Starting his career as a film critic and music industry copywriter, he's now an expert in making complex proxy and web scraping concepts accessible to everyone.


Connect with Dominykas via LinkedIn

All information on Decodo Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Decodo Blog or any third-party websites that may belinked therein.

Frequently asked questions

What are the best Python libraries for scraping images from websites?

For most static sites, the usual starting point is Requests for HTTP requests and Beautiful Soup for parsing HTML and extracting image tags. urllib can also download files using only the standard library, though it is a bit less convenient. For JavaScript-heavy pages, browser automation tools such as Playwright or Selenium are more suitable because they can render the page before scraping. If you need to process images after downloading them, libraries like Pillow are helpful for resizing, format conversion, or inspection.

How do I scrape images from websites that use JavaScript to load content?

For JavaScript-driven sites, you generally need to simulate a real browser rather than just fetching raw HTML. Tools like Playwright or Selenium load the page, run its scripts, and render all dynamic elements, including images. Once the page is fully loaded, you can select <img> elements, read their src or srcset attributes, and download the images. Sometimes you also need to scroll, click Load more buttons, or wait for specific elements to appear.

Why do I get a 403 error when trying to download images, and how can I fix it?

A 403 error usually means the server is refusing your request, often because it detects non-browser traffic or missing headers. You can often fix this by setting a realistic User-Agent header, reusing cookies or session headers from a normal browser visit, or respecting rate limits. In some cases, you may also need to use HTTPS correctly, handle redirects, or ensure your IP isn't blocked. If the site has strict anti-bot protection, more advanced techniques, such as employing proxies, may be needed.

Is it possible to scrape image metadata (like alt text, captions, or author information)?

Yes, you can often scrape metadata while you are already looping through image elements. Attributes such as alt, title, and srcset can be read directly from the <img> tag. Captions, photographer names, or tags may be stored in nearby HTML elements, so you can navigate the DOM around the image to capture that context. Storing this metadata alongside the image URL makes your dataset more useful for search, analysis, or machine learning.

What should I do if the website structure changes and my scraper breaks?

When a site changes its layout or HTML structure, selectors that used to work may suddenly fail. The first step is to reopen the page in DevTools, inspect the new markup, and update your selectors accordingly. Building your scraper with clear functions, logging, and minimal hardcoding makes these adjustments easier. For frequently changing sites, consider writing more flexible selectors or adding tests that alert you when parsing starts failing.

Can I use proxies to avoid getting blocked while scraping images?

Yes, proxies are commonly used for a more reliable scraping experience. They help distribute requests across different IP addresses and reduce the chance of being blocked. Make sure to get proxies from a reliable provider. But even then, it's still important to respect rate limits, use realistic headers, and avoid aggressive request patterns.

Is it legal to scrape images for personal or commercial use?

The legality of web scraping depends on several factors, including the website’s terms of service, copyright rules in your jurisdiction, and how you plan to use the images. In many cases, scraping may be technically possible but still restricted by terms or copyright, especially for commercial reuse or redistribution. It's best to review the site’s terms, check any licensing information provided with the images, and seek legal advice if you plan to use scraped content commercially.

What to do when getting parsing errors in Python?

What to do when getting parsing errors in Python?

This one’s gonna be serious. But not scary. We know how frightening the word “programming” could be for a newbie or a person with a little technical background. But hey, don’t worry, we’ll make your trip in Python smooth and pleasant. Deal? Then, let’s go!


Python is widely known for its simple syntax. On the other hand, when learning Python for the first time or coming to Python after having worked with other programming languages, you may face some difficulties. If you’ve ever got a syntax error when running your Python code, then you’re in the right place.


In this guide, we’ll analyze common cases of parsing errors in Python. The cherry on the cake is that by the end of this article, you’ll have learnt how to resolve such issues.

James Keenan

May 24, 2023

12 min read

How to scrape Google Images

How to Scrape Google Images: A Step-By-Step Guide

Google Images is arguably the first place anyone uses to find photographs, paintings, illustrations, and any other visual files on the internet. Its vast repository of visual content has become an essential tool for users worldwide. In this guide, we'll delve into the types of data that can be scraped from Google Images, explore the various methods for scraping this information, and demonstrate how to efficiently collect image data using our Web Scraping API.

Dominykas Niaura

Oct 28, 2024

7 min read

🐍 Python Web Scraping: In-Depth Guide 2025

Welcome to 2025, the year of the snake – and what better way to celebrate than by mastering Python, the ultimate "snake" in the tech world! If you’re new to web scraping, don’t worry – this guide starts from the basics, guiding you step-by-step on collecting data from websites. Whether you’re curious about automating simple tasks or diving into more significant projects, Python makes it easy and fun to start. Let’s slither into the world of web scraping and see how powerful this tool can be!

Zilvinas Tamulis

Feb 28, 2025

15 min read

© 2018-2025 decodo.com (formerly smartproxy.com). All Rights Reserved