Back to blog

Python Cloudscraper: Bypass Cloudflare Protection, Configure Proxies, and Handle Common Errors

Share article:

Most Python scrapers that use Requests stop working as soon as a site is protected by Cloudflare. You might see a 403 error, get stuck in a redirect loop, or land on a "Just a moment..." page that never loads. Cloudscraper solves this problem without needing a headless browser. It builds on Requests, handles Cloudflare's JavaScript challenges, and gives you a working session. This guide explains how to set up Cloudscraper, configure proxies, choose an interpreter, handle CAPTCHAs, parse data, fix common errors, and understand the library's limitations. If you're new to Python scraping, start with the Python web scraping guide first.

Using Python Cloudscraper

TL;DR

  • Cloudscraper is built on top of the Requests library and uses almost the same API. Just swap requests.get() for scraper.get(), and you are nearly done
  • Interpreter choice matters more than most settings: Node.js is the most reliable option; js2py (the default) increasingly fails on newer Cloudflare challenge formats
  • Cloudflare blocks data center IPs before Cloudscraper can even attempt to solve the challenge. Using residential proxies removes the strongest signal that you are a bot
  • Cloudscraper cannot handle Cloudflare Turnstile, Bot Fight Mode, or sites that use client-side rendering like SPAs. Be aware of these limits before you start building

What is the Cloudscraper Python library and how does it work?

Cloudscraper is a Python library that wraps Requests and adds Cloudflare challenge-solving logic. Call cloudscraper.create_scraper(), and you get back a Requests object.Session compatible object. Every method you already use, like scraper.get()scraper.post()scraper.put() works the same way. If the target site isn't behind Cloudflare, the library behaves like a plain Requests session with no overhead.

When it does hit a Cloudflare-protected site, here's what happens under the hood:

  • Cloudscraper detects the "Just a moment..." challenge page in the response
  • It parses the obfuscated JavaScript embedded in that page
  • It executes the math using a configurable interpreter, js2py by default, or Node.js if installed
  • It submits the computed answer back to Cloudflare and stores the resulting cf_clearance cookie for all subsequent requests

That cookie is what keeps the session alive. As long as the scraper instance persists, Cloudscraper reuses it across requests without re-solving the challenge.

The package ecosystem in 2026

The original Cloudscraper package (VeNoMouS/cloudscraper) is still on PyPI, but maintenance has slowed since 2023. It works for older IUAM (I'm Under Attack Mode) challenges, but newer Cloudflare formats increasingly return _403_s.

If the original package stops working, there are several community forks you can try. Each one works a bit differently:

  • cloudscraper25. A direct fork of the original. Stays close to the same API while adding support for newer challenge formats and additional interpreters. The lowest-friction upgrade path
  • Hybrid-engine forks (such as ai-cloudscraper). Fall back to browser automation tools like Playwright when pure JavaScript challenge solving fails. These projects often aim to preserve a Cloudscraper-like workflow while extending it with browser-based capabilities.
  • TLS-impersonation forks. Replace Python's default networking stack with libraries such as curl_cffi or TLS-Chameleon to mimic a real browser's TLS fingerprint during the handshake. This helps reduce detection before any Cloudflare challenge is presented.

This is important. The original Cloudscraper does not support TLS fingerprinting, so Cloudflare can still detect Python's urllib3 stack regardless of the headers you use. Matching headers is no longer enough because modern anti-bot systems also evaluate TLS and HTTP/2 fingerprints before any JavaScript challenge runs.

Forks that use curl_cffi can mimic a real browser's TLS handshake, which hides one of the main signs you are using a bot before any challenge starts.

Neither the original Cloudscraper nor any of its forks can handle Cloudflare Turnstile or Bot Fight Mode. These are hard limits, not something you can fix with configuration.

When Cloudscraper is the right tool

  • Sites using mid-level Cloudflare IUAM challenges on server-rendered HTML
  • Scrapers where spinning up a headless browser adds unnecessary overhead
  • If you have a scraper built with Requests that now returns 403 errors, Cloudscraper is a quick fix. You do not need to rewrite your code

For a deeper look at the Requests library that Cloudscraper extends, see how to master Python Requests. For a broader view of Cloudflare bypass methods beyond this library, see how to use a Cloudflare scraper for data extraction.

Setting up a Python environment for Cloudscraper

Set up your environment before you start writing any scraping code. Using a clean project structure and a virtual environment helps avoid dependency issues and makes it easier to switch Cloudscraper versions if needed.

Setting up the project

Create a dedicated project directory before writing any code:

mkdir cloudscraper-project
cd cloudscraper-project

Create the main script file and a folder for saved data:

touch scraper.py
mkdir output

Your project should now look like this:

cloudscraper-project/
├── output/
└── scraper.py

All code in this guide goes in scraper.py unless otherwise stated.

Prerequisites

  • Python 3.10 or higher. Cloudscraper supports older versions, but several dependencies have dropped Python 3.7 and 3.8 support.
  • Node.js installed and on PATH; optional, but strongly recommended. It’s the most reliable JavaScript interpreter option available to Cloudscraper.

Creating a virtual environment

Create and activate a virtual environment before installing anything:

# create the environment
python -m venv .venv
# activate on macOS/Linux
source .venv/bin/activate
# activate on Windows
.\.venv\Scripts\activate

Your project directory now looks like this:

cloudscraper-project/
├── .venv/
├── output/
└── scraper.py

If you already use uv for Python project management, you can run uv venv and uv pip install cloudscraper instead. Both methods work. The key is to keep the installation isolated.

Installing Cloudscraper

With the virtual environment active, install the package:

pip install cloudscraper

Pin a specific version for reproducibility in production:

pip install "cloudscraper==1.2.71"

If the default package returns _403_s against your target in 2026, switch to a community fork:

# direct fork -- closest to the original API
pip install cloudscraper25
# hybrid engine fork -- adds curl_cffi TLS spoofing and Playwright fallback
pip install ai-cloudscraper

Install the companion libraries needed for the parsing examples later in this guide:

pip install beautifulsoup4 lxml

Your final project structure before writing any scraping code:

cloudscraper-project/
├── .venv/
├── output/
└── scraper.py

Verifying the install

Run this before pointing Cloudscraper at a real target. It confirms the package imported correctly and that outbound requests are working:

import cloudscraper
scraper = cloudscraper.create_scraper()
response = scraper.get("https://httpbin.dev/ip")
print(response.status_code)
print(response.json())

Expected output:

200
{'origin': '203.0.113.42'}

If you get a 200 and a valid IP address, the setup is working. If the import fails, check that you’re running the script with the same Python interpreter the virtual environment uses. For a full breakdown of Python HTTP client options, see best Python HTTP clients for web scraping.

Basic usage: your first Cloudscraper request

There's no single way to use Cloudscraper, but the main steps are always the same. You create a scraper instance, send a request, and check the response. Everything else builds on these basics.

The core pattern

cloudscraper.create_scraper() returns an object that behaves like a requests.Session. The API is identical:

  • scraper.get(url) fetch a page
  • scraper.post(url, data=...) submit a form or API payload
  • scraper.put(url, data=...) update a resource

Response attributes work the same way too: response.status_coderesponse.textresponse.json(), and response.cookies are all available.

Your first request

Add this to scraper.py:

import cloudscraper
scraper = cloudscraper.create_scraper()
response = scraper.get("https://quotes.toscrape.com")
print(response.status_code)
print(response.text[:500])

Run it:

python scraper.py

Expected output:

200
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Quotes to Scrape</title>
...

Cloudscraper connected successfully and returned the page. If you see a "Just a moment..." page instead, the challenge wasn't solved. Check your interpreter setup in the next section.

Persisting the session

A single scraper instance reuses cookies, including the cf_clearance cookie, across all requests. This matters when crawling multiple pages of the same site:

import cloudscraper
scraper = cloudscraper.create_scraper()
# the challenge is solved once on the first request
page_1 = scraper.get("https://quotes.toscrape.com/page/1/")
page_2 = scraper.get("https://quotes.toscrape.com/page/2/")
page_3 = scraper.get("https://quotes.toscrape.com/page/3/")
print(page_1.status_code, page_2.status_code, page_3.status_code)

Don't call create_scraper() inside a loop. Doing this throws away the cf_clearance cookie each time and makes Cloudscraper solve the challenge for every request. This is slower and more likely to get you blocked.

# wrong -- creates a new session on every request
for url in urls:
scraper = cloudscraper.create_scraper()
response = scraper.get(url)
# correct -- one session, reused across all requests
scraper = cloudscraper.create_scraper()
for url in urls:
response = scraper.get(url)

Configuring browser profiles, headers, and cookies

Cloudflare checks request signatures very closely. Your browser profile, headers, and cookies must all match. Even one mismatch can get you blocked, no matter how well you set up everything else.

The browser= argument

The browser= argument tells Cloudscraper which User-Agent and matching accept headers to generate.

Add this to scraper.py:

import cloudscraper
# chrome on windows -- desktop profile
scraper = cloudscraper.create_scraper(
browser={
"browser": "chrome",
"platform": "windows",
"desktop": True
}
)
response = scraper.get("https://quotes.toscrape.com")
print(response.status_code)

Run it:

python scraper.py

Expected output:

200

Mobile profiles are often more permissive on aggressive Cloudflare setups:

# chrome on android -- mobile profile
scraper = cloudscraper.create_scraper(
browser={
"browser": "chrome",
"platform": "android",
"desktop": False
}
)

Make sure your profile is consistent. For example, sending an iOS User-Agent with Windows Sec-CH-UA hints will quickly get you blocked. Cloudscraper fills in the right headers for you when you set the profile correctly, so don't override them manually.

Custom headers

Add or override headers for every subsequent request using scraper.headers.update(). Add this to scraper.py:

import cloudscraper
scraper = cloudscraper.create_scraper(
browser={"browser": "chrome", "platform": "windows", "desktop": True}
)
scraper.headers.update({
"Accept-Language": "en-US,en;q=0.9",
"Referer": "https://www.google.com",
})
response = scraper.get("https://quotes.toscrape.com")
print(response.status_code)

Run it:

python scraper.py

Expected output:

200

For per-request headers, useful when crawling product pages that need a specific referrer – pass them directly to the request method:

response = scraper.get(
"https://quotes.toscrape.com/page/2/",
headers={"Referer": "https://quotes.toscrape.com/page/1/"}
)

Don't manually override AcceptAccept-Encoding, or any Sec-CH-UA-* headers. The browser profile sets these automatically, and they must match. Changing them will break the fingerprint.

Cookies

Session cookies persist automatically across requests on the same scraper instance. To set cookies manually:

scraper.cookies.set("name", "value", domain=".example.com")

To check whether Cloudscraper successfully solved the challenge, inspect the cf_clearance cookie after the first request. Add this to scraper.py:

import cloudscraper
scraper = cloudscraper.create_scraper()
response = scraper.get("https://quotes.toscrape.com")
clearance = response.cookies.get("cf_clearance")
if clearance:
print(f"Challenge solved. cf_clearance: {clearance}")
else:
print("No clearance cookie -- challenge may not have been solved")

Run it:

python scraper.py

Expected output:

No clearance cookie -- challenge may not have been solved

quotes.toscrape.com isn’t behind Cloudflare, so no cf_clearance cookie is issued. On a real Cloudflare-protected target, a successful solution returns the cookie and prints the first line instead.

If you don't see a cf_clearance cookie on a protected site, the issue is usually with the interpreter, the proxy IP, or a challenge format that your Cloudscraper version doesn't support.

To reuse a solved session across runs, serialize the cookies to disk.

Add this to scraper.py:

import json
import cloudscraper
import requests
scraper = cloudscraper.create_scraper()
response = scraper.get("https://quotes.toscrape.com")
# save cookies after a successful session
cookies = requests.utils.dict_from_cookiejar(scraper.cookies)
with open("cookies.json", "w") as f:
json.dump(cookies, f)
# reload on the next run
with open("cookies.json", "r") as f:
cookies = json.load(f)
scraper.cookies.update(cookies)
print("Cookies saved and reloaded successfully")

Run it:

python scraper.py

Expected output:

Cookies saved and reloaded successfully

For readers who need to go beyond header-level masking into full browser fingerprint spoofing, see how to bypass CreepJS and spoof browser fingerprinting.

Using proxies with Cloudscraper

Setting up proxies in Cloudscraper works the same way as in Requests. However, choosing the right type of proxy is even more important when dealing with Cloudflare. The wrong proxy can get you blocked before Cloudscraper even tries to solve the challenge.

Basic proxy configuration

Pass a proxies= dict to any request method. Add this to scraper.py:

import cloudscraper
scraper = cloudscraper.create_scraper()
proxies = {
"http": "http://YOUR_PROXY_USERNAME:YOUR_PROXY_PASSWORD@gate.decodo.com:7000",
"https": "http://YOUR_PROXY_USERNAME:YOUR_PROXY_PASSWORD@gate.decodo.com:7000",
}
response = scraper.get("https://quotes.toscrape.com", proxies=proxies)
print(response.status_code)

Run it:

python scraper.py

Expected output:

200

To set proxies on the scraper instance for all subsequent requests:

import cloudscraper
scraper = cloudscraper.create_scraper()
scraper.proxies.update({
"http": "http://YOUR_PROXY_USERNAME:YOUR_PROXY_PASSWORD@gate.decodo.com:7000",
"https": "http://YOUR_PROXY_USERNAME:YOUR_PROXY_PASSWORD@gate.decodo.com:7000",
})
response = scraper.get("https://quotes.toscrape.com")
print(response.status_code)

Why proxy choice matters more on Cloudflare targets

Not all proxies carry the same risk signal. Cloudflare scores the ASN (Autonomous System Number) of the incoming IP before running any JavaScript challenge:

  • Datacenter IPs from AWS, GCP, or DigitalOcean have an ASN that Cloudflare flags as suspicious by default. Even a perfectly set up Cloudscraper session will hit challenges right away from these addresses.
  • Residential IPs come from real ISP connections and look like normal user traffic at the ASN level. They don't bypass Cloudflare by themselves, but they remove one of the strongest signals before the challenge.
  • Rotating residential pools reduce the number of requests per IP, helping avoid rate-limit challenges. Each new request comes from a different IP address, so no single address gets flagged for excessive activity.

Before hitting a real target, confirm the proxy is routing correctly. Add this to scraper.py:

import cloudscraper
scraper = cloudscraper.create_scraper()
scraper.proxies.update({
"http": "http://YOUR_PROXY_USERNAME:YOUR_PROXY_PASSWORD@gate.decodo.com:7000",
"https": "http://YOUR_PROXY_USERNAME:YOUR_PROXY_PASSWORD@gate.decodo.com:7000",
})
# verify outbound IP before hitting the real target
response = scraper.get("https://httpbin.dev/ip")
print(response.json())

Run it:

python scraper.py

Expected output:

{'origin': '185.220.101.42'}

The IP in the response should match a residential address, not a datacenter range. If it matches your local machine’s IP, the proxy isn’t routing correctly.

Pitfalls

Two proxy configuration mistakes cause most connection errors:

  • HTTPS proxy URLs. Even when you are proxying HTTPS traffic, the proxy should start with http://, not https://. Using https:// in the proxy URL causes SSL handshake errors that may look like Cloudscraper bugs.
  • SOCKS5 proxies require an extra dependency. Cloudscraper inherits this from Requests:
pip install "requests[socks]"

Then configure the proxy with the correct scheme:

proxies = {
"http": "socks5://YOUR_PROXY_USERNAME:YOUR_PROXY_PASSWORD@gate.decodo.com:7000",
"https": "socks5://YOUR_PROXY_USERNAME:YOUR_PROXY_PASSWORD@gate.decodo.com:7000",
}

For more background on why residential proxies are the right choice for Cloudflare targets, see our blog post on what residential proxies are. To get started with Decodo’s rotating residential pool directly, see Decodo residential proxies.

Enhance your scraper with proxies

Claim your 3-day free trial of residential proxies and access 115M+ ethically-sourced IPs, advanced geo-targeting options, a 99.86% success rate, and more.

Handling JavaScript challenges and interpreter options

Cloudflare’s IUAM (I’m Under Attack Mode) challenges work by returning a small JavaScript snippet that the client must execute and submit. Cloudscraper runs this JS using one of several interpreters. The interpreter is one of the most important reliability factors when solving JavaScript challenges.

The interpreter= argument

Pass interpreter= to create_scraper() to set which engine executes the challenge JavaScript. Add this to scraper.py:

import cloudscraper
# node.js -- most reliable, requires node on PATH
scraper = cloudscraper.create_scraper(interpreter="nodejs")
response = scraper.get("https://quotes.toscrape.com")
print(response.status_code)

Run it:

python scraper.py

Expected output:

200

The 3 interpreter options available in 2026:

Interpreter

External dependencies

Reliability

Use case

nodejs

Node.js on PATH, verify with node --version

High

Production setups; handles modern complex JS obfuscation

js2py

None (pure Python)

Low

Quick local scripts; frequently fails on newer challenge formats

native

None

Very low

Last resort fallback; extremely limited challenge engine scope

You can use V8 and ChakraCore bindings through PyV8 and ChakraCore, but installing them is usually not worth the trouble in 2026. Use Node.js if you can.

The delay= argument

Cloudflare expects clients to wait a few seconds before submitting the challenge answer. Submitting instantly is itself a bot signal. Add this to scraper.py:

import cloudscraper
# wait 7 seconds before submitting the challenge answer
scraper = cloudscraper.create_scraper(
interpreter="nodejs",
delay=7
)
response = scraper.get("https://quotes.toscrape.com")
print(response.status_code)

Run it:

python scraper.py

Expected output:

200

A delay of 5 to 10 seconds is usually safe. Don't add a manual time.sleep() on top, because Cloudscraper already waits for you. Adding both just makes the delay longer without any benefit.

Combining interpreter and browser profile

For the most reliable configuration, set both together. Add this to scraper.py:

import cloudscraper
scraper = cloudscraper.create_scraper(
interpreter="nodejs",
delay=7,
browser={
"browser": "chrome",
"platform": "windows",
"desktop": True
}
)
response = scraper.get("https://quotes.toscrape.com")
print(response.status_code)

Run it:

python scraper.py

Expected output:

200

Diagnosing challenge failures

Two error patterns cover most interpreter failures:

  • "Could not collect any of the parameters needed for Cloudflare IUAM JS challenge". The challenge format changed and the installed Cloudscraper version doesn’t recognize it. Upgrade to the latest version or switch to cloudscraper25.
  • Silent 403 with HTML containing cf-chl but no cf_clearance cookie set. The JS executed, but the answer was rejected. This is usually a TLS fingerprint mismatch or a proxy IP problem, not the interpreter.

Check which issue you’re dealing with before changing the interpreter. Switching interpreters won’t fix a datacenter IP problem.

Handling CAPTCHAs with third-party solvers

When Cloudflare switches from a JavaScript challenge to a visible CAPTCHA, Cloudscraper cannot solve it by itself. It can send the CAPTCHA to a third-party solving service using a built-in integration, but you should know the costs and limits before using this option.

Supported solver services

Cloudscraper has built-in support for the following providers: 2captcha, Anti-Captcha, CapSolver, CapMonster Cloud, DeathByCaptcha, and 9kw. The integration is handled through the captcha= argument.

Configuration

Add this to scraper.py:

import cloudscraper
import os
scraper = cloudscraper.create_scraper(
interpreter="nodejs",
captcha={
"provider": "2captcha",
"api_key": os.environ["CAPTCHA_API_KEY"]
}
)
response = scraper.get("https://quotes.toscrape.com")
print(response.status_code)

Run it:

python scraper.py

Expected output:

200

Never put API keys directly in your script. Sign up on your chosen provider's website, copy the API key from your dashboard, and load it from an environment variable as shown above. For example, use 2captcha.com for 2captcha, anti-captcha.com for Anti-Captcha, or capsolver.com for CapSolver. To set the variable before running:

# macOS/Linux
export CAPTCHA_API_KEY="your_api_key_here"
# Windows
set CAPTCHA_API_KEY="your_api_key_here"

To debug without spending solver credits, use the return_response provider. It returns the unsolved challenge page to the caller so you can inspect what Cloudscraper is seeing:

import cloudscraper
scraper = cloudscraper.create_scraper(
captcha={"provider": "return_response"}
)
response = scraper.get("https://quotes.toscrape.com")
print(response.text[:500])

Run it:

python scraper.py

Cost reality

Solver services charge per CAPTCHA solved. As of early 2026, reCAPTCHA solving generally costs less than a few dollars per 1,000 challenges, though pricing varies by provider, CAPTCHA type, and market conditions.

If you face a lot of CAPTCHAs, it's usually better to avoid triggering them than to solve them after the fact. Using residential IPs and a consistent browser fingerprint helps reduce how often Cloudflare shows a CAPTCHA. Only solve CAPTCHAs you cannot avoid.

What Cloudscraper can’t solve, even with a solver

Three challenge types are outside Cloudscraper reach regardless of which provider you use:

  • Cloudflare Turnstile. The newer invisible CAPTCHA replacement. Not supported by the original Cloudscraper package and generally outside the scope of Cloudscraper-style challenge solvers.
  • hCaptcha Enterprise with PAT tokens. Solver services haven’t modeled the token validation layer.
  • Site-specific CAPTCHA variants. Custom implementations that solver services don’t cover.

If your target uses any of these challenge types, Cloudscraper won't work. Check the limitations section for alternatives.

For a broader look at CAPTCHA bypass strategies, see the ultimate guide to how to bypass CAPTCHAs.

Parsing HTML with Beautiful Soup and saving data

Cloudscraper only downloads HTML; it doesn't parse it. To parse the data, use Beautiful Soup and lxml. This section shows the full process: fetch with Cloudscraper, parse with Beautiful Soup, and save the results to JSON.

Why lxml over html.parser

Beautiful Soup can use different parsers. lxml is faster than the built-in html.parser and is better at handling broken HTML. Install it if you haven't already:

pip install beautifulsoup4 lxml

Fetching and parsing a page

The demo target is quotes.toscrape.com, a public sandbox built for scraping practice. The goal is to extract the quote text, author name, and tags from each .quote block on the page.

Add this to scraper.py:

import cloudscraper
from bs4 import BeautifulSoup
scraper = cloudscraper.create_scraper(
interpreter="nodejs",
browser={"browser": "chrome", "platform": "windows", "desktop": True}
)
response = scraper.get("https://quotes.toscrape.com")
soup = BeautifulSoup(response.text, "lxml")
quotes = []
for block in soup.select(".quote"):
quote = {
"text": block.find(class_="text").get_text(strip=True),
"author": block.find(class_="author").get_text(strip=True),
"tags": [tag.get_text(strip=True) for tag in block.select(".tag")],
}
quotes.append(quote)
for quote in quotes:
print(quote)

Run it:

python scraper.py

Expected output:

{'text': '"The world as we have created it is a process of our thinking."', 'author': 'Albert Einstein', 'tags': ['change', 'deep-thoughts', 'thinking', 'world']}
{'text': '"It is our choices, Harry, that show what we truly are, far more than our abilities."', 'author': 'J.K. Rowling', 'tags': ['abilities', 'choices']}
...

Crawling paginated pages

quotes.toscrape.com paginates results across /page/1//page/2/, and so on until a 404. The same scraper instance keeps its cookies across all page requests, so the challenge is solved only once.

Before parsing, each response is checked to make sure it is real content. A 200 status code doesn't always mean you got the right page. A "Just a moment…" challenge page can also return 200. The cf-chl check helps catch this before Beautiful Soup tries to parse. In the loop, .find().get_text(strip=True) is used carefully for each element. If something is missing, the try/except block catches the error, logs it, and continues without stopping the whole crawl.

The cf-chl check does not trigger on quotes.toscrape.com because it's not protected by Cloudflare. It's included as a safety measure for real Cloudflare-protected sites, where a challenge page can also return a 200 status. Without this check, your parser might collect bad data without warning.

Add this to scraper.py:

import cloudscraper
from bs4 import BeautifulSoup
scraper = cloudscraper.create_scraper(
interpreter="nodejs",
browser={"browser": "chrome", "platform": "windows", "desktop": True}
)
quotes = []
page = 1
while True:
response = scraper.get(f"https://quotes.toscrape.com/page/{page}/")
# validate the page is real content, not a challenge that slipped through
# quotes.toscrape.com is not behind Cloudflare, so this never triggers here
# swap in a real Cloudflare-protected target and this check becomes critical
if "cf-chl" in response.text:
print(f"Challenge page detected on page {page}. Check your proxy or interpreter.")
break
soup = BeautifulSoup(response.text, "lxml")
quote_blocks = soup.select(".quote")
# stop when no quote blocks are found -- quotes.toscrape.com returns
# a 200 with a "No quotes found" message on the last page, not a 404
if not quote_blocks:
print(f"Reached end of results at page {page}.")
break
for block in quote_blocks:
try:
# use get_text(strip=True) defensively -- a 200 doesn't guarantee the expected DOM
quotes.append({
"text": block.find(class_="text").get_text(strip=True),
"author": block.find(class_="author").get_text(strip=True),
"tags": [tag.get_text(strip=True) for tag in block.select(".tag")],
})
except AttributeError as e:
print(f"Parse error on page {page}: {e} -- skipping block")
continue
print(f"Page {page} scraped -- {len(quotes)} quotes collected so far.")
page += 1

Run it:

python scraper.py

Expected output:

Page 1 scraped -- 10 quotes collected so far.
Page 2 scraped -- 20 quotes collected so far.
...
Reached end of results at page 11.

Saving data to JSON

Once all pages are scraped, write the results to a file in the output folder. Add this to scraper.py:

import json
import cloudscraper
from bs4 import BeautifulSoup
scraper = cloudscraper.create_scraper(
interpreter="nodejs",
browser={"browser": "chrome", "platform": "windows", "desktop": True}
)
quotes = []
page = 1
while True:
response = scraper.get(f"https://quotes.toscrape.com/page/{page}/")
# validate the page is real content, not a challenge that slipped through
# quotes.toscrape.com is not behind Cloudflare, so this never triggers here
# swap in a real Cloudflare-protected target and this check becomes critical
if "cf-chl" in response.text:
print(f"Challenge page detected on page {page}. Check your proxy or interpreter.")
break
soup = BeautifulSoup(response.text, "lxml")
quote_blocks = soup.select(".quote")
# stop when no quote blocks are found -- quotes.toscrape.com returns
# a 200 with a "No quotes found" message on the last page, not a 404
if not quote_blocks:
print(f"Reached end of results at page {page}.")
break
for block in quote_blocks:
try:
# use get_text(strip=True) defensively -- a 200 doesn't guarantee the expected DOM
quotes.append({
"text": block.find(class_="text").get_text(strip=True),
"author": block.find(class_="author").get_text(strip=True),
"tags": [tag.get_text(strip=True) for tag in block.select(".tag")],
})
except AttributeError as e:
print(f"Parse error on page {page}: {e} -- skipping block")
continue
page += 1
# save to output/quotes.json
with open("output/quotes.json", "w", encoding="utf-8") as f:
json.dump(quotes, f, ensure_ascii=False, indent=2)
print(f"Saved {len(quotes)} quotes to output/quotes.json.")

Run it:

python scraper.py

Expected output:

Saved 100 quotes to output/quotes.json.

Open output/quotes.json to find the full dataset. Each entry follows this structure:

{
"text": "\"The world as we have created it is a process of our thinking.\"",
"author": "Albert Einstein",
"tags": ["change", "deep-thoughts", "thinking", "world"]
}

For a fuller walkthrough of Beautiful Soup, see how to parse scraped HTML with Python. For readers who want to use lxml directly without Beautiful Soup, see the lxml tutorial: parsing HTML and XML documents.

Common Cloudscraper errors and how to fix them

This section covers the errors you’ll actually hit when using Cloudscraper. Each one follows the same pattern: error, likely cause, fix.

ModuleNotFoundError: No module named "cloudscraper"

Cause: the package is installed in a different Python environment than the one running the script.

Fix: check which Python interpreter is active:

which python
which pip

If they point to different locations, the environment is mismatched. Reactivate the virtual environment and reinstall:

source .venv/bin/activate
pip install cloudscraper

403 Forbidden on every request

This is the most common error and has 2 distinct causes:

Cause 1: TLS fingerprint mismatch. Cloudflare reads the TLS handshake (JA3/JA4 fingerprint) before inspecting any HTTP headers. Python’s urllib3 stack is identifiable no matter what User-Agent or browser profile you set in Cloudscraper. No configuration change can fix this at the Cloudscraper level.

Cause 2: datacenter IP. Even a perfect Cloudscraper setup fails from an AWS, GCP, or DigitalOcean ASN. Cloudflare scores the IP before running any challenge logic.

Fix: route requests through residential proxies. If 403s persist after switching to residential IPs, the target is likely scoring TLS fingerprints; switch to curl_cffi for impersonated TLS handshakes or use a managed unblocker.

"Could not collect any of the parameters needed for Cloudflare IUAM JS challenge"

Cause: the installed Cloudscraper version doesn’t recognize the current challenge format. Cloudflare updates its challenge pages regularly.

Fix: upgrade to the latest version first:

pip install --upgrade cloudscraper

If the error persists, switch to a maintained fork:

pip install cloudscraper25

RecursionError: maximum recursion depth exceeded

Cause: Cloudscraper gets caught in a redirect loop on a misconfigured target site.

Fix: disable automatic redirects and handle them manually:

import cloudscraper
scraper = cloudscraper.create_scraper()
response = scraper.get("https://quotes.toscrape.com", allow_redirects=False)
print(response.status_code)

As a temporary workaround while debugging:

import sys
sys.setrecursionlimit(2000)

Don’t leave setrecursionlimit in production code. It masks the underlying problem instead of fixing it.

Empty response body, status 200

Cause: the page returned was a Cloudflare challenge HTML page, not the real content, but Cloudscraper marked it as solved. This happens when the challenge format has changed, and the installed version no longer parses it correctly.

Fix: add a content check after every request:

import cloudscraper
scraper = cloudscraper.create_scraper()
response = scraper.get("https://quotes.toscrape.com")
if "cf-chl" in response.text:
print("Challenge page returned -- check proxy IP or upgrade cloudscraper")
else:
print(response.text[:500])

If the check triggers consistently, rotate to a different proxy IP or upgrade to cloudscraper25.

SSL: CERTIFICATE_VERIFY_FAILED

Cause: three possible sources: an outdated certifi bundle, a MITM proxy in the network path, or a proxy URL using https:// instead of http://.

Fix: start with the simplest option first:

pip install --upgrade certifi

Then check the proxy URL scheme it must start with http:// even when proxying HTTPS traffic:

proxies = {
# correct
"https": "http://YOUR_PROXY_USERNAME:YOUR_PROXY_PASSWORD@gate.decodo.com:7000",
# wrong -- causes SSL handshake errors
"https": "https://YOUR_PROXY_USERNAME:YOUR_PROXY_PASSWORD@gate.decodo.com:7000",
}

If the error persists in a debugging context only, verify=False disables certificate verification temporarily:

response = scraper.get("https://quotes.toscrape.com", verify=False)

Never use verify=False in production. It turns off certificate validation completely and exposes your requests to interception.

For retry patterns that compose cleanly with Cloudscraper’s session API, see Retry failed Python requests in 2026. For a deeper look at SSL errors specifically, see How to fix SSLError in Python Requests: causes and solutions.

Limitations of Cloudscraper and when to use alternatives

Cloudscraper is a good starting tool, but it has strict limits. Knowing these limits saves you time that might be wasted trying to fix problems that cannot be solved by changing the configuration.

Hard limits

These are not rare issues; they are built-in limitations of the tool:

  • No TLS fingerprint spoofing. Python’s default TLS stack is identifiable as urllib3 at the handshake level, no matter what headers or browser profile you use. Cloudflare scores TLS fingerprints heavily on modern deployments.
  • No JavaScript rendering. Client-side React and Vue apps return empty containers, and Cloudscraper cannot execute them.
  • No Cloudflare Turnstile support. The newer invisible challenge replacement cannot be solved by Cloudscraper or any third-party solver integration.
  • No support for Bot Fight Mode or Cloudflare Enterprise tiers. These work at a level Cloudscraper was never designed to handle.

When to switch and what to use instead

Situation

Alternative

Site uses TLS fingerprinting

curl_cffi — impersonates a real browser’s TLS handshake, still on top of a Requests-style API

Site is a client-rendered SPA

undetected-chromedriver, nodriver, or Playwright with stealth plugins

Site uses Turnstile or aggressive bot management

Decodo Site Unblocker or full browser automation through residential proxies

Site triggers rate-limit challenges only

Keep Cloudscraper, add rotating residential proxies to spread requests across IPs

How to read the table

TLS fingerprinting and Turnstile are the two main limits you will face on tough Cloudflare sites in 2026. If a site worked with plain Requests and some custom headers a few years ago, Cloudscraper is still a good place to start. If the site now uses TLS scoring, Turnstile, or Bot Fight Mode, use Cloudscraper to test, but plan to switch to a more advanced tool.

The open-source path for TLS fingerprinting is curl_cffi. It impersonates Chrome’s TLS handshake at the socket level and exposes a Requests-compatible API, so the migration from Cloudscraper is minimal. For sites that require full browser automation, nodriver is the current recommended option over undetected-chromedriver.

For a broader look at the anti-bot landscape and how to choose between tools, see navigating anti-bot systems: pro tips for 2026. For a full breakdown of what nodriver is and how it works, see nodriver explained: how undetected Chromedriver’s successor actually works.

Final thoughts

Cloudscraper is best used as a quick upgrade for Requests on sites with mid-level Cloudflare IUAM protection. If your Requests-based scraper suddenly gets 403 errors, Cloudscraper can fix it without you needing to rewrite your code.

Three things matter most for reliability: choosing the right proxy, keeping your browser profile consistent, and picking the right interpreter. Focus on these before trying retries or solver integrations.

There are real limits to what Cloudscraper can do. TLS fingerprinting, Turnstile, and Bot Fight Mode are all beyond its reach, and no settings can change that. When you run into these issues, use curl_cffi for TLS, nodriver for full browser automation, or Decodo Site Unblocker for a managed solution. If you want to keep using Cloudscraper and handle IP rotation yourself, Decodo rotating residential proxies can help.

Use Cloudscraper for the tasks it handles well, and be ready to switch tools when you reach its limits.

Get residential proxy IPs

Claim your free trial of residential proxies and explore full features with unrestricted access.

Share article:

About the author

Mykolas Juodis

Head of Marketing

Mykolas is a seasoned digital marketing professional with over a decade of experience, currently leading Marketing department in the web data gathering industry. His extensive background in digital marketing, combined with his deep understanding of proxies and web scraping technologies, allows him to bridge the gap between technical solutions and practical business applications.

Connect with Mykolas via LinkedIn.

All information on Decodo Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Decodo Blog or any third-party websites that may belinked therein.

Frequently asked questions

What is the difference between Requests and Cloudscraper in Python?

Requests is a standard HTTP client. It sends requests and returns responses with no challenge-solving logic. Cloudscraper wraps Requests and adds Cloudflare IUAM challenge detection on top. If the target is not behind Cloudflare, both behave the same. If it is, plain Requests returns a 403 or a challenge page, while Cloudscraper tries to solve it and return the real content.

Can Cloudscraper bypass Cloudflare Turnstile?

No. Turnstile is Cloudflare’s newer invisible challenge replacement and is outside what Cloudscraper can handle, even with a third-party solver integration. If your target uses Turnstile, switch to a managed unblocker like Decodo Site Unblocker or use full browser automation with residential proxies.

Why am I getting a 403 error even with Cloudscraper installed?

Two causes cover most cases. The first is a datacenter IP. Cloudflare scores the ASN of the incoming request before running any challenge logic, and datacenter ranges from AWS, GCP, or DigitalOcean are flagged right away. The second is TLS fingerprinting. Cloudflare identifies Python’s urllib3 stack at the handshake level, no matter what headers or browser profile you use. Fix the IP problem first with residential proxies. If 403s persist, the target is scoring TLS fingerprints, so switch to curl_cffi.

Should I use cloudscraper or cloudscraper25 in 2026?

Start with the original Cloudscraper package. If it returns consistent 403s or throws "Could not collect any of the parameters needed for Cloudflare IUAM JS challenge", switch to cloudscraper25; it’s more actively maintained and supports newer challenge formats. If you need TLS fingerprint spoofing or a Playwright fallback, ai-cloudscraper goes further with a hybrid engine approach.

Dashboard UI showing response JSON with "status_code":200 and "url":"https://example.com" on dark gradient background

How to Use a Cloudflare Scraper for Data Extraction

Cloudflare protects over 20% of all websites, and its anti-bot system can shut your scraper down in seconds. A Cloudflare scraper is any tool or script that gets past those defenses to pull data from protected sites. This guide breaks down how Cloudflare spots bots, why most scrapers fail, and how to scrape with Decodo's Web Scraping API.

Python logo rising from a bowl beside a spoon in neon magenta on dark background

Beautiful Soup Web Scraping: How to Parse Scraped HTML with Python

Web scraping with Python is a powerful technique for extracting valuable data from the web, enabling automation, analysis, and integration across various domains. Using libraries like Beautiful Soup and Requests, developers can efficiently parse HTML and XML documents, transforming unstructured web data into structured formats for further use. This guide explores essential tools and techniques to navigate the vast web and extract meaningful insights effortlessly.

Python logo centered, stylized in magenta inside a dashed circular ring on dark background

Mastering Python Requests: A Comprehensive Guide to Using Proxies

When using Python's Requests library, proxies can help with tasks like web scraping, interacting with APIs, or accessing geo-restricted content. Proxies route HTTP requests through different IP addresses, helping you avoid IP bans, maintain anonymity, and bypass restrictions. This guide covers how to set up and use proxies with the Requests library. Let’s get started!

© 2018-2026 decodo.com (formerly smartproxy.com). All Rights Reserved