403 Forbidden Error and How to Avoid It

A 403 Forbidden error is a rejection signal you'll often hit when sending automated scraping requests. It means the server understood your request and decided not to serve it. In scraping workflows, that often happens because your request looks like a bot.

The fix depends on the signal that triggered the block. A missing User-Agent needs a different fix than a blocklisted proxy IP. A TLS fingerprint mismatch needs a different fix than a broken cookie session.

TL;DR

In web scraping, a 403 usually means your request looked automated, not that the page disappeared.
Start with diagnosis: headers, cookies, timing, TLS fingerprint, IP reputation, and location can all trigger 403.
Don't treat proxies as a universal fix. They help when the IP is the problem and hurt when they break session continuity.
Fix the cheapest layer first: headers, cookies, and request pacing. Escalate to better proxies or browser-like clients only when the symptom points there.

What is a 403 Forbidden error?

403 Forbidden means the server understood the request but refused access to the resource. That makes it different from similar HTTP status codes:

Status code

Meaning

401 Unauthorized

Authentication is required or failed.

403 Forbidden

The request was understood, but access was refused.

404 Not Found

The resource doesn't exist at that URL, or the server hides whether it exists.

429 Too Many Requests

The client sent too many requests in a given period.

Some protected sites return 403 instead of 429 when they detect scraping. That hides the rate-limit threshold. If every excessive request returned 429, the limit would be easier to reverse-engineer. 403 is less helpful on purpose.

In practice, 403 has 2 main causes:

Access restriction. The page may require a logged-in session, a specific account role, or traffic from an allowed country.
Bot detection. Public pages can reject automated traffic based on IP reputation, HTTP headers, TLS fingerprint, cookie state, or behavior patterns.

Diagnosing your 403: three symptom patterns

Don't try fixing a 403 by randomly stacking headers, delays, and proxies. Instead, start by looking at when the failure happens. The pattern usually tells you which layer to inspect first.

Symptom

Likely cause

First fix to try

403 on the first request

Missing headers, a bot-like User-Agent, or poor IP reputation

Send browser-like headers and test with a cleaner IP

403 after several successful responses

Rate limiting, broken session continuity, or missing cookies

Slow down, persist cookies, and reuse the same session

Page loads in a browser but returns 403 in a script

Browser fingerprint mismatch, missing JavaScript execution, or incomplete headers

Try curl_cffi impersonation or a headless browser

In a first-request failure, the server hasn't seen your behavior yet, so it's judging the request profile: IP address, User-Agent, and headers.

Mid-session failures are more behavioral. If 5 pages work and the 6th fails, check timing, cookies, and whether your scraper changed IPs halfway through the same session.

Browser-only failures usually mean your script says "Chrome" but doesn't behave like Chrome. A real browser sends a broader request profile than a basic HTTP client. That includes headers such as Sec-Fetch-Site, Sec-Fetch-Mode, and Accept-Language. It also includes a TLS handshake profile that some bot systems compare against the browser you claim to be.

Before changing code, inspect what your client actually sends. An echo endpoint such as httpbin.org/headers shows your request headers:

import requests

response = requests.get("https://httpbin.org/headers")
print(response.text)

It's also important to choose the right client you use when sending requests. Different Python clients such as Requests, httpx, aiohttp, and other browser-like clients send different default headers and have different TLS behavior.

Why proxies sometimes cause 403 errors instead of fixing them

A proxy doesn't make a bad request look human. It changes the network path, IP identity, and sometimes location. That helps only when the 403 comes from IP reputation or geo access.

Datacenter IP reputation

Many datacenter ranges are easy to identify because they belong to hosting providers. If a site rejects known cloud or hosting CIDR blocks, better headers won't save the request.

If IP reputation is the issue, residential proxies or ISP proxies are usually a better fit than datacenter proxies. Residential IPs are assigned by ISPs, so they're less likely to fail checks that reject obvious datacenter networks. ISP proxies can help when the target also expects a stable IP across the same session.

Proxy headers

A normal request to the target might look like this:

GET /product/123 HTTP/1.1
Host: quotes.toscrape.com
User-Agent: Mozilla/5.0 ...
Accept: text/html,application/xhtml+xml

A misconfigured proxy path can leak headers that identify the request as proxied:

GET /product/123 HTTP/1.1
Host: quotes.toscrape.com
User-Agent: Mozilla/5.0 ...
X-Forwarded-For: 203.0.113.10
Via: 1.1 proxy
Proxy-Authorization: Basic ...

X-Forwarded-For, Via, and Proxy-Authorization shouldn't reach the target server in a normal scraping request. If they appear upstream, the proxy path is exposing more than it should.

Rotating proxies

Rotating proxies help distribute requests, but rotating every request can fragment sessions. The site sees one cookie jar jumping across many IPs, while real browsing sessions usually keep the same network identity for at least a short period.

Geo-mismatch

If the site only serves a page in certain countries, a proxy in the wrong location can trigger 403.

How to fix 403 Forbidden errors: an escalation checklist

Treat 403 fixes as an escalation path. Start with the lowest-cost layer: headers, cookies, and request pacing. Move to proxies, TLS impersonation, or browsers only when the symptom points there.

Fix 1: Set a realistic User-Agent and complete headers

Python requests library doesn't send a full browser profile by default. Many sites will accept a basic HTTP client, but more protected sites often won't:

import requests

url = "https://quotes.toscrape.com"

headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
}

response = requests.get(url, headers=headers, timeout=30)
print(response.status_code)

Use this set of headers as a baseline, then mirror what your own browser sends to the same target.

Fix 2: Add request delays and avoid patterns

Use randomized delays and back off after failures. For many sites, a 2-5 second minimum delay is a reasonable starting point, then adjust based on the target's response behavior:

import random
import time
import requests

for url in urls:
response = requests.get(url, headers=headers, timeout=30)
print(url, response.status_code)
time.sleep(random.uniform(2, 5))

Fix 3: Use a session object and persist cookies

Bare requests.get() calls don't preserve cookies across requests unless you manage them yourself. A requests.Session() keeps cookies and connection state together:

import requests

session = requests.Session()
session.headers.update(headers)

home = session.get("https://books.toscrape.com/", timeout=30)
book = session.get("https://books.toscrape.com/catalogue/tipping-the-velvet_999/", timeout=30)

print(book.status_code)

This matters when a site issues a session cookie on the first page and expects it on the next one.

Fix 4: Route through residential proxies

If the same request works from your browser but fails from a server or datacenter proxy, IP reputation is a likely cause. In that case, the fix is to use a better proxy.

Here's the basic shape in Python Requests:

import requests

proxy_url = "http://USERNAME:PASSWORD@gate.decodo.com:10001"

proxies = {
"http": proxy_url,
"https": proxy_url,
}

response = requests.get(
"https://quotes.toscrape.com",
headers=headers,
proxies=proxies,
timeout=30,
)

print(response.status_code)

Fix 5: Match TLS fingerprints with curl_cffi

If a page loads in a normal browser but returns 403 from your script, even with good headers, the problem may be TLS fingerprinting. The target may see a script claiming to be Chrome in the User-Agent, but sending a TLS handshake that Chrome wouldn’t send. curl_cffi can impersonate browser TLS behavior:

from curl_cffi import requests

response = requests.get(
"https://quotes.toscrape.com",
headers=headers,
impersonate="chrome",
timeout=30,
)

print(response.status_code)
print(response.text[:500])

Fix 6: Escalate to a headless browser

Use a headless browser when the site needs JavaScript execution, browser storage, dynamic tokens, or a more complete browsing context. Playwright is usually the practical next step.

Choose the right proxy type for 403 errors

The right proxy type depends on what triggered the 403. Don't start with the most expensive setup. Start with the simplest proxy that clears the actual check.

Proxy type

403 risk

Best fit

Trade-off

Datacenter proxies

Highest on protected targets

Fast access to low-protection sites

Easy to block based on IP reputation

Residential proxies

Lower

Targets where IP reputation matters

Rotation still requires cookie and session continuity

ISP proxies

Lower for session-sensitive targets

Stable sessions on targets that monitor IP continuity

Less rotation flexibility

Rotating proxies

Depends on the rotation strategy

Distributing requests across many IPs

Over-rotation can break sessions

Site Unblocker

Lowest manual tuning required

Targets that need managed proxy, browser, and anti-bot handling

Less low-level control

Use proxies for IP reputation, location, and continuity problems. Don't use them to paper over broken headers, missing cookies, predictable timing, or TLS mismatch. A clean IP won't solve broken cookies. Perfect headers won't fix a banned subnet.