Back to blog

How to Scrape Amazon Reviews

Amazon is the go-to destination for online shoppers – and with that comes a treasure trove of customer reviews. These reviews provide invaluable insights for businesses looking to understand consumer preferences, researchers tracking market trends, and shoppers making well-informed decisions. In this guide, we’ll explore the types of data you can extract from Amazon reviews, outline various scraping methods, and show you how to efficiently scrape reviews using Python and our powerful residential proxies.

Dominykas Niaura

Aug 04, 2025

10 min read

How to Scrape Amazon Reviews

What are Amazon reviews?

Amazon reviews are user-generated feedback provided by customers who have purchased and used products available on Amazon's platform. These reviews play a crucial role in the eCommerce ecosystem by offering insights into product quality, functionality, and user satisfaction. They help potential buyers make informed decisions and enable sellers to understand customer experiences and areas for improvement.

Each Amazon review typically consists of several key data points:

  • Review ID. A unique identifier assigned to each review.
  • Title. A brief headline summarizing the review, often including the star rating and a concise opinion.
  • Author. The username or display name of the customer who wrote the review.
  • Rating. The star rating given by the reviewer on a scale from 1 to 5.
  • Content. The main body of the review where the customer shares their detailed thoughts and experiences.
  • Timestamp. The date and location when and where the review was posted.
  • Profile ID. A unique identifier associated with the reviewer's Amazon profile.
  • Verified purchase status. Indicates whether the reviewer purchased the product through Amazon, adding credibility to the review.
  • Helpful count. The number of other users who found the review helpful.
  • Product attributes. Specific details about the product variant being reviewed, such as color, size, or style.

These components allow customer sentiment analysis, competitor monitoring, product performance tracking, and gaining insights into consumer behavior. By examining these data points, businesses can identify trends, address issues, and enhance their products or services to better meet customer needs.

Scraping Amazon customer reviews: best methods

Amazon reviews scraping typically involves using a service, automated tools, or software to programmatically extract customer review data from Amazon's product pages. Here’s a list of the best ways you can get Amazon reviews data:

1. Buying datasets from third-party services

One option for obtaining Amazon review data is to use third-party services that offer pre-collected datasets. There are several companies that specialize in aggregating large volumes of data from various sources, including eCommerce platforms like Amazon.

By purchasing these datasets, you can access extensive review information without the need to build your own scraper or manage the complexities of data extraction.

These services often provide the data in structured, ready-to-use formats, which can save you significant time and resources. Also, if it’s a reputable provider, this data is collected ethically and in compliance with all relevant laws and Amazon’s terms of service.

2. Using web scraping tools

If you're looking for a plug-and-play solution and don't want to write any code, third-party web scraping tools might still do the trick. However, most of them now struggle with Amazon reviews due to updated anti-bot measures. Many previously reliable APIs have lost access or become unstable.

Unless the tool supports custom proxy integration and active maintenance, it's safer to stick with your own solution.

3. Building a custom solution

For most use cases, a custom-built scraper is the most flexible and future-proof option. It gives you full control over what data you collect, how often, and in what format. Whether you're targeting specific products, running sentiment analysis, or integrating review data into a broader pipeline, a tailored scraper setup lets you build around your exact needs.

The best approach is to combine a lightweight Python scraper (e.g., using Requests, httpx, or Selenium) and combine it with proxies to maintain stable access, bypass anti-bot systems, and stay in control of your data extraction process. This setup is flexible, scriptable, and easy to adapt as Amazon’s page structure or anti-bot logic evolves.

Why proxies are necessary for stable scraping

When scraping Amazon reviews, proxies are extremely helpful. Amazon has robust anti-bot mechanisms that can quickly block your IP address if it detects unusual behavior, such as sending too many requests within a short time. By routing your traffic through different IP addresses, proxies help you distribute requests and avoid hitting rate limits or triggering CAPTCHAs.

For the best results, it's recommended to use residential proxies. Residential IPs are associated with real internet service providers, making your traffic appear more like a typical user browsing from home. Ideally, use a rotating proxy service that automatically assigns a new IP address with each request, providing maximum coverage and minimizing the chance of bans.

At Decodo, we offer residential proxies with a high success rate (99.86%), a rapid response time (<0.6s), and extensive geo-targeting options (195+ worldwide locations). Here's how easy it is to get a plan and your proxy credentials:

  1. Head over to the Decodo dashboard and create an account.
  2. On the left panel, click Residential.
  3. Choose a subscription, Pay As You Go plan, or opt for a 3-day free trial.
  4. In the Proxy setup tab, select the location, session type, and protocol according to your needs.
  5. Copy your proxy address, port, username, and password for later use. Alternatively, you can click the download icon in the lower right corner of the table to download the proxy endpoints (10 by default).

Get residential proxy IPs

Claim your 3-day free trial of residential proxies and explore full features with unrestricted access.

Easy way to scrape Amazon product reviews

If you want to collect Amazon product reviews reliably and on your own terms, the best method right now is to use a custom Python script with residential proxies. This gives you full control, keeps your data fresh, and avoids relying on unstable third-party APIs (which often lose access anyway). Let’s walk through a simple setup.

1. Install prerequisites

Before running the Amazon review scraper, make sure you have Python 3.8+ installed. Then, you'll need to install a few Python libraries. The script uses both built-in Python modules and some external packages:

  • time – for delays between requests
  • random – for randomizing delays
  • json – for saving review data
  • logging – for error messages
  • re – for regex pattern matching
  • Requests – for making HTTP requests to Amazon
  • lxml – for parsing HTML and using XPath selectors

Use this command to install the required external libraries:

pip install requests lxml

2. Set up proxy credentials

The script requires proxy credentials to avoid getting blocked by Amazon. Once you have your Decodo proxy credentials, you'll need to update several variables in the usage section at the bottom of the script:

  • Replace 'YOUR_PROXY_USERNAME' with your proxy username
  • Replace 'YOUR_PROXY_PASSWORD' with your proxy password
  • In the line "proxy = f"http://{username}:{password}@gate.decodo.com:7000," you can replace gate.decodo.com:7000 with your specific proxy address and port depending on the parameters you configure in Decodo's dashboard

3. Configure target product

You'll need to specify which Amazon product you want to scrape reviews from by updating the product URL. Change 'https://www.amazon.com/dp/B07ZF8T63K' to whatever product you want to analyze – just copy the URL from any Amazon product page.

This script focuses on scraping the top reviews that are visible directly on a product’s main page – not the full list of reviews. Amazon has made it harder to access all reviews without logging in, and many detailed review pages now load dynamically using JavaScript. By targeting the top visible reviews, we avoid those complications and still get valuable, recent feedback without dealing with authentication or headless browsers.

4. Run the script

Copy and save the Amazon reviews scraping script below as a Python file (e.g., amazon_scraper.py), then run it after updating your proxy credentials and target product URL.

import requests
from lxml import html
import time
import random
import json
import logging
import re
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
class AmazonReviewScraper:
def __init__(self, username, password, base_delay=5):
self.session = requests.Session()
self.base_delay = base_delay
# Comprehensive headers to avoid detection
self.session.headers.update({
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'DNT': '1',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
'Cache-Control': 'max-age=0',
})
# Configure proxy
proxy = f"http://{username}:{password}@gate.decodo.com:7000"
self.session.proxies = {'http': proxy, 'https': proxy}
def scrape_all_reviews(self, product_url):
print(f"Scraping: {product_url}")
return self.scrape_page(product_url)
def scrape_page(self, url):
try:
time.sleep(random.uniform(self.base_delay, self.base_delay + 5))
response = self.session.get(url, timeout=20)
response.raise_for_status()
tree = html.fromstring(response.content)
# Check for CAPTCHA
if tree.xpath('//form[@action="/errors/validateCaptcha"]'):
logger.error("Amazon CAPTCHA detected!")
return []
# Get review containers
containers = tree.xpath('//*[starts-with(@id, "customer_review")]')
if not containers:
return []
# Extract and deduplicate reviews
reviews = []
seen = set()
for container in containers:
review = self.extract_review(container)
if review and self.is_valid(review):
key = review['review'][:100] if review['review'] != 'N/A' else f"{review['username']}_{review['title']}"
if key not in seen:
seen.add(key)
reviews.append(review)
return reviews
except Exception as e:
logger.error(f"Scraping failed: {e}")
return []
def extract_review(self, container):
try:
review = {}
# Username (formerly reviewer_name)
review['username'] = self.get_text(container, [
'./div[1]/div/div[2]/span/text()',
'.//span[@class="a-profile-name"]/text()',
])
# Rating
rating_text = self.get_text(container, [
'.//i[@data-hook="review-star-rating"]//span[@class="a-icon-alt"]/text()',
'.//span[@class="a-icon-alt" and contains(text(), "out of")]/text()',
])
rating_match = re.search(r'^(\d+\.?\d*)', rating_text) if rating_text != 'N/A' else None
review['rating'] = rating_match.group(1) if rating_match else 'N/A'
# Title (formerly headline) - avoid rating text
title = 'N/A'
for xpath in [
'.//a[@data-hook="review-title"]//span/text()',
'.//span[@data-hook="review-title"]//span/text()',
'./div[2]/h5/span[2]/span/text()',
]:
try:
results = container.xpath(xpath)
for result in results:
text = result.strip()
if (text and 'out of' not in text.lower() and 'stars' not in text.lower()
and len(text) > 3 and not text.replace('.', '').isdigit()):
title = text
break
if title != 'N/A':
break
except:
continue
review['title'] = title
# Review (formerly review_text)
text_parts = []
for xpath in [
'./div[4]/span/div/div[1]/span/text()',
'.//span[@data-hook="review-body"]//span/text()',
'.//div[@data-hook="review-collapsed"]//span/text()',
]:
try:
nodes = container.xpath(xpath)
if nodes:
text_parts = [n.strip() for n in nodes if n.strip()]
break
except:
continue
review['review'] = ' '.join(text_parts) if text_parts else 'N/A'
# Location_date (formerly date_location)
review['location_date'] = self.get_text(container, [
'.//span[@data-hook="review-date"]/text()',
'.//span[contains(text(), "Reviewed in")]/text()',
])
# Verified purchase
verified = self.get_text(container, [
'.//span[@data-hook="avp-badge-linkless"]/text()',
'.//span[contains(text(), "Verified Purchase")]/text()',
])
review['verified_purchase'] = verified if verified != 'N/A' else 'Not verified'
# Helpful votes
helpful = self.get_text(container, [
'.//span[@data-hook="helpful-vote-statement"]/text()',
'.//span[contains(text(), "people found this helpful")]/text()',
])
review['helpful_votes'] = helpful if helpful != 'N/A' else '0 people found this helpful'
return review
except Exception as e:
logger.error(f"Extract failed: {e}")
return None
def get_text(self, container, xpaths):
"""Extract text using first successful XPath"""
for xpath in xpaths:
try:
result = container.xpath(xpath)
if result:
text = result[0].strip()
if text:
return text
except:
continue
return 'N/A'
def is_valid(self, review):
"""Check if review has enough valid data"""
has_name = review['username'] != 'N/A'
has_text = review['review'] != 'N/A' and len(review['review']) > 10
has_rating = review['rating'] != 'N/A' and review['rating'].replace('.', '').isdigit()
return (has_name or has_text) and has_rating
def save_to_json(self, reviews, filename, scraped_url):
"""Save reviews to JSON file with URL metadata"""
if reviews:
data = {
"scraped_url": scraped_url,
"total_reviews": len(reviews),
"reviews": reviews
}
with open(filename, 'w', encoding='utf-8') as f:
json.dump(data, f, indent=2, ensure_ascii=False)
# Usage
if __name__ == "__main__":
product_url = 'https://www.amazon.com/dp/B07ZF8T63K' # Replace with your target product URL
scraper = AmazonReviewScraper('YOUR_PROXY_USERNAME', 'YOUR_PROXY_PASSWORD', base_delay=8) # Replace with your proxy credentials
reviews = scraper.scrape_all_reviews(product_url)
if reviews:
scraper.save_to_json(reviews, 'amazon_reviews.json', product_url)
print(f"{len(reviews)} reviews saved to amazon_reviews.json")
print(f"\nPreview:")
for i, review in enumerate(reviews[:3]):
print(f"\n--- Review {i+1} ---")
print(f"Username: {review['username']}")
print(f"Rating: {review['rating']}")
print(f"Title: {review['title']}")
print(f"Location and Date: {review['location_date']}")
print(f"Verified: {review['verified_purchase']}")
print(f"Review: {review['review'][:100]}...")
else:
print("No reviews were scraped.")

This scraper is built with a Python class that collects top reviews from Amazon product pages. It uses realistic headers, timed delays, and residential proxies to blend in like a real user and avoid getting blocked.

The script focuses on the top reviews shown directly on the product page and pulls key info (username, rating, title, text, date, location, verified status, and helpful vote count) using XPath selectors you can adjust if Amazon’s layout changes.

It also filters out duplicates, detects CAPTCHAs, and saves everything to a JSON file – plus shows a quick preview in your terminal.

Bottom line

With a simple Python script and residential proxies, you can reliably scrape Amazon product reviews in just a few minutes. This approach gives you full control over the data, avoids common scraping issues, and works well at scale. It's a solid option for collecting customer feedback, tracking sentiment, or analyzing product performance.

Get residential proxies for Amazon

Claim your 3-day free trial of residential proxies to collect Amazon reviews with full feature access.

About the author

Dominykas Niaura

Technical Copywriter

Dominykas brings a unique blend of philosophical insight and technical expertise to his writing. Starting his career as a film critic and music industry copywriter, he's now an expert in making complex proxy and web scraping concepts accessible to everyone.


Connect with Dominykas via LinkedIn

All information on Decodo Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Decodo Blog or any third-party websites that may belinked therein.

Can we scrape user reviews data from Amazon?

Yes, it's possible to collect user review data from Amazon using various methods, such as web scraping APIs or Python scripts. It’s important to do so responsibly and ensure that your data collection methods comply with applicable laws.

How to legally scrape Amazon reviews?

Ensure that you are accessing only publicly available data, avoid excessive requests that could strain the website’s servers, and use the data responsibly while adhering to copyright and data protection laws. Consulting legal counsel is advisable to ensure full compliance with relevant regulations for your specific use case.

How to scrape reviews from Amazon?

The most reliable way to scrape Amazon reviews is by using a custom Python script with residential proxies. This setup gives you control over what data you collect and helps you avoid blocks or CAPTCHAs. You can extract reviews directly from product pages without relying on third-party APIs.

What are the advantages of Amazon review scraping solutions?

Residential proxies combined with a Python scraper offer a reliable way to collect Amazon review data without building complex infrastructure. This setup gives you structured, ready-to-use insights while helping you avoid blocks and stay aligned with platform policies.

How to scrape Google Images

How to Scrape Google Images: A Step-By-Step Guide

Google Images is arguably the first place anyone uses to find photographs, paintings, illustrations, and any other visual files on the internet. Its vast repository of visual content has become an essential tool for users worldwide. In this guide, we'll delve into the types of data that can be scraped from Google Images, explore the various methods for scraping this information, and demonstrate how to efficiently collect image data using our Web Scraping API.

Dominykas Niaura

Oct 28, 2024

7 min read

How to scrape Google Maps

How to Scrape Google Maps: A Step-By-Step Tutorial 2025

Ever wondered how to extract valuable business data directly from Google Maps? Whether you're building a lead list, analyzing local markets, or researching competitors, scraping Google Maps can be a goldmine of insights. In this guide, you’ll learn how to automate the process step by step using Python – or skip the coding altogether with Decodo’s plug-and-play scraper.

Dominykas Niaura

Aug 18, 2025

10 min read

© 2018-2025 decodo.com. All Rights Reserved