Back to blog

Scrape Walmart Data: A Complete How-To Guide & Best Methods

Walmart’s digital marketplace is a vast platform, featuring over 420 million products and nearly 500 million monthly visits. That volume of web data is a valuable source for eCommerce teams, data analysts, and investment firms seeking pricing intelligence, inventory trends, and competitive insights. But scraping it isn’t easy – Walmart uses a complex, multi-layer anti-bot system that stops most common scraping tools. In this guide, you’ll learn the proven methods that work in 2025.

Vaidotas Juknys

Jul 03, 2025

9 min read

Understanding Walmart data: what can you scrape?

Let’s look at the types of information you can scrape from Walmart’s site and why they matter:

Data type

Fields

Use cases

Product details

Name, brand, SKU, UPC, specifications, category path

Competitive intelligence, catalog enrichment

Pricing info

Current price, previous price, unit price, promo labels (rollback, clearance, best seller)

Dynamic pricing, margin analysis

Stock status

In stock / out of stock flag, low-stock alert, store-level availability

Supply-chain insights, promotion timing

Customer reviews

Review text, star rating, verified-purchase badge, images, review count over time

Sentiment analysis, product R&D

Media assets

High-resolution images, 360° views, spec videos (when available)

Content creation, visual merchandising

Seller details

Seller name & ID, seller rating, on-time shipping rate, return policy, and fulfillment type

Vendor assessment, partnership opportunities

Challenges in scraping Walmart

Walmart uses multi-layered detection systems to flag non-human traffic. As of 2025, they leverage Akamai Bot Manager and PerimeterX for advanced bot protection. Their detection methods include:

  • Analyze TLS fingerprints, IP reputation, geolocation patterns, request frequency, and missing browser headers.
  • Monitor JavaScript execution, device and browser fingerprinting, and other behavioral signals for bot score calculation.
  • Enforce a progressive CAPTCHA flow – from simple “press and hold” checks to complex visual puzzles.
  • Apply rate limiting, escalating from throttling to permanent IP bans.
  • Build on a Next.js architecture (SSR, client hydration, lazy loading, obfuscated CSS classes) that further complicates scraping.

Here’s what you’ll see when your scraper gets detected:

Key point: Walmart invests heavily in blocking bots. Any vanilla HTTP-client script is redirected to a “Robot or human?” challenge within seconds. The following sections cover proven techniques to bypass these challenges.

Building a Walmart data scraper from scratch

The Python scraping landscape has evolved thanks to curl-cffi, the Python bindings for curl-impersonate. It mimics real browser TLS fingerprints, so your requests look indistinguishable from genuine browser traffic.

The strategy – finding the hidden JSON

Modern web apps, such as Walmart, fetch data via internal APIs and embed it in a <script> tag to speed up initial rendering.

The goal is to locate and parse this data:

  1. Navigate to any Walmart search or product page.
  2. Open Developer Tools with a right-click and then Inspect.
  3. In the Elements tab, search for "__NEXT_DATA__".
  4. Look for: <script id="__NEXT_DATA__">…</script>.

So, rather than relying on brittle CSS selectors, we’ll target the structured JSON embedded in the page’s HTML – a far more robust approach.

Step #1 – setup and installation

Create and activate a Python virtual environment:

python -m venv walmart-scraper  
source walmart-scraper/bin/activate   # macOS / Linux  
# OR (Windows)  
walmart-scraper\Scripts\activate.bat  # Windows CMD  
walmart-scraper\Scripts\Activate.ps1  # Windows PowerShell  

Install required libraries:

pip install curl-cffi beautifulsoup4

We use beautifulsoup4 to parse HTML and extract the <script> tag (a critical step in isolating the __NEXT_DATA__ payload). For detailed guidance on HTML parsing with Beautiful Soup, head to our complete web scraping guide with BeautifulSoup.

Step #2 – scraping Walmart search results

Let’s scrape Walmart’s search results for “office chair”.

This script will query Walmart for a search term, paginate through the results, and extract product data from the __NEXT_DATA__ object on each page.

import time
import json
from typing import Dict, List, Any, Optional
from bs4 import BeautifulSoup, Tag
from curl_cffi import requests
class WalmartScraper:
    """Web scraper for extracting product data from Walmart.com search results."""
    def __init__(self):
        self.session = requests.Session()
    def build_url(self, query: str, **kwargs) -> str:
        params = [f"q={query}"]
        for key, value in kwargs.items():
            if value:
                params.append(f"{key}={value}")
        return f"https://www.walmart.com/search?{'&'.join(params)}"
    def extract_products_from_json(
        self, json_data: Dict[str, Any]
    ) -> List[Dict[str, Any]]:
        try:
            # Navigate Walmart's nested JSON structure to find product data
            item_stacks = json_data["props"]["pageProps"]["initialData"][
                "searchResult"
            ]["itemStacks"]
            products = []
            for stack in item_stacks:
                products.extend(stack.get("items", []))
            return products
        except (KeyError, TypeError):
            return []
    def scrape_search_page(
        self, query: str, page: int = 1, **kwargs
    ) -> List[Dict[str, Any]]:
        url = self.build_url(
            query,
            page=page if page > 1 else None,
            affinityOverride="default" if page > 1 else None,
            **kwargs,
        )
        try:
            # Use curl_cffi to bypass anti-bot measures
            response = self.session.get(
                url, impersonate="chrome", timeout=30  # type: ignore
            )
            response.raise_for_status()
            soup = BeautifulSoup(response.content, "html.parser")
            script_tag = soup.find("script", {"id": "__NEXT_DATA__"})
            if script_tag and isinstance(script_tag, Tag) and script_tag.string:
                json_data = json.loads(script_tag.string)
                return self.extract_products_from_json(json_data)
            return []
        except Exception as e:
            print(f"Error scraping page {page}{str(e)}")
            return []
    def extract_products(
        self, query: str, max_products: Optional[int] = None, **kwargs
    ) -> int:
        print(f"Starting scraper for query: '{query}'")
        if max_products:
            print(f"Target: {max_products} products")
        page = 1
        all_products: List[Dict[str, Any]] = []
        filename = f"walmart_{query.replace(' ', '_')}.json"
        while True:
            if max_products and len(all_products) >= max_products:
                print(f"Reached target of {max_products} products")
                break
            print(f"Scraping page {page}...", end=" ")
            products = self.scrape_search_page(query, page, **kwargs)
            if not products:
                print("No more products found")
                break
            all_products.extend(products)
            if max_products:
                all_products = all_products[:max_products]
            print(f"Found {len(products)} products (Total: {len(all_products)})")
            page += 1
            time.sleep(3)  # Rate limiting
        print(f"Saving {len(all_products)} products to '{filename}'...")
        with open(filename, "w") as f:
            json.dump(all_products, f, indent=2)
        print(f"Successfully saved {len(all_products)} products!")
        return len(all_products)
def main():
    scraper = WalmartScraper()
    scraper.extract_products(
        "office chair",
        max_products=200,
        sort="best_seller",
        min_price=120,
        max_price=500,
    )
if __name__ == "__main__":
    main()

What this does:

  • Uses curl-cffi with impersonate="chrome" to mimic Chrome’s TLS fingerprint
  • Finds the __NEXT_DATA__ script tag and parses its JSON
  • Extracts all product items from the nested JSON
  • Loops through pages until no more items (or until max_items is reached)
  • Saves results to a JSON file

Heads up – the sort parameter accepts best_match, price_low, price_high, and best_seller. You can also refine results with other filters (customer ratings, brand, fulfillment speed, etc.).

When you run the scraper, it loops through each page and outputs a walmart_office_chair.json file packed with clean product data.

Each item in the JSON file represents a product object with over 100 fields scraped directly from Walmart’s search results. The data is well-structured and ready for downstream analysis.

Here’s an example of what a product object might look like (trimmed for simplicity):

{
  "name": "(3 pack) Mainstays Ergonomic Mesh Back Office Chair with Flip Up Arms for Adults, Black Fabric, 275lb",
  "id": "0UG8YSKSXQ0K",
  "url": "https://www.walmart.com/ip/15345264075",
  "description": "Fabric upholstery with a breathable mesh back. Pronounced lumbar support. Ergonomically positioned lift-up armrests.",
  "image": "https://i5.walmartimages.com/seo/3-pack-Mainstays-Ergonomic-Mesh-Back-Office-Chair-with-Flip-Up-Arms-for-Adults-Black-Fabric-275lb_acb6d904-befa-405b-897a-a8e481895559.91414d84cdc05aa8690e1aa6f82050b4.jpeg",
  "price": 177,
  "rating": 4.5,
  "reviewCount": 6100,
  "availability": "In stock",
  "seller": "Walmart.com",
  "category": "Desk Chairs"
}

Heads up – the actual JSON includes many more fields and deeper nesting than the sample above.

Step #3 – scraping a Walmart product page

Scraping a single product page uses the same method – we target the <script id="__NEXT_DATA__"> tag and parse its JSON payload. However, the JSON path for accessing product details differs from the listing pages.

Here’s the complete code:

import json
from typing import Dict, Any, Optional
from bs4 import BeautifulSoup, Tag
from curl_cffi import requests
class WalmartProductScraper:
    def __init__(self):
        self.session = requests.Session()
    def extract_json_data(self, html_content: str) -> Optional[Dict[str, Any]]:
        soup = BeautifulSoup(html_content, "html.parser")
        # Walmart stores product data in Next.js script tag
        script_tag = soup.find("script", {"id": "__NEXT_DATA__"})
        if script_tag and isinstance(script_tag, Tag) and script_tag.string:
            try:
                return json.loads(script_tag.string)
            except json.JSONDecodeError:
                return None
        return None
    def scrape_product(self, product_url: str) -> Optional[Dict[str, Any]]:
        try:
            # Use curl_cffi to bypass anti-bot measures
            response = self.session.get(
                product_url, impersonate="chrome", timeout=30  # type: ignore
            )
            response.raise_for_status()
            json_data = self.extract_json_data(response.text)
            if json_data:
                filename = "walmart_product.json"
                print(f"Saving product data to '{filename}'...")
                with open(filename, "w", encoding="utf-8") as f:
                    json.dump(json_data, f, indent=2)
                print(f"Successfully saved product data!")
                return json_data
            return None
        except Exception as e:
            print(f"Error scraping product: {str(e)}")
            return None
def main():
    scraper = WalmartProductScraper()
    product_url = "https://www.walmart.com/ip/Mainstays-Ergonomic-Mesh-Back-Task-Office-Chair-with-Flip-up-Arms-Black-Fabric-275-lb/2205851521"
    scraper.scrape_product(product_url)
if __name__ == "__main__":
    main()

When you run this script, a walmart_product.json file appears in your directory, containing the complete product data model (usually under props.pageProps.initialData.data.product). Explore this JSON to find exactly the fields you need.

Step #4 – scraping Walmart product reviews

Walmart hosts reviews on pages like https://www.walmart.com/reviews/product/{PRODUCT_ID}?entryPoint=viewAllReviewsTop. Like other pages, the review data is embedded in the <script id="__NEXT_DATA__"> JSON object.

Here’s the complete code:

import time
import json
from typing import Dict, List, Any, Optional
from bs4 import BeautifulSoup, Tag
from curl_cffi import requests
class WalmartReviewsScraper:
    """Web scraper for extracting product reviews from Walmart.com review pages."""
    def __init__(self):
        self.session = requests.Session()
    def build_url(self, product_id: str, page: int = 1) -> str:
        """Build the reviews URL for a specific product and page."""
        params = ["entryPoint=viewAllReviewsTop"]
        if page > 1:
            params.append(f"page={page}")
        return (
            f"https://www.walmart.com/reviews/product/{product_id}?{'&'.join(params)}"
        )
    def extract_reviews_from_json(
        self, json_data: Dict[str, Any]
    ) -> List[Dict[str, Any]]:
        try:
            # Navigate Walmart's nested JSON structure to find review data
            customer_reviews = json_data["props"]["pageProps"]["initialData"]["data"][
                "reviews"
            ]["customerReviews"]
            return customer_reviews
        except (KeyError, TypeError):
            return []
    def scrape_page(self, product_id: str, page: int = 1) -> List[Dict[str, Any]]:
        url = self.build_url(product_id, page)
        try:
            # Use curl_cffi to bypass anti-bot measures
            response = self.session.get(
                url, impersonate="chrome", timeout=30  # type: ignore
            )
            response.raise_for_status()
            soup = BeautifulSoup(response.content, "html.parser")
            script_tag = soup.find("script", {"id": "__NEXT_DATA__"})
            if script_tag and isinstance(script_tag, Tag) and script_tag.string:
                json_data = json.loads(script_tag.string)
                return self.extract_reviews_from_json(json_data)
            return []
        except Exception as e:
            print(f"Error scraping page {page}{str(e)}")
            return []
    def extract_reviews(
        self, product_id: str, max_reviews: Optional[int] = None
    ) -> int:
        print(f"Starting scraper for product: '{product_id}'")
        if max_reviews:
            print(f"Target: {max_reviews} reviews")
        page = 1
        all_reviews: List[Dict[str, Any]] = []
        filename = f"walmart_reviews_{product_id}.json"
        while True:
            if max_reviews and len(all_reviews) >= max_reviews:
                print(f"Reached target of {max_reviews} reviews")
                break
            print(f"Scraping page {page}...", end=" ")
            reviews = self.scrape_page(product_id, page)
            if not reviews:
                print("No more reviews found")
                break
            all_reviews.extend(reviews)
            if max_reviews:
                all_reviews = all_reviews[:max_reviews]
            print(f"Found {len(reviews)} reviews (Total: {len(all_reviews)})")
            page += 1
            time.sleep(3)  # Rate limiting
        print(f"Saving {len(all_reviews)} reviews to '{filename}'...")
        with open(filename, "w") as f:
            json.dump(all_reviews, f, indent=2)
        print(f"Successfully saved {len(all_reviews)} reviews!")
        return len(all_reviews)
def main():
    scraper = WalmartReviewsScraper()
    scraper.extract_reviews("2205851521", max_reviews=200)
if __name__ == "__main__":
    main()

Run the scraper with your product ID and, optionally, the maximum number of reviews. It will loop through each review page until we’ve collected as many reviews as you need, then save everything to walmart_reviews_{PRODUCT_ID}.json.

Scaling up: how to avoid getting blocked

The basic script works fine for a few product pages. But once you send hundreds or thousands of requests from a single IP, Walmart will block you. Their bot protection system detects repeat traffic and responds with a challenge page:

Robot or human? Activate and hold the button to confirm that you're human.

To scrape at scale reliably, you’ll need rotating residential proxies. Residential IPs are assigned by real ISPs, making your traffic appear as if it’s coming from actual users, bypassing most bot detection systems.

Integrating Decodo residential proxies into your Python script is straightforward. First, head to the quick start guide to learn how to obtain your credentials. Decodo offers a free 3-day trial to help you test the setup at no cost.

Create an .env file in the same directory as your script to securely store your proxy credentials. Install the helper library using:

pip install python-dotenv

Now, add your credentials to .env:

PROXY_URL="http://USERNAME:[email protected]:10000"

Modify your scraper to use the proxy:

import os
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
class WalmartProductScraper:
    def __init__(self):
        proxy_url = os.getenv("PROXY_URL")
        self.proxies = {"http": proxy_url, "https": proxy_url} if proxy_url else None
        self.session = requests.Session()
    def scrape_product(self, product_url):
        try:
            response = self.session.get(
                product_url,
                impersonate="chrome",
                timeout=30,
                proxies=self.proxies  # Apply proxy settings
            )
            # Continue scraping logic here...

Decodo gives you access to 115M+ ethically-sourced residential proxy IPs across 195+ worldwide locations. With proxy rotation, each request is routed through a different real-user IP, allowing you to scrape at scale without triggering blocks.

The enterprise solution – web scraping API

Maintaining proxies and scrapers is a heavy engineering lift. Layout changes, evolving anti-bot measures, and parser breaks mean constant upkeep. For businesses that need reliable web data without the maintenance overhead, an all-in-one Web Scraping API is the most efficient solution.

A Web Scraping API abstracts away complexity – you make a single API call with a target URL, and the service handles proxy rotation, CAPTCHA solving, browser-fingerprint impersonation, and parsing, returning clean JSON.

Whether you’re scraping search pages, product listings, or any other site, Decodo’s Web Scraping API makes it fast, simple, and scalable, with zero manual work:

  • Pay only for successful requests. You’re billed only for calls that return data.
  • Flexible output options. Choose HTML, structured JSON, or parsed CSV.
  • Real-time and on-demand results. Get data immediately or schedule tasks for later.
  • Built-in anti-bot bypass. Browser fingerprinting, CAPTCHA evasion, and IP spoofing.
  • Easy integration. Quick-start guides and code examples help you get up and running quickly.
  • Ethically-sourced proxy pool. Leverage 125M+ residential, datacenter, mobile, and static residential (ISP) IPs across the world for geo-targeting and high success rates.
  • Free trial. Test the API for 7 days with 1K requests at no cost.

Getting started

To set up the Walmart Scraping API:

  1. Create an account or sign in to your Decodo dashboard.
  2. Under Scraping APIs, select a plan – either Core or Advanced.
  3. Choose a subscription and select Start with free trial.
  4. Now, you can choose your target website. In this case, select Walmart – specifically the Walmart Search or Walmart Product endpoint.

Walmart Search Scraper

Here’s what the Decodo dashboard looks like when configuring the Web Scraping API for Walmart Search:

Simply paste the Walmart search URL into the input box on the Decodo dashboard.

You can optionally configure several API parameters, such as:

  • JavaScript rendering
  • Custom headers
  • Geolocation
  • Store zip code or store ID, and more.

To target a specific Walmart store:

  • Use a ZIP code (e.g., 99950) to define the store location.
  • Or specify a store ID to get localized search results.

Then click Send Request. Within seconds, you’ll receive the clean HTML of the Walmart search results page.

Your API response will appear in the Response tab, and you can export it in CSV or JSON format. Here's an example response (trimmed for brevity):

{
  "price": {
    "price": 89.99,
    "currency": "USD",
    "price_min": 89.99,
    "price_strikethrough": 199.99
  },
  "rating": {
    "count": 56,
    "rating": 4.3
  },
  "seller": {
    "id": "2A40BB024BE44621A0A64F0E4FF22566",
    "name": "CoolHut"
  },
  "general": {
    "pos": 1,
    "url": "/ip/CoolHut-Ergonomic-Mesh-Office-Chair...",
    "image": "https://i5.walmartimages.com/seo/CoolHut-Ergonomic...",
    "title": "Ergonomic Mesh Office Chair, High Back Adjustable...",
    "sponsored": true,
    "product_id": "3740918901",
    "out_of_stock": false,
    "section_title": "Results for \"office chairs\""
  },
  "fulfillment": {
    "pickup": false,
    "delivery": false,
    "shipping": true,
    "free_shipping": true
  }
}

If you’re a developer, the dashboard can auto-generate code in cURL, Python, and Node.js. You can copy it with one click and plug it into your application instantly.

Walmart product scraper

Just like the Search endpoint, you can switch the target to Walmart Product in the dashboard. The process is similar – paste the product URL, configure parameters if needed, and click Send Request. You'll receive structured JSON with detailed product data, which you can export just as before.

Ethical considerations

Scraping Walmart at scale is feasible, but only when done responsibly. Use the checklist below to ensure compliance with Walmart’s policies.

  • Limit server load – throttle requests, respect robots.txt, and distribute traffic across regions with a geographically rotated proxy pool.
  • Collect only public product data – prices, SKU IDs, and star ratings are non-copyrightable, strip any personal info (names, emails, addresses) that may appear in reviews.
  • Simulate normal shopper traffic by rotating authentic User-Agent strings, maintaining TLS/device fingerprints coherence, and terminating the session after N consecutive CAPTCHA challenges.

Decodo’s Walmart scraper API includes built-in safeguards by default – adaptive rate limiting, advanced IP rotation, browser fingerprinting, and a controlled CAPTCHA handler, helping you to focus on insights instead of compliance overhead.

Conclusion

Scraping Walmart at scale is tough – their anti-bot stack flags most basic bots. You have 2 realistic options. The DIY approach gives you full control, but it requires building and maintaining your scraper and integrating a robust residential proxy network to avoid blocks and detection.

A more efficient alternative is to use a managed solution, like Decodo's Web Scraping API. It’s a faster, more reliable way to access structured Walmart data at scale. A single call returns parsed JSON or CSV while Decodo rotates proxies, solves CAPTCHAs, and adapts to layout changes.

For serious commercial projects, the API approach offers far better ROI by saving hundreds of hours in development and ongoing upkeep.

Collect Walmart data at scale

Start your 7-day free trial of Web Scraping API and gather pricing information without CAPTCHAs or IP bans.

About the author

Vaidotas Juknys

Head of Commerce

Vaidotas Juknys is a seasoned commercial leader with over a decade of experience spanning technology, telecommunications, and management consulting. Currently working as Head of Commerce, Vaidotas brings valuable insights from his diverse background in the technology industry and analytical expertize.


Connect with Vaidotas via LinkedIn.

All information on Decodo Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Decodo Blog or any third-party websites that may belinked therein.

Frequently asked questions

Can I scrape data from Walmart?

Yes, you can scrape publicly available data from Walmart’s website. Since Walmart doesn’t offer a public product API, web scraping is a widely used solution. Just make sure to follow best practices such as rate limiting, avoiding excessive traffic, and never collecting personal or sensitive information.

Can data scraping be detected?

Yes, sites with advanced anti-bot systems monitor request patterns, IP reputation, user-agent strings, and TLS fingerprints to spot non-human traffic. If your scraper sends too many requests too fast or lacks genuine browser signals, you’ll trigger blocks, CAPTCHAs, or IP bans.

How do I get Walmart data?

You have two options. First, build your scraper in Python or Node.js and hide behind rotating residential proxies to stay undetected. Second – and usually simpler – use the Web Scraping API, which handles proxy rotation, CAPTCHA solving, and data parsing for you, so you receive clean product, price, availability, review, and seller data with minimal effort.

How to Scrape Products from eCommerce Sites: The Ultimate Guide

How to Scrape Products from eCommerce Sites: The Ultimate Guide

Since there are over 2.14 billion online shoppers worldwide, understanding how to scrape products from eCommerce websites can give you a competitive edge and help you find relevant data to drive your business forward. In this article, we’ll discuss the 4 fundamental steps to scraping eCommerce sites and how to avoid some of the most common pitfalls.

Martin Ganchev

Oct 02, 2024

4 min read

10 Creative Web Scraping Ideas for Beginners

They say you’ll never have time to read all the books or watch all the movies in your entire lifetime – but what if you could at least gather all their titles, ratings, and reviews in seconds? That’s the magic of web scraping: automating the impossible, collecting large amounts of data, and uncovering hidden insights from all across the internet. In this article, we’ll explore valuable web scraping ideas that you can create even with little to no experience – completely free of charge.

Zilvinas Tamulis

Mar 20, 2025

12 min read

© 2018-2025 decodo.com. All Rights Reserved