How to Scrape Google Reviews: A Step-by-Step Guide (2025)

Whether you're hunting for the best tacos in town or avoiding a haircut horror story on vacation, Google reviews have become the go-to guidebook for public opinions. With millions relying on it to gauge everything from coffee quality to customer service, scraping this goldmine of insights can unlock serious business intel – if you know how.

Zilvinas Tamulis

May 12, 2025

16 min read

What is Google review scraping?

Web scraping is the process of automatically extracting data from websites – think of it as automated copy-paste at scale. Scraping Google reviews, in particular, involves pulling review data (such as ratings and comments) from business listings and then working with it to extract valuable information and insights.

Google reviews offer a real-time pulse on how customers feel about a business, making them an excellent source for developers building sentiment analysis tools, tracking brand perception, or doing market research. From common complaints to highlighting what people love most, the data can reveal exactly what's driving customer behavior.

Methods to scrape Google reviews

There's more than one way to get your hands on Google review data, and you'll learn about each in this section. Some will be clean and straightforward, while others may require some tinkering to work. We'll introduce the 4 primary methods: using the Google Places API (the sanctioned route), manual scraping (for the masochists), scraping APIs (to skip the pain and get the gain), and automated scraping tools (for when you'd rather let code do all the heavy lifting).

Google Places API (official method)

The Google Places API is the cleanest, most reliable way to access Google review data. You can query a business by name and address to obtain its place_id, then use that to retrieve details such as name, rating, and user reviews in a structured JSON format, which is ideal for fast and clean integrations.

The catch? You only get a maximum of 5 reviews per place, and usage is subject to quotas and billing depending on your request volume. The reviews are also sorted by default, meaning you'll probably only see the most positive reviews or just the negative ones, whichever Google decides is more relevant.

Use this method if you need an official, stable source of limited review data, especially for smaller-scale projects, dashboards, or apps where compliance and data quality are more important than depth.

Manual scraping

Manual scraping involves visiting a Google Maps business page, opening the reviews section, and copying the necessary data. You can do it either by hand or with the help of browser tools like Chrome's DevTools or simple scripts. It’s slow, tedious, and not scalable, but it can get the job done in a pinch.

This approach is best for small-scale or one-time needs, such as gathering reviews for a single location or testing a concept before scaling it up. Use it when automation is overkill, and you just need a few examples to work with.

Scraping APIs

Scraping APIs simplify the process of extracting data by providing a ready-made solution to send requests, parse HTML, and bypass blocks like CAPTCHAs. They handle the heavy lifting so you don't have to deal with manual scraping or come up with solutions to bypass restrictions.

For example, Decodo's Web Scraping API offers a Google Maps Scraper that targets place names and ratings, making it easy to gather place data without the hassle of dealing with blocks or complex setups.

Scraping APIs are a great solution when you need reliable, scalable data extraction with minimal setup. Use them when you need to collect large amounts of data without needing to build your scraper or deal with technical roadblocks.

Automated scraping using Python

Automated scraping with Python lets you build custom scripts to extract data from Google reviews at scale. Using libraries like Selenium or Playwright, you can simulate browsing, interact with web pages, and scrape reviews at scale while also avoiding CAPTCHAs and rendering dynamic content.

Writing your custom scripts is the best method when you need to gather large volumes of reviews, especially from different businesses across many locations. If you're looking for flexibility, scalability, and the ability to customize your scraping process, automated scraping with Python is the ideal solution for serious review collection.

While this method offers complete freedom in how you collect data, it does require some effort to set up correctly. But don’t worry, this article will guide you step-by-step, starting from scratch and building up to a fully functional scraping solution.

In summary, here's a comparison of the mentioned methods to help you choose the best one:

Scalability

Costs

Difficulty

Data control

Best for

Google Places API

Low to medium (limited by quotas)

Free for small usage, paid for larger requests

Medium (API usage knowledge)

Limited (5 reviews max)

Small-scale projects, structured data needs

Manual scraping

Low (requires manual effort)

Free

Low (just copy-paste)

None (unstructured, raw data)

One-off tasks, small amounts of data

Scraping APIs

High (handles large volumes)

Paid (based on usage)

Low (easy integration)

Limited to API capabilities

Quick, easy extraction for large datasets

Custom Python solution

Very high (fully customizable)

Free (with small costs for proxies or paid tools)

High (requires coding)

Full (custom data collection)

Large-scale, customizable scraping projects

Tools and technologies for scraping Google reviews with Python

To create your scraping tool, you'll need to have the following prerequisites:

Python. Download the latest version of Python for your device, as all code will be written in this programming language.
Playwright. You'll need this automation library to run headless browsers, simulate user behavior, and render dynamic content.
Beautiful Soup. A Python library that extracts and parses data from HTML and XML documents.
Proxies. Scraping large amounts of data from Google reviews can quickly lead to blocks and limitations, requiring a reliable proxy provider to rotate IP addresses and stay undetected.
IDE. An integrated development environment, such as Visual Studio Code, helps with writing code easily, running terminal commands, and offers a wide variety of tools to aid debugging.
A web browser. If you're reading this, congratulations, you already have one! On a serious note, a good browser like Google Chrome, with easy-to-use DevTools, is helpful in identifying HTML elements and understanding page structure.
A cup of coffee. This might take some time, so make sure to stay awake and hydrated.

Reliable proxies for scraping

Start your 3-day free trial of residential proxies and scrape Google Reviews without limits.

Get proxies

Setting up the environment

Begin by creating an environment for your project. Ensure you have Python installed, then follow these steps:

Create a project directory. Create a new folder in a location that's easy to access, where you'll store all your project files. Additionally, you can create a virtual environment.
Install the required libraries. Run this command in your terminal tool to get the Beautiful Soup and Playwright libraries:

pip install playwright beautifulsoup4

3. Get the necessary browsers. Get the required browser binaries (Chromium, Firefox, and WebKit) that Playwright uses to automate browsers. Playwright needs these binaries to run browser automation tasks, but they're not included with the initial library installation:

python -m playwright install

4. Grab some proxies. You'll need to implement them in your script, so make sure you have your credentials and endpoint information ready. You can easily get them from the Decodo dashboard.

5. Run a test project file. Create a small test file to verify that all installed tools are working as intended. Here's a helpful script to check Playwright, Beautiful Soup, and proxies:

from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup

def test_proxy_with_playwright():
    proxy_config = {
        "server": "http://gate.decodo.com:7000",  # Proxy host and port
        "username": "user", # Your proxy username
        "password": "pass"  # Your proxy password
    }

    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=False,
            proxy=proxy_config
        )

        page = browser.new_page()

        try:
            # Visit Decodo IP info page
            page.goto("https://ip.decodo.com/")
            page.wait_for_selector(".item-value", timeout=10000)

            content = page.content()
            soup = BeautifulSoup(content, "html.parser")

            # Find all elements with class 'item-value'
            items = soup.find_all("p", class_="item-value")

            ip = items[0].text.strip()
            country = items[1].text.strip()
            print("IP Address:", ip)
            print("Country:", country)

        except Exception as e:
            print("Proxy test failed:", e)

        finally:
            browser.close()

if __name__ == "__main__":
    test_proxy_with_playwright()

from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup

def test_proxy_with_playwright():
    proxy_config = {
        "server": "http://gate.decodo.com:7000",  # Proxy host and port
        "username": "user", # Your proxy username
        "password": "pass"  # Your proxy password
    }

    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=False,
            proxy=proxy_config
        )

        page = browser.new_page()

        try:
            # Visit Decodo IP info page
            page.goto("https://ip.decodo.com/")
            page.wait_for_selector(".item-value", timeout=10000)

            content = page.content()
            soup = BeautifulSoup(content, "html.parser")

            # Find all elements with class 'item-value'
            items = soup.find_all("p", class_="item-value")

            ip = items[0].text.strip()
            country = items[1].text.strip()
            print("IP Address:", ip)
            print("Country:", country)

        except Exception as e:
            print("Proxy test failed:", e)

        finally:
            browser.close()

if __name__ == "__main__":
    test_proxy_with_playwright()

Important note: The headless variable is set to False for a visual preview of what's happening with the browser. It's great for debugging and seeing if your script works correctly, but it may use more data and resources.

Run the test script with this terminal command:

python file_name.py

The script will run for a few seconds. Playwright launches a headless browser to navigate to the IP checker web page, while Decodo proxies mask your connection to appear from a different location. Finally, Beautiful Soup parses the HTML to find elements with the item-value class name that contain the IP address and country information. If you see an IP and location different from yours printed in the terminal, you've set up and installed everything correctly!

Step-by-step guide to scraping Google reviews

Once you've finished setting up your environment and ensured that everything is working correctly, it's time to build the Google reviews scraper.

Identify the target URL

The first challenge you'll encounter is that there isn't a single direct public URL where you can simply enter a business name and retrieve Google reviews in a nice, scrapeable format – Google intentionally makes it difficult to prevent scraping. However, here are two main workarounds you can use:

Search URL hack

One of the main places to find reviews is on Google Maps. However, the URL of each business is long and complex, making it impossible to predict unless you have a list of prepared URLs already. Here’s what a typical URL looks like:

https://www.google.com/maps/place/Curry+House+CoCo+Ichibanya/@51.5156973,-0.1536278,17z/
data=!3m1!4b1!4m6!3m5!1s0x487605e96a024569:0xa47de3899dd56b6c!8m2!3d51.5156973!4d
-0.1510529!16s%2Fg%2F11fhyf2ftb?entry=ttu&g_ep=EgoyMDI1MDQyMy4wIKXMDSoASAFQAw%3D%3D

That’s a complete mess that’s impossible to untangle, no matter how hard you try. Luckily, there's a URL you can use that only requires the location name and address:

https://www.google.com/maps/search/?api=1&query=Business+Name+Address

This will redirect you to the relevant business page in Google Maps if there's a clear match. The query doesn't have to be very specific, as it will try to find something as close as possible. For example, here's what the query would look like for Curry House CoCo Ichibanya on 39 James St, London:

https://www.google.com/maps/search/?api=1&query=coco+curry+39+james

It will redirect you to the Google Maps page for the business, which includes relevant information such as opening hours, contact details, menus, and, most importantly, reviews.

User browsing emulation

While the method above might be the easiest, it's not foolproof. You may want to create something more dynamic, for example, finding all the businesses within an area and scraping reviews for each one.

For that, you'll need to browse Google Maps as a real user would. Don't worry, you won't need to do it manually, as tools like Playwright can emulate this behavior for you. You'll need to write a script that navigates to Google Maps' main page, enters a query in the search bar, clicks on each business result, locates the reviews section, and scrapes the data.

Navigate to the main page

Let's start with the simplest step: going to a specific URL using Playwright. Since you'll be launching a completely fresh browser instance, Google will likely prompt you to accept or deny cookies, which will prevent you from accessing the page you're looking for. Using proxies will usually circumvent this problem, but there's always a chance you'll have to deal with the prompt.

Here's a simple Playwright script that navigates to https://google.com/maps, checks for the cookie prompt, accepts cookies, and returns the raw HTML data from the target URL:

from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup

def test_proxy_with_playwright():
    proxy_config = {
        "server": "http://gate.decodo.com:7000",  # Proxy host and port
        "username": "user", # Your proxy username
        "password": "pass"  # Your proxy password
    }

    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=False,
            proxy=proxy_config
        )

        page = browser.new_page()

        try:
            # Navigate to Google Maps
            page.goto("https://www.google.com/maps", timeout=60000)

            # Accept cookies if prompted
            try:
                page.locator('div.VtwTSb > form:nth-of-type(2) span.UywwFc-vQzf8d').click(timeout=5000),
                page.wait_for_timeout(2000)
            except:
                pass  # Cookie popup not shown

            # Wait for additional content to load
            page.wait_for_timeout(5000)

            # Get and return raw HTML
            html = page.content()
            print(html[:1000])  # Preview first 1000 characters

            return html

        except Exception as e:
            print("Failed to retrieve page:", e)

        finally:
            browser.close()

if __name__ == "__main__":
   test_proxy_with_playwright()

from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup

def test_proxy_with_playwright():
    proxy_config = {
        "server": "http://gate.decodo.com:7000",  # Proxy host and port
        "username": "user", # Your proxy username
        "password": "pass"  # Your proxy password
    }

    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=False,
            proxy=proxy_config
        )

        page = browser.new_page()

        try:
            # Navigate to Google Maps
            page.goto("https://www.google.com/maps", timeout=60000)

            # Accept cookies if prompted
            try:
                page.locator('div.VtwTSb > form:nth-of-type(2) span.UywwFc-vQzf8d').click(timeout=5000),
                page.wait_for_timeout(2000)
            except:
                pass  # Cookie popup not shown

            # Wait for additional content to load
            page.wait_for_timeout(5000)

            # Get and return raw HTML
            html = page.content()
            print(html[:1000])  # Preview first 1000 characters

            return html

        except Exception as e:
            print("Failed to retrieve page:", e)

        finally:
            browser.close()

if __name__ == "__main__":
   test_proxy_with_playwright()

The script is fairly simple, but the main challenge comes from finding the Accept all button on the page. You'll have to find the correct class name and use a locator to find the button element to click. Chrome's DevTools has a handy Recorder, which you can use to record the action of clicking a button, save the recording, export a code example, and find the correct selectors. The example already has a selector, but the page structure may change over time, so double-check to ensure the code works.

Get a list of locations

The next step involves using the search bar to find the places from which you want to scrape reviews. Before you begin, there are a few factors to consider – the location from which you're making requests and how the query affects the results.

When talking about location, it's not about your physical location, but rather where the proxy you're using to make requests is located. For example, if you have a France-based proxy, searching for "Starbucks" will show local cafes near your location, such as those in Paris or around the country.

Luckily, you can control your proxy location directly through the Decodo dashboard by setting the country, city, state, or even ZIP code (available in the US only). You'll then be provided with a generated endpoint specific to that location. That way, you can get results tailored to particular countries, cities, or even areas.

The search query can also heavily influence the results. Even if you select a specific proxy location, if your query includes the name of another country or city, you'll see results for that area. For example, using UK-based proxies and searching for "Starbucks Poland" will provide results from Poland.

With this information in mind, let's try a search. Similar to accepting cookies, you'll need to find the search bar using locators and click on it to input text. You'll then need to provide a query and click the Enter key to search. Finally, loop through the first 5 results to get the business name and address. Let's see the code:

from playwright.sync_api import sync_playwright

def test_proxy_with_playwright():
    proxy_config = {
        "server": "http://gate.decodo.com:7000",  # Proxy host and port
        "username": "user",  # Your proxy username
        "password": "pass"   # Your proxy password
    }

    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=False,
            proxy=proxy_config
        )
        page = browser.new_page()

        try:
            page.goto("https://www.google.com/maps", timeout=10000)

            # Accept cookies
            try:
                page.locator('div.VtwTSb > form:nth-of-type(2) span.UywwFc-vQzf8d').click(timeout=5000)
                page.wait_for_timeout(2000)
            except:
                pass

            # Search for query

            search_bar = page.locator('#searchboxinput')
            search_bar.click(timeout=5000)
            search_bar.fill("Starbucks London")
            search_bar.press("Enter")

            page.wait_for_timeout(7000)  # Wait for search results

            # Grab all result cards
            result_cards = page.locator('a.hfpxzc[aria-label]')

            for i in range(min(5, result_cards.count())):
                try:
                    result = result_cards.nth(i) # Get the search result (business card)
                    name = result.get_attribute("aria-label") # Extract the business name from the aria-label attribute

                    # Go down to the container holding all business info
                    parent = result.locator("xpath=ancestor::div[contains(@class, 'Nv2PK')]")

                    # Find all W4Efsd blocks (these contain all the needed info)
                    address_blocks = parent.locator("div.W4Efsd")

                    # Get the 3rd W4Efsd block, which typically contains the address
                    full_text = address_blocks.nth(2).inner_text(timeout=5000)

                    # Split the text by the "·" symbol and keep only the last part (the address)
                    parts = [part.strip() for part in full_text.split("·")]
                    address = parts[-1] if parts else full_text.strip()

                    # Print name and cleaned address
                    print(f"{i + 1}. {name} - {address}")

                except Exception as e:
                    print(f"{i + 1}. Failed to extract info: {e}")

        except Exception as e:
            print("Failed to retrieve page:", e)

        finally:
            browser.close()

if __name__ == "__main__":
    test_proxy_with_playwright()

from playwright.sync_api import sync_playwright

def test_proxy_with_playwright():
    proxy_config = {
        "server": "http://gate.decodo.com:7000",  # Proxy host and port
        "username": "user",  # Your proxy username
        "password": "pass"   # Your proxy password
    }

    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=False,
            proxy=proxy_config
        )
        page = browser.new_page()

        try:
            page.goto("https://www.google.com/maps", timeout=10000)

            # Accept cookies
            try:
                page.locator('div.VtwTSb > form:nth-of-type(2) span.UywwFc-vQzf8d').click(timeout=5000)
                page.wait_for_timeout(2000)
            except:
                pass

            # Search for query

            search_bar = page.locator('#searchboxinput')
            search_bar.click(timeout=5000)
            search_bar.fill("Starbucks London")
            search_bar.press("Enter")

            page.wait_for_timeout(7000)  # Wait for search results

            # Grab all result cards
            result_cards = page.locator('a.hfpxzc[aria-label]')

            for i in range(min(5, result_cards.count())):
                try:
                    result = result_cards.nth(i) # Get the search result (business card)
                    name = result.get_attribute("aria-label") # Extract the business name from the aria-label attribute

                    # Go down to the container holding all business info
                    parent = result.locator("xpath=ancestor::div[contains(@class, 'Nv2PK')]")

                    # Find all W4Efsd blocks (these contain all the needed info)
                    address_blocks = parent.locator("div.W4Efsd")

                    # Get the 3rd W4Efsd block, which typically contains the address
                    full_text = address_blocks.nth(2).inner_text(timeout=5000)

                    # Split the text by the "·" symbol and keep only the last part (the address)
                    parts = [part.strip() for part in full_text.split("·")]
                    address = parts[-1] if parts else full_text.strip()

                    # Print name and cleaned address
                    print(f"{i + 1}. {name} - {address}")

                except Exception as e:
                    print(f"{i + 1}. Failed to extract info: {e}")

        except Exception as e:
            print("Failed to retrieve page:", e)

        finally:
            browser.close()

if __name__ == "__main__":
    test_proxy_with_playwright()

Here's a quick breakdown of what the script does:

Navigates to https://google.com/maps, checks for cookies, and accepts them if any;
Waits for the page to load fully and finds the search bar with the searchboxinput id;
Enters "Starbucks London" in the search bar;
Simulates clicking the Enter key to perform the search;
Prints the first 5 results in a "Business name – address" format.

Scrape the reviews

It's time to get what you came here for – the reviews. If you've been following the tutorial up to this point, you'll probably have a reasonable idea of how to do it. You'll have to use the same methods as before – waiting for pages to load, clicking, finding relevant elements, and so on.

For this example, let's click the first link in the search results, navigate to the Reviews section, and retrieve the overall rating and number of reviews. Then, get the first 20 reviews. The code portion from the previous section that prints the first 5 results is removed.

One additional function here: you'll need to implement scrolling. Only a small number of reviews are loaded initially, meaning that you'll have to scroll through the list to load the next batch, and so on. Luckily, Playwright has the option to scroll within a defined element, allowing you to load as many reviews as you need. There are also a few tweaks done to ensure stability, which will be explained in just a bit:

from playwright.sync_api import sync_playwright
import re
from hashlib import sha256

def test_proxy_with_playwright():
    proxy_config = {
        "server": "http://gate.decodo.com:7000",
        "username": "user",
        "password": "pass"
    }
    with sync_playwright() as p:
        # Launch Chromium with proxy settings
        browser = p.chromium.launch(
            headless=False,
            proxy=proxy_config
        )

        # Create a new browser context with specific viewport and locale
        context = browser.new_context(
            viewport={'width': 1280, 'height': 1280},
            locale='en-US',
            extra_http_headers={"Accept-Language": "en-US,en;q=0.9"}
        )
        page = context.new_page()

        search_query = "starbucks london"
        number_of_reviews = 20

        try:
            # Navigate to Google Maps
            page.goto("https://www.google.com/maps?hl=en")

            # Attempt to dismiss cookie/consent banner if present
            try:
                page.locator('div.VtwTSb > form:nth-of-type(2) span.UywwFc-vQzf8d').click(timeout=5000)
                page.wait_for_timeout(2000)
            except:
                pass

            # Fill in the search box and press Enter
            search_bar = page.locator('#searchboxinput')
            search_bar.click(timeout=5000)
            search_bar.fill(search_query)
            search_bar.press("Enter")
            page.wait_for_timeout(5000)

            # Click on the first search result
            try:
                page.locator('a.hfpxzc[aria-label]').first.click()
                page.wait_for_timeout(5000)
            except:
                pass

            # Extract the business title
            title = page.locator('h1.DUwDvf.lfPIob').inner_text(timeout=5000)

            # Open the reviews panel
            page.locator('button.hh2c6[aria-label*="Reviews for"]').click(timeout=5000)
            page.wait_for_timeout(5000)

            # Extract overall rating and total number of reviews
            star_rating = page.locator('div.jANrlb div.fontDisplayLarge').inner_text(timeout=5000)
            review_count_text = page.locator('div.jANrlb div.fontBodySmall').inner_text(timeout=5000)
            review_count = int(re.sub(r'\D', '', review_count_text))  # Remove non-digits to get integer count

            print(f"Business Title: {title}")
            print(f"Star Rating: {star_rating}")
            print(f"Total Reviews: {review_count}")
            print("=" * 32)

            # Adjust number of reviews to collect if fewer are available
            if number_of_reviews > review_count:
                print(f"Requested {number_of_reviews} reviews, but only {review_count} available.")
                number_of_reviews = review_count

            reviews = []
            seen_review_ids = set()

            # Scroll and collect reviews until desired number is reached
            while len(reviews) < number_of_reviews:
                review_elements = page.locator('div.jJc9Ad').all()
                new_reviews_found = False

                for review in review_elements:
                    if len(reviews) >= number_of_reviews:
                        break

                    try:
                        # If there's a "More" button, click to expand long review text
                        more_buttons = review.locator('button.w8nwRe.kyuRq')
                        if more_buttons.count() > 0:
                            more_buttons.nth(0).click(timeout=2000)
                            page.wait_for_timeout(300)

                        # Get author's name
                        author = review.locator('div.d4r55').inner_text(timeout=5000)

                        # Try to extract star rating from one of two known patterns
                        rating_element = review.locator('span.kvMYJc')
                        if rating_element.count() > 0:
                            rating = rating_element.first.get_attribute('aria-label')
                            rating_number = re.sub(r'\D', '', rating) if rating else "N/A"

                        else:
                            alt_rating_element = review.locator('span.fzvQIb')
                            if alt_rating_element.count() > 0:
                                rating_text = alt_rating_element.first.inner_text()
                                match = re.search(r'(\d+(?:\.\d+)?)/5', rating_text)
                                rating_number = match.group(1) if match else "N/A"

                            else:
                                rating_number = "N/A"

                        # Extract review text, or fallback if not found
                        review_text_locator = review.locator('span.wiI7pd')
                        review_text = (
                            review_text_locator.inner_text(timeout=5000)
                            if review_text_locator.count() > 0 else
                            "No review comment"
                        )

                        # Use a unique key for deduplication: review ID if present, else hash of author + text
                        review_id_attr = review.get_attribute("data-review-id")
                        review_key = review_id_attr or sha256(f"{author}-{review_text}".encode()).hexdigest()

                        # Skip if already seen

                        if review_key in seen_review_ids:
                            continue

                        reviews.append((author, rating_number, review_text))
                        seen_review_ids.add(review_key)
                        new_reviews_found = True

                    except Exception as e:
                        print(f"Failed to extract review info: {e}")

                if not new_reviews_found:
                    print("No new unique reviews found -- stopping early.")
                    break

                # Attempt to scroll the review container to load more
                try:
                    scroll_container = page.locator('div.m6QErb.DxyBCb.kA9KIf.dS8AEf.XiKgde').nth(2)
                    scroll_element = scroll_container.element_handle()
                    page.evaluate("(el) => el.scrollTop = el.scrollHeight", scroll_element)
                    page.wait_for_timeout(1500)

                except:
                    print("Scrolling failed -- exiting.")
                    break

            # Print collected reviews
            print(f"\nCollected {len(reviews)} reviews for {title}:")
            for i, (author, rating_number, review_text) in enumerate(reviews, 1):
                print(f"{i}. {author}, {rating_number}/5, '{review_text}'")

        except Exception as e:
            print("Error during script execution:", e)

        finally:
            browser.close()

if __name__ == "__main__":
   test_proxy_with_playwright()

from playwright.sync_api import sync_playwright
import re
from hashlib import sha256

def test_proxy_with_playwright():
    proxy_config = {
        "server": "http://gate.decodo.com:7000",
        "username": "user",
        "password": "pass"
    }
    with sync_playwright() as p:
        # Launch Chromium with proxy settings
        browser = p.chromium.launch(
            headless=False,
            proxy=proxy_config
        )

        # Create a new browser context with specific viewport and locale
        context = browser.new_context(
            viewport={'width': 1280, 'height': 1280},
            locale='en-US',
            extra_http_headers={"Accept-Language": "en-US,en;q=0.9"}
        )
        page = context.new_page()

        search_query = "starbucks london"
        number_of_reviews = 20

        try:
            # Navigate to Google Maps
            page.goto("https://www.google.com/maps?hl=en")

            # Attempt to dismiss cookie/consent banner if present
            try:
                page.locator('div.VtwTSb > form:nth-of-type(2) span.UywwFc-vQzf8d').click(timeout=5000)
                page.wait_for_timeout(2000)
            except:
                pass

            # Fill in the search box and press Enter
            search_bar = page.locator('#searchboxinput')
            search_bar.click(timeout=5000)
            search_bar.fill(search_query)
            search_bar.press("Enter")
            page.wait_for_timeout(5000)

            # Click on the first search result
            try:
                page.locator('a.hfpxzc[aria-label]').first.click()
                page.wait_for_timeout(5000)
            except:
                pass

            # Extract the business title
            title = page.locator('h1.DUwDvf.lfPIob').inner_text(timeout=5000)

            # Open the reviews panel
            page.locator('button.hh2c6[aria-label*="Reviews for"]').click(timeout=5000)
            page.wait_for_timeout(5000)

            # Extract overall rating and total number of reviews
            star_rating = page.locator('div.jANrlb div.fontDisplayLarge').inner_text(timeout=5000)
            review_count_text = page.locator('div.jANrlb div.fontBodySmall').inner_text(timeout=5000)
            review_count = int(re.sub(r'\D', '', review_count_text))  # Remove non-digits to get integer count

            print(f"Business Title: {title}")
            print(f"Star Rating: {star_rating}")
            print(f"Total Reviews: {review_count}")
            print("=" * 32)

            # Adjust number of reviews to collect if fewer are available
            if number_of_reviews > review_count:
                print(f"Requested {number_of_reviews} reviews, but only {review_count} available.")
                number_of_reviews = review_count

            reviews = []
            seen_review_ids = set()

            # Scroll and collect reviews until desired number is reached
            while len(reviews) < number_of_reviews:
                review_elements = page.locator('div.jJc9Ad').all()
                new_reviews_found = False

                for review in review_elements:
                    if len(reviews) >= number_of_reviews:
                        break

                    try:
                        # If there's a "More" button, click to expand long review text
                        more_buttons = review.locator('button.w8nwRe.kyuRq')
                        if more_buttons.count() > 0:
                            more_buttons.nth(0).click(timeout=2000)
                            page.wait_for_timeout(300)

                        # Get author's name
                        author = review.locator('div.d4r55').inner_text(timeout=5000)

                        # Try to extract star rating from one of two known patterns
                        rating_element = review.locator('span.kvMYJc')
                        if rating_element.count() > 0:
                            rating = rating_element.first.get_attribute('aria-label')
                            rating_number = re.sub(r'\D', '', rating) if rating else "N/A"

                        else:
                            alt_rating_element = review.locator('span.fzvQIb')
                            if alt_rating_element.count() > 0:
                                rating_text = alt_rating_element.first.inner_text()
                                match = re.search(r'(\d+(?:\.\d+)?)/5', rating_text)
                                rating_number = match.group(1) if match else "N/A"

                            else:
                                rating_number = "N/A"

                        # Extract review text, or fallback if not found
                        review_text_locator = review.locator('span.wiI7pd')
                        review_text = (
                            review_text_locator.inner_text(timeout=5000)
                            if review_text_locator.count() > 0 else
                            "No review comment"
                        )

                        # Use a unique key for deduplication: review ID if present, else hash of author + text
                        review_id_attr = review.get_attribute("data-review-id")
                        review_key = review_id_attr or sha256(f"{author}-{review_text}".encode()).hexdigest()

                        # Skip if already seen

                        if review_key in seen_review_ids:
                            continue

                        reviews.append((author, rating_number, review_text))
                        seen_review_ids.add(review_key)
                        new_reviews_found = True

                    except Exception as e:
                        print(f"Failed to extract review info: {e}")

                if not new_reviews_found:
                    print("No new unique reviews found -- stopping early.")
                    break

                # Attempt to scroll the review container to load more
                try:
                    scroll_container = page.locator('div.m6QErb.DxyBCb.kA9KIf.dS8AEf.XiKgde').nth(2)
                    scroll_element = scroll_container.element_handle()
                    page.evaluate("(el) => el.scrollTop = el.scrollHeight", scroll_element)
                    page.wait_for_timeout(1500)

                except:
                    print("Scrolling failed -- exiting.")
                    break

            # Print collected reviews
            print(f"\nCollected {len(reviews)} reviews for {title}:")
            for i, (author, rating_number, review_text) in enumerate(reviews, 1):
                print(f"{i}. {author}, {rating_number}/5, '{review_text}'")

        except Exception as e:
            print("Error during script execution:", e)

        finally:
            browser.close()

if __name__ == "__main__":
   test_proxy_with_playwright()

There's quite a lot of changes here – before it looks too much like a "draw the rest of the ******* owl" meme, here's a breakdown of what's been changed in the code:

Imports. Added re and hashlib to the mix – re handles all the regex magic for pulling out clean numbers (like ratings), while hashlib.sha256 helps create unique fingerprints for each review, so we don’t accidentally grab duplicates.
Browser context. Included extra parameters to the browser context – viewport sets the window width and height – this is needed to properly display the reviews page and have them load once a result is clicked. Alternatively, a scroll function could also be used here, but this method has provided more success. Locale and extra HTTP headers were also added to ensure a more realistic and consistent request.
Page link. A host language (?hl=en) parameter was also added to the link, so that regardless of the proxy you connect to, the interface result will always be in English – this is important for finding the Reviews tab, as it's not always in the same location and language.
Search & navigation. The search interaction is extended to click on the first search result, which navigates us to the tab with reviews.
Review section access. Once on the location page, find and click the Reviews section. It's found within the aria-label, as it's the most consistent way of seeing which button opens reviews. That's why the locale was set to English earlier, as the results for different languages will change this aria-label.
Review summary extraction. Before diving into scraping individual reviews, pull out some high-level statistics, such as the star rating and total number of reviews, which are perfect for setting context or building summary dashboards later.
Review loop. This is where the magic of scraping happens. The script navigates through the reviews section, extracts every block, and expands long reviews using the More button. It retrieves the author, rating, and full text of the review.
Deduplication logic. When scrolling, some reviews may reappear and repeat themselves. To prevent this, the script has an innovative deduplication system: if a review has an ID, it's used; if not, a SHA-256 hash is generated from the author's name and text.
Scroll handling. Reviews load dynamically as you scroll. The script scrolls the review container to fetch more entries until we reach our target count or run out of fresh ones.

Output. Instead of just printing business information, it now prints detailed review entries, including author, rating, and comment.
Error handling. More robust try-except blocks make the script resilient – one flaky review won't crash the whole run, which is key when scraping at scale.
Browser close. Wrapped up in a finally block to guarantee a clean shutdown, because even scrapers need good hygiene.

Storing and Analyzing the Data

To finalize your script and export the results to a CSV file, you can add a CSV-writing section just before printing the reviews. Here's what to do:

Add the csv library at the beginning of your script:

import csv

2. Add the following snippet right before the final print(f"\nCollected {len(reviews)} reviews for {title}:") line:

# Write results to CSV
with open("google_reviews.csv", "w", newline="", encoding="utf-8") as csvfile:
    writer = csv.writer(csvfile)

    # First row: business title, star rating, and total reviews
    writer.writerow([title, star_rating, review_count])

    # Second row: column headers
    writer.writerow(["Reviewer", "Star rating", "Review text"])

    # Write each review
    for author, rating_number, review_text in reviews:
        writer.writerow([author, rating_number, review_text])

# Write results to CSV
with open("google_reviews.csv", "w", newline="", encoding="utf-8") as csvfile:
    writer = csv.writer(csvfile)

    # First row: business title, star rating, and total reviews
    writer.writerow([title, star_rating, review_count])

    # Second row: column headers
    writer.writerow(["Reviewer", "Star rating", "Review text"])

    # Write each review
    for author, rating_number, review_text in reviews:
        writer.writerow([author, rating_number, review_text])

The snippet creates a new file to export to, writes the first row with business name, rating, and total reviews, adds the first header row, and then displays all the collected reviews in a structured format.

To analyze the information further, you can use libraries like pandas to extract meaningful data or utilize AI tools that allow you to upload the result file and provide a smart summary of what the reviews are saying.

Troubleshooting common issues

The script is built with several try-except blocks to ensure that it doesn't break in the process. However, as the Google Maps page can often throw some curveballs that are hard to dodge, here are a few things to keep in mind:

Proxy stability. Use premium, rotating residential proxies from reputable providers to reduce connection drops and avoid bans.
Loading times. Add dynamic wait logic or increase fixed timeout values to accommodate slower page loads, especially during high traffic.
Locators changing. Target stable attributes like aria-label, button text, or heading tags instead of relying on class names that often change.
No scrolling. Ensure you select the correct scrollable container and apply scrollTop = scrollHeight to trigger the lazy-loaded reviews.
Page structure. Test with different types of businesses and search queries to handle layout variations, such as missing review sections or alternative result formats.

Conclusion

In this guide, you explored how to scrape Google Maps reviews using Playwright with support for proxies, dynamic scrolling, and smart selectors. We tackled common challenges like changing page structures, flaky locators, and proxy instability, offering practical solutions to keep your scraper resilient. No matter what your goals are for Google review scraping, this setup gets you reliable review data without any headaches.

About the author

Zilvinas Tamulis

Technical Copywriter

A technical writer with over 4 years of experience, Žilvinas blends his studies in Multimedia & Computer Design with practical expertise in creating user manuals, guides, and technical documentation. His work includes developing web projects used by hundreds daily, drawing from hands-on experience with JavaScript, PHP, and Python.

Connect with Žilvinas via LinkedIn

All information on Decodo Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Decodo Blog or any third-party websites that may belinked therein.

In this article

Top-tier residential proxies

Get 115M+ residential IPs with quick response times and excellent success rates.

Video: How to Set Up SOCKS5 Proxy Servers

Do you need a SOCKS5 proxy? In this video, we will show you a step-by-step SOCKS5 proxy setup. Learn how to get SOCKS5 and other proxy protocols - HTTP & HTTPS proxies.

Martin Ganchev

Dec 28, 2023

2 min read

DATA COLLECTION

SEARCH ENGINE OPTIMIZATION

PYTHON

How to Scrape Google Search Data

Business success is driven by data, and few data sources are as valuable as Google’s Search Engine Results Page (SERP). Collecting this data can be complex, but various tools and automation techniques make it easier. This guide explores practical ways to scrape Google search results, highlights the benefits of such efforts, and addresses common challenges.

Dominykas Niaura

Dec 30, 2024

7 min read

PYTHON

DATA COLLECTION

How to Scrape Google Maps: A Step-By-Step Tutorial 2025

Ever wondered how to extract valuable business data directly from Google Maps? Whether you're building a lead list, analyzing local markets, or researching competitors, scraping Google Maps can be a goldmine of insights. In this guide, you’ll learn how to automate the process step by step using Python – or skip the coding altogether with Decodo’s plug-and-play scraper.

Dominykas Niaura

Mar 29, 2024

10 min read

Frequently asked questions

What are the risks involved in scraping Google reviews?

Scraping Google reviews risks encountering technical obstacles, including IP blocks, CAPTCHAs, and rate limits, as well as data quality challenges such as incomplete information or format changes. To mitigate these risks, consider using rotating proxies, implementing delays between requests, rotating user agents, and building robust error handling to ensure data accuracy and consistency.

How often should I scrape Google reviews?

When scraping Google reviews, use multiple rotating proxies as your primary strategy. This allows you to collect data every 1-2 hours while distributing requests across different IP addresses, helping to avoid detection and rate limiting. Combining proxy rotation with reasonable delays between requests creates an optimal balance between data freshness and avoiding restrictions that often block web scraping efforts.

What tools can help me scrape Google reviews efficiently?

To efficiently scrape Google reviews, use specialized proxy providers like Decodo alongside Python libraries such as Selenium and Beautiful Soup to handle dynamic content and parse review data seamlessly. Proxy services provide residential IPs that appear as legitimate users, which, combined with powerful scraping libraries, significantly reduce detection risks while maintaining high uptime for your scraping operations.

How can I use the data after scraping Google reviews?

After collecting Google reviews, turn the raw data into valuable insights by running sentiment analysis to find common customer opinions. You can also build competitor dashboards using review scores from different locations. When this data is connected to business intelligence tools, it helps visualize trends and spot areas where the business can improve based on honest customer feedback.