Back to blog

How to Scrape Google Flights: Extract Prices, Airlines, and Schedules with Python

Google Flights is a rich source of crucial flight information, such as prices, airlines, times, stops, durations, and emissions, but scraping this information has never been easy. The flight search engine hides valuable data behind JavaScript-heavy pages and anti-bot protections. This guide explains how to scrape Google Flights using Python by building a browser-based scraper powered by Playwright.

TL;DR

  • Google Flights is a valuable source for flight information, but is difficult to scrape due to its JavaScript-heavy pages and anti-bot protections
  • Use Playwright to render JavaScript-heavy pages
  • Use Pydantic to structure and validate data
  • Use rotating residential proxies to avoid blocks
  • Organize your project into multiple files for maintainability
  • Export scraped results to JSON for analysis

Prerequisites and environment setup

Before you scrape Google Flights, you need to prepare a Python environment and install the required libraries. This step makes your web scraper run in a controlled environment and prevents dependency conflicts with other Python web scraping projects.

Because Google aggressively monitors automated traffic, scraping flight information without the proper infrastructure will likely result in blocked requests. To avoid this, we'll configure residential proxies that rotate IP addresses and simulate real users accessing the platform.

Preparing a Python environment

You must have the required Python version before you can start the web scraping process.

This is because most modern scraping libraries rely on recent Python versions for performance improvements and compatibility with asynchronous frameworks. Using outdated Python versions can cause dependency errors or missing features.

You'll need the latest version of Python installed on your computer.

For this project, we'll use Python 3.9+, which supports asynchronous programming and modern packaging tools.

But first:

Check your Python version:

On Windows: open the Command Prompt and type:

python --version or py –version

On macOS and Linux: open the Terminal and type:

python --version or python3 --version

If you don't have the latest version of Python on your system, install it by visiting the downloads page from the official website before continuing.

Create a virtual environment:

Next, create a virtual environment and activate it to isolate dependencies for individual projects.

This prevents library conflicts between projects and ensures reproducibility when sharing code with other developers. It's also considered a best practice in Python development.

Use:

python -m venv flights_scraper_env

Then activate the virtual environment by running these commands:

On Mac/Linux:

source flights_scraper_env/bin/activate

On Windows:

flights_scraper_env\Scripts\activate

Install required dependencies:

To scrape Google Flights effectively, you need tools that can render dynamic web pages and structure extracted data. 

We'll use several Python libraries, including:

  • Playwright – browser automation for dynamic pages
  • Pydantic – define and validate structured data models
  • Python-dotenv – securely manage environment variables like proxy credentials

These libraries work together to create a reliable scraping pipeline. Playwright loads JavaScript-rendered pages, Pydantic validates the scraped data, and dotenv ensures sensitive information remains outside the source code.

Install the three libraries by running the following command:

pip install playwright pydantic python-dotenv

Next, install the Chromium browser binaries required by Playwright:

npx playwright install chromium

This command will download and install a headless Chromium browser that Playwright needs for running automated tests on Google Flights.

Setting up proxies

Google's anti-bot systems monitor traffic patterns, IP addresses, and request frequency. If multiple requests originate from the same IP address in a short period, Google may block the connection or display CAPTCHA challenges. 

You'll need to use proxies to navigate around all this.

The two common types of proxies you can use for this project are rotating datacenter proxies and rotating residential proxies.

While datacenter proxies are high-speed and cost-effective, they send requests through server/cloud IP addresses owned by hosting providers (not real consumer ISPs). Google systems can also identify these proxies as automated infrastructure and block them or display CAPTCHA challenges when you use them for large-scale scraping.

Rotating residential proxies can solve this. These proxies support geo-targeting. You can route requests through IPs in specific countries, which is critical for capturing region-specific flight pricing (more on this in a later section). This reduces IP bans, CAPTCHAs or request blocks.

Setting up Decodo residential proxies

Residential proxies allow you to simulate requests from real users around the world. This is particularly useful when scraping flight prices as they vary depending on the user's geographic location.

Decodo provides residential proxies for web scraping and large-scale data collection. These proxies use endpoints. When you connect to an endpoint, your traffic is routed through a random IP address. You get high anonymity so you can avoid rate limits and enable geo-targeted searches. They also offer accurate geo-location targeting across over 195 locations.

Here is a request example of Decodo residential proxies in Python:

import requests
url = 'https://ip.decodo.com/json'
username = 'username'
password = 'password'
proxy = f"http://{username}:{password}@gate.decodo.com:7000"
result = requests.get(url, proxies = {
'http': proxy,
'https': proxy
})
print(result.text)

Follow these simple steps to set up rotating residential proxies with Decodo:

  • Go to the Residential → Proxy setup page and log in to the dashboard
  • Select "Residential" proxies and choose a plan
  • Go to the parameter selection section below your authentication methods
  • Set the location of your proxy
  • Choose your preferred Session type (Sticky or Rotating)
  • Select your preferred Protocol format: HTTP(S) or SOCKS5

From there, use the generated username and password for authentication to integrate with browsers or scrapers.

You can also watch this video guide to learn how to set up and use Decodo residential proxies.

Tired of getting blocked?

Decodo's residential proxies give your scraper access to millions of real IPs across 195+ locations, so anti-bot systems never see you coming.

Storing proxy credentials securely

Hardcoding credentials inside source code can cause a security risk and make your project harder to maintain. Instead, store sensitive information securely in environment variables to keep it separate from the application code.

Create a .env file in your project root directory (alongside your Python files) to store your credentials locally and keep them excluded from version control systems like Git.

PROXY_HOST=gate.decodo.com
PROXY_PORT=10000
PROXY_USER=username
PROXY_PASSWORD=password

Then, at the top of proxy_manager.py (which you will create in the next section), load the variables like this:

# proxy_manager.py
from dotenv import load_dotenv
import os
load_dotenv()
proxy_host = os.getenv("PROXY_HOST")
proxy_port = os.getenv("PROXY_PORT")
proxy_user = os.getenv("PROXY_USER")
proxy_password = os.getenv("PROXY_PASSWORD")

Choose a project structure

Organize your project into multiple files to improve readability and simplify maintenance.

Use a clean project structure to separate models, scraping logic, proxy handling, and execution scripts. 

This modular design is especially useful when scaling the scraper or adding additional features later.

flights-scraper/
├── .env # Proxy credentials (never commit this)
├── models.py # Pydantic data models
├── proxy_manager.py # Proxy configuration logic
├── scraper.py # Core scraping logic
└── main.py # Entry point and orchestration

Here is the purpose of each file:

  • models.py: Defines data models for search parameters and results.
  • scraper.py: Contains the Playwright scraping logic.
  • proxy_manager.py: Handles proxy rotation and configuration.
  • main.py: Runs the scraper and saves results.

Defining data models for flight information

When scraping flight data, raw dictionaries can quickly become messy.

Data models define the structure of both the input parameters and the scraped results, enforce consistent data formats, and simplify validation. This approach reduces errors when processing large datasets.

We'll use Pydantic, a popular Python library designed for data validation and JSON serialization. With this data model, the scraped data matches the expected schema and automatically converts values into appropriate types.

Configuration model

The configuration model stores the parameters you use to generate flight searches. These parameters include origin city/airport, destination city/airport, departure date, return date (optional), trip type (one-way vs. round-trip), target currencies, list of proxy countries to scrape from.

The model also makes the scraper easier to reuse. Instead of editing code every time you change routes or dates, you simply update the configuration object.

Here is an example configuration model using Pydantic:

from pydantic import BaseModel
from datetime import date
from enum import Enum
from typing import Optional, List

Using Python Enums for fixed values like trip type:

class TripType(str, Enum):
ONE_WAY = "one_way"
ROUND_TRIP = "round_trip"

Add the following code to models.py:

# models.py
from datetime import date
from typing import Optional, List
from enum import Enum
from pydantic import BaseModel, field_validator, model_validator
class TripType(str, Enum):
ONE_WAY = "one_way"
ROUND_TRIP = "round_trip"
class FlightSearchConfig(BaseModel):
origin: str
destination: str
departure_date: date
return_date: Optional[date] = None
trip_type: TripType
currency: str = "USD"
proxy_countries: List[str]
@field_validator("departure_date")
@classmethod
def departure_must_be_future(cls, value: date):
if value <= date.today():
raise ValueError("Departure date must be in the future")
return value
@field_validator("proxy_countries")
@classmethod
def proxy_list_not_empty(cls, value):
if not value:
raise ValueError("proxy_countries cannot be empty")
return value
@model_validator(mode="after")
def validate_trip_logic(self):
if self.trip_type == TripType.ROUND_TRIP and not self.return_date:
raise ValueError("Round trip requires a return date")
if self.trip_type == TripType.ONE_WAY and self.return_date:
raise ValueError("One-way trip cannot have a return date")
if self.return_date and self.return_date <= self.departure_date:
raise ValueError("Return date must be after departure date")
return self

We'll explain the fields for the above code briefly so you can understand this better:

Field

Type

Meaning

origin

str

Departure airport/city code (e.g., "NBO" for Nairobi)

destination

str

Arrival airport/city code (e.g., "DXB")

departure_date

date

Date of the outbound flight

return_date

Optional[date]

Return flight date (only used for round trips). Default is None.

trip_type

TripType

Enum describing the trip type (e.g., ONE_WAY, ROUND_TRIP)

currency

str

Currency used for flight prices. Defaults to "USD".

proxy_countries

List[str]

Countries used for proxy requests to search prices from different regions (e.g., ["US", "DE", "IN"]).

Date validation makes sure:

  • Departure dates are valid
  • Return date follows departure date
  • Currency format remains consistent

Flight result model

Now, let's define the structure for scraped data.

The flight result model defines how extracted data will be stored. Each scraped flight becomes a structured object containing fields such as:

  • Airline name
  • Departure time
  • Arrival time
  • Flight duration
  • Number of stops
  • Layover details, price (as a string with currency symbol)
  • Emissions data
  • Proxy country used and scrape timestamp.

As shown below:

class FlightResult(BaseModel):
airline: str
departure_time: str
arrival_time: str
duration: str
stops: int
layover_details: Optional[str] = None
price: str # e.g., "$312"
emissions: Optional[str] = None # e.g., "176 kg CO2e"
proxy_country: str
scraped_at: datetime = datetime.utcnow()

You will also need to add a method to serialize scraped flight results to JSON-friendly dictionaries. This is important for interoperability, data storage, and efficient data exchange across different systems and programming languages.

def to_dict(self) -> dict:
return self.model_dump(mode="json")

Here is the complete code for models.py, which defines all the data models for flight information:

# models.py
from pydantic import BaseModel, Field, field_validator, model_validator
from typing import Optional, List
from datetime import date, datetime, timezone
from enum import Enum
class TripType(str, Enum):
ONE_WAY = "one_way"
ROUND_TRIP = "round_trip"
class SearchConfig(BaseModel):
origin: str
destination: str
departure_date: date
return_date: Optional[date] = None
trip_type: TripType = TripType.ONE_WAY
currency: str = "USD"
proxy_countries: List[str] = Field(default_factory=lambda: ["US"])
@field_validator("origin", "destination")
@classmethod
def validate_airport(cls, v: str) -> str:
v = v.upper()
if len(v) != 3 or not v.isalpha():
raise ValueError("Airport code must be a 3-letter IATA code.")
return v
@field_validator("departure_date")
@classmethod
def departure_must_be_future(cls, v: date) -> date:
if v < date.today():
raise ValueError("Departure date must be in the future.")
return v
@field_validator("currency")
@classmethod
def currency_uppercase(cls, v: str) -> str:
v = v.upper()
if len(v) != 3 or not v.isalpha():
raise ValueError("Currency must be a valid 3-letter ISO code.")
return v
@model_validator(mode="after")
def validate_trip(self):
if self.return_date and self.return_date <= self.departure_date:
raise ValueError("Return date must be after departure date.")
if self.trip_type == TripType.ROUND_TRIP and not self.return_date:
raise ValueError("Return date required for round trip.")
return self
class FlightResult(BaseModel):
airline: str
departure_time: datetime
arrival_time: datetime
duration_minutes: int
stops: int
layover_details: Optional[str] = None
price: float
emissions: Optional[str] = None
proxy_country: str
scraped_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
def to_dict(self) -> dict:
return self.model_dump(mode="json")

Why Pydantic is better than raw dictionaries

Using raw dictionaries for scraped data may seem convenient initially. However, dictionaries don't enforce data types or validate values, which can lead to inconsistent datasets. Over time, this can cause significant issues when analyzing or exporting the data.

Here is why Pydantic is the better choice:

  • Automatic type validation catches malformed data early
  • Clear schema makes the codebase self-documenting
  • Easy serialization to JSON with .model_dump()

Building dynamic search URLs for Google Flights

Before you build the scraper, you need a reliable way to construct Google Flights search URLs. Google Flights encodes search parameters directly in the URL. If you get the URL right, the page loads with results already populated without requiring any form submissions.

Note: You will need to add the following codes inside scraper.py:

Anatomy of a Google Flights URL

A typical Google Flights URL looks like this:

https://www.google.com/travel/flights?tfs=...

There are two ways to search Google Flights through the URL:

This URL structure is simpler and works in a browser, but is less reliable for automated scraping because results can be inconsistent. It also requires minimal encoding but offers less control over advanced parameters.

You can use this structure for small scraping projects.

  • Structured parameter approach: Uses ?tfs= … for example, https://www.google.com/travel/flights?tfs=... This is the encoded format Google uses internally. It encodes origin, destination, dates, and trip type into a compact string. This is more stable for automated scraping and offers a more precise search configuration. However, it's also hard to generate programmatically and requires reverse engineering Google's encoding format.

We'll use the natural-language query approach for simplicity and reliability at a moderate scale.

URL encoding considerations

City names or routes may contain spaces or special characters. You need to URL-encode these before you use them in the query string.

For example, using the natural-language query approach: 

Flights from New York to London

Becomes:

https://www.google.com/travel/flights?q=Flights%20from%20New%20York%20to%20London

Python provides built-in urllib utilities to handle this automatically.

For example:

from urllib.parse import quote
query = "Flights from NBO to DXB on 2025-06-10"
encoded_query = quote(query)
url = f"https://www.google.com/travel/flights?q={encoded_query}"
print(url)

The Python code above will generate a Google Flights search URL for the query “Flights from NBO to DXB on 2025-06-10”.

Building one-way vs. round-trip Google Flights URLs

The query format slightly changes depending on the trip type. One-way example:

def build_one_way_url(origin, destination, departure_date):
query = f"Flights from {origin} to {destination} on {departure_date}"
return f"https://www.google.com/travel/flights?q={quote(query)}"

This function builds a Google Flights search URL for a one-way trip.

Round-trip example:

from urllib.parse import quote
def build_round_trip_url(origin, destination, departure_date, return_date):
query = (
f"Round trip flights from {origin} to {destination} "
f"departing {departure_date} returning {return_date}"
)
return f"https://www.google.com/travel/flights?q={quote(query)}"

This function builds a Google Flights search URL for a round-trip.

Setting currency with the curr parameter

Google Flights will show prices in the default currency for the user's region, and this can make it harder to compare results scraped from different geographic locations.

You need to specify the currency using the curr parameter for the scraper to extract flight prices in a consistent currency.

This will get the scraper to extract prices in a consistent currency.

For example, this URL will show flights that use the USD currency:

https://www.google.com/travel/flights?q=Flights%20from%20NBO%20to%20DXB&curr=USD

Here are practical tips before you run automated scraping projects with your dynamic search URLs for Google Flights:

  • Always test generated URLs manually in a browser before relying on them in an automated scrape. Paste the URL into Chrome and verify that the results page loads correctly.
  • Google Flights occasionally changes its URL structure. Log the URLs your scraper generates alongside the results. If results suddenly drop to zero, you can inspect recent URLs to catch breaking changes early.
  • Use URL-encoding for city names with spaces or special characters. The urlencode() function handles this automatically, but double-check names by testing them manually.
  • For round-trip searches, include both departure and return dates in the query string.

Building the proxy manager

The proxy manager handles how Decodo residential proxy credentials are formatted and injected into each browser session. It uses the geo-targeted residential proxy endpoints to route each request through a specific country.

Create proxy_manager.py:

# proxy_manager.py
import os
from dotenv import load_dotenv
load_dotenv()
class ProxyManager:
"""
Manages Decodo residential proxy configuration.
Formats proxy credentials for use with Playwright.
"""
def __init__(self):
self.host = os.getenv("PROXY_HOST", "gate.decodo.com")
self.port = int(os.getenv("PROXY_PORT", "10000"))
self.user = os.getenv("PROXY_USER")
self.password = os.getenv("PROXY_PASSWORD")
if not self.user or not self.password:
raise ValueError("Proxy credentials not found. Check your .env file.")
def get_proxy(self, country_code: str) -> dict:
"""
Returns a Playwright-compatible proxy config for a given country.
Decodo geo-targeting uses country-specific subdomains.
"""
# Decodo country-specific endpoint format
host = f"{country_code.lower()}.{self.host}"
return {
"server": f"http://{host}:{self.port}",
"username": self.user,
"password": self.password
}

This proxy manager formats the Decodo endpoint for geo-targeted routing so that each country gets its own browser session. This is to enable the scraper to fetch flight prices from the correct regional perspective.

Initializing the scraper and launching Playwright

Your Google Flights search URLs are ready to go, so let's build the core scraper.

We'll use Playwright because it can scrape websites with dynamic content and simulate real user browsing behavior. We'll also configure it with proxy settings from your proxy manager.

Scraper class structure

Create scraper.py and define a main GoogleFlightsScraper class that organizes the different responsibilities: URL generation, browser management, data extraction, and results aggregation.

Keeping these as separate methods makes the code easier to debug, test, and extend.

Here is the scraper class structure in Python that accepts a proxy manager and search config

# scraper.py
class GoogleFlightsScraper:
def __init__(self, config, proxy_manager):
self.config = config
self.proxy_manager = proxy_manager
async def build_search_url(self):
pass
async def launch_browser(self):
pass
async def extract_flights(self, page):
pass
async def run(self):
pass

Launching Playwright

Playwright supports asynchronous execution, which improves performance when scraping JavaScript-heavy pages like Google Flights.

Start by initializing the Playwright runtime. We'll use async_playwright() for asynchronous execution, which allows multiple scraping tasks to run concurrently if needed.

Add the following logic inside scraper.py inside the launch_browser method.

from playwright.async_api import async_playwright
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)

Next, configure the headless Chromium browser with proxy settings passed with Decodo credentials to reduce automation detection.

browser = await p.chromium.launch(
headless=True,
proxy={
"server": f"http://{proxy_host}:{proxy_port}",
"username": proxy_user,
"password": proxy_password
},
args=[
"--disable-blink-features=AutomationControlled",
"--no-sandbox"
]
)

The complete function will look like this:

# scraper.py
from playwright.async_api import async_playwright
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=True,
proxy={
"server": proxy["server"],
"username": proxy["username"],
"password": proxy["password"]
},
args=[
"--disable-blink-features=AutomationControlled",
"--no-sandbox"
]
)

This way, requests go through residential IPs, and authentication is handled at the browser level, so Google treats sessions as real users.

Applying stealth and anti-detection measures

Even with proxies in place, headless browsers have distinctive characteristics that Google's anti-bot systems can detect. Configure your browser environment to make your sessions look more like real users.

Set a realistic user agent and viewport dimensions to match common screen sizes:

context = await browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
viewport={"width": 1366, "height": 768}
)

You will need to block unnecessary resources such as images, fonts, and stylesheets to make pages load quickly and speed up scraping.

await context.route(
"**/*",
lambda route: route.abort()
if route.request.resource_type in ["image", "font", "stylesheet"]
else route.continue_()
)

Automated scripts that run too quickly can trigger anti-bot systems. You need to add randomized delays between actions to mimic human browsing patterns. 

Example of a randomized delay function:

import random
import asyncio
async def random_sleep():
delay = random.uniform(3, 6)
print(f"Sleeping for {delay:.2f} seconds...")
await asyncio.sleep(delay)
print("Done sleeping!")
# Run the async function
asyncio.run(random_sleep())

This function simulates human browsing patterns.

You have configured the browser. Now, open a new page and navigate to the generated search URL.

page = await context.new_page()
await page.goto(
search_url,
wait_until="domcontentloaded",
timeout=60000
)

Then wait for the flight results to appear.

await page.wait_for_selector("div[jsname]")

This will get the page to finish rendering before data extraction begins.

If the selector never appears, handle the error gracefully using the following function:

try:
await page.wait_for_selector("div[jsname]", timeout=15000)
except:
print("No flight results found or page failed to load.")

Extracting flight data from the page

With the browser running and the Google Flights results page loaded, the next step is to parse the page and pull out the structured flight data you need.

Google Flights dynamically generates its HTML, so you need to choose the right selector.

Identify flight card elements

Google Flights renders its results as a list of flight cards. Each card contains the core details for a single flight option. The specific CSS selectors Google uses can change over time, but the general DOM structure has remained relatively consistent.

Start by inspecting the page using browser developer tools.

Note that Google uses obfuscated, short, and randomized CSS class names (e.g., pIav2d, YMlIz, sSHqwe) generated at build time to significantly reduce file sizes, prevent CSS naming conflicts in large applications, and hinder web scraping. 

You can select all result cards with:

flight_cards = await page.query_selector_all("div.pIav2d")

These classes often change frequently because they are hashed based on component structure or updated during continuous deployment, breaking static scraping bots.

You need to build selectors that rely on structural position and ARIA attributes where possible, and verify selectors regularly.

Extracting individual data points

Each flight card contains multiple nested elements. Let's see how to extract each data point with Playwright.

Add this helper function inside scraper.py:

Price

price_element = await card.query_selector(".YMlIz")
if price_element:
price = await price_element.inner_text()
else:
price = None

Departure and arrival times

time_elements = await card.query_selector_all("span")
if len(time_elements) >= 2:
departure_time = (await time_elements[0].inner_text()).strip()
arrival_time = (await time_elements[1].inner_text()).strip()
else:
departure_time = arrival_time = None # Or handle error

Flight duration

duration = await card.query_selector(".gvkrdb").inner_text()

Number of stops

The stop information typically appears as text:

Nonstop

1 stop

2 stops

To extract the number of stops:

stops_text = await card.query_selector(".EfT7Ae").inner_text()
if "Nonstop" in stops_text:
stops = 0
else:
stops = int(stops_text.split()[0])

Airline name

airline = await card.query_selector(".sSHqwe").inner_text()

Emissions data

Some flights display estimated emissions.

Extract it using:

emissions_element = await card.query_selector(".V1iAHe")
emissions = None
if emissions_element:
emissions = await emissions_element.inner_text()

Error handling

Google Flights pages can vary depending on the route. Wrap each extraction step in try/except blocks to handle missing elements gracefully and avoid crashing the scraper.

Use:

try:
price = await price_element.inner_text()
except:
price = None

Returning None for missing fields will keep the dataset usable.

For efficiency, limit extraction to the first 10–15 results, which usually contain the most relevant flights.

A complete async Playwright function that extracts all the flight data points from a Google Flights card element will look like this:

# scraper.py
async def extract_flight_data(card):
"""Extract individual flight data points from a Google Flights card."""
data = {
"price": None,
"departure_time": None,
"arrival_time": None,
"duration": None,
"stops": None,
"airline": None,
"emissions": None
}
# Price
try:
price_element = await card.query_selector(".YMlIz")
if price_element:
data["price"] = (await price_element.inner_text()).strip()
except:
pass
# Departure and arrival times
try:
time_elements = await card.query_selector_all("span")
if len(time_elements) >= 2:
data["departure_time"] = (await time_elements[0].inner_text()).strip()
data["arrival_time"] = (await time_elements[1].inner_text()).strip()
except:
pass
# Flight duration
try:
duration_element = await card.query_selector(".gvkrdb")
if duration_element:
data["duration"] = (await duration_element.inner_text()).strip()
except:
pass
# Number of stops
try:
stops_element = await card.query_selector(".EfT7Ae")
if stops_element:
stops_text = await stops_element.inner_text()
data["stops"] = 0 if "Nonstop" in stops_text else int(stops_text.split()[0])
except:
pass
# Airline name
try:
airline_element = await card.query_selector(".sSHqwe")
if airline_element:
data["airline"] = (await airline_element.inner_text()).strip()
except:
pass
# Emissions (optional)
try:
emissions_element = await card.query_selector(".V1iAHe")
if emissions_element:
data["emissions"] = (await emissions_element.inner_text()).strip()
except:
pass
return data

Managing multi-region searches and combining results

Flight prices often vary by region. You need to use proxy-based scraping to scrape the same route from multiple regions, so you discover pricing differences and hidden deals.

Why scrape from multiple regions

Airlines and travel platforms frequently adjust fares based on geographic demand.

Here are reasons for scraping flight prices from different regions:

  • You will discover region-specific promotions
  • You will understand currency and fare class differences across regions
  • You will uncover flight deals available only in specific markets

Let's see how to run multi-region Google Flights scraping.

Launching separate browser sessions per country

You must launch a new browser session per country through Decodo's geo-targeted residential proxies. Here's why:

  • Each browser session inherits the proxy's location
  • Google uses IP geolocation to determine prices
  • Reusing sessions reduces accuracy and increases detection risk

Implementation

Your configuration already includes a list of proxy countries.

Example:

proxy_countries = ["US", "GB", "JP"]

For each country:

  • Request a proxy endpoint
  • Launch a new browser session
  • Run the scraper
  • Store results

Example loop:

all_results = []
for country in config.proxy_countries:
proxy = proxy_manager.get_proxy(country)
scraper = GoogleFlightsScraper(config, proxy)
results = await scraper.run()
all_results.extend(results)

Add delays between regions:

await asyncio.sleep(random.uniform(5, 10))

Deduplication strategy

When scraping multiple regions, you may collect duplicate flights.

You need to define a unique flight identifier using: airline, departure time, and arrival time for duplication.

Use:

flight_id = f"{airline}_{departure_time}_{arrival_time}"

For example, the function:

airline = "KQ"
departure_time = "10:30"
arrival_time = "12:45"
flight_id = f"{airline}_{departure_time}_{arrival_time}"
print(flight_id)

Will produce the following results:

KQ_10:30_12:45

You can then remove duplicates while preserving price differences. Alternatively, you can keep all records and compare regional pricing later.

Combining results

After scraping all regions:

sorted_results = sorted(all_results, key=lambda x: x.price)

You can also calculate summary statistics:

  • Lowest price
  • Average price
  • Price range by airline

You will use these insights to identify the best deals across markets.

A complete async Python function for scraping flight data across multiple regions looks like this:

# main.py
import asyncio
import random
from collections import defaultdict
async def run_multi_region_search(config, proxy_manager):
"""
Runs Google Flights scraping from multiple regions and combines results.
"""
all_results = []
# --- Multi-region scraping ---
for country in config.proxy_countries:
print(f"Starting scrape for region: {country}")
proxy = proxy_manager.get_proxy(country)
scraper = GoogleFlightsScraper(config, proxy)
try:
results = await scraper.run()
# Attach region info to each result
for r in results:
r.region = country
all_results.extend(results)
except Exception as e:
print(f"Error scraping region {country}: {e}")
# Delay between regions to avoid rate limiting
await asyncio.sleep(random.uniform(5, 10))
print(f"Total raw results collected: {len(all_results)}")
# --- Deduplication ---
unique_flights = {}
regional_prices = defaultdict(list)
for flight in all_results:
flight_id = f"{flight.airline}_{flight.departure_time}_{flight.arrival_time}"
# Track regional prices
regional_prices[flight_id].append({
"region": flight.region,
"price": flight.price
})
# Keep the lowest price instance
if flight_id not in unique_flights:
unique_flights[flight_id] = flight
else:
if flight.price < unique_flights[flight_id].price:
unique_flights[flight_id] = flight
deduped_results = list(unique_flights.values())
print(f"Unique flights after deduplication: {len(deduped_results)}")
# --- Sorting results by price ---
sorted_results = sorted(deduped_results, key=lambda x: x.price)
# --- Summary statistics ---
prices = [flight.price for flight in deduped_results]
summary = {
"total_regions": len(config.proxy_countries),
"total_raw_results": len(all_results),
"unique_flights": len(deduped_results),
"lowest_price": min(prices) if prices else None,
"average_price": sum(prices) / len(prices) if prices else None,
"highest_price": max(prices) if prices else None,
}
# --- Airline price ranges ---
airline_prices = defaultdict(list)
for flight in deduped_results:
airline_prices[flight.airline].append(flight.price)
airline_summary = {
airline: {
"min": min(p),
"avg": sum(p) / len(p),
"max": max(p)
}
for airline, p in airline_prices.items()
}
return {
"flights": sorted_results,
"summary": summary,
"airline_price_stats": airline_summary,
"regional_prices": dict(regional_prices)
}

Saving scraped data to JSON and displaying results

You now have your scraped data. The next step is to store it somewhere useful. Your export format will vary depending on your use case.

Exporting to JSON

We used Pydantic models, so exporting data to JSON is straightforward because Pydantic provides built-in serialization methods.

You will need to add the functions below to main.py:

import json
from datetime import datetime
filename = f"flights_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
data = [flight.to_dict() for flight in results]
with open(filename, "w", encoding="utf-8") as f:
json.dump(data, f, indent=4, ensure_ascii=False)

Use timestamped filenames to track different scraping runs. You can also include metadata:

output = {
"search_config": config.model_dump(),
"scraped_at": datetime.utcnow().isoformat(),
"total_results": len(data),
"flights": data
}

Console output

We'll now print a formatted summary table showing key fields such as airline, times, stops, price, and region alongside overall statistics. We'll also display total flight count, lowest/highest price found, and handle the case where no flights are found to inform the user and skip file creation.

def display_results(results):
if not results:
print("No flights found.")
return
print(f"\n{"="*70}")
print(f"{"Airline":<20} {"Departs":<10} {"Arrives":<10} {"Stops":<8} {"Price":<10} {"Region"}")
print(f"{"="*70}")
for f in results:
print(
f"{f.get('airline', 'N/A'):<20} "
f"{f.get('departure_time', 'N/A'):<10} "
f"{f.get('arrival_time', 'N/A'):<10} "
f"{str(f.get('stops', 'N/A')):<8} "
f"{f.get('price', 'N/A'):<10} "
f"{f.get('proxy_country', 'N/A')}"
)
prices = [f.get("price", 0) for f in results if f.get("price")]
if prices:
print(f"\nTotal flights: {len(results)}")
print(f"Lowest price: {min(prices)}")
print(f"Highest price: {max(prices)}")

Extending to other formats

JSON is flexible, but you can save scraped data to the following formats, depending on your workflow.

  • CSV export for spreadsheet analysis
  • Database storage (PostgreSQL, SQLite) for historical fare tracking
  • Alerting systems — email or Slack notification when a price drops below a threshold

Putting it all together using main.py

The main.py file ties all modules together. It initializes the configuration, builds the proxy manager, runs the multi-region scraper, and saves the output:

# main.py
import asyncio
from models import SearchConfig, TripType
from proxy_manager import ProxyManager
from scraper import GoogleFlightsScraper
from datetime import date
import json
from datetime import datetime
async def main():
# 1. Define search parameters
config = SearchConfig(
origin="NBO",
destination="DXB",
departure_date=date(2025, 8, 15),
trip_type=TripType.ONE_WAY,
currency="USD",
proxy_countries=["US", "GB", "JP"]
)
# 2. Initialize the proxy manager
proxy_manager = ProxyManager()
# 3. Run multi-region scraping
all_results = []
for country in config.proxy_countries:
print(f"Scraping from region: {country}")
proxy = proxy_manager.get_proxy(country)
scraper = GoogleFlightsScraper(config, proxy)
try:
results = await scraper.run()
all_results.extend(results)
except Exception as e:
print(f"Failed for region {country}: {e}")
# 4. Save and display results
if all_results:
filename = f"flights_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
with open(filename, "w", encoding="utf-8") as f:
json.dump(
{
"metadata": {
"scraped_at": datetime.utcnow().isoformat(),
"total_results": len(all_results)
},
"flights": all_results
},
f, indent=4, ensure_ascii=False
)
print(f"Saved {len(all_results)} flights to {filename}")
else:
print("No flights found across all regions.")
if __name__ == "__main__":
asyncio.run(main())

Best practices and common pitfalls when scraping Google Flights

Scraping Google Flights can be challenging due to dynamic page structures and aggressive anti-bot systems. Here are the most important challenges and how to address them.

Anti-bot and detection avoidance

Google's anti-bot systems are among the most aggressive on the web. Without proper proxy rotation, expect blocks before you collect any useful data. Key recommendations:

Here is how to scrape Google without getting blocked:

  • Use rotating residential proxies instead of datacenter proxies
  • Randomize request timing, user agents, and viewport sizes between sessions
  • Avoid repetitive patterns. Do not scrape the same route hundreds of times in quick succession
  • For extremely challenging targets, consider Decodo Site Unblocker, which handles JavaScript rendering, CAPTCHAs, and fingerprinting automatically

Google Flights dynamic DOM

Google Flights frequently updates its HTML structure. CSS class names are obfuscated and can change between deployments. To build resilient scrapers:

  • Avoid relying solely on CSS class names. Use structural position and ARIA attributes where possible
  • Implement monitoring to detect when selectors break (e.g., scrape returns zero results unexpectedly)
  • Keep selectors in a separate configuration file, so updates are easy to track without touching core logic

Rate limiting and polite scraping

Sending too many requests too quickly will get you blocked. Here is how to avoid this:

  • Implement delays between requests – 3-8 seconds minimum for Google
  • Limit concurrent sessions to one or two per proxy pool
  • Scrape during off-peak hours when possible
  • Implement retry logic with exponential backoff for failed requests

Data quality

When scraping data from multiple regions, you will encounter duplicate and malformed records:

  • Validate extracted prices. Check for currency symbols and reasonable ranges
  • Handle flights with missing data points gracefully. Return None rather than crashing
  • Deduplicate results using the airline + departure time + arrival time key

Maintenance

Google Flights structure can change at any time. Plan for ongoing maintenance:

  • Update selectors regularly. Check at least monthly or whenever scrape volume drops unexpectedly
  • Version-control your selectors separately, so updates are easy to track
  • Consider using a web scraping API as a more stable long-term solution that handles DOM changes automatically

Final thoughts

You now have a complete Python scraper for Google Flights, capable of extracting prices, airlines, departure times, durations, and stop information across multiple geographic regions simultaneously.

The most important piece of infrastructure when scraping flight prices is the proxy layer. Without residential proxies, Google Flights will block automated requests long before you collect any useful data. 

Decodo's residential proxies give you real consumer IPs across 195+ locations, letting you not only bypass detection but also capture region-specific pricing differences that would otherwise be invisible.

Flight prices change constantly, and geographic variation in pricing is real and significant. Whether you're building a price comparison tool, monitoring fare trends for a specific route, or running competitive research, a well-built Google Flights scraper gives you access to data that's genuinely difficult to get any other way.

Get Google Flights data effortlessly

Let Decodo's Web Scraping API handle JavaScript rendering, CAPTCHAs, and fingerprint detection while you focus on your next holiday trip.

About the author

Kipras Kalzanauskas

Senior Account Manager

Kipras is a strategic account expert with a strong background in sales, IT support, and data-driven solutions. Born and raised in Vilnius, he studied history at Vilnius University before spending time in the Lithuanian Military. For the past 3.5 years, he has been a key player at Decodo, working with Fortune 500 companies in eCommerce and Market Intelligence.


Connect with Kipras on LinkedIn.

All information on Decodo Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Decodo Blog or any third-party websites that may belinked therein.

Frequently asked questions

Does Google offer an official API for flight data?

No. Google does not provide a public API for accessing flight data. To scrape flight prices, schedules, and airline details, you will need a web scraper or third-party travel APIs provided by airline data aggregators.

What are the best ways to collect Google Flights data?

The most reliable method is using a headless browser scraper such as Playwright or Puppeteer combined with rotating residential proxies. This setup allows scripts to render JavaScript-heavy pages, bypass anti-bot systems, and extract structured data like prices, schedules, and airline information.

Is it legal to scrape data from Google Flights?

Web scraping legality depends on how you collect and use the data. Scraping publicly available information is generally allowed in many jurisdictions, but you should always review the website's terms of service and ensure your scraping practices follow applicable laws and ethical guidelines.

How reliable is the flight information shown on Google Flights?

Google Flights aggregates data from airlines and travel partners, making it generally accurate for comparing routes and prices. However, fares can change rapidly due to airline pricing algorithms, seat availability, and regional factors, so scraped data should be treated as time-sensitive information.

How to Bypass Google CAPTCHA: Expert Scraping Guide 2026

Scraping Google can quickly turn frustrating when you're repeatedly met with CAPTCHA challenges. Google's CAPTCHA system is notoriously advanced, but it’s not impossible to avoid. In this guide, we’ll explain how to bypass Google CAPTCHA verification reliably, why steering clear of Selenium is critical, and what tools and techniques actually work in 2026.

How to Scrape Google Scholar With Python

Google Scholar is a free search engine for academic articles, books, and research papers. If you're gathering academic data for research, analysis, or application development, this blog post will give you a reliable foundation. In this guide, you'll learn how to scrape Google Scholar with Python, set up proxies to avoid IP bans, build a working scraper, and explore advanced tips for scaling your data collection.

How To Scrape Websites With Dynamic Content Using Python

You've mastered static HTML scraping, but now you're staring at a site where Requests + Beautiful Soup returns nothing but an empty <div> and <script> tags. Welcome to JavaScript-rendered content, where you get the material after the initial request. In this guide, we'll tackle dynamic sites using Python and Selenium (plus a Beautiful Soup alternative).

© 2018-2026 decodo.com (formerly smartproxy.com). All Rights Reserved