How to Scrape Google Finance

Google Finance is one of the most comprehensive financial data platforms, offering real-time stock prices, market analytics, and company insights. Scraping Google Finance provides access to valuable data streams that can transform your analysis capabilities. In this guide, we'll walk through building a robust Google Finance scraper using Python, handling anti-bot measures, and implementing best practices for reliable data extraction.

Dominykas Niaura

Jun 25, 2025

10 min read

Why scrape Google Finance

Google Finance contains a wealth of financial information that's continuously updated throughout trading hours. By automating data collection from this platform, you can unlock insights that would be time-consuming or impossible to gather manually.

Market research and analysis

Scraping Google Finance allows you to track stock performance, analyze market trends, and gather comparative data across multiple securities. This data can power investment research, help identify emerging opportunities, or support academic studies on market behavior.

Portfolio management and tracking

Automated data collection enables real-time portfolio monitoring, performance tracking, and alert systems. You can build custom dashboards that aggregate data from multiple holdings and provide insights that aren't available through standard brokerage interfaces.

Financial application development

Developers can integrate Google Finance data into custom applications, trading algorithms, or financial tools. This includes everything from simple price trackers to sophisticated analytical platforms that require fresh, accurate market data.

Competitive intelligence

For businesses in the financial sector, monitoring competitor stock performance, analyst ratings, and market sentiment provides valuable competitive insights that can inform strategic decisions.

What data you can scrape from Google Finance

Google Finance pages contain rich financial information across multiple categories, making it a comprehensive source for market data extraction. Here are some of the most valuable data points you can scrape from this platform:

Real-time stock data

Current stock prices, percentage changes, trading volume, and market capitalization provide the foundation for most financial analysis. You can also extract 52-week high/low ranges, previous close prices, and intraday trading patterns.

Company fundamentals

Corporate information including company names, ticker symbols, primary exchanges, and basic metrics like P/E (price-to-earnings) ratios. Some listings also include employee counts, headquarters locations, and founding dates.

Financial statements and metrics

Revenue figures, operating expenses, net income, earnings per share, and EBITDA (Earnings Before Interest, Taxes, Depreciation and Amortization) data from recent financial reports. You can also access profit margins, tax rates, and other key performance indicators.

Market sentiment indicators

Price movement trends, analyst ratings where available, and related news articles that can provide context for stock performance and market sentiment.

Tools and libraries for Google Finance scraping

Now that you know the power behind Google Finance data, let’s talk about the process of gathering it with a custom scraping code. Building an effective Google Finance scraper requires the right combination of Python libraries and supporting tools.

Core Python libraries

The Requests library handles HTTP communication, while Beautiful Soup parses HTML content and extracts specific data elements. For faster and more accurate HTML parsing, it's recommended to use the lxml parser with Beautiful Soup. Together, these tools form the foundation of most web scraping projects and are sufficient for Google Finance’s primarily static content.

Data processing tools

Pandas provides powerful data manipulation capabilities for organizing and analyzing scraped results. The csv module enables easy export to spreadsheet formats, while the json module can handle structured data storage.

Advanced browser automation

For complex scenarios requiring JavaScript execution, Selenium or Playwright can simulate full browser environments. However, Google Finance's data is largely accessible through standard HTTP requests, making these tools optional for most use cases.

Proxy and session management

Reliable proxy services are essential for sustained scraping operations. Google Finance implements rate limiting and bot detection, making IP rotation and proper session management crucial for consistent data collection.

Setting up your environment

Before building your scraper, ensure you have the necessary tools and credentials configured properly.

Install required packages

Start by installing the essential Python libraries for web scraping and data handling in your terminal:

pip install requests beautifulsoup4 lxml pandas

Configure proxy access

For reliable scraping, you'll need access to quality proxies. At Decodo, we offer residential proxies with a 99.86% success rate and response times under 0.6 seconds (the best in the market). Here's how to get started:

Create an account on the Decodo dashboard.
Navigate to Residential proxies and select a plan.
In the Proxy setup tab, configure your location and session preferences.
Copy your credentials for integration into your scraping script.

Get residential proxy IPs

Claim your 3-day free trial of residential proxies and explore full features with unrestricted access.

Start now

Prepare your development environment

Set up a Python development environment using your preferred IDE or text editor. Having browser developer tools available will help you inspect Google Finance pages and identify the correct elements to target.

Step-by-step Google Finance scraping tutorial

Let's build a comprehensive scraper that can extract detailed financial data from Google Finance pages. We’ll break this down into components, explaining each step and why it's necessary.

1. Import libraries and build the scraper class

The first block sets up all necessary imports and creates a scraper class. These libraries handle HTTP requests, HTML parsing, CSV export, and more. When initializing the GoogleFinanceScraper, you’ll need to input your proxy credentials (YOUR_USERNAME, YOUR_PASSWORD). This enables access through Decodo’s proxy gateway.

The session is configured with browser-like headers to reduce the chance of detection, and proxy setup is built-in for both HTTP and HTTPS requests.

import requests
from bs4 import BeautifulSoup
import csv
import re
from datetime import datetime
import time
import random

class GoogleFinanceScraper:
    def __init__(self, proxy_username="YOUR_USERNAME", proxy_password="YOUR_PASSWORD"):
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0'
        })
        self.session.proxies.update({
            'http': f'http://{proxy_username}:{proxy_password}@gate.decodo.com:7000',
            'https': f'http://{proxy_username}:{proxy_password}@gate.decodo.com:7000'
        })

import requests
from bs4 import BeautifulSoup
import csv
import re
from datetime import datetime
import time
import random

class GoogleFinanceScraper:
    def __init__(self, proxy_username="YOUR_USERNAME", proxy_password="YOUR_PASSWORD"):
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0'
        })
        self.session.proxies.update({
            'http': f'http://{proxy_username}:{proxy_password}@gate.decodo.com:7000',
            'https': f'http://{proxy_username}:{proxy_password}@gate.decodo.com:7000'
        })

2. Page fetching with error handling

Next, the get_page_content() method loads the target URL and includes error handling to raise alerts if the page fails to load. It also mimics human behavior by adding a small randomized delay between requests. This helps avoid getting flagged by Google’s anti-bot systems.

def get_page_content(self, url):
    """Fetch page content"""
    print(f"Fetching data from: {url}")
    response = self.session.get(url, timeout=30)
    response.raise_for_status()
    time.sleep(random.uniform(1, 3))
    return response.text

3. Data cleaning and standardization

The clean_number() method helps standardize various formats of numbers, dates, and symbols you might encounter on a Google Finance page. It ensures values like percentages, financial multipliers (M, B, T), and currencies are handled correctly for further processing or export.

def clean_number(self, text):
    """Clean and format numbers"""
    if not text or text == "N/A" or text == "-":
        return "N/A"
    
    text = text.strip()
    
    # Handle percentages
    if '%' in text:
        match = re.search(r'([+-]?\d+\.?\d*)%', text)
        return match.group(1) + '%' if match else text
    
    # Handle dates
    if re.search(r'[A-Za-z]{3}\s+\d{1,2},?\s+\d{4}', text):
        return text
    
    # Handle ranges and addresses
    if any(x in text for x in [' - ', ' to ', 'street', 'avenue', 'city']):
        return text
    
    # Keep original format for financial numbers with multipliers (B, M, K, T)
    if re.search(r'\d+\.?\d*[BMKT]', text.upper()):
        return text
    
    # Handle currency symbols for price data
    currency = re.search(r'[\$€£¥₹]', text)
    if currency:
        return text  # Keep original format for prices
    
    # Handle regular numbers with commas (for employee counts, etc.)
    number_match = re.search(r'([+-]?\d{1,3}(?:,\d{3})*(?:\.\d+)?)', text)
    if number_match:
        number_str = number_match.group(1).replace(',', '')
        try:
            if '.' in number_str:
                number = float(number_str)
                return f"{number:,.2f}"
            else:
                number = int(number_str)
                return f"{number:,}"
        except:
            pass
    
    return text

def clean_number(self, text):
    """Clean and format numbers"""
    if not text or text == "N/A" or text == "-":
        return "N/A"
    
    text = text.strip()
    
    # Handle percentages
    if '%' in text:
        match = re.search(r'([+-]?\d+\.?\d*)%', text)
        return match.group(1) + '%' if match else text
    
    # Handle dates
    if re.search(r'[A-Za-z]{3}\s+\d{1,2},?\s+\d{4}', text):
        return text
    
    # Handle ranges and addresses
    if any(x in text for x in [' - ', ' to ', 'street', 'avenue', 'city']):
        return text
    
    # Keep original format for financial numbers with multipliers (B, M, K, T)
    if re.search(r'\d+\.?\d*[BMKT]', text.upper()):
        return text
    
    # Handle currency symbols for price data
    currency = re.search(r'[\$€£¥₹]', text)
    if currency:
        return text  # Keep original format for prices
    
    # Handle regular numbers with commas (for employee counts, etc.)
    number_match = re.search(r'([+-]?\d{1,3}(?:,\d{3})*(?:\.\d+)?)', text)
    if number_match:
        number_str = number_match.group(1).replace(',', '')
        try:
            if '.' in number_str:
                number = float(number_str)
                return f"{number:,.2f}"
            else:
                number = int(number_str)
                return f"{number:,}"
        except:
            pass
    
    return text

4. CSS selector and contextual extraction

Two helper methods (extract_by_selector and find_p6k_value_by_context) offer flexibility when pulling data from the HTML:

Use the CSS selector method when you know the structure.
Use the context-based method when the page layout is dynamic – it looks for nearby keywords to locate the right value.

This dual approach is helpful because Google Finance’s layout isn’t always consistent.

def extract_by_selector(self, soup, selector):
    """Extract text using CSS selector"""
    element = soup.select_one(selector)
    return element.get_text(strip=True) if element else "N/A"

def find_p6k_value_by_context(self, soup, keywords):
    """Find P6K39c value by searching for nearby keywords"""
    for keyword in keywords:
        labels = soup.find_all(string=re.compile(keyword, re.IGNORECASE))
        for label in labels:
            parent = label.parent
            while parent and parent.name != 'body':
                value_element = parent.find('div', class_='P6K39c')
                if value_element:
                    return value_element.get_text(strip=True)
                parent = parent.parent
    return "N/A"

def extract_by_selector(self, soup, selector):
    """Extract text using CSS selector"""
    element = soup.select_one(selector)
    return element.get_text(strip=True) if element else "N/A"

def find_p6k_value_by_context(self, soup, keywords):
    """Find P6K39c value by searching for nearby keywords"""
    for keyword in keywords:
        labels = soup.find_all(string=re.compile(keyword, re.IGNORECASE))
        for label in labels:
            parent = label.parent
            while parent and parent.name != 'body':
                value_element = parent.find('div', class_='P6K39c')
                if value_element:
                    return value_element.get_text(strip=True)
                parent = parent.parent
    return "N/A"

5. Financial data extraction

extract_financial_data() zeroes in on core financial figures like revenue, net income, and EPS. It scans the table rows using keyword matching to locate the right cell, and then formats each value using the cleaner from earlier. This method handles different financial currencies and units as they appear on the site.

def extract_financial_data(self, soup):
    """Extract financial data from QXDnM elements"""
    # Get financial currency from the correct selector
    financial_currency = self.extract_by_selector(soup, 'th.yNnsfe.PFjsMe')
    
    # Define financial fields and their keywords
    financial_fields = {
        'revenue': ['Revenue', 'Total revenue'],
        'operating_expense': ['Operating expense', 'Operating expenses'],
        'net_income': ['Net income', 'Net earnings'],
        'net_profit_margin': ['Net profit margin', 'Profit margin'],
        'earnings_per_share': ['Earnings per share', 'EPS'],
        'ebitda': ['EBITDA'],
        'effective_tax_rate': ['Effective tax rate', 'Tax rate']
    }
    
    financial_data = {'financial_currency': financial_currency}
    
    # Extract using table row context
    for field, keywords in financial_fields.items():
        value = "N/A"
        for keyword in keywords:
            rows = soup.find_all('tr')
            for row in rows:
                if keyword.lower() in row.get_text().lower():
                    qxdnm_cell = row.find('td', class_='QXDnM')
                    if qxdnm_cell:
                        value = qxdnm_cell.get_text(strip=True)
                        break
            if value != "N/A":
                break
        financial_data[field] = self.clean_number(value)
    
    return financial_data

def extract_financial_data(self, soup):
    """Extract financial data from QXDnM elements"""
    # Get financial currency from the correct selector
    financial_currency = self.extract_by_selector(soup, 'th.yNnsfe.PFjsMe')
    
    # Define financial fields and their keywords
    financial_fields = {
        'revenue': ['Revenue', 'Total revenue'],
        'operating_expense': ['Operating expense', 'Operating expenses'],
        'net_income': ['Net income', 'Net earnings'],
        'net_profit_margin': ['Net profit margin', 'Profit margin'],
        'earnings_per_share': ['Earnings per share', 'EPS'],
        'ebitda': ['EBITDA'],
        'effective_tax_rate': ['Effective tax rate', 'Tax rate']
    }
    
    financial_data = {'financial_currency': financial_currency}
    
    # Extract using table row context
    for field, keywords in financial_fields.items():
        value = "N/A"
        for keyword in keywords:
            rows = soup.find_all('tr')
            for row in rows:
                if keyword.lower() in row.get_text().lower():
                    qxdnm_cell = row.find('td', class_='QXDnM')
                    if qxdnm_cell:
                        value = qxdnm_cell.get_text(strip=True)
                        break
            if value != "N/A":
                break
        financial_data[field] = self.clean_number(value)
    
    return financial_data

6. Main scraping function

The scrape_google_finance() method is where everything comes together. It:

Fetches and parses the HTML
Grabs the company name, price, market metrics, and other essentials
Extracts info like CEO, HQ, and website
Uses a fallback approach if some data isn’t found initially
Merges in financial statement data

This is the method you’ll call when scraping a specific company. Be sure to replace the placeholder URL with your desired target.

def scrape_google_finance(self, url):
    """Main scraping function"""
    try:
        html_content = self.get_page_content(url)
        soup = BeautifulSoup(html_content, 'html.parser')
        
        data = {
            'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
            'url': url
        }
        
        # Extract basic company and price data
        data['company_title'] = self.extract_by_selector(soup, 'div[role="heading"][aria-level="1"].zzDege')
        data['current_price'] = self.extract_by_selector(soup, 'div.YMlKec.fxKbKc')
        
        # Extract change percentage (handle SVG)
        change_element = soup.select_one('div.JwB6zf')
        if change_element:
            change_text = change_element.get_text(strip=True)
            data['change_percentage'] = re.sub(r'[^\d.%+-]', '', change_text)
        else:
            data['change_percentage'] = "N/A"
        
        # Extract P6K39c data using positional and contextual methods
        p6k_elements = soup.find_all('div', class_='P6K39c')
        
        # Try contextual extraction first
        p6k_fields = {
            'previous_close': ['Previous close'],
            'year_range': ['52 week range', 'Year range'],
            'market_cap': ['Market cap'],
            'avg_volume': ['Avg volume'],
            'pe_ratio': ['P/E ratio'],
            'primary_exchange': ['Primary exchange'],
            'employees': ['Employees']
        }
        
        for field, keywords in p6k_fields.items():
            data[field] = self.clean_number(self.find_p6k_value_by_context(soup, keywords))
        
        # Fallback to positional extraction if needed
        if data.get('previous_close') == "N/A" and len(p6k_elements) >= 6:
            field_order = ['previous_close', 'year_range', 'market_cap', 
                          'avg_volume', 'pe_ratio', 'primary_exchange']
            for i, field in enumerate(field_order):
                if i < len(p6k_elements):
                    data[field] = self.clean_number(p6k_elements[i].get_text(strip=True))
        
        # Extract founded date from P6K39c elements
        data['founded'] = "N/A"
        for element in p6k_elements:
            text = element.get_text(strip=True)
            if re.search(r'[A-Za-z]{3}\s+\d{1,2},?\s+\d{4}', text):
                data['founded'] = text
                break
        
        # Extract CEO, headquarters, and website from links
        links = soup.find_all('a', class_='tBHE4e')
        
        data['ceo'] = "N/A"
        data['headquarters'] = "N/A"
        data['website'] = "N/A"
        
        potential_websites = []
        
        for link in links:
            href = link.get('href', '')
            text = link.get_text(strip=True)
            rel = link.get('rel', [])
            
            # CEO (search links with person names)
            if data['ceo'] == "N/A" and 'search?q=' in href and len(text.split()) == 2:
                data['ceo'] = text
            
            # Headquarters (maps links)
            elif data['headquarters'] == "N/A" and 'maps/place/' in href:
                data['headquarters'] = link.get_text(separator=' ', strip=True)
            
            # Collect potential website links
            elif (href.startswith('http') and 
                  'google.com' not in href and 
                  'maps' not in href and 
                  'wikipedia' not in href.lower() and
                  'noopener' in rel and 
                  'noreferrer' in rel):
                potential_websites.append((href, text))
        
        # Choose the best website from potential candidates
        if potential_websites:
            # Prefer company domain websites (containing company name or common patterns)
            company_keywords = ['amazon', 'about', 'corp', 'company', '.com', 'www']
            
            for href, text in potential_websites:
                if any(keyword in href.lower() or keyword in text.lower() for keyword in company_keywords):
                    data['website'] = href
                    break
            
            # If no company-specific website found, take the first one
            if data['website'] == "N/A":
                data['website'] = potential_websites[0][0]
        
        # Extract financial data
        financial_data = self.extract_financial_data(soup)
        data.update(financial_data)
        
        return data
        
    except Exception as e:
        print(f"Error scraping data: {e}")
        return None

def scrape_google_finance(self, url):
    """Main scraping function"""
    try:
        html_content = self.get_page_content(url)
        soup = BeautifulSoup(html_content, 'html.parser')
        
        data = {
            'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
            'url': url
        }
        
        # Extract basic company and price data
        data['company_title'] = self.extract_by_selector(soup, 'div[role="heading"][aria-level="1"].zzDege')
        data['current_price'] = self.extract_by_selector(soup, 'div.YMlKec.fxKbKc')
        
        # Extract change percentage (handle SVG)
        change_element = soup.select_one('div.JwB6zf')
        if change_element:
            change_text = change_element.get_text(strip=True)
            data['change_percentage'] = re.sub(r'[^\d.%+-]', '', change_text)
        else:
            data['change_percentage'] = "N/A"
        
        # Extract P6K39c data using positional and contextual methods
        p6k_elements = soup.find_all('div', class_='P6K39c')
        
        # Try contextual extraction first
        p6k_fields = {
            'previous_close': ['Previous close'],
            'year_range': ['52 week range', 'Year range'],
            'market_cap': ['Market cap'],
            'avg_volume': ['Avg volume'],
            'pe_ratio': ['P/E ratio'],
            'primary_exchange': ['Primary exchange'],
            'employees': ['Employees']
        }
        
        for field, keywords in p6k_fields.items():
            data[field] = self.clean_number(self.find_p6k_value_by_context(soup, keywords))
        
        # Fallback to positional extraction if needed
        if data.get('previous_close') == "N/A" and len(p6k_elements) >= 6:
            field_order = ['previous_close', 'year_range', 'market_cap', 
                          'avg_volume', 'pe_ratio', 'primary_exchange']
            for i, field in enumerate(field_order):
                if i < len(p6k_elements):
                    data[field] = self.clean_number(p6k_elements[i].get_text(strip=True))
        
        # Extract founded date from P6K39c elements
        data['founded'] = "N/A"
        for element in p6k_elements:
            text = element.get_text(strip=True)
            if re.search(r'[A-Za-z]{3}\s+\d{1,2},?\s+\d{4}', text):
                data['founded'] = text
                break
        
        # Extract CEO, headquarters, and website from links
        links = soup.find_all('a', class_='tBHE4e')
        
        data['ceo'] = "N/A"
        data['headquarters'] = "N/A"
        data['website'] = "N/A"
        
        potential_websites = []
        
        for link in links:
            href = link.get('href', '')
            text = link.get_text(strip=True)
            rel = link.get('rel', [])
            
            # CEO (search links with person names)
            if data['ceo'] == "N/A" and 'search?q=' in href and len(text.split()) == 2:
                data['ceo'] = text
            
            # Headquarters (maps links)
            elif data['headquarters'] == "N/A" and 'maps/place/' in href:
                data['headquarters'] = link.get_text(separator=' ', strip=True)
            
            # Collect potential website links
            elif (href.startswith('http') and 
                  'google.com' not in href and 
                  'maps' not in href and 
                  'wikipedia' not in href.lower() and
                  'noopener' in rel and 
                  'noreferrer' in rel):
                potential_websites.append((href, text))
        
        # Choose the best website from potential candidates
        if potential_websites:
            # Prefer company domain websites (containing company name or common patterns)
            company_keywords = ['amazon', 'about', 'corp', 'company', '.com', 'www']
            
            for href, text in potential_websites:
                if any(keyword in href.lower() or keyword in text.lower() for keyword in company_keywords):
                    data['website'] = href
                    break
            
            # If no company-specific website found, take the first one
            if data['website'] == "N/A":
                data['website'] = potential_websites[0][0]
        
        # Extract financial data
        financial_data = self.extract_financial_data(soup)
        data.update(financial_data)
        
        return data
        
    except Exception as e:
        print(f"Error scraping data: {e}")
        return None

7. Data display and export functions

Once data is collected, the print_data method organizes it into logical sections for clarity. It’s a useful way to preview what was extracted.

The save_to_csv() method outputs the data to a CSV file. You can either save one company per file or extend it for batch mode.

def print_data(self, data):
    """Print formatted results"""
    print("\n" + "="*70)
    print("GOOGLE FINANCE DATA EXTRACTION RESULTS")
    print("="*70)
    
    sections = {
        'COMPANY INFORMATION': ['company_title', 'ceo', 'founded', 'headquarters', 'website', 'employees', 'primary_exchange'],
        'STOCK PRICE DATA': ['current_price', 'change_percentage', 'previous_close', 'year_range'],
        'MARKET STATISTICS': ['market_cap', 'avg_volume', 'pe_ratio'],
        f"FINANCIAL DATA {data.get('financial_currency', '')}".strip(): ['revenue', 'operating_expense', 'net_income', 'net_profit_margin', 'earnings_per_share', 'ebitda', 'effective_tax_rate'],
        'EXTRACTION METADATA': ['timestamp', 'url']
    }
    
    for section_name, fields in sections.items():
        print(f"\n{section_name}")
        print("-" * 40)
        for field in fields:
            if field in data:
                formatted_key = field.replace('_', ' ').title()
                print(f"{formatted_key:<20}: {data[field]}")
    
    print("="*70)

def save_to_csv(self, data, filename):
    """Save data to CSV"""
    try:
        with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
            writer = csv.DictWriter(csvfile, fieldnames=list(data.keys()))
            writer.writeheader()
            writer.writerow(data)
        print(f"Data saved to {filename}")
    except Exception as e:
        print(f"Error saving to CSV: {e}")

def print_data(self, data):
    """Print formatted results"""
    print("\n" + "="*70)
    print("GOOGLE FINANCE DATA EXTRACTION RESULTS")
    print("="*70)
    
    sections = {
        'COMPANY INFORMATION': ['company_title', 'ceo', 'founded', 'headquarters', 'website', 'employees', 'primary_exchange'],
        'STOCK PRICE DATA': ['current_price', 'change_percentage', 'previous_close', 'year_range'],
        'MARKET STATISTICS': ['market_cap', 'avg_volume', 'pe_ratio'],
        f"FINANCIAL DATA {data.get('financial_currency', '')}".strip(): ['revenue', 'operating_expense', 'net_income', 'net_profit_margin', 'earnings_per_share', 'ebitda', 'effective_tax_rate'],
        'EXTRACTION METADATA': ['timestamp', 'url']
    }
    
    for section_name, fields in sections.items():
        print(f"\n{section_name}")
        print("-" * 40)
        for field in fields:
            if field in data:
                formatted_key = field.replace('_', ' ').title()
                print(f"{formatted_key:<20}: {data[field]}")
    
    print("="*70)

def save_to_csv(self, data, filename):
    """Save data to CSV"""
    try:
        with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
            writer = csv.DictWriter(csvfile, fieldnames=list(data.keys()))
            writer.writeheader()
            writer.writerow(data)
        print(f"Data saved to {filename}")
    except Exception as e:
        print(f"Error saving to CSV: {e}")

8. Main execution function

In the main() function, you can either scrape a single company or loop through a list of company URLs. Rate limiting is included for batch mode to help avoid getting blocked.

The example here scrapes Apple’s Google Finance page. Swap in any other stock page by updating the URL. For multiple companies, just uncomment the list and loop sections.

def main():
    """Main execution function"""
    # Single URL scraping
    url = "https://www.google.com/finance/quote/AAPL:NASDAQ?hl=en"
    
    # For multiple companies, replace the single URL with a list:
    # urls = [
    #     "https://www.google.com/finance/quote/AMZN:NASDAQ?hl=en",
    #     "https://www.google.com/finance/quote/AAPL:NASDAQ?hl=en", 
    #     "https://www.google.com/finance/quote/GOOGL:NASDAQ?hl=en",
    #     "https://www.google.com/finance/quote/MSFT:NASDAQ?hl=en"
    # ]
    
    scraper = GoogleFinanceScraper()
    
    print("Starting Google Finance scraping...")
    
    try:
        # Single company scraping
        data = scraper.scrape_google_finance(url)
        
        if data:
            scraper.print_data(data)
            filename = f"amazon_finance_data_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
            scraper.save_to_csv(data, filename)
        else:
            print("Failed to extract data")
        
        # For multiple companies, use this loop instead:
        # all_data = []
        # for i, url in enumerate(urls):
        #     print(f"\nScraping company {i+1}/{len(urls)}")
        #     data = scraper.scrape_google_finance(url)
        #     if data:
        #         all_data.append(data)
        #         scraper.print_data(data)
        #         time.sleep(random.uniform(3, 6))  # Delay between requests
        # 
        # # Save all data to one CSV
        # if all_data:
        #     filename = f"multiple_companies_data_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
        #     with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
        #         writer = csv.DictWriter(csvfile, fieldnames=list(all_data[0].keys()))
        #         writer.writeheader()
        #         writer.writerows(all_data)
        #     print(f"All company data saved to {filename}")
            
    except Exception as e:
        print(f"Script execution failed: {e}")

if __name__ == "__main__":
    main()

def main():
    """Main execution function"""
    # Single URL scraping
    url = "https://www.google.com/finance/quote/AAPL:NASDAQ?hl=en"
    
    # For multiple companies, replace the single URL with a list:
    # urls = [
    #     "https://www.google.com/finance/quote/AMZN:NASDAQ?hl=en",
    #     "https://www.google.com/finance/quote/AAPL:NASDAQ?hl=en", 
    #     "https://www.google.com/finance/quote/GOOGL:NASDAQ?hl=en",
    #     "https://www.google.com/finance/quote/MSFT:NASDAQ?hl=en"
    # ]
    
    scraper = GoogleFinanceScraper()
    
    print("Starting Google Finance scraping...")
    
    try:
        # Single company scraping
        data = scraper.scrape_google_finance(url)
        
        if data:
            scraper.print_data(data)
            filename = f"amazon_finance_data_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
            scraper.save_to_csv(data, filename)
        else:
            print("Failed to extract data")
        
        # For multiple companies, use this loop instead:
        # all_data = []
        # for i, url in enumerate(urls):
        #     print(f"\nScraping company {i+1}/{len(urls)}")
        #     data = scraper.scrape_google_finance(url)
        #     if data:
        #         all_data.append(data)
        #         scraper.print_data(data)
        #         time.sleep(random.uniform(3, 6))  # Delay between requests
        # 
        # # Save all data to one CSV
        # if all_data:
        #     filename = f"multiple_companies_data_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
        #     with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
        #         writer = csv.DictWriter(csvfile, fieldnames=list(all_data[0].keys()))
        #         writer.writeheader()
        #         writer.writerows(all_data)
        #     print(f"All company data saved to {filename}")
            
    except Exception as e:
        print(f"Script execution failed: {e}")

if __name__ == "__main__":
    main()

The complete Google Finance scraping code

Here's the full implementation that brings together all the components we've discussed:

import requests
from bs4 import BeautifulSoup
import csv
import re
from datetime import datetime
import time
import random

class GoogleFinanceScraper:
    def __init__(self, proxy_username="YOUR_USERNAME", proxy_password="YOUR_PASSWORD"):
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0'
        })
        self.session.proxies.update({
            'http': f'http://{proxy_username}:{proxy_password}@gate.decodo.com:7000',
            'https': f'http://{proxy_username}:{proxy_password}@gate.decodo.com:7000'
        })
    
    def get_page_content(self, url):
        """Fetch page content"""
        print(f"Fetching data from: {url}")
        response = self.session.get(url, timeout=30)
        response.raise_for_status()
        time.sleep(random.uniform(1, 3))
        return response.text
    
    def clean_number(self, text):
        """Clean and format numbers"""
        if not text or text == "N/A" or text == "-":
            return "N/A"
        
        text = text.strip()
        
        # Handle percentages
        if '%' in text:
            match = re.search(r'([+-]?\d+\.?\d*)%', text)
            return match.group(1) + '%' if match else text
        
        # Handle dates
        if re.search(r'[A-Za-z]{3}\s+\d{1,2},?\s+\d{4}', text):
            return text
        
        # Handle ranges and addresses
        if any(x in text for x in [' - ', ' to ', 'street', 'avenue', 'city']):
            return text
        
        # Keep original format for financial numbers with multipliers (B, M, K, T)
        if re.search(r'\d+\.?\d*[BMKT]', text.upper()):
            return text
        
        # Handle currency symbols for price data
        currency = re.search(r'[\$€£¥₹]', text)
        if currency:
            return text  # Keep original format for prices
        
        # Handle regular numbers with commas (for employee counts, etc.)
        number_match = re.search(r'([+-]?\d{1,3}(?:,\d{3})*(?:\.\d+)?)', text)
        if number_match:
            number_str = number_match.group(1).replace(',', '')
            try:
                if '.' in number_str:
                    number = float(number_str)
                    return f"{number:,.2f}"
                else:
                    number = int(number_str)
                    return f"{number:,}"
            except:
                pass
        
        return text
    
    def extract_by_selector(self, soup, selector):
        """Extract text using CSS selector"""
        element = soup.select_one(selector)
        return element.get_text(strip=True) if element else "N/A"
    
    def find_p6k_value_by_context(self, soup, keywords):
        """Find P6K39c value by searching for nearby keywords"""
        for keyword in keywords:
            labels = soup.find_all(string=re.compile(keyword, re.IGNORECASE))
            for label in labels:
                parent = label.parent
                while parent and parent.name != 'body':
                    value_element = parent.find('div', class_='P6K39c')
                    if value_element:
                        return value_element.get_text(strip=True)
                    parent = parent.parent
        return "N/A"
    
    def extract_financial_data(self, soup):
        """Extract financial data from QXDnM elements"""
        # Get financial currency from the correct selector
        financial_currency = self.extract_by_selector(soup, 'th.yNnsfe.PFjsMe')
        
        # Define financial fields and their keywords
        financial_fields = {
            'revenue': ['Revenue', 'Total revenue'],
            'operating_expense': ['Operating expense', 'Operating expenses'],
            'net_income': ['Net income', 'Net earnings'],
            'net_profit_margin': ['Net profit margin', 'Profit margin'],
            'earnings_per_share': ['Earnings per share', 'EPS'],
            'ebitda': ['EBITDA'],
            'effective_tax_rate': ['Effective tax rate', 'Tax rate']
        }
        
        financial_data = {'financial_currency': financial_currency}
        
        # Extract using table row context
        for field, keywords in financial_fields.items():
            value = "N/A"
            for keyword in keywords:
                rows = soup.find_all('tr')
                for row in rows:
                    if keyword.lower() in row.get_text().lower():
                        qxdnm_cell = row.find('td', class_='QXDnM')
                        if qxdnm_cell:
                            value = qxdnm_cell.get_text(strip=True)
                            break
                if value != "N/A":
                    break
            financial_data[field] = self.clean_number(value)
        
        return financial_data
    
    def scrape_google_finance(self, url):
        """Main scraping function"""
        try:
            html_content = self.get_page_content(url)
            soup = BeautifulSoup(html_content, 'html.parser')
            
            data = {
                'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
                'url': url
            }
            
            # Extract basic company and price data
            data['company_title'] = self.extract_by_selector(soup, 'div[role="heading"][aria-level="1"].zzDege')
            data['current_price'] = self.extract_by_selector(soup, 'div.YMlKec.fxKbKc')
            
            # Extract change percentage (handle SVG)
            change_element = soup.select_one('div.JwB6zf')
            if change_element:
                change_text = change_element.get_text(strip=True)
                data['change_percentage'] = re.sub(r'[^\d.%+-]', '', change_text)
            else:
                data['change_percentage'] = "N/A"
            
            # Extract P6K39c data using positional and contextual methods
            p6k_elements = soup.find_all('div', class_='P6K39c')
            
            # Try contextual extraction first
            p6k_fields = {
                'previous_close': ['Previous close'],
                'year_range': ['52 week range', 'Year range'],
                'market_cap': ['Market cap'],
                'avg_volume': ['Avg volume'],
                'pe_ratio': ['P/E ratio'],
                'primary_exchange': ['Primary exchange'],
                'employees': ['Employees']
            }
            
            for field, keywords in p6k_fields.items():
                data[field] = self.clean_number(self.find_p6k_value_by_context(soup, keywords))
            
            # Fallback to positional extraction if needed
            if data.get('previous_close') == "N/A" and len(p6k_elements) >= 6:
                field_order = ['previous_close', 'year_range', 'market_cap', 
                              'avg_volume', 'pe_ratio', 'primary_exchange']
                for i, field in enumerate(field_order):
                    if i < len(p6k_elements):
                        data[field] = self.clean_number(p6k_elements[i].get_text(strip=True))
            
            # Extract founded date from P6K39c elements
            data['founded'] = "N/A"
            for element in p6k_elements:
                text = element.get_text(strip=True)
                if re.search(r'[A-Za-z]{3}\s+\d{1,2},?\s+\d{4}', text):
                    data['founded'] = text
                    break
            
            # Extract CEO, headquarters, and website from links
            links = soup.find_all('a', class_='tBHE4e')
            
            data['ceo'] = "N/A"
            data['headquarters'] = "N/A"
            data['website'] = "N/A"
            
            potential_websites = []
            
            for link in links:
                href = link.get('href', '')
                text = link.get_text(strip=True)
                rel = link.get('rel', [])
                
                # CEO (search links with person names)
                if data['ceo'] == "N/A" and 'search?q=' in href and len(text.split()) == 2:
                    data['ceo'] = text
                
                # Headquarters (maps links)
                elif data['headquarters'] == "N/A" and 'maps/place/' in href:
                    data['headquarters'] = link.get_text(separator=' ', strip=True)
                
                # Collect potential website links
                elif (href.startswith('http') and 
                      'google.com' not in href and 
                      'maps' not in href and 
                      'wikipedia' not in href.lower() and
                      'noopener' in rel and 
                      'noreferrer' in rel):
                    potential_websites.append((href, text))
            
            # Choose the best website from potential candidates
            if potential_websites:
                # Prefer company domain websites (containing company name or common patterns)
                company_keywords = ['amazon', 'about', 'corp', 'company', '.com', 'www']
                
                for href, text in potential_websites:
                    if any(keyword in href.lower() or keyword in text.lower() for keyword in company_keywords):
                        data['website'] = href
                        break
                
                # If no company-specific website found, take the first one
                if data['website'] == "N/A":
                    data['website'] = potential_websites[0][0]
            
            # Extract financial data
            financial_data = self.extract_financial_data(soup)
            data.update(financial_data)
            
            return data
            
        except Exception as e:
            print(f"Error scraping data: {e}")
            return None
    
    def print_data(self, data):
        """Print formatted results"""
        print("\n" + "="*70)
        print("GOOGLE FINANCE DATA EXTRACTION RESULTS")
        print("="*70)
        
        sections = {
            'COMPANY INFORMATION': ['company_title', 'ceo', 'founded', 'headquarters', 'website', 'employees', 'primary_exchange'],
            'STOCK PRICE DATA': ['current_price', 'change_percentage', 'previous_close', 'year_range'],
            'MARKET STATISTICS': ['market_cap', 'avg_volume', 'pe_ratio'],
            f"FINANCIAL DATA {data.get('financial_currency', '')}".strip(): ['revenue', 'operating_expense', 'net_income', 'net_profit_margin', 'earnings_per_share', 'ebitda', 'effective_tax_rate'],
            'EXTRACTION METADATA': ['timestamp', 'url']
        }
        
        for section_name, fields in sections.items():
            print(f"\n{section_name}")
            print("-" * 40)
            for field in fields:
                if field in data:
                    formatted_key = field.replace('_', ' ').title()
                    print(f"{formatted_key:<20}: {data[field]}")
        
        print("="*70)
    
    def save_to_csv(self, data, filename):
        """Save data to CSV"""
        try:
            with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
                writer = csv.DictWriter(csvfile, fieldnames=list(data.keys()))
                writer.writeheader()
                writer.writerow(data)
            print(f"Data saved to {filename}")
        except Exception as e:
            print(f"Error saving to CSV: {e}")

def main():
    """Main execution function"""
    # Single URL scraping
    url = "https://www.google.com/finance/quote/AAPL:NASDAQ?hl=en"
    
    # For multiple companies, replace the single URL with a list:
    # urls = [
    #     "https://www.google.com/finance/quote/AMZN:NASDAQ?hl=en",
    #     "https://www.google.com/finance/quote/AAPL:NASDAQ?hl=en", 
    #     "https://www.google.com/finance/quote/GOOGL:NASDAQ?hl=en",
    #     "https://www.google.com/finance/quote/MSFT:NASDAQ?hl=en"
    # ]
    
    scraper = GoogleFinanceScraper()
    
    print("Starting Google Finance scraping...")
    
    try:
        # Single company scraping
        data = scraper.scrape_google_finance(url)
        
        if data:
            scraper.print_data(data)
            filename = f"amazon_finance_data_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
            scraper.save_to_csv(data, filename)
        else:
            print("Failed to extract data")
        
        # For multiple companies, use this loop instead:
        # all_data = []
        # for i, url in enumerate(urls):
        #     print(f"\nScraping company {i+1}/{len(urls)}")
        #     data = scraper.scrape_google_finance(url)
        #     if data:
        #         all_data.append(data)
        #         scraper.print_data(data)
        #         time.sleep(random.uniform(3, 6))  # Delay between requests
        # 
        # # Save all data to one CSV
        # if all_data:
        #     filename = f"multiple_companies_data_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
        #     with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
        #         writer = csv.DictWriter(csvfile, fieldnames=list(all_data[0].keys()))
        #         writer.writeheader()
        #         writer.writerows(all_data)
        #     print(f"All company data saved to {filename}")
            
    except Exception as e:
        print(f"Script execution failed: {e}")

if __name__ == "__main__":
    main()

import requests
from bs4 import BeautifulSoup
import csv
import re
from datetime import datetime
import time
import random

class GoogleFinanceScraper:
    def __init__(self, proxy_username="YOUR_USERNAME", proxy_password="YOUR_PASSWORD"):
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0'
        })
        self.session.proxies.update({
            'http': f'http://{proxy_username}:{proxy_password}@gate.decodo.com:7000',
            'https': f'http://{proxy_username}:{proxy_password}@gate.decodo.com:7000'
        })
    
    def get_page_content(self, url):
        """Fetch page content"""
        print(f"Fetching data from: {url}")
        response = self.session.get(url, timeout=30)
        response.raise_for_status()
        time.sleep(random.uniform(1, 3))
        return response.text
    
    def clean_number(self, text):
        """Clean and format numbers"""
        if not text or text == "N/A" or text == "-":
            return "N/A"
        
        text = text.strip()
        
        # Handle percentages
        if '%' in text:
            match = re.search(r'([+-]?\d+\.?\d*)%', text)
            return match.group(1) + '%' if match else text
        
        # Handle dates
        if re.search(r'[A-Za-z]{3}\s+\d{1,2},?\s+\d{4}', text):
            return text
        
        # Handle ranges and addresses
        if any(x in text for x in [' - ', ' to ', 'street', 'avenue', 'city']):
            return text
        
        # Keep original format for financial numbers with multipliers (B, M, K, T)
        if re.search(r'\d+\.?\d*[BMKT]', text.upper()):
            return text
        
        # Handle currency symbols for price data
        currency = re.search(r'[\$€£¥₹]', text)
        if currency:
            return text  # Keep original format for prices
        
        # Handle regular numbers with commas (for employee counts, etc.)
        number_match = re.search(r'([+-]?\d{1,3}(?:,\d{3})*(?:\.\d+)?)', text)
        if number_match:
            number_str = number_match.group(1).replace(',', '')
            try:
                if '.' in number_str:
                    number = float(number_str)
                    return f"{number:,.2f}"
                else:
                    number = int(number_str)
                    return f"{number:,}"
            except:
                pass
        
        return text
    
    def extract_by_selector(self, soup, selector):
        """Extract text using CSS selector"""
        element = soup.select_one(selector)
        return element.get_text(strip=True) if element else "N/A"
    
    def find_p6k_value_by_context(self, soup, keywords):
        """Find P6K39c value by searching for nearby keywords"""
        for keyword in keywords:
            labels = soup.find_all(string=re.compile(keyword, re.IGNORECASE))
            for label in labels:
                parent = label.parent
                while parent and parent.name != 'body':
                    value_element = parent.find('div', class_='P6K39c')
                    if value_element:
                        return value_element.get_text(strip=True)
                    parent = parent.parent
        return "N/A"
    
    def extract_financial_data(self, soup):
        """Extract financial data from QXDnM elements"""
        # Get financial currency from the correct selector
        financial_currency = self.extract_by_selector(soup, 'th.yNnsfe.PFjsMe')
        
        # Define financial fields and their keywords
        financial_fields = {
            'revenue': ['Revenue', 'Total revenue'],
            'operating_expense': ['Operating expense', 'Operating expenses'],
            'net_income': ['Net income', 'Net earnings'],
            'net_profit_margin': ['Net profit margin', 'Profit margin'],
            'earnings_per_share': ['Earnings per share', 'EPS'],
            'ebitda': ['EBITDA'],
            'effective_tax_rate': ['Effective tax rate', 'Tax rate']
        }
        
        financial_data = {'financial_currency': financial_currency}
        
        # Extract using table row context
        for field, keywords in financial_fields.items():
            value = "N/A"
            for keyword in keywords:
                rows = soup.find_all('tr')
                for row in rows:
                    if keyword.lower() in row.get_text().lower():
                        qxdnm_cell = row.find('td', class_='QXDnM')
                        if qxdnm_cell:
                            value = qxdnm_cell.get_text(strip=True)
                            break
                if value != "N/A":
                    break
            financial_data[field] = self.clean_number(value)
        
        return financial_data
    
    def scrape_google_finance(self, url):
        """Main scraping function"""
        try:
            html_content = self.get_page_content(url)
            soup = BeautifulSoup(html_content, 'html.parser')
            
            data = {
                'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
                'url': url
            }
            
            # Extract basic company and price data
            data['company_title'] = self.extract_by_selector(soup, 'div[role="heading"][aria-level="1"].zzDege')
            data['current_price'] = self.extract_by_selector(soup, 'div.YMlKec.fxKbKc')
            
            # Extract change percentage (handle SVG)
            change_element = soup.select_one('div.JwB6zf')
            if change_element:
                change_text = change_element.get_text(strip=True)
                data['change_percentage'] = re.sub(r'[^\d.%+-]', '', change_text)
            else:
                data['change_percentage'] = "N/A"
            
            # Extract P6K39c data using positional and contextual methods
            p6k_elements = soup.find_all('div', class_='P6K39c')
            
            # Try contextual extraction first
            p6k_fields = {
                'previous_close': ['Previous close'],
                'year_range': ['52 week range', 'Year range'],
                'market_cap': ['Market cap'],
                'avg_volume': ['Avg volume'],
                'pe_ratio': ['P/E ratio'],
                'primary_exchange': ['Primary exchange'],
                'employees': ['Employees']
            }
            
            for field, keywords in p6k_fields.items():
                data[field] = self.clean_number(self.find_p6k_value_by_context(soup, keywords))
            
            # Fallback to positional extraction if needed
            if data.get('previous_close') == "N/A" and len(p6k_elements) >= 6:
                field_order = ['previous_close', 'year_range', 'market_cap', 
                              'avg_volume', 'pe_ratio', 'primary_exchange']
                for i, field in enumerate(field_order):
                    if i < len(p6k_elements):
                        data[field] = self.clean_number(p6k_elements[i].get_text(strip=True))
            
            # Extract founded date from P6K39c elements
            data['founded'] = "N/A"
            for element in p6k_elements:
                text = element.get_text(strip=True)
                if re.search(r'[A-Za-z]{3}\s+\d{1,2},?\s+\d{4}', text):
                    data['founded'] = text
                    break
            
            # Extract CEO, headquarters, and website from links
            links = soup.find_all('a', class_='tBHE4e')
            
            data['ceo'] = "N/A"
            data['headquarters'] = "N/A"
            data['website'] = "N/A"
            
            potential_websites = []
            
            for link in links:
                href = link.get('href', '')
                text = link.get_text(strip=True)
                rel = link.get('rel', [])
                
                # CEO (search links with person names)
                if data['ceo'] == "N/A" and 'search?q=' in href and len(text.split()) == 2:
                    data['ceo'] = text
                
                # Headquarters (maps links)
                elif data['headquarters'] == "N/A" and 'maps/place/' in href:
                    data['headquarters'] = link.get_text(separator=' ', strip=True)
                
                # Collect potential website links
                elif (href.startswith('http') and 
                      'google.com' not in href and 
                      'maps' not in href and 
                      'wikipedia' not in href.lower() and
                      'noopener' in rel and 
                      'noreferrer' in rel):
                    potential_websites.append((href, text))
            
            # Choose the best website from potential candidates
            if potential_websites:
                # Prefer company domain websites (containing company name or common patterns)
                company_keywords = ['amazon', 'about', 'corp', 'company', '.com', 'www']
                
                for href, text in potential_websites:
                    if any(keyword in href.lower() or keyword in text.lower() for keyword in company_keywords):
                        data['website'] = href
                        break
                
                # If no company-specific website found, take the first one
                if data['website'] == "N/A":
                    data['website'] = potential_websites[0][0]
            
            # Extract financial data
            financial_data = self.extract_financial_data(soup)
            data.update(financial_data)
            
            return data
            
        except Exception as e:
            print(f"Error scraping data: {e}")
            return None
    
    def print_data(self, data):
        """Print formatted results"""
        print("\n" + "="*70)
        print("GOOGLE FINANCE DATA EXTRACTION RESULTS")
        print("="*70)
        
        sections = {
            'COMPANY INFORMATION': ['company_title', 'ceo', 'founded', 'headquarters', 'website', 'employees', 'primary_exchange'],
            'STOCK PRICE DATA': ['current_price', 'change_percentage', 'previous_close', 'year_range'],
            'MARKET STATISTICS': ['market_cap', 'avg_volume', 'pe_ratio'],
            f"FINANCIAL DATA {data.get('financial_currency', '')}".strip(): ['revenue', 'operating_expense', 'net_income', 'net_profit_margin', 'earnings_per_share', 'ebitda', 'effective_tax_rate'],
            'EXTRACTION METADATA': ['timestamp', 'url']
        }
        
        for section_name, fields in sections.items():
            print(f"\n{section_name}")
            print("-" * 40)
            for field in fields:
                if field in data:
                    formatted_key = field.replace('_', ' ').title()
                    print(f"{formatted_key:<20}: {data[field]}")
        
        print("="*70)
    
    def save_to_csv(self, data, filename):
        """Save data to CSV"""
        try:
            with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
                writer = csv.DictWriter(csvfile, fieldnames=list(data.keys()))
                writer.writeheader()
                writer.writerow(data)
            print(f"Data saved to {filename}")
        except Exception as e:
            print(f"Error saving to CSV: {e}")

def main():
    """Main execution function"""
    # Single URL scraping
    url = "https://www.google.com/finance/quote/AAPL:NASDAQ?hl=en"
    
    # For multiple companies, replace the single URL with a list:
    # urls = [
    #     "https://www.google.com/finance/quote/AMZN:NASDAQ?hl=en",
    #     "https://www.google.com/finance/quote/AAPL:NASDAQ?hl=en", 
    #     "https://www.google.com/finance/quote/GOOGL:NASDAQ?hl=en",
    #     "https://www.google.com/finance/quote/MSFT:NASDAQ?hl=en"
    # ]
    
    scraper = GoogleFinanceScraper()
    
    print("Starting Google Finance scraping...")
    
    try:
        # Single company scraping
        data = scraper.scrape_google_finance(url)
        
        if data:
            scraper.print_data(data)
            filename = f"amazon_finance_data_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
            scraper.save_to_csv(data, filename)
        else:
            print("Failed to extract data")
        
        # For multiple companies, use this loop instead:
        # all_data = []
        # for i, url in enumerate(urls):
        #     print(f"\nScraping company {i+1}/{len(urls)}")
        #     data = scraper.scrape_google_finance(url)
        #     if data:
        #         all_data.append(data)
        #         scraper.print_data(data)
        #         time.sleep(random.uniform(3, 6))  # Delay between requests
        # 
        # # Save all data to one CSV
        # if all_data:
        #     filename = f"multiple_companies_data_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
        #     with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
        #         writer = csv.DictWriter(csvfile, fieldnames=list(all_data[0].keys()))
        #         writer.writeheader()
        #         writer.writerows(all_data)
        #     print(f"All company data saved to {filename}")
            
    except Exception as e:
        print(f"Script execution failed: {e}")

if __name__ == "__main__":
    main()

And here’s the response you’ll see in the terminal:

Handling challenges and anti-bot measures

Google Finance implements several protection mechanisms that scrapers must navigate carefully.

Rate limiting and request throttling

Google monitors request patterns and can temporarily block IPs that send too many requests in a short period. The scraper includes randomized delays between requests and exponential backoff for failed attempts to respect these limits.

Dynamic content loading

While most Google Finance data loads with the initial HTML, some elements may require JavaScript execution. For basic financial data, the static HTML approach works reliably, but more complex scenarios might benefit from browser automation tools.

IP-based blocking

Sustained scraping from a single IP address increases the likelihood of being blocked. Rotating through multiple proxy IPs helps distribute requests and maintain access over longer periods.

User agent and header detection

Google can detect non-browser requests through missing or inconsistent headers. The scraper includes comprehensive browser headers and uses common user agent strings to appear more like legitimate browser traffic.

Advanced scraping techniques

For more sophisticated data collection needs, several advanced techniques can improve scraper performance and reliability.

Session management and cookie handling

Maintaining consistent sessions across requests can improve success rates and reduce the likelihood of triggering anti-bot measures. The requests.Session object automatically handles cookies and connection pooling.

Multi-threaded data collection

When scraping large numbers of stocks, parallel processing can significantly reduce total execution time. However, this must be balanced against rate limiting requirements to avoid overwhelming the target server.

Error recovery and data validation

Robust scrapers include comprehensive error handling and data validation to ensure reliable operation even when encountering unexpected page structures or network issues.

Proxy rotation strategies

Advanced proxy management includes automatic rotation, health checking, and failover mechanisms to maintain consistent access even when individual proxy IPs become blocked.

Best practices for sustainable scraping

Following established best practices ensures your scraper operates reliably and respectfully over time.

Respect rate limits and implement delays

Always include appropriate delays between requests and avoid overwhelming target servers. Random delays help make request patterns appear more natural and reduce the likelihood of detection.

Monitor and adapt to page changes

Google occasionally updates its page structure, which can break scrapers that rely on specific CSS selectors or HTML patterns. Regular monitoring and testing help identify when updates are needed.

Handle errors gracefully

Network failures, rate limiting, and page structure changes are inevitable when scraping at scale. Building robust error handling and retry logic ensures your scraper can recover from temporary issues.

Store and rotate proxy credentials securely

Protect proxy credentials and API keys by storing them securely and rotating them regularly. This prevents unauthorized access and ensures continued service availability.

Alternatives to scraping Google Finance

Google no longer provides an official public API for Google Finance – the original one was deprecated years ago. While scraping remains a flexible way to access data, it's not always the most convenient or scalable option. Depending on your needs, third-party APIs might offer a more streamlined and reliable solution.

Official financial APIs

Services like Alpha Vantage, IEX Cloud, and Yahoo Finance offer structured APIs with reliable data access, though they often include usage limits and fees for comprehensive access.

Financial data providers

Professional services like Bloomberg Terminal, Refinitiv, and FactSet provide institutional-grade financial data with extensive historical records and real-time updates, though at significantly higher costs.

Broker APIs

Many online brokers offer APIs that provide account-specific data and limited market information, suitable for personal portfolio management but not broader market analysis.

To sum up

Scraping Google Finance can be a great way to unlock valuable insights for financial research, portfolio tracking, or just staying on top of the markets. With this Python-based scraper, you now have a flexible tool that can pull together over 20 key data points.

As long as you follow a few best practices (like rotating proxies, respecting rate limits, and handling errors gracefully), you’ll be well on your way to building scrapers that are both reliable and long-lasting. From automated reporting to custom dashboards, this setup offers a reliable starting point for working with financial data.

Access residential proxies now

Try residential proxies free for 3 days – full access, zero restrictions.

Get started

About the author

Dominykas Niaura

Technical Copywriter

Dominykas brings a unique blend of philosophical insight and technical expertise to his writing. Starting his career as a film critic and music industry copywriter, he's now an expert in making complex proxy and web scraping concepts accessible to everyone.

Connect with Dominykas via LinkedIn

All information on Decodo Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Decodo Blog or any third-party websites that may belinked therein.

In this article

Industry-leading residential proxies

Access 115M+ residential IPs with fast response times and high success rates.

Start free trial

Frequently asked questions

Can you scrape data from Google Finance?

Yes, you can! The HTML and network calls that power Google Finance can be fetched with ordinary HTTP requests or browser-automation tools. Scraping publicly available financial data is generally permissible, but you should review Google’s terms of service and ensure your usage complies with applicable laws.

Does Google have a web scraper?

Google itself doesn’t provide a dedicated scraper for Google Finance. The closest first-party solution is the Google Finance function in Google Sheets (=GOOGLEFINANCE()), which pulls delayed quotes and limited fundamental data directly into a spreadsheet. For anything more sophisticated, you’ll need to roll your own code or use a third-party data provider.

How often can I scrape data without getting blocked?

Success depends on factors like request volume, proxy quality, and scraping patterns. Generally, implementing delays of 2-5 seconds between requests and using rotating proxies provides good reliability for moderate-volume scraping.

What tools are best for beginners?

Python’s Requests and Beautiful Soup combo is the classic entry point: easy to learn, well-documented, and perfect for small-scale experiments. For dynamic pages, Chrome-driven solutions such as Playwright or Selenium let you interact with the site just as a browser would, albeit with more overhead. Finally, libraries like pandas-read-html or pyquery can save you from reinventing table parsing once you have the raw HTML. Choose the lightest tool that meets your needs.

How can I manage IP blocks while scraping?

Google Scholar actively tries to prevent automated access, so it's important to rotate IP addresses using proxies. You can also throttle your request rate, use randomized headers, and add delays between requests to mimic human behavior. These steps help reduce the risk of IP bans and maintain uninterrupted scraping sessions.

PYTHON

Python Pandas Tutorial for Beginners

Pandas, both the cuddly animals and the Python library, are known for their efficiency – while one munches through bamboo, the other helps you munch through data. Pandas in Python is an amazing data analysis and manipulation tool offering powerful data structures and functions that make handling data a breeze. With Pandas by your side, your data will be as organized as a panda's daily schedule of eating and napping!

Zilvinas Tamulis

Sep 10, 2024

10 min read

DATA COLLECTION

SEARCH ENGINE OPTIMIZATION

PYTHON

How to Scrape Google Search Data

Business success is driven by data, and few data sources are as valuable as Google’s Search Engine Results Page (SERP). Collecting this data can be complex, but various tools and automation techniques make it easier. This guide explores practical ways to scrape Google search results, highlights the benefits of such efforts, and addresses common challenges.

Dominykas Niaura

Dec 30, 2024

7 min read

How to Scrape Google Finance

Why scrape Google Finance

Market research and analysis

Portfolio management and tracking

Financial application development

Competitive intelligence

What data you can scrape from Google Finance

Real-time stock data

Company fundamentals

Financial statements and metrics

Market sentiment indicators

Tools and libraries for Google Finance scraping

Core Python libraries

Data processing tools

Advanced browser automation

Proxy and session management

Setting up your environment

Install required packages

Configure proxy access

Prepare your development environment

Step-by-step Google Finance scraping tutorial

1. Import libraries and build the scraper class

2. Page fetching with error handling

3. Data cleaning and standardization

4. CSS selector and contextual extraction

5. Financial data extraction

6. Main scraping function

7. Data display and export functions

8. Main execution function

The complete Google Finance scraping code

Handling challenges and anti-bot measures

Rate limiting and request throttling

Dynamic content loading

IP-based blocking

User agent and header detection

Advanced scraping techniques

Session management and cookie handling

Multi-threaded data collection

Error recovery and data validation

Proxy rotation strategies

Best practices for sustainable scraping

Respect rate limits and implement delays

Monitor and adapt to page changes

Handle errors gracefully

Store and rotate proxy credentials securely

Alternatives to scraping Google Finance

Official financial APIs

Financial data providers

Broker APIs

To sum up

Frequently asked questions

Can you scrape data from Google Finance?

Does Google have a web scraper?

How often can I scrape data without getting blocked?

What tools are best for beginners?

How can I manage IP blocks while scraping?

Related Articles