How to Scrape Nasdaq Data: A Complete Guide Using Python and Alternatives

Nasdaq offers a wealth of stock prices, news, and market reports. Manually collecting this data is a Sisyphean task, since new information appears constantly. Savvy investors, analysts, and traders turn to web scraping instead, automating data gathering to power more intelligent analysis and trading strategies. This guide walks you through building a Nasdaq scraper with Python, browser automation, APIs, and proxies to extract both real-time and historical market data.

Zilvinas Tamulis

Nov 21, 2025

14 min read

Understanding Nasdaq's data structure

Nasdaq spreads its stock data across multiple pages and sections. Each stock page delivers real-time quotes, company overviews, financials, and related news, while the screener pages let you sift through thousands of stocks using filters like market cap, sector, and performance.

For every ticker, Nasdaq provides a rich set of data points, such as current price, trading volume, 52-week highs and lows, P/E ratio, dividend yield, and upcoming earnings dates. Historical data is also available, including price charts, trade volumes, and corporate actions like splits and dividends.

Unfortunately for web scrapers, most of Nasdaq's data doesn't show up right away – it's loaded dynamically through JavaScript. The initial HTML is more of a skeleton, while the real content – prices, charts, tables – arrives later through background API calls. That means traditional HTML parsing won't cut it. To get the whole picture, you'll either need to render dynamic content in a headless browser or tap into Nasdaq's internal API endpoints directly. The ideal method depends on what kind of data you're chasing.

Tools and technologies for scraping Nasdaq

Choosing the right tools can mean the difference between a scraper that runs smoothly and one that crashes on its first attempt.

Python is the most popular choice for web scraping thanks to its mature libraries, clean syntax, and strong data-handling capabilities. Its vast community also makes troubleshooting easy – most issues have already been solved somewhere online.

Other languages can get the job done too:

JavaScript (Node.js). Great for scraping JavaScript-heavy websites and works seamlessly with browser automation tools.
Ruby. Equipped with solid libraries like Nokogiri and Mechanize for lightweight extraction tasks.
Go. Ideal for high-performance, large-scale scraping where speed and efficiency matter.

For scraping Nasdaq specifically, the essential Python libraries include:

Requests for sending HTTP requests.
Beautiful Soup for parsing HTML.
Selenium or Playwright for browser automation.
Pandas for organizing and exporting the data.

Playwright is the top choice for Nasdaq scraping. It's faster than Selenium, better at handling modern web technologies, and includes built-in waiting mechanisms for dynamic content. Its clean API and consistent performance across environments make it ideal for production use.

For enterprise-level operations, Decodo's Web Scraping API is also an essential tool of the scraping setup. It handles browser automation, JavaScript rendering, and data extraction through simple API requests – you send a URL and receive clean, structured data in return. Behind the scenes, the API uses rotating proxies to prevent IP blocks, CAPTCHAs, and rate limits. The API distributes requests across multiple IP addresses, making your scraper appear as many different users, ensuring uninterrupted data collection even when pulling thousands of records daily.

Skip the scraper setup, get straight to analysis

Decodo's Web Scraping API handles Nasdaq scraping for you so that you can focus entirely on the data.

Get it now

Step-by-step guide: How to scrape Nasdaq data

Step 1: Choose your target data

Nasdaq has a wealth of data to choose from, so it's up to you to decide what's most relevant and what you're looking for. Here are some of the most popular categories to target:

Individual stock pages. Great for live quotes, key stats, and performance insights that are perfect for tracking price movements or building real-time dashboards.
Historical data pages. Ideal for time-series analysis and backtesting trading strategies, since they provide price data across different time ranges.
News and press releases. Crucial for understanding market sentiment and tracking events that influence stock volatility and investor behavior.

If your target isn't on the list, don't worry, as the process for most of these pages is relatively similar. Now, let's explore how to extract information from them.

Step 2: Analyze the target page

Before scraping any page, it's important to take a peek behind the scenes to see how it works. Visit any of Nasdaq's pages, right-click, select Inspect Element or Inspect, or press the F12 button. This will bring up your browser's developer tools, where you can inspect the underlying HTML, track network activity, and performance.

Step 3: Select your scraping method

From this point, there are two routes you can choose:

Intercepting network requests. This approach skips the tedious work of locating HTML elements or updating selectors when the site changes. Instead, you capture the backend requests directly, giving you structured JSON data that's clean, reliable, and ready to use.
Browser automation. A more hands-on approach where you control a real browser to navigate and extract data. It takes a bit more setup but offers greater flexibility for handling live content or accessing elements that the API doesn't provide.

For scraping Nasdaq, it's generally best to rely on intercepted network requests as your primary data source. This approach is clean, simple, and straightforward, while browser automation is better saved for cases where the API responses don't give you what you need. Regardless of choice, both methods can provide the same information accurately.

Step 4: Implement the scraper

In this section, you'll see how to collect the 3 different types of data in both API and browser automation methods. To run these scripts, make sure you have Python with Playwright installed on your computer.

Individual stock pages

To collect individual stock data using the API:

Navigate to the URL with the corresponding ticker:

https://www.nasdaq.com/market-activity/stocks/{ticker}

2. Open your browser's developer tools.

3. Switch to the Network tab.

4. Filter requests by Fetch/XHR.

5. Find the "info?assetclass=stocks" request (there may be a few of them, depending on how long you had the page open).

6. See the Response section to find the underlying JSON data. It contains a variety of helpful information, such as last sale price, net change, bid and ask prices, the volume being traded, etc.

7. Create and run a Python script with the following code:

from playwright.sync_api import sync_playwright

URL = "https://www.nasdaq.com/market-activity/stocks/nvda"

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    context = browser.new_context()
    page = context.new_page()
    
    # Track last price
    state = {"last_price": None}

    def handle_response(response):
        if "info?assetclass=stocks" in response.url:
            try:
                data = response.json()
                ask_price = data.get("data", {}).get("primaryData", {}).get("askPrice")

                # Skip if no price
                if ask_price is None:
                    return

                # First observed price
                if state["last_price"] is None:
                    print(f"[INIT] Ask price: {ask_price}")
                    state["last_price"] = ask_price
                    return

                previous = state["last_price"]

                if ask_price > previous:
                    print(f"[RISE] Ask price: {ask_price}")
                elif ask_price < previous:
                    print(f"[DROP] Ask price: {ask_price}")
                else:
                    print(f"[UNCHANGED] Ask price: {ask_price}")

                state["last_price"] = ask_price

            except:
                pass

    page.on("response", handle_response)
    page.goto(URL)
    page.wait_for_timeout(60000)

from playwright.sync_api import sync_playwright

URL = "https://www.nasdaq.com/market-activity/stocks/nvda"

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    context = browser.new_context()
    page = context.new_page()
    
    # Track last price
    state = {"last_price": None}

    def handle_response(response):
        if "info?assetclass=stocks" in response.url:
            try:
                data = response.json()
                ask_price = data.get("data", {}).get("primaryData", {}).get("askPrice")

                # Skip if no price
                if ask_price is None:
                    return

                # First observed price
                if state["last_price"] is None:
                    print(f"[INIT] Ask price: {ask_price}")
                    state["last_price"] = ask_price
                    return

                previous = state["last_price"]

                if ask_price > previous:
                    print(f"[RISE] Ask price: {ask_price}")
                elif ask_price < previous:
                    print(f"[DROP] Ask price: {ask_price}")
                else:
                    print(f"[UNCHANGED] Ask price: {ask_price}")

                state["last_price"] = ask_price

            except:
                pass

    page.on("response", handle_response)
    page.goto(URL)
    page.wait_for_timeout(60000)

You'll see real-time data being printed in your terminal showing the changes in the ask price. The script will watch the requests for 1 minute, but you can increase the timeout if needed. If you want to test the code with a different stock or data, simply replace "nvda" in the URL and change the "askPrice" string with another type of data available from the JSON.

To perform the same job without using the API:

Navigate to the target page.
Open your browser's developer tools.
Locate the ask price HTML element ("header-info-ask-info").
Write and run the following code:

from playwright.sync_api import sync_playwright
import time

URL = "https://www.nasdaq.com/market-activity/stocks/nvda"

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    context = browser.new_context()
    page = context.new_page()

    state = {"last_price": None}

    def get_ask_price():
        span = page.query_selector("span.header-info-ask-info")
        if not span:
            return None

        text = span.inner_text().strip()

        # Format is "$191.97 X 26" → extract the numeric price
        try:
            price_str = text.split(" ")[0].replace("$", "")
            return float(price_str)
        except Exception:
            return None

    page.goto(URL)

    # Watch indefinitely
    while True:
        ask = get_ask_price()
        if ask is None:
            time.sleep(1)
            continue

        if state["last_price"] is None:
            print(f"[INIT] Ask price: {ask}")
            state["last_price"] = ask
        else:
            previous = state["last_price"]

            if ask > previous:
                print(f"[RISE] Ask price: {ask}")
            elif ask < previous:
                print(f"[DROP] Ask price: {ask}")
            else:
                print(f"[UNCHANGED] Ask price: {ask}")

            state["last_price"] = ask

        # Poll every second
        time.sleep(1)

from playwright.sync_api import sync_playwright
import time

URL = "https://www.nasdaq.com/market-activity/stocks/nvda"

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    context = browser.new_context()
    page = context.new_page()

    state = {"last_price": None}

    def get_ask_price():
        span = page.query_selector("span.header-info-ask-info")
        if not span:
            return None

        text = span.inner_text().strip()

        # Format is "$191.97 X 26" → extract the numeric price
        try:
            price_str = text.split(" ")[0].replace("$", "")
            return float(price_str)
        except Exception:
            return None

    page.goto(URL)

    # Watch indefinitely
    while True:
        ask = get_ask_price()
        if ask is None:
            time.sleep(1)
            continue

        if state["last_price"] is None:
            print(f"[INIT] Ask price: {ask}")
            state["last_price"] = ask
        else:
            previous = state["last_price"]

            if ask > previous:
                print(f"[RISE] Ask price: {ask}")
            elif ask < previous:
                print(f"[DROP] Ask price: {ask}")
            else:
                print(f"[UNCHANGED] Ask price: {ask}")

            state["last_price"] = ask

        # Poll every second
        time.sleep(1)

Since there's no way to watch for changes like with the API, the script uses the time library to check the page every second. This may lead to redundant data, such as the price staying the same and repeating multiple times without any actual update. The script also runs indefinitely, so make sure to stop it in your terminal by clicking CTRL + C or add logic that would end the script once enough data has been collected.

To test this script with another stock, you can replace the ticker in the URL as before. However, if you want to target another data point within the page, you'll have to find where it lies within the HTML manually.

Historical data pages

There are several ways to get historical data, ranging from a simple download button to intricate scraping through pagination.

If you want a stock's historical quotes or the NASDAQ Composite Index (COMP) Historical Data, you don't need to do any scraping at all. Simply navigate to the page, set the desired timeline, and click Download historical data above the table. Scraping, on the other hand, can help if you want to get data for several stocks quickly:

from playwright.sync_api import sync_playwright
import os
import shutil

# User settings
TICKERS = ["AAPL", "MSFT", "NVDA"]
TIMELINE = "m1"   # m1, m6, ytd, y1, y5, y10

BASE_URL = "https://www.nasdaq.com/market-activity/stocks/{ticker}/historical?page=1&rows_per_page=10&timeline={timeline}"

OUTPUT_DIR = "Nasdaq_data"
os.makedirs(OUTPUT_DIR, exist_ok=True)

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    context = browser.new_context(accept_downloads=True)
    page = context.new_page()

    for ticker in TICKERS:
        url = BASE_URL.format(ticker=ticker, timeline=TIMELINE)
        print(f"Fetching {ticker} from {url}")
        page.goto(url)

        # Handle cookie consent if it appears
        try:
            cookie_button = page.wait_for_selector("#onetrust-accept-btn-handler", timeout=5000)
            cookie_button.click()
            print("Accepted cookies")
        except:
            pass

        # Wait for the download button
        page.wait_for_selector("button.historical-download", timeout=15000)
        download_button = page.query_selector("button.historical-download")

        if download_button:
            download_button.scroll_into_view_if_needed()
            
            # Correct download handling
            with page.expect_download() as download_info:
                download_button.click()
            download = download_info.value

            # Save file
            file_path = download.path()
            if file_path:
                new_filename = f"{ticker}_historical.csv"
                final_path = os.path.join(OUTPUT_DIR, new_filename)
                shutil.move(file_path, final_path)
                print(f"Saved {final_path}")
            else:
                print(f"No download returned for {ticker}")
        else:
            print(f"Download button not found for {ticker}")

    browser.close()

from playwright.sync_api import sync_playwright
import os
import shutil

# User settings
TICKERS = ["AAPL", "MSFT", "NVDA"]
TIMELINE = "m1"   # m1, m6, ytd, y1, y5, y10

BASE_URL = "https://www.nasdaq.com/market-activity/stocks/{ticker}/historical?page=1&rows_per_page=10&timeline={timeline}"

OUTPUT_DIR = "Nasdaq_data"
os.makedirs(OUTPUT_DIR, exist_ok=True)

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    context = browser.new_context(accept_downloads=True)
    page = context.new_page()

    for ticker in TICKERS:
        url = BASE_URL.format(ticker=ticker, timeline=TIMELINE)
        print(f"Fetching {ticker} from {url}")
        page.goto(url)

        # Handle cookie consent if it appears
        try:
            cookie_button = page.wait_for_selector("#onetrust-accept-btn-handler", timeout=5000)
            cookie_button.click()
            print("Accepted cookies")
        except:
            pass

        # Wait for the download button
        page.wait_for_selector("button.historical-download", timeout=15000)
        download_button = page.query_selector("button.historical-download")

        if download_button:
            download_button.scroll_into_view_if_needed()
            
            # Correct download handling
            with page.expect_download() as download_info:
                download_button.click()
            download = download_info.value

            # Save file
            file_path = download.path()
            if file_path:
                new_filename = f"{ticker}_historical.csv"
                final_path = os.path.join(OUTPUT_DIR, new_filename)
                shutil.move(file_path, final_path)
                print(f"Saved {final_path}")
            else:
                print(f"No download returned for {ticker}")
        else:
            print(f"Download button not found for {ticker}")

    browser.close()

The script above allows you to enter a list of tickers to scrape and the timeline to set for them. It then navigates to every stock page, clicks the download button, and saves the data into one neat folder, properly labeling each CSV file with the ticker symbol.

But what if the page doesn't offer a handy download button? If, for any reason, Nasdaq decides to remove the download button, or you're scraping the historical NOCP of a stock, you'll need to get the data manually.

You can use the same method as before and intercept network requests to get all data immediately. This time target the "historical-nocp?timeframe=y[x]" request:

from playwright.sync_api import sync_playwright
import csv

URL = "https://www.nasdaq.com/market-activity/stocks/nvda/historical-nocp?page=1&rows_per_page=100&timeline=y1"
OUTPUT_CSV = "nvda_historical.csv"

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    context = browser.new_context()
    page = context.new_page()

    # This will hold the data
    historical_data = []

    # Intercept responses
    def handle_response(response):
        if "historical-nocp" in response.url and "timeframe=y1" in response.url:
            try:
                data = response.json()
                # Extract the table
                table = data.get("data", {}).get("nocp", {}).get("nocpTable", [])
                for row in table:
                    trade_date = row.get("date")
                    price = row.get("price")
                    if trade_date and price:
                        historical_data.append({"Trade Date": trade_date, "Nasdaq Closing Price": price})
            except Exception as e:
                print("Failed to parse response:", e)

    page.on("response", handle_response)

    # Navigate to trigger the request
    page.goto(URL)

    # Wait some time to ensure the request completes
    page.wait_for_timeout(5000)  # 5 seconds; adjust if needed

    # Write to CSV
    if historical_data:
        with open(OUTPUT_CSV, "w", newline="") as f:
            writer = csv.DictWriter(f, fieldnames=["Trade Date", "Nasdaq Closing Price"])
            writer.writeheader()
            for row in historical_data:
                writer.writerow(row)

        print(f"Saved {len(historical_data)} rows to {OUTPUT_CSV}")
    else:
        print("No data captured.")

    browser.close()

from playwright.sync_api import sync_playwright
import csv

URL = "https://www.nasdaq.com/market-activity/stocks/nvda/historical-nocp?page=1&rows_per_page=100&timeline=y1"
OUTPUT_CSV = "nvda_historical.csv"

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    context = browser.new_context()
    page = context.new_page()

    # This will hold the data
    historical_data = []

    # Intercept responses
    def handle_response(response):
        if "historical-nocp" in response.url and "timeframe=y1" in response.url:
            try:
                data = response.json()
                # Extract the table
                table = data.get("data", {}).get("nocp", {}).get("nocpTable", [])
                for row in table:
                    trade_date = row.get("date")
                    price = row.get("price")
                    if trade_date and price:
                        historical_data.append({"Trade Date": trade_date, "Nasdaq Closing Price": price})
            except Exception as e:
                print("Failed to parse response:", e)

    page.on("response", handle_response)

    # Navigate to trigger the request
    page.goto(URL)

    # Wait some time to ensure the request completes
    page.wait_for_timeout(5000)  # 5 seconds; adjust if needed

    # Write to CSV
    if historical_data:
        with open(OUTPUT_CSV, "w", newline="") as f:
            writer = csv.DictWriter(f, fieldnames=["Trade Date", "Nasdaq Closing Price"])
            writer.writeheader()
            for row in historical_data:
                writer.writerow(row)

        print(f"Saved {len(historical_data)} rows to {OUTPUT_CSV}")
    else:
        print("No data captured.")

    browser.close()

The script gets all of the data from the sent JSON and exports it to a CSV file. Make sure to set the timeline parameter in the URL to match the amount of data you need. You don't have to handle any pagination or look through HTML elements, as all the data is sent immediately for you to capture and save.

Manual scraping is also possible if none of the above methods fit your needs:

from playwright.sync_api import sync_playwright
import csv
import time

TICKER = "nvda"
TIMELINE = "y1"
ROWS_PER_PAGE = 100
OUTPUT_CSV = f"{TICKER}_historical_all_pages.csv"

BASE_URL = f"https://www.nasdaq.com/market-activity/stocks/{TICKER}/historical-nocp?page=1&rows_per_page={ROWS_PER_PAGE}&timeline={TIMELINE}"

historical_data = []

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    context = browser.new_context()
    page = context.new_page()

    page.goto(BASE_URL)

    # Accept cookies if popup appears
    try:
        cookie_button = page.wait_for_selector("#onetrust-accept-btn-handler", timeout=5000)
        cookie_button.click()
        print("Accepted cookies")
    except:
        pass

    while True:
        page.wait_for_selector("div.table-row", timeout=10000)

        # Scrape table rows
        rows = page.query_selector_all("div.table-row")
        for row in rows:
            cells = row.query_selector_all("div.table-cell")
            if len(cells) >= 2:
                trade_date = cells[0].inner_text().strip()
                price = cells[1].inner_text().strip().replace("$", "")
                historical_data.append({"Trade Date": trade_date, "Nasdaq Closing Price": price})

        # Check the next button
        next_button = page.query_selector("button.pagination__next")
        if not next_button:
            print("Next button not found, ending.")
            break

        # Scroll into view
        next_button.scroll_into_view_if_needed()
        disabled_attr = next_button.get_attribute("disabled")
        if disabled_attr == "true":
            print("Reached last page.")
            break

        # Click the next button to go to the next page
        next_button.click()
        print("Clicked next page")
        time.sleep(2)  # small wait to allow table to reload

    # Write CSV
    with open(OUTPUT_CSV, "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=["Trade Date", "Nasdaq Closing Price"])
        writer.writeheader()
        for row in historical_data:
            writer.writerow(row)

    print(f"Saved {len(historical_data)} rows to {OUTPUT_CSV}")
    browser.close()

from playwright.sync_api import sync_playwright
import csv
import time

TICKER = "nvda"
TIMELINE = "y1"
ROWS_PER_PAGE = 100
OUTPUT_CSV = f"{TICKER}_historical_all_pages.csv"

BASE_URL = f"https://www.nasdaq.com/market-activity/stocks/{TICKER}/historical-nocp?page=1&rows_per_page={ROWS_PER_PAGE}&timeline={TIMELINE}"

historical_data = []

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    context = browser.new_context()
    page = context.new_page()

    page.goto(BASE_URL)

    # Accept cookies if popup appears
    try:
        cookie_button = page.wait_for_selector("#onetrust-accept-btn-handler", timeout=5000)
        cookie_button.click()
        print("Accepted cookies")
    except:
        pass

    while True:
        page.wait_for_selector("div.table-row", timeout=10000)

        # Scrape table rows
        rows = page.query_selector_all("div.table-row")
        for row in rows:
            cells = row.query_selector_all("div.table-cell")
            if len(cells) >= 2:
                trade_date = cells[0].inner_text().strip()
                price = cells[1].inner_text().strip().replace("$", "")
                historical_data.append({"Trade Date": trade_date, "Nasdaq Closing Price": price})

        # Check the next button
        next_button = page.query_selector("button.pagination__next")
        if not next_button:
            print("Next button not found, ending.")
            break

        # Scroll into view
        next_button.scroll_into_view_if_needed()
        disabled_attr = next_button.get_attribute("disabled")
        if disabled_attr == "true":
            print("Reached last page.")
            break

        # Click the next button to go to the next page
        next_button.click()
        print("Clicked next page")
        time.sleep(2)  # small wait to allow table to reload

    # Write CSV
    with open(OUTPUT_CSV, "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=["Trade Date", "Nasdaq Closing Price"])
        writer.writeheader()
        for row in historical_data:
            writer.writerow(row)

    print(f"Saved {len(historical_data)} rows to {OUTPUT_CSV}")
    browser.close()

The script navigates to the target page, extracts data from the table, and then clicks the "Next" button to move through subsequent pages, repeating this process until the "Next" button is disabled, indicating there are no more pages.

News and press releases

Stock pages also feature articles that mention them at the bottom of the page. Scraping these can provide the latest news and valuable insights from experts in the industry.

So far, the API has proven to be the most effective option when scraping Nasdaq. However, scraping news articles can only be done manually, as there are no requests that send clean JSON data.

Here's how you can do it:

from playwright.sync_api import sync_playwright
import csv
import time

URL = "https://www.nasdaq.com/market-activity/stocks/nvda"
OUTPUT_CSV = "nvda_articles_full.csv"

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    context = browser.new_context()
    page = context.new_page()

    page.goto(URL)

    # Accept cookies
    try:
        cookie_btn = page.wait_for_selector("#onetrust-accept-btn-handler", timeout=5000)
        cookie_btn.scroll_into_view_if_needed()
        cookie_btn.click()
    except:
        pass

    # Scroll until the section heading is visible
    section_selector = (
        "h3.jupiter22-c-section-heading-title."
        "jupiter22-c-section-heading-title__size-xs."
        "jupiter22-c-section-heading__headline"
    )

    for _ in range(20):
        if page.query_selector(section_selector):
            page.locator(section_selector).scroll_into_view_if_needed()
            break
        page.evaluate("window.scrollBy(0, 500);")
        time.sleep(1)

    # Keep scrolling until all content loads
    last_height = 0
    while True:
        page.evaluate("window.scrollBy(0, 1200);")
        time.sleep(2)
        new_height = page.evaluate("document.body.scrollHeight")
        if new_height == last_height:
            break
        last_height = new_height

    # Extract items
    page.wait_for_selector("span.jupiter22-c-article-list__item_title", timeout=10000)

    titles = page.query_selector_all("span.jupiter22-c-article-list__item_title")
    dates = page.query_selector_all("span.jupiter22-c-article-list__item_timeline")
    links = page.query_selector_all("a.jupiter22-c-article-list__item_title_wrapper")

    data = []
    for t, d, l in zip(titles, dates, links):
        relative = l.get_attribute("href")
        full_link = "https://www.nasdaq.com" + relative

        data.append(
            {
                "Title": t.inner_text().strip(),
                "Date": d.inner_text().strip(),
                "Link": full_link,
            }
        )

    # Save CSV
    with open(OUTPUT_CSV, "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=["Title", "Date", "Link"])
        writer.writeheader()
        writer.writerows(data)

    print(f"Saved {len(data)} articles to {OUTPUT_CSV}")
    browser.close()

from playwright.sync_api import sync_playwright
import csv
import time

URL = "https://www.nasdaq.com/market-activity/stocks/nvda"
OUTPUT_CSV = "nvda_articles_full.csv"

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    context = browser.new_context()
    page = context.new_page()

    page.goto(URL)

    # Accept cookies
    try:
        cookie_btn = page.wait_for_selector("#onetrust-accept-btn-handler", timeout=5000)
        cookie_btn.scroll_into_view_if_needed()
        cookie_btn.click()
    except:
        pass

    # Scroll until the section heading is visible
    section_selector = (
        "h3.jupiter22-c-section-heading-title."
        "jupiter22-c-section-heading-title__size-xs."
        "jupiter22-c-section-heading__headline"
    )

    for _ in range(20):
        if page.query_selector(section_selector):
            page.locator(section_selector).scroll_into_view_if_needed()
            break
        page.evaluate("window.scrollBy(0, 500);")
        time.sleep(1)

    # Keep scrolling until all content loads
    last_height = 0
    while True:
        page.evaluate("window.scrollBy(0, 1200);")
        time.sleep(2)
        new_height = page.evaluate("document.body.scrollHeight")
        if new_height == last_height:
            break
        last_height = new_height

    # Extract items
    page.wait_for_selector("span.jupiter22-c-article-list__item_title", timeout=10000)

    titles = page.query_selector_all("span.jupiter22-c-article-list__item_title")
    dates = page.query_selector_all("span.jupiter22-c-article-list__item_timeline")
    links = page.query_selector_all("a.jupiter22-c-article-list__item_title_wrapper")

    data = []
    for t, d, l in zip(titles, dates, links):
        relative = l.get_attribute("href")
        full_link = "https://www.nasdaq.com" + relative

        data.append(
            {
                "Title": t.inner_text().strip(),
                "Date": d.inner_text().strip(),
                "Link": full_link,
            }
        )

    # Save CSV
    with open(OUTPUT_CSV, "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=["Title", "Date", "Link"])
        writer.writeheader()
        writer.writerows(data)

    print(f"Saved {len(data)} articles to {OUTPUT_CSV}")
    browser.close()

The script finds the "Latest News" heading, waits for the content to load, and scrapes the article titles, post dates, and direct URLs, then stores them in a CSV file for further reading or AI analysis.

Step 5: Using proxies to avoid blocks

Proxies help you slip past the usual roadblocks that large sites throw at high-frequency scrapers. Nasdaq isn't the strictest gatekeeper on the web, yet it still applies traffic filtering that can slow or completely halt repeated scraping. A proxy layer helps you keep your code running smoothly without setting off alarms.

Implementing Decodo proxies into your Playwright script is simple:

Create an account on the Decodo dashboard.
On the left panel, select Residential proxies.
Choose a subscription, Pay As You Go plan, or claim a 3-day free trial.
In the Proxy setup tab, configure your location and session preferences.
Copy your proxy credentials for integration into your scraping script.
Add a PROXY_URL = "https://user:[email protected]:10001" variable after library imports.
Add a proxy parameter to the browser launch parameters.

Example:

from playwright.sync_api import sync_playwright

PROXY_URL = "https://user:[email protected]:10001"

with sync_playwright() as p:
    browser = p.chromium.launch(
        headless=True,
        proxy={
            "server": PROXY_URL
        }
    )

You can apply this small change to any script in the article. It instantly boosts your scraper's reliability, reduces the chance of getting flagged, and keeps the origin of your traffic hidden.

Automated scraping with n8n

If coding isn't your strong suit, n8n is a handy tool to automate workflows that would typically require complex code. Its intuitive workflow builder lets you link nodes, with each node performing a specific task to collect and process data. Here's an example workflow that gets a stock's information:

Prepare a Google Sheets document. List a few tickers and the information you want to extract.

2. Create a new n8n workflow. Use either the cloud or a locally hosted version of n8n. On the homepage, click Create Workflow.

3. Add a Schedule Trigger node. Set the Trigger Rules to set how often the scraper should run.

4. Add a Google Sheets node. Select the Get row(s) in sheet action and connect it to the prepared spreadsheet.

5. Add a Decodo node. Select the Scrape using Universal target action. Set the URL to https://api.nasdaq.com/api/quote/{{ $json.Ticker }}/info?assetclass=stocks. It will loop through the tickers by appending them to the URL and get the data.

6. Add a Code node. Select the Code in JavaScript action. You'll need this to parse the JSON data into a more readable format. Modify the json fields to match the ones in the Google Sheet.


const results = [];

// Loop over all input items
for (const item of $input.all()) {
    const resultsArray = item.json.results || []; // get the results array
    for (const entry of resultsArray) {
        const contentString = entry.content; // stringified JSON
        const parsedContent = JSON.parse(contentString);
        const data = parsedContent.data;

        results.push({
            json: {
                "Ticker": data.symbol,
                "Sale price": data.primaryData.lastSalePrice,
                "Net change": data.primaryData.netChange,
                "Percentage change": data.primaryData.percentageChange,
                "Bid price": data.primaryData.bidPrice,
                "Ask price": data.primaryData.askPrice,
            }
        });
    }
}

return results;


const results = [];

// Loop over all input items
for (const item of $input.all()) {
    const resultsArray = item.json.results || []; // get the results array
    for (const entry of resultsArray) {
        const contentString = entry.content; // stringified JSON
        const parsedContent = JSON.parse(contentString);
        const data = parsedContent.data;

        results.push({
            json: {
                "Ticker": data.symbol,
                "Sale price": data.primaryData.lastSalePrice,
                "Net change": data.primaryData.netChange,
                "Percentage change": data.primaryData.percentageChange,
                "Bid price": data.primaryData.bidPrice,
                "Ask price": data.primaryData.askPrice,
            }
        });
    }
}

return results;

7. Add another Google Sheets node. This time, choose the Update row in sheet action. Set the Mapping Column Mode to Map Automatically and the Column to match on to Ticker.

8. Activate the workflow. Save the workflow and toggle it to become Active. It will now trigger at a set time (every minute in this example) and update the Google Sheet with the most recent stock data. You can check if it works at your n8n instance's /home/executions.

You can download the JSON file to get started right away.

Best practices for scraping Nasdaq data

Make the most out of scraping Nasdaq with these valuable tips:

Use rate limiting and polite scraping to avoid detection. Don't hammer the site with nonstop requests; space them out using delays or concurrency limits. This keeps your scraper under the radar and prevents temporary IP bans.
Respect robots.txt and the website's terms of service. Nasdaq's robots.txt specifies which areas are crawlable and which are off-limits. Always check it before scraping to avoid hitting restricted paths like API endpoints or real-time quote pages.
Store accurate timestamps for all data points. Nasdaq data updates constantly, where every tick matters. Tag each data point with a precise UTC timestamp so you can analyze historical trends and synchronize with other market feeds.
Regularly validate scraped data against trusted sources. Cross-check your scraped results with Nasdaq's Data Link, or another verified data provider. This ensures your scraper isn't missing fields or collecting stale values after a layout change.
Monitor for changes in page structure or API endpoints. Nasdaq often refreshes its frontend, renames classes, or relocates data attributes. Build automatic tests or schema checks so you're alerted the moment your selectors break.
Use rotating proxies and random user agents for reliability. Nasdaq enforces strict rate limits and can block repeated IPs or header patterns. Rotate proxies, shuffle user agents, and use session persistence when needed to maintain a consistent flow of requests.

Troubleshooting common issues

Even well-built Nasdaq scrapers can stumble from time to time. Here are a few of the usual suspects that might disrupt your work:

Missing or incomplete data. Nasdaq pages often load dynamically, so ensure you wait for JavaScript-rendered elements before scraping. If data fields suddenly disappear, inspect the DOM for updated class names or lazy-loaded sections. A non-coding related detail to also remember is that US markets stay closed during standard exchange holidays and overnight hours, so data updates pause until regular trading resumes in the next session.
CAPTCHA challenges or IP blocks. Frequent requests or repetitive patterns can trigger Nasdaq's bot protection. Rotate proxies, add randomized delays, and simulate realistic mouse or scroll behavior to reduce the odds of hitting a CAPTCHA. If that sounds like too much work, use a reliable scraping API that can bypass CAPTCHAs and change your IP the moment it's blocked.
Changes in page structure. Nasdaq periodically updates its HTML layout and data containers. Keep your selectors modular by storing them in one config file, so you can quickly adjust them when the structure shifts. Make sure they're flexible and not overly reliant on frequently changing elements like IDs, class names, or very specific XPaths.

Nasdaq data on a bigger scale

Once you move beyond small-scale experiments, it's worth considering enterprise-grade data solutions. Providers like Nasdaq Data Link or other APIs offer reliable, structured feeds with guaranteed uptime – ideal when you need consistency and speed over DIY scraping. They also handle updates and compliance, freeing you from the maintenance treadmill.

Running your scraper in the cloud also makes scaling and automation much easier. Platforms like AWS Lambda, Google Cloud Run, or Azure Functions let you schedule scraping, handle retries, and store results without managing servers. Combined with containerization tools like Docker, your Nasdaq scraper can scale horizontally and run 24/7 without manual intervention.

Once you've gathered enough clean data, the next step is making it useful. Integrate it into business dashboards or internal research tools to visualize trends for real-time decision-making. With a thought-out setup, your scraped Nasdaq data becomes more than numbers – it becomes a living assistant with tips for growth and strategy.

Alternative tools and data sources

If you want broader market insights beyond Nasdaq, platforms like Yahoo Finance, TradingView, or MarketWatch can provide trend data, sector movements, and stock performance analytics. These services track price action, trading volume, and market sentiment, giving you additional angles to spot opportunities or shifts in the market.

For programmatic access, several APIs deliver stock and market trend data. Alpha Vantage, IEX Cloud, and Finnhub provide historical prices, intraday updates, and technical indicators that can be integrated into dashboards, models, or automated strategies alongside your Nasdaq data.

Final thoughts

Scraping Nasdaq opens the door to valuable market insights, but you need solid tools, innovative techniques, and dependable proxies to do it well. Whether you're writing your own scraper or pulling data from an API, aim for accuracy, scalability, and low visibility. With a clean, structured workflow, your Nasdaq data turns into practical intelligence that can fuel better analysis, dashboards, and trading decisions.

Stop getting blocked, start getting data

Scrape Nasdaq with Decodo's residential proxies that keep you under the radar and your data flows uninterrupted.

Get started

About the author

Zilvinas Tamulis

Technical Copywriter

A technical writer with over 4 years of experience, Žilvinas blends his studies in Multimedia & Computer Design with practical expertise in creating user manuals, guides, and technical documentation. His work includes developing web projects used by hundreds daily, drawing from hands-on experience with JavaScript, PHP, and Python.

Connect with Žilvinas via LinkedIn

All information on Decodo Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Decodo Blog or any third-party websites that may belinked therein.

In this article

Test-drive enterprise-grade proxies, risk-free

Start your 3-day free trial with Decodo's residential proxies and see why top data teams trust our infrastructure.

Try free now

BIG DATA

UNBLOCK

Understanding Cloudflare Errors 1006, 1007, and 1008: Causes and Fixes

Cloudflare helps a big chunk of the internet run faster and stay safer by routing traffic through its worldwide network. But sometimes things don’t go smoothly, and you might see errors like 1006, 1007, or 1008. They all mean your request got blocked, but for different reasons. Let’s break down what each of these errors actually means.

Vaidotas Juknys

Aug 04, 2025

6 min read

DATA COLLECTION

PYTHON

UNBLOCK

Ultimate Guide to Error 1020: Causes, Fixes, and Prevention

When the website's firewall security settings block your request, Error 1020 will appear. This means that the restriction has been enforced even before your device gets to the website. People using automation tools, website administrators, and ordinary internet users encounter this problem. This post will help you understand what causes it and how to fix it.

Justinas Tamasevicius

Aug 12, 2025

8 min read

DATA COLLECTION

Playwright Web Scraping: A Practical Tutorial

Web scraping can feel like directing a play without a script – unpredictable and chaotic. That’s where Playwright steps in: a powerful, headless browser automation tool that makes scraping modern, dynamic websites smoother than ever. In this practical tutorial, you’ll learn how to use Playwright to reliably extract data from any web page.

Zilvinas Tamulis

Jan 13, 2025

8 min read

Frequently asked questions

Can I use scraped data for trading?

Using scraped financial data for trading decisions requires you to consider whether the data is relevant to you. While scraped data can help with a trading strategy, it may not match the accuracy or timeliness of official feeds. For high-frequency or real-time trading, licensed APIs generally offer more reliable data.

How often should I scrape Nasdaq data?

The right scraping frequency depends entirely on your goals and specific use cases. For real-time monitoring, scrape every few seconds using proxies to avoid detection. For trend analysis or portfolio tracking, hourly or daily scraping is often sufficient. Always implement rate limiting when you scrape Nasdaq to prevent overwhelming the server.

What are the best tools for scraping Nasdaq?

Python remains a practical choice for scraping stock market data due to its simplicity. To scrape Nasdaq, Python tools like Requests, Beautiful Soup, and Pandas work perfectly for basic extraction. Use Selenium or Playwright to load pages fully and allow JavaScript-rendered data to populate. Consider Decodo's Web Scraping API with automatic proxy rotation for reliable, large-scale scraping.

What are the risks of scraping financial data?

Nasdaq and similar sites often implement anti-bot protections that can block your IP address. Most stock market trackers delay data by at least 15 minutes or throttle during volatility. Overloading a website's server can lead to many issues, so enforce proper rate limits. Use rotating residential proxies and enterprise solutions to minimize risks when you scrape Nasdaq.

How to Scrape Nasdaq Data: A Complete Guide Using Python and Alternatives

Understanding Nasdaq's data structure

Tools and technologies for scraping Nasdaq

Step-by-step guide: How to scrape Nasdaq data

Step 1: Choose your target data

Step 2: Analyze the target page

Step 3: Select your scraping method

Step 4: Implement the scraper

Individual stock pages

Historical data pages

News and press releases

Step 5: Using proxies to avoid blocks

Automated scraping with n8n

Best practices for scraping Nasdaq data

Troubleshooting common issues

Nasdaq data on a bigger scale

Alternative tools and data sources

Final thoughts

Related articles

Frequently asked questions

Can I use scraped data for trading?

How often should I scrape Nasdaq data?

What are the best tools for scraping Nasdaq?

What are the risks of scraping financial data?