How to Scrape ZoomInfo: A Complete Step-by-Step Guide

ZoomInfo is a goldmine for B2B teams – over 100M company profiles and 260M contacts, all in one place. But getting that data isn’t easy. With strict defenses like CAPTCHAs, browser fingerprinting, and aggressive IP bans, most scrapers fail after just a few requests. That’s where this guide comes in. We’ll show you how to bypass ZoomInfo’s countermeasures in 2025 and extract clean, actionable data at scale.

Justinas Tamasevicius

Jun 10, 2025

10 min read

What data can you extract from ZoomInfo?

ZoomInfo delivers rich business intelligence across several key categories:

Company intelligence (firmographics). Company name, headquarters location, website, SIC/NAICS codes, revenue, employee count, and parent-subsidiary structure.
Contact information. Professional profiles, job titles, departments, seniority, verified emails, direct phone numbers, and LinkedIn URLs.
Technology & operations (technographics). Insights into the company’s tech stack, cloud providers, and even org charts that map team structures and reporting lines.
Business insights. Real-time updates, intent signals, executive moves, funding rounds, and data confidence scores to help you filter for quality.

ZoomInfo data architecture

Before writing any scraping logic, it’s important to understand how ZoomInfo structures its pages. Let’s take Anthropic’s company profile as an example:

Most developers jump straight into parsing HTML when scraping, but ZoomInfo hides its best data elsewhere. Instead of cluttered DOM elements, it embeds a clean JSON object right inside a <script> tag.

Open DevTools, go to Network, filter by "Doc", and click the first entry to view the HTML. If you need a quick refresher on finding hidden data in your browser, check out How to Inspect Element.

This JSON blob holds far more than what’s visible on the screen – from org charts and funding history to detailed contact info and company intent signals. Scraping this structured data directly is not only faster, it’s more reliable than chasing fragile DOM selectors.

Challenges of scraping ZoomInfo

Scraping ZoomInfo requires a clear understanding of how aggressive their anti-bot defenses are. Some sites are intentionally built to be hard to scrape – ZoomInfo is a prime example. You can learn more about these tactics in our article on navigating anti-bot systems.

Here are the key challenges you’ll face:

Aggressive IP bans. ZoomInfo monitors request frequency closely. Too many requests in a short period can trigger a 429 Too Many Requests response, followed by a 403 Forbidden error and a temporary or permanent IP ban. For more on handling these, see our proxy error codes guide and, if you do get banned, follow the steps in how to fix an IP ban.
CAPTCHAs and behavioral traps. After a few requests from the same IP, ZoomInfo will present a CAPTCHA (e.g., a "Press & Hold" slider puzzle) designed to block automated scripts.
Advanced browser fingerprinting. ZoomInfo analyzes headers, JavaScript execution, Canvas/WebGL fingerprints, and other signals to distinguish real users from bots. A basic Requests script or a vanilla headless browser will usually be flagged almost immediately. To dive deeper into fingerprinting techniques, read What Is Browser Fingerprinting.

These defenses mean a "simple scraper" will fail outright. To succeed, you’ll need to upgrade your toolkit.

Handling anti-bot protection

Now that you understand how platforms like PerimeterX detect bots – and that ZoomInfo uses it – let’s walk through practical methods to bypass their defenses. To scrape ZoomInfo successfully, your scraper must behave like a real user by mimicking browser behavior, rotating IP addresses, and handling CAPTCHAs and fingerprinting.

Stealth (fortified) headless browsers

You’ll need a stealth browser – essentially a custom headless setup that hides automation signals and mimics real-user behavior:

Selenium – Use Undetected ChromeDriver or SeleniumBase
Puppeteer – Use the Puppeteer Stealth Plugin
Playwright – Use Playwright Stealth

These tools patch navigator.webdriver, spoof Canvas/WebGL fingerprints, and eliminate the most obvious headless clues.

Note: Open-source stealth plugins are powerful, but they lag behind ever-evolving systems like PerimeterX. Also, running headless browsers at scale consumes significant CPU, RAM, and bandwidth.

CAPTCHA-solving services

When ZoomInfo presents a CAPTCHA, you’ll need a solver. The most popular options include 2Captcha and Anti-Captcha. They use human solvers or advanced AI models to bypass challenges automatically. Integration is straightforward, but these services add latency and increase your cost per request.

Rotating residential proxies

The most critical step: don’t send all requests from a single IP. ZoomInfo actively monitors IP behavior, and repeated access from one address is a quick path to a 403 Forbidden response.

Why residential proxies? Residential IPs route traffic through real consumer devices, making them far harder for ZoomInfo’s bot detection systems to flag compared to datacenter IPs.

Rotation is key. Use a proxy pool that assigns a fresh IP on each request to stay under ZoomInfo’s radar.

Our residential proxy pool provides access to over 115 million ethically-sourced IPs across 195+ locations, complete with automatic rotation and geo-targeting capabilities. You’ll see consistently high success rates – even when targeting advanced anti-bot platforms like ZoomInfo.

Step-by-step ZoomInfo scraping implementation

Now, let’s build the ZoomInfo scraper step by step. We’ll route all requests through a rotating residential proxy pool to avoid ZoomInfo’s anti-bot defenses.

1. Environment setup

First, create a virtual environment and install the necessary packages:

# Create a virtual environment
python -m venv zoominfo-scraper

# Activate the environment
# Windows (CMD)
zoominfo-scraper\Scripts\activate
# Windows (PowerShell)
.\zoominfo-scraper\Scripts\Activate.ps1
# macOS/Linux
source zoominfo-scraper/bin/activate

# Install dependencies
pip install requests beautifulsoup4 urllib3

Here’s what each component does:

Requests – Fetches the webpage HTML
BeautifulSoup – Parses the HTML to extract data
urllib3 – Handles proxy-related security warnings

👉 For a deeper dive, explore our hands-on guides to mastering Python Requests and web scraping with BeautifulSoup.

2. Basic company profile scraper

Let’s build a simple scraper that:

Fetches the HTML. Makes a request to the ZoomInfo URL.
Extracts the JSON data. Finds a hidden <script id="ng-state"> tag containing all profile data.
Saves the output. Dumps the JSON to page_data.json.

Here's a scraper for individual company pages:

import json

from typing import Optional, Any
from urllib3.exceptions import InsecureRequestWarning

import requests
from bs4 import BeautifulSoup

import urllib3

# Disable insecure request warnings for proxies
urllib3.disable_warnings(InsecureRequestWarning)


class ZoomInfoScraper:
   """A web scraper for extracting company data from ZoomInfo profiles."""

   def __init__(self, url: str) -> None:
       """Initialize the scraper with target URL and default headers."""
       self.url = url
       self.headers = {
           "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
           "Accept-Language": "en-US,en;q=0.9",
           "Referer": url,
           "User-Agent": (
               "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
               "AppleWebKit/537.36 (KHTML, like Gecko) "
               "Chrome/124.0.0.0 Safari/537.36"
           ),
       }
       self.proxies = self._setup_proxies()

   def _setup_proxies(self) -> Optional[dict[str, str]]:
       """Configure proxies using credentials from your proxy provider."""
       username = "PROXY_USERNAME"
       password = "PROXY_PASSWORD"
       proxy_host = "gate.decodo.com:7000"

       if not username or not password:
           print("Proxy credentials not found. Running without proxies.")
           return None
       proxy_url = f"http://{username}:{password}@{proxy_host}"
       return {"http": proxy_url, "https": proxy_url}

   def fetch_html(self) -> str:
       """Fetch HTML content from the target URL."""
       try:
           response = requests.get(
               self.url,
               headers=self.headers,
               proxies=self.proxies,
               verify=False,
               timeout=15,
           )
           response.raise_for_status()
           return response.text
       except requests.RequestException as e:
           raise Exception(f"Request failed: {e}")

   def extract_page_data(self, html_content: str) -> Optional[dict[str, Any]]:
       """Extract JSON data from the page's script tag."""
       try:
           soup = BeautifulSoup(html_content, "html.parser")
           script_tag = soup.find(
               "script", {"id": "ng-state", "type": "application/json"}
           )

           if not script_tag:
               raise ValueError("Data script tag not found")

           return json.loads(script_tag.string).get("pageData")
       except (json.JSONDecodeError, AttributeError) as e:
           raise ValueError(f"Data extraction failed: {e}")

   def run(self) -> Optional[dict[str, Any]]:
       """Execute the scraping workflow."""
       print(f"Scraping {self.url.split('/')[-1]}...")
       try:
           html = self.fetch_html()
           if page_data := self.extract_page_data(html):
               with open("page_data.json", "w") as f:
                   json.dump(page_data, f, indent=2)
               print("Success - Data saved to page_data.json")
               return page_data
           print("No page data found")
       except Exception as e:
           print(f"Error: {e}")
       return None


if __name__ == "__main__":
   # Example usage (replace with your target URL)
   target_url = "https://www.zoominfo.com/c/anthropic-pbc/546195556"
   scraper = ZoomInfoScraper(url=target_url)
   scraper.run()

import json

from typing import Optional, Any
from urllib3.exceptions import InsecureRequestWarning

import requests
from bs4 import BeautifulSoup

import urllib3

# Disable insecure request warnings for proxies
urllib3.disable_warnings(InsecureRequestWarning)


class ZoomInfoScraper:
   """A web scraper for extracting company data from ZoomInfo profiles."""

   def __init__(self, url: str) -> None:
       """Initialize the scraper with target URL and default headers."""
       self.url = url
       self.headers = {
           "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
           "Accept-Language": "en-US,en;q=0.9",
           "Referer": url,
           "User-Agent": (
               "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
               "AppleWebKit/537.36 (KHTML, like Gecko) "
               "Chrome/124.0.0.0 Safari/537.36"
           ),
       }
       self.proxies = self._setup_proxies()

   def _setup_proxies(self) -> Optional[dict[str, str]]:
       """Configure proxies using credentials from your proxy provider."""
       username = "PROXY_USERNAME"
       password = "PROXY_PASSWORD"
       proxy_host = "gate.decodo.com:7000"

       if not username or not password:
           print("Proxy credentials not found. Running without proxies.")
           return None
       proxy_url = f"http://{username}:{password}@{proxy_host}"
       return {"http": proxy_url, "https": proxy_url}

   def fetch_html(self) -> str:
       """Fetch HTML content from the target URL."""
       try:
           response = requests.get(
               self.url,
               headers=self.headers,
               proxies=self.proxies,
               verify=False,
               timeout=15,
           )
           response.raise_for_status()
           return response.text
       except requests.RequestException as e:
           raise Exception(f"Request failed: {e}")

   def extract_page_data(self, html_content: str) -> Optional[dict[str, Any]]:
       """Extract JSON data from the page's script tag."""
       try:
           soup = BeautifulSoup(html_content, "html.parser")
           script_tag = soup.find(
               "script", {"id": "ng-state", "type": "application/json"}
           )

           if not script_tag:
               raise ValueError("Data script tag not found")

           return json.loads(script_tag.string).get("pageData")
       except (json.JSONDecodeError, AttributeError) as e:
           raise ValueError(f"Data extraction failed: {e}")

   def run(self) -> Optional[dict[str, Any]]:
       """Execute the scraping workflow."""
       print(f"Scraping {self.url.split('/')[-1]}...")
       try:
           html = self.fetch_html()
           if page_data := self.extract_page_data(html):
               with open("page_data.json", "w") as f:
                   json.dump(page_data, f, indent=2)
               print("Success - Data saved to page_data.json")
               return page_data
           print("No page data found")
       except Exception as e:
           print(f"Error: {e}")
       return None


if __name__ == "__main__":
   # Example usage (replace with your target URL)
   target_url = "https://www.zoominfo.com/c/anthropic-pbc/546195556"
   scraper = ZoomInfoScraper(url=target_url)
   scraper.run()

Running this script will create a page_data.json file. You’ll see something like this:

{
  "companyId": "546195556",
  "name": "Anthropic",
  "url": "https://www.anthropic.com",
  "numberOfEmployees": "1035",
  "address": {
    "street": "548 Market St Pmb 90375",
    "city": "San Francisco",
    "state": "California",
    "country": "United States",
    "zip": "94104"
  },
  "fundings": {
    "data": [
      {
        "date": "May 19, 2025",
        "amountValue": "2500000",
        "round": "Debt",
        "investorsLabel": ["RBC", "Citi", "MUFG", "GOLDMAN SACHS GROUP"]
      }
    ],
    "totals": {
      "totalAmount": "$17.4B",
      "lastFundingAmount": "$2.5B",
      "numOfRounds": 12
    }
  },
  "competitors": [
    {
      "id": "414124033",
      "name": "OpenAI",
      "numberOfEmployees": 3200,
      "revenue": "2000000"
    }
  ]
}

{
  "companyId": "546195556",
  "name": "Anthropic",
  "url": "https://www.anthropic.com",
  "numberOfEmployees": "1035",
  "address": {
    "street": "548 Market St Pmb 90375",
    "city": "San Francisco",
    "state": "California",
    "country": "United States",
    "zip": "94104"
  },
  "fundings": {
    "data": [
      {
        "date": "May 19, 2025",
        "amountValue": "2500000",
        "round": "Debt",
        "investorsLabel": ["RBC", "Citi", "MUFG", "GOLDMAN SACHS GROUP"]
      }
    ],
    "totals": {
      "totalAmount": "$17.4B",
      "lastFundingAmount": "$2.5B",
      "numOfRounds": 12
    }
  },
  "competitors": [
    {
      "id": "414124033",
      "name": "OpenAI",
      "numberOfEmployees": 3200,
      "revenue": "2000000"
    }
  ]
}

That JSON comes with everything you need for:

Market analysis
Lead generation
Competitive research
CRM enrichment
Custom dashboards and reports

Scaling ZoomInfo data collection

Now that we have a scraper that works for individual company pages, let’s scale up and extract data from thousands of companies.

Method 1 – Scraping search results with pagination

ZoomInfo’s company search is a great starting point. You can apply filters like industry and location to narrow down results. For example, software companies in Germany:

To collect company profiles at scale, we’ll need to handle pagination. ZoomInfo paginates its search results using a "?pageNum=" query parameter. However, you’ll only get access to the first 5 pages before hitting a login wall.

Here’s our approach:

Loop through pages 1 to 5.
On each page, extract all company profile URLs using BeautifulSoup. The scraper uses the selector a.company-name.link[href] to find profile links and converts relative URLs into absolute ones using urljoin.
Send each URL to the ZoomInfoScraper to extract and save JSON data from the HTML.
Throttle requests and rotate headers to stay under ZoomInfo’s radar.

First, install the extra required dependencies:

pip install tenacity fake-useragent

tenacity – Adds retry logic with exponential backoff to handle occasional request failures. Learn more about retrying failed requests.
fake-useragent – Generates random User-Agent strings to simulate real browser behavior and avoid detection.

Here’s the full Python code:

import os
import time
import json
import random
import logging
from urllib.parse import urljoin, urlparse
from typing import Optional, Dict, Any, List, Set

import requests
from bs4 import BeautifulSoup
from tenacity import (
   retry,
   stop_after_attempt,
   wait_exponential,
   retry_if_exception_type,
)
from fake_useragent import UserAgent

import urllib3


# Disable SSL warnings (necessary for some proxy configurations)
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)


# Constants
MAX_PAGES: int = 5
SEARCH_BASE_URL: str = (
   "https://www.zoominfo.com/companies-search/"
   "location-germany-industry-software"
)
OUTPUT_DIR: str = "zoominfo_companies"
BASE_DELAY: float = random.uniform(3, 8)
MIN_DELAY_FACTOR: float = 0.8
MAX_DELAY_FACTOR: float = 1.2


# Configure logging
logging.basicConfig(
   level=logging.INFO,
   format="%(asctime)s - %(levelname)s - %(message)s",
)
logger: logging.Logger = logging.getLogger(__name__)


# Initialize user agent rotator
ua: UserAgent = UserAgent()


class WebScraper:
   """Base class with retry logic, rotating User-Agent, and proxy support."""

   def __init__(self) -> None:
       """Initialize the scraper with session and proxy configuration."""
       self.session: requests.Session = requests.Session()
       self.proxies: Optional[Dict[str, str]] = self._setup_proxies()

   def _setup_proxies(self) -> Optional[Dict[str, str]]:
       """Configure proxy credentials from your proxy provider."""
       proxy_user = "PROXY_USERNAME"
       proxy_pass = "PROXY_PASSWORD"
       proxy_host = "gate.decodo.com:7000"

       if not all([proxy_user, proxy_pass]):
           logger.warning("Proxy credentials not found. Running without proxies.")
           return None

       proxy_url: str = f"http://{proxy_user}:{proxy_pass}@{proxy_host}"
       return {
           "http": proxy_url,
           "https": proxy_url,
       }

   @staticmethod
   def _safe_slug(text: str) -> str:
       """Create filesystem-safe slug from text."""
       if not text:
           return str(int(time.time()))

       ascii_only: str = "".join(
           ch for ch in text if ord(ch) < 128
       ).strip()

       return ascii_only.replace(" ", "_") or str(int(time.time()))

   @retry(
       stop=stop_after_attempt(5),
       wait=wait_exponential(multiplier=1, min=4, max=60),
       retry=retry_if_exception_type(requests.exceptions.RequestException),
   )
   def _request(self, url: str, method: str = "GET", **kwargs) -> requests.Response:
       """Send HTTP request with rotating User-Agent, headers, and proxy support."""
       headers: Dict[str, str] = {
           "User-Agent": ua.random,
           "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
           "Accept-Language": "en-US,en;q=0.5",
           "Referer": url,
       }

       try:
           response: requests.Response = self.session.request(
               method,
               url,
               headers=headers,
               proxies=self.proxies,
               verify=False,  # Disabled SSL verification for proxy compatibility
               timeout=30,
               **kwargs,
           )
           response.raise_for_status()
           return response
       except requests.RequestException as e:
           logger.error(f"Request failed: {e}")
           raise


class ZoomInfoScraper(WebScraper):
   """Scraper for extracting company data from ZoomInfo pages."""

   def __init__(self, output_dir: str = OUTPUT_DIR) -> None:
       """Initialize the scraper with output directory."""
       super().__init__()
       os.makedirs(output_dir, exist_ok=True)
       self.output_dir: str = output_dir

   @staticmethod
   def _extract_page_data(html_content: str) -> Optional[Dict[str, Any]]:
       """Extract JSON data from the ng-state script tag."""
       soup: BeautifulSoup = BeautifulSoup(html_content, "html.parser")
       script_tag = soup.find("script", {"id": "ng-state", "type": "application/json"})

       if not script_tag:
           return None

       try:
           return json.loads(script_tag.string).get("pageData")
       except json.JSONDecodeError as e:
           logger.error(f"Failed to parse JSON: {e}")
           return None

   def scrape_company(self, url: str) -> Optional[Dict[str, Any]]:
       """Scrape a single company page and save data to JSON file."""
       segments: List[str] = [seg for seg in urlparse(url).path.split("/") if seg]
       if len(segments) < 2:
           logger.warning(f"Invalid URL format: {url}")
           return None

       company_slug: str = segments[1]
       safe_slug: str = self._safe_slug(company_slug)
       output_path: str = os.path.join(self.output_dir, f"{safe_slug}.json")

       if os.path.exists(output_path):
           logger.debug(f"Skipping existing file: {output_path}")
           return None

       try:
           resp: requests.Response = self._request(url)
           page_data: Optional[Dict[str, Any]] = self._extract_page_data(resp.text)

           if not page_data:
               logger.warning(f"No JSON data found for {url}")
               return None

           with open(output_path, "w", encoding="utf-8") as f:
               json.dump(page_data, f, indent=2, ensure_ascii=False)

           logger.info(f"Saved: {output_path}")
           return page_data

       except Exception as e:
           logger.error(f"Failed to process {url}: {e}")
           return None


class SearchScraper(WebScraper):
   """Scraper for extracting company URLs from ZoomInfo search results."""

   COMPANY_LINK_SELECTOR: str = "a.company-name.link[href]"

   def scrape_search_page(self, page_num: int) -> List[str]:
       """Scrape a search results page and return company URLs."""
       page_url: str = f"{SEARCH_BASE_URL}?pageNum={page_num}"
       logger.info(f"Processing search page {page_num} → {page_url}")

       try:
           resp: requests.Response = self._request(page_url)
           soup: BeautifulSoup = BeautifulSoup(resp.text, "html.parser")
          anchors = soup.select(self.COMPANY_LINK_SELECTOR)

           return list({
               urljoin("https://www.zoominfo.com", a["href"])
               for a in anchors if a.get("href")
           })

       except Exception as e:
           logger.error(f"Failed to fetch search page {page_num}: {e}")
           return []


def main() -> None:
   """Main execution function for the scraper."""
   scraper: ZoomInfoScraper = ZoomInfoScraper()
   search_scraper: SearchScraper = SearchScraper()

   existing_slugs: Set[str] = {
       f.split(".")[0]
       for f in os.listdir(OUTPUT_DIR)
       if f.endswith(".json")
  }
   logger.info(f"Found {len(existing_slugs)} existing company files")

   for page_num in range(1, MAX_PAGES + 1):
       company_urls: List[str] = search_scraper.scrape_search_page(page_num)

       for url in company_urls:
           segments: List[str] = [seg for seg in urlparse(url).path.split("/") if seg]
           if len(segments) < 2:
               continue

           slug: str = segments[1]
           safe_slug: str = scraper._safe_slug(slug)

           if safe_slug not in existing_slugs:
               scraper.scrape_company(url)
               time.sleep(BASE_DELAY * random.uniform(MIN_DELAY_FACTOR, 
MAX_DELAY_FACTOR))

       time.sleep(BASE_DELAY * 2)  # Longer pause between pages


if __name__ == "__main__":
   main()

import os
import time
import json
import random
import logging
from urllib.parse import urljoin, urlparse
from typing import Optional, Dict, Any, List, Set

import requests
from bs4 import BeautifulSoup
from tenacity import (
   retry,
   stop_after_attempt,
   wait_exponential,
   retry_if_exception_type,
)
from fake_useragent import UserAgent

import urllib3


# Disable SSL warnings (necessary for some proxy configurations)
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)


# Constants
MAX_PAGES: int = 5
SEARCH_BASE_URL: str = (
   "https://www.zoominfo.com/companies-search/"
   "location-germany-industry-software"
)
OUTPUT_DIR: str = "zoominfo_companies"
BASE_DELAY: float = random.uniform(3, 8)
MIN_DELAY_FACTOR: float = 0.8
MAX_DELAY_FACTOR: float = 1.2


# Configure logging
logging.basicConfig(
   level=logging.INFO,
   format="%(asctime)s - %(levelname)s - %(message)s",
)
logger: logging.Logger = logging.getLogger(__name__)


# Initialize user agent rotator
ua: UserAgent = UserAgent()


class WebScraper:
   """Base class with retry logic, rotating User-Agent, and proxy support."""

   def __init__(self) -> None:
       """Initialize the scraper with session and proxy configuration."""
       self.session: requests.Session = requests.Session()
       self.proxies: Optional[Dict[str, str]] = self._setup_proxies()

   def _setup_proxies(self) -> Optional[Dict[str, str]]:
       """Configure proxy credentials from your proxy provider."""
       proxy_user = "PROXY_USERNAME"
       proxy_pass = "PROXY_PASSWORD"
       proxy_host = "gate.decodo.com:7000"

       if not all([proxy_user, proxy_pass]):
           logger.warning("Proxy credentials not found. Running without proxies.")
           return None

       proxy_url: str = f"http://{proxy_user}:{proxy_pass}@{proxy_host}"
       return {
           "http": proxy_url,
           "https": proxy_url,
       }

   @staticmethod
   def _safe_slug(text: str) -> str:
       """Create filesystem-safe slug from text."""
       if not text:
           return str(int(time.time()))

       ascii_only: str = "".join(
           ch for ch in text if ord(ch) < 128
       ).strip()

       return ascii_only.replace(" ", "_") or str(int(time.time()))

   @retry(
       stop=stop_after_attempt(5),
       wait=wait_exponential(multiplier=1, min=4, max=60),
       retry=retry_if_exception_type(requests.exceptions.RequestException),
   )
   def _request(self, url: str, method: str = "GET", **kwargs) -> requests.Response:
       """Send HTTP request with rotating User-Agent, headers, and proxy support."""
       headers: Dict[str, str] = {
           "User-Agent": ua.random,
           "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
           "Accept-Language": "en-US,en;q=0.5",
           "Referer": url,
       }

       try:
           response: requests.Response = self.session.request(
               method,
               url,
               headers=headers,
               proxies=self.proxies,
               verify=False,  # Disabled SSL verification for proxy compatibility
               timeout=30,
               **kwargs,
           )
           response.raise_for_status()
           return response
       except requests.RequestException as e:
           logger.error(f"Request failed: {e}")
           raise


class ZoomInfoScraper(WebScraper):
   """Scraper for extracting company data from ZoomInfo pages."""

   def __init__(self, output_dir: str = OUTPUT_DIR) -> None:
       """Initialize the scraper with output directory."""
       super().__init__()
       os.makedirs(output_dir, exist_ok=True)
       self.output_dir: str = output_dir

   @staticmethod
   def _extract_page_data(html_content: str) -> Optional[Dict[str, Any]]:
       """Extract JSON data from the ng-state script tag."""
       soup: BeautifulSoup = BeautifulSoup(html_content, "html.parser")
       script_tag = soup.find("script", {"id": "ng-state", "type": "application/json"})

       if not script_tag:
           return None

       try:
           return json.loads(script_tag.string).get("pageData")
       except json.JSONDecodeError as e:
           logger.error(f"Failed to parse JSON: {e}")
           return None

   def scrape_company(self, url: str) -> Optional[Dict[str, Any]]:
       """Scrape a single company page and save data to JSON file."""
       segments: List[str] = [seg for seg in urlparse(url).path.split("/") if seg]
       if len(segments) < 2:
           logger.warning(f"Invalid URL format: {url}")
           return None

       company_slug: str = segments[1]
       safe_slug: str = self._safe_slug(company_slug)
       output_path: str = os.path.join(self.output_dir, f"{safe_slug}.json")

       if os.path.exists(output_path):
           logger.debug(f"Skipping existing file: {output_path}")
           return None

       try:
           resp: requests.Response = self._request(url)
           page_data: Optional[Dict[str, Any]] = self._extract_page_data(resp.text)

           if not page_data:
               logger.warning(f"No JSON data found for {url}")
               return None

           with open(output_path, "w", encoding="utf-8") as f:
               json.dump(page_data, f, indent=2, ensure_ascii=False)

           logger.info(f"Saved: {output_path}")
           return page_data

       except Exception as e:
           logger.error(f"Failed to process {url}: {e}")
           return None


class SearchScraper(WebScraper):
   """Scraper for extracting company URLs from ZoomInfo search results."""

   COMPANY_LINK_SELECTOR: str = "a.company-name.link[href]"

   def scrape_search_page(self, page_num: int) -> List[str]:
       """Scrape a search results page and return company URLs."""
       page_url: str = f"{SEARCH_BASE_URL}?pageNum={page_num}"
       logger.info(f"Processing search page {page_num} → {page_url}")

       try:
           resp: requests.Response = self._request(page_url)
           soup: BeautifulSoup = BeautifulSoup(resp.text, "html.parser")
          anchors = soup.select(self.COMPANY_LINK_SELECTOR)

           return list({
               urljoin("https://www.zoominfo.com", a["href"])
               for a in anchors if a.get("href")
           })

       except Exception as e:
           logger.error(f"Failed to fetch search page {page_num}: {e}")
           return []


def main() -> None:
   """Main execution function for the scraper."""
   scraper: ZoomInfoScraper = ZoomInfoScraper()
   search_scraper: SearchScraper = SearchScraper()

   existing_slugs: Set[str] = {
       f.split(".")[0]
       for f in os.listdir(OUTPUT_DIR)
       if f.endswith(".json")
  }
   logger.info(f"Found {len(existing_slugs)} existing company files")

   for page_num in range(1, MAX_PAGES + 1):
       company_urls: List[str] = search_scraper.scrape_search_page(page_num)

       for url in company_urls:
           segments: List[str] = [seg for seg in urlparse(url).path.split("/") if seg]
           if len(segments) < 2:
               continue

           slug: str = segments[1]
           safe_slug: str = scraper._safe_slug(slug)

           if safe_slug not in existing_slugs:
               scraper.scrape_company(url)
               time.sleep(BASE_DELAY * random.uniform(MIN_DELAY_FACTOR, 
MAX_DELAY_FACTOR))

       time.sleep(BASE_DELAY * 2)  # Longer pause between pages


if __name__ == "__main__":
   main()

Method 2 – Recursive crawling via competitor links

One powerful way to scale ZoomInfo scraping is by using competitor relationships to find more companies. Each company profile includes a competitors section, which you can use to dynamically expand your dataset.

Here’s a sample from the competitors array in the extracted JSON:

[
  {
    "id": "546195556",
    "name": "Anthropic",
    "fullName": "Anthropic PBC",
    "url": "anthropic.com",
    "revenue": "217404",
    "numberOfEmployees": 1035,
    "isPublic": "Private",
    ...
  }
]

Using this, you can make full company URLs like:

https://www.zoominfo.com/c/anthropic-pbc/546195556

Here’s how to turn a few seed companies into a large-scale dataset:

Start with a seed company (e.g., Anthropic).
Scrape its profile and extract all competitor info.
For each competitor, build their URL and repeat the process.
Continue crawling until you hit a max page limit or depth.

This strategy is called recursive crawling. For a deeper dive on how crawling differs from traditional scraping, read Web Crawling vs. Web Scraping.

Here’s a visual overview:

Seed Company → Competitors → Competitors of Competitors → ...

Bonus – People's data is also hidden in the HTML

Company pages aren’t the only target. ZoomInfo’s People Search results also embed rich, structured data in the same <script id="ng-state"> JSON and can be extracted using the same logic.

This includes job titles, tenure, verified contact information, social links, organizational chart relationships, and more.

The easier alternative – using scraper APIs

Building your scraper is powerful, but maintaining it at scale is a different story. You’ll need to juggle proxies, CAPTCHAs, fingerprinting, and breakage from even minor page structure changes.

For reliable, low-maintenance ZoomInfo scraping, Web Scraping API is a smarter choice.

Decodo’s ZoomInfo Scraper API handles everything for you – proxy rotation, CAPTCHA-bypassing, JavaScript rendering, and fingerprint evasion – so you can focus on data, not infrastructure.

With Decodo, you send one POST request with your target URL, and the platform delivers the raw HTML or structured data. It’s that simple.

Key features:

Automatic proxy rotation. 125M+ IPs to dodge bans and rotate locations.
CAPTCHA-bypassing. No need to integrate third-party solvers.
JavaScript rendering. Pages are rendered in a headless browser, so you get the final DOM.
Pay-per-success. Only pay for successful requests.
No infrastructure to manage. No proxies, browser farms, retries, or CAPTCHAs to deal with.
Geo-targeting. Choose any country, state, or even city for precise results.

Every new user can claim a 7-day free trial, so you can test ZoomInfo scraping before committing.

Getting started with Decodo

Setting up takes just minutes:

Create an account at dashboard.decodo.com.
Select a plan under the Scraping APIs section – Core or Advanced.
Start your trial – all plans come with a 7-day free trial.
In the Scraper tab, select Web Scraper as the target.
Paste your ZoomInfo company URLs.
(Optional) Configure API parameters like JS rendering, headers, or geolocation.
Hit Send Request, and you’ll get the HTML in seconds.

Here’s what the Decodo dashboard looks like when using the Web Scraping API:

View your raw HTML response in the Response tab. It’s that easy!

If you prefer coding, here’s how to use the API:

import requests

url = "https://scraper-api.decodo.com/v2/scrape"

payload = {"url": "https://www.zoominfo.com/c/anthropic-pbc/546195556"}

headers = {
    "accept": "application/json",
    "content-type": "application/json",
    "authorization": "Basic DECODO_AUTH_TOKEN",
}

response = requests.post(url, json=payload, headers=headers)

with open("zoominfo_data.html", "wb") as file:
    file.write(response.content)

import requests

url = "https://scraper-api.decodo.com/v2/scrape"

payload = {"url": "https://www.zoominfo.com/c/anthropic-pbc/546195556"}

headers = {
    "accept": "application/json",
    "content-type": "application/json",
    "authorization": "Basic DECODO_AUTH_TOKEN",
}

response = requests.post(url, json=payload, headers=headers)

with open("zoominfo_data.html", "wb") as file:
    file.write(response.content)

What’s happening:

Define the scraping endpoint.
Add the target URL to the payload.
Set your headers with your API token.
Send the request and save the HTML response.

Don’t forget to replace DECODO_AUTH_TOKEN with your actual token from the Decodo dashboard.

Optional – AI Parser

If you'd rather skip HTML parsing and selector implementation, try Decodo's AI Parser – extract structured data from any website without writing a single line of code.

How to use the AI parser:

Step 1 – Input the source URL

https://www.zoominfo.com/c/anthropic-pbc/546195556

Step 2 – Enter your AI prompt

Extract structured company info from this ZoomInfo page. Include: name, website, LinkedIn URL, industry, company type, employee size, revenue range, full address, phone number, key people (name, title, LinkedIn), technologies used, and competitors.

Step 3 – Receive parsed data

{
  "name": "Anthropic",
  "website": "www.anthropic.com",
  "industry": "Engineering Software",
  "company_type": "Private",
  "employee_size": "1,035 Employees",
  "revenue_range": "$217.4 Million",
  "full_address": "548 Market St PMB 90375, San Francisco, CA 94104, USA",
  "phone_number": "(415) 555-1234",
  "linkedin_url": "http://www.linkedin.com/company/anthropicresearch",
  "key_people": [
    {
      "name": "Krishna Rao",
      "title": "Chief Financial Officer",
      "linkedin": "/p/Krishna-Rao/1456880811"
    },
    {
      "name": "Genna Jones",
      "title": "Controller",
      "linkedin": "/p/Genna-Jones/1995882139"
    }
  ],
  "technologies_used": [
    "PHP",
    "Google Cloud DNS",
    "Insider",
    "Lynda Business"
  ],
  "competitors": [
    "OpenAI",
    "SandStar",
    "Pinecone Systems",
    "Skymind",
    "MarketingKr"
  ]
}

{
  "name": "Anthropic",
  "website": "www.anthropic.com",
  "industry": "Engineering Software",
  "company_type": "Private",
  "employee_size": "1,035 Employees",
  "revenue_range": "$217.4 Million",
  "full_address": "548 Market St PMB 90375, San Francisco, CA 94104, USA",
  "phone_number": "(415) 555-1234",
  "linkedin_url": "http://www.linkedin.com/company/anthropicresearch",
  "key_people": [
    {
      "name": "Krishna Rao",
      "title": "Chief Financial Officer",
      "linkedin": "/p/Krishna-Rao/1456880811"
    },
    {
      "name": "Genna Jones",
      "title": "Controller",
      "linkedin": "/p/Genna-Jones/1995882139"
    }
  ],
  "technologies_used": [
    "PHP",
    "Google Cloud DNS",
    "Insider",
    "Lynda Business"
  ],
  "competitors": [
    "OpenAI",
    "SandStar",
    "Pinecone Systems",
    "Skymind",
    "MarketingKr"
  ]
}

That's it – no coding required. Just copy, paste, and get structured results in seconds.

For advanced use cases and code samples, see the web scraping API documentation. To learn how teams are cutting costs and scaling faster, watch our web scraping efficiency Webinar.

Conclusion

Scraping ZoomInfo can unlock powerful B2B insights, but it requires careful handling of its anti-bot protections. You’ll need rotating proxies, headless browsers, and smart handling of CAPTCHAs and rate limits just to keep your scraper alive.

Skip the maintenance headache with Decodo’s Zoominfo Scraper API. It auto-rotates proxies, solves CAPTCHAs, renders JavaScript, and retries failed requests – so you send a single request and get data back. Try it free for 7 days and see for yourself.

Start your free trial of Web Scraping API

Access structured data from ZoomInfo and other platforms with our full-stack tool, complete with ready-made scraping templates.

Get scraper

About the author

Justinas Tamasevicius

Head of Engineering

Justinas Tamaševičius is Head of Engineering with over two decades of expertize in software development. What started as a self-taught passion during his school years has evolved into a distinguished career spanning backend engineering, system architecture, and infrastructure development.

Connect with Justinas via LinkedIn.

All information on Decodo Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Decodo Blog or any third-party websites that may belinked therein.

In this article

Collect data with Web Scraping API

Access structured data from any website with our full-stack scraping solution and ready-made scraping templates.

Start free trial

Frequently asked questions

Can I legally scrape ZoomInfo?

ZoomInfo’s public-facing data is technically accessible, but its terms of service likely prohibit automated scraping. Large-scale scraping may also raise legal concerns, such as GDPR violations for personal data. That said, scraping ZoomInfo at slow, respectful rates generally qualifies as ethical scraping.

How do I get my data off ZoomInfo?

If you have a ZoomInfo subscription, the platform provides export tools, but only for limited data. For large-scale needs, you’d need to scrape using the methods covered above, as ZoomInfo doesn’t offer a public bulk export option.

How can I extract leads from ZoomInfo?

Use the search or directory pages on ZoomInfo to find target companies or sectors, then scrape each company’s profile for employee names, titles, and contact info fields.

What if my scraper gets blocked or banned?

Switch to a fresh proxy immediately. Rotate through many IP addresses so no single IP makes too many requests. Throttle your scraping speed. If blocks persist, use a Web Scraper API or a managed scraping service.

How do I handle CAPTCHAs and rate limits?

For CAPTCHAs, use an automated solver service or rotate IPs on each challenge. For rate limiting (429 errors), add random delays between requests and distribute them across many proxies. If managing this manually is too complex, a paid scraping API is often the best solution.

DATA COLLECTION

How to Use LLM for Data Analysis: Supercharge Your Data in 5 Steps

Leveraging large language models (LLMs) for data analysis revolutionizes how we extract insights and make informed decisions. These advanced AI tools can process vast amounts of data, identify patterns, and generate meaningful interpretations with remarkable accuracy and efficiency. If you want to effectively support your business, knowing how to use LLM for data analysis is essential.

Martin Ganchev

Oct 10, 2024

4 min read