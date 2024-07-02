What data can you extract?

Indeed is a popular job search platform, operating in over 60 countries, with 615M+ job-seeker profiles and 3.3M+ employers, resulting in approximately 27 hires per minute. It offers various job types across country-specific sites, making its dataset a widely used source for labor-market analysis.

Standard Indeed job scraping yields the essentials:

Job titles, company data, locations

Posting timestamps, job URLs/IDs

Descriptions, benefits, and salary ranges where disclosed

Job type (full-time, part-time, or contract)

Why it matters – data engineers use this to build real-time job intelligence pipelines. Analysts track hiring velocity across tech stacks and geographies. Founders monitor competitor hiring patterns to spot market opportunities.

Now that you know what data you can collect, let's understand how Indeed's website is structured and how that affects our approach to scraping.

Understanding Indeed's data architecture

Indeed organizes job information in a consistent structure that allows efficient extraction once you understand the moving parts.

How Indeed search works

Indeed constructs search URLs with stable parameters that you can modify programmatically. A basic search looks like:

https://www.indeed.com/jobs?q=data+analyst&l=Chicago%2C+IL

Key parameters you’ll use:

q – query keywords (for example, data analyst)

– query keywords (for example, data analyst) l – location (for example, Chicago, IL; use remote for remote roles)

– location (for example, Chicago, IL; use remote for remote roles) start – pagination offset in increments of 10 (0, 10, 20, …)

– pagination offset in increments of 10 (0, 10, 20, …) sort=date – newest results first

– newest results first fromage – posting age filter (for example, 1 = last 24 hours)

– posting age filter (for example, 1 = last 24 hours) radius – distance from the location center (for example, 100 = within 100 miles)

Regional domains include:

USA – www.indeed.com

Canada – ca.indeed.com

UK – uk.indeed.com

Australia – au.indeed.com

Others vary by country.

A reliable shortcut – embedded JSON beats brittle HTML

Scraping Indeed is challenging due to its dynamic, JavaScript-rendered content and complex HTML structure, which can be difficult to navigate reliably. Targeting embedded JSON data offers a more stable and efficient alternative to parsing the DOM. Rather than maintaining many CSS selectors, parse the structured payload that Indeed injects into the page. The most useful data appears under: