Advanced techniques for scraping Craigslist

Once you're comfortable running the basic scripts, you can extend them to handle larger data sets and produce more structured, scalable outputs.

Handling filters

Craigslist supports extensive built-in filters, such as price ranges, keywords, listing types, or date. You can adjust these filters directly on the Craigslist search page and then simply copy the resulting URL into your scraper.

Alternatively, you can append parameters manually, such as "&min_price=500&max_price=2000", to limit listings to a specific price range. A filtered housing URL in New York might look like this:

https : // newyork . craigslist . org / search / hhh?max_price = 2000 & min_price = 500

Exporting data to different formats

While the examples in this guide save results to CSV, you can easily adjust the export logic. These methods are particularly useful when you need to aggregate listings over time or compare data between multiple regions:

Excel . Use Python's Pandas or openpyxl library to write .xlsx files for cleaner formatting and easier sharing.

. Use Python's or library to write files for cleaner formatting and easier sharing. Databases . Insert scraped data into SQLite, PostgreSQL, or MongoDB to store and query large volumes efficiently.

. Insert scraped data into SQLite, PostgreSQL, or MongoDB to store and query large volumes efficiently. APIs or dashboards. For automated systems, send the data directly to internal APIs or visualization tools for live analysis.

Using extraction rules and templates

Craigslist pages share similar structures across categories, but minor variations can break hardcoded selectors. Defining extraction templates helps maintain flexibility. You can store selector sets for each data field (title, price, date, etc.) in a separate configuration file or dictionary and load them dynamically depending on the category.

For highly structured output, you can also introduce extraction rules – logic that standardizes inconsistent values (for example, converting "2br," which stands for "two bedrooms," into a numeric bedroom count, or removing currency symbols). This improves data quality and makes analysis or integration with other systems much easier.

How to avoid getting blocked

Craigslist uses various anti-bot measures to protect its platform, so responsible scraping practices are essential for stable and sustainable data collection. Here are some key steps to help keep your scrapers running smoothly.

Proxy rotation and request throttling

Avoid sending too many requests from the same IP address. Use a rotating proxy pool to distribute traffic across different IPs, reducing the risk of bans. Combine this with request throttling (short, random delays between page loads) to mimic normal user behavior and stay under Craigslist's rate limits.

Rotating user-agents

Craigslist often checks browser headers to detect automation. Rotate user-agent strings (the identifiers that tell websites which browser and device you're using) for each session or request. Libraries like fake-useragent or custom user-agent lists help make your requests appear more organic.

Avoid scraping sensitive or prohibited data

Only collect public, non-personal information. Avoid scraping user emails, phone numbers, or any data that violates Craigslist's terms of use. Focus on general listing attributes like titles, prices, and locations – fields that are publicly visible and safe to process.

General etiquette

Respectful scraping frequency matters. Keep your request rate low enough that it doesn't affect Craigslist's servers. A delay of a few seconds between page loads is usually enough to stay off the radar. Responsible data collection ensures your scripts run reliably over time.

A simpler alternative: use Web Scraping API

If you'd rather skip proxy management and anti-bot challenges altogether, you can use Web Scraping API. It handles IP rotation, CAPTCHA bypassing, and JavaScript rendering automatically while offering advanced geo-targeting options and a 100% success rate.

The API includes 100+ ready-made templates for popular websites. Craigslist isn't one of them yet, so you'll need to use the Web (universal) target, which returns the HTML of any page. You can then parse this output using the same logic as in your Playwright scripts.

If you prefer a more readable format right out of the box, enable the Markdown option – Craigslist pages work especially well with it, since they contain little unnecessary markup and present data in a clear, text-oriented structure.