Welcome to Decodo Blog!

Build knowledge on our solutions and streamline your workflows with step-by-step guides and expert tips.

ALL
PRICING INTELLIGENCE
UNBLOCK
VOTEBOT
DATA COLLECTION
...

Golang Colly: How To Build a Web Scraper in Go

Golang Colly is a fast, callback-driven scraping framework for the Go programming language. It wraps HTTP requests, HTML parsing, rate limiting, and concurrency in a clean API, so you can pull structured data from a website with very little code. This tutorial walks you through building a working Colly scraper from an empty project all the way to proxy rotation.

Kipras Kalzanauskas

Last updated: Jun 10, 2026

15 min read

Document labeled 'Notice' showing lines and seal, linked to colorful stream and shield over progress bar on dark background

DATA COLLECTION

Vibe Scraping or Vibe Coding for Data Collection

Vibe scraping is the practice of building scrapers by describing goals in natural language to an LLM rather than hand-writing selectors, a concept derived from Andrej Karpathy's 'vibe coding.' This allows developers to turn prompts into working extractors as LLMs now efficiently parse DOMs, infer schemas, and write code. While it enables rapid prototyping, it introduces new failure modes like hallucinated selectors; scaling these scripts for production still requires real proxies and rendering infrastructure.

Dominykas Niaura

Last updated: Jun 09, 2026

18 min read

DATA COLLECTION

Puppeteer Download File: A Complete Guide for Node.js Developers

Puppeteer makes browser automation feel easy until you need to save a file to disk. Triggering a download in headless mode isn't the same as clicking a button in a real browser, and the default behavior in headless Chrome won't help you. This guide covers the full Puppeteer download file workflow: configuring CDP correctly, picking the right method for your scenario, detecting when a file truly finished, and scaling to batch jobs without leaking memory or corrupting your queue.

Justinas Tamasevicius

Last updated: Jun 08, 2026

25 min read

DATA COLLECTION

Price Scraping: How To Build a Scraper, Test It, and Scale With Confidence

Price data is important for monitoring competitors in eCommerce, enforcing MAP policies, and receiving deal alerts. Doing this manually isn't effective for scaling. A practical approach is price scraping, which helps automatically collect product pricing data from eCommerce websites. This guide will show you how to build a Python scraper using Playwright. It will help you gather real prices, deal with anti-bot measures, and create structured JSON data.

Lukas Mikelionis

Last updated: Jun 08, 2026

20 min read

UNBLOCK

Node Unblocker: A Comprehensive Guide

Node Unblocker is an open-source web proxy built on Node.js that allows users to bypass internet censorship, evade network filters, and access restricted content. Whether you are dealing with strict corporate firewalls, educational network restrictions, or geo-blocked websites, Node Unblocker acts as a seamless intermediary to securely route your web traffic.

Zilvinas Tamulis

Last updated: Jun 08, 2026

8 min read

DATA COLLECTION

C++ Web Scraping: A Practical Guide for Performance-Critical Projects

C++ web scraping is the process of sending HTTP requests from a C++ program, retrieving HTML or other structured responses, and parsing the data using libraries such as libcurl, CPR, libxml2, or pugixml. It's most useful in scraping workloads where CPU efficiency, memory control, predictable latency, or direct integration with an existing C++ system matter more than quick setup. That makes it a practical option for performance-critical pipelines, but a heavier one to build and maintain. The real question isn't whether C++ can scrape the web. It's whether that extra control is worth the extra engineering work.

Lukas Mikelionis

Last updated: Jun 04, 2026

14 min read

How To Set Up PewDiePie's Odysseus AI Workspace

Odysseus is a free, open-source, self-hosted AI workspace from Felix Kjellberg, aka PewDiePie. Yes, the guy who spent a decade telling 100 million people to smash subscribe now wants you to smash docker compose up. It bundles chat, autonomous agents, deep research, and email into one interface that runs on your hardware, not someone else's cloud. It launched in late May 2026 and hit 30,000+ GitHub stars in just 3 days, signaling real demand for AI you own, not rent. Here's what it does, how to use it, and where proxies fit in.

Zilvinas Tamulis

Last updated: Jun 04, 2026

16 min read

Notice document with lines, colorful bars and shield, progress controls Pause, Clear all, Copy valid list on dark background

DATA COLLECTION

Puppeteer Form Submit: A Practical Guide to Reliable Form Automation

Submitting forms with Puppeteer goes beyond just clicking a button on the browser or typing text into an input box. Puppeteer is a Node.js library that can control a headless or headful Chromium (browser) instance through the DevTools Protocol. That means you can use Puppeteer to automatically locate form fields, fill input boxes with necessary values, trigger the action of submitting a form, and confirm whether that form submission actually worked. If you’re into web scraping, or you’re testing your own product, or even automating anything in a browser with JavaScript, you’ll inevitably run into form fields and submit buttons that you will need to get passed programmatically with Puppeteer.

Justinas Tamasevicius

Last updated: Jun 04, 2026

25 min read

DATA COLLECTION

PYTHON

MechanicalSoup Python: A Complete Guide to Scraping, Forms, and Proxies

When you need to scrape 50 pages of search results behind a login wall, raw Requests + Beautiful Soup force you to track cookies and assemble form payloads by hand, while Selenium launches a full browser for pages that don't even use JavaScript. MechanicalSoup sits between those extremes. It wraps Requests and Beautiful Soup into a stateful browser that handles web scraping sessions, forms, and navigation automatically. This guide covers everything from installation to proxy-powered production scrapers.

Justinas Tamasevicius

Last updated: Jun 03, 2026

16 min read

NEWS

PYTHON

How To Build a News Crawler in Python: Step-by-Step Guide

A news crawler is a tool that automatically pulls content from news websites. A web news crawler helps with tracking competitors, feeding LLM pipelines, or watching topic coverage across publishers. This guide walks you through building a configurable proxy-integrated Python news crawler that’ll target multiple news sources, handles proxy rotation, and saves structured results on a schedule.

Kipras Kalzanauskas

Last updated: Jun 03, 2026

12 min read

INSIGHTS

How To Switch From Google to DuckDuckGo

Want to switch from Google to DuckDuckGo? You've got company. DuckDuckGo says traffic to its No AI search page tripled after Google’s I/O 2026 conference on May 19. Daily visits sit around 84% above their old average. This guide explains the move and shows you how to switch.

Benediktas Kazlauskas

Last updated: Jun 02, 2026

3 min read

DATA COLLECTION

UNBLOCK

How to Bypass Cloudflare: Complete Guide to Anti-Bot Evasion

Cloudflare is a massive global cloud network that sits firmly between your scraper and the data you need, blocking all requests that fail its multi-layered detection system. It powers nearly 21% of all websites globally, meaning that 1-in-5 sites rely on this network. Therefore, knowing how to bypass it is essential for serious scrapers. This practical walkthrough covers detection methods, tools like Puppeteer and Playwright, and both DIY approaches and managed solutions, including proxy strategies and web scraping APIs.

Vilius Sakutis

Last updated: Jun 02, 2026

10 min read

DATA COLLECTION

Puppeteer Infinite Scroll: A Practical Scraping Guide

If you try to run curl on an infinite scroll page and then search (grep) for the content, the result will show zero matches. The required items aren't available in the initial HTML. The content is loaded using fetch requests that the page sends when a scroll event is triggered. This guide will cover: diagnosing the target, three scroll strategies, verification, speed, anti-detection methods, and a real-world walkthrough.

Justinas Tamasevicius

Last updated: May 29, 2026

25 min read

PARSING

PYTHON

Python Extract Text From HTML: A Step-by-Step Guide With Code Examples

Extracting text from HTML in Python is one of the most common tasks in web scraping, NLP pipelines, search indexing, and data preparation. The goal is to keep the visible content from a webpage while removing all the HTML markup, scripts, and styles that surround it. This guide walks you through the popular Python libraries for HTML text extraction and a full step-by-step workflow to go from raw HTML to clean, production-ready text.

Lukas Mikelionis

Last updated: May 28, 2026

14 min read

DATA COLLECTION

PYTHON

How To Build a Rank Tracker: Manual Checks, Python Automation, and Modern SERP Tracking

On a recent run, wired.com ranked no. 1 on US desktop and no. 2 on UK desktop computers for "best laptop 2026". Same query, same hour, different country. That gap is what a single-number rank tracker misses, especially now that modern SERPs add AI Overview citations, featured snippets, and "People also ask" blocks that most older tracking tools ignore. This post walks through building a tracker that captures all of it, starting with a manual baseline for ground truth, then moving to a Python implementation against a SERP API, and finally setting up a scaling path for more keywords and locations.

Kipras Kalzanauskas

Last updated: May 28, 2026

20 min read

Rounded square icon showing neon circuit lines and nodes on a dark textured background with a rainbow light wave

BUSINESS AUTOMATION

API

Open WebUI tools: how to give your local LLM real-time internet access with a scraping API

Local LLMs are powerful, but their knowledge ends at the training cutoff. Without internet access, a model running on your own hardware can’t check current prices, read recent news, or retrieve updated documentation. Open WebUI’s Tools system solves this by letting models call custom Python functions during a conversation. In this tutorial, you’ll connect the Decodo Web Scraping API to a custom Open WebUI tool, so your model can fetch live web content on demand.

Justinas Tamasevicius

Last updated: May 28, 2026

12 min read

DATA COLLECTION

API vs. Web Scraping: How to Choose the Right Data Collection Method

Web data extraction typically follows two main paths: requesting through an API or directly scraping target pages. If you're building distributed data pipelines, your choice can impact scalability, reliability, and overall cost. In this guide, we'll explore what each path entails, provide a detailed comparison between them, and explain when to use APIs, web scraping, or both.

Justinas Tamasevicius

Last updated: May 27, 2026

12 min read

Web-scraping UI displaying 'Response' JSON panel and 'Start scraping' button on dark dotted gradient background

DATA COLLECTION

Web Scraping in Dify: A No-Code Guide

Dify is an open-source platform for building LLM apps and AI workflows visually. It gives teams a drag-and-drop canvas for chaining LLMs, tools, and APIs into complete AI workflows. Modern AI apps need fresh, structured web data, but most teams don't want to write and maintain Python scrapers. In this article, you'll learn how to build a no‑code Dify workflow and switch to a managed Web Scraping API when basic plugins aren't enough.

Lukas Mikelionis

Last updated: May 27, 2026

12 min read