Open WebUI tools: how to give your local LLM real-time internet access with a scraping API

Local LLMs are powerful, but their knowledge ends at the training cutoff. Without internet access, a model running on your own hardware can’t check current prices, read recent news, or retrieve updated documentation. Open WebUI’s Tools system solves this by letting models call custom Python functions during a conversation. In this tutorial, you’ll connect the Decodo Web Scraping API to a custom Open WebUI tool, so your model can fetch live web content on demand.

Justinas Tamasevicius

Last updated: May 28, 2026

12 min read

Rounded square icon showing neon circuit lines and nodes on a dark textured background with a rainbow light wave

TL;DR

Local LLMs have no knowledge of events past their training cutoff, so no live prices, news, or updated docs. Unlike cloud-hosted models from OpenAI or Anthropic, they don't ship with built-in web access
Open WebUI's Tools system lets you attach custom Python functions to a model, enabling real-time web retrieval during conversations
We’ll build a fetch_page tool that sends any URL to the Decodo Web Scraping API and returns clean, model-readable text
The stack is Ollama + Open WebUI v0.3.x + Docker + Decodo Web Scraping API + basic Python
This tutorial will take ~30 minutes, assuming no prior setup
We’ll cover setting up Ollama and Open WebUI, writing and installing the tool, configuring credentials via Valves, testing in chat, and troubleshooting common errors (auth failures, empty content, tool not firing)
After building, you’ll learn how to extend the tool with search-then-fetch, structured data extraction, or multi-URL summarization – or connect it to n8n and LangChain for larger automated workflows

Why real-time web access changes what a local LLM can do

Because an LLM learns from a fixed training dataset collected up to a certain date, it has no built-in awareness of new information published after that point. To overcome this limitation, the model needs access to live web data.

With web access, the same model can check the current price of an API subscription, read the latest version of a library's documentation, summarize a news article published yesterday, or fact-check a claim against a live source. This reduces reliance on static training data and helps mitigate hallucinations when dealing with fast-changing information.

This capability is especially important for local models. Cloud-hosted AI, such as those from OpenAI and Anthropic, often include built-in retrieval and browsing layers. Local models, however, typically don't ship with web access by default, so developers must implement it themselves.

Adding web access to local LLMs unlocks a whole host of possibilities. For instance, by integrating your local model with a web scraping API, you can automatically collect recent product reviews from a competitor’s Amazon listings, analyze recurring customer complaints, identify missing features users keep requesting, and generate actionable recommendations for improving your own product or marketing strategy.

A related concept worth understanding here is Retrieval-Augmented Generation (RAG). Both RAG systems and live web-access via Tools extend an LLM beyond its static training data by accessing external information during inference.

Traditional RAG systems retrieve information from a pre-indexed and embedded document corpus, making them ideal for private knowledge bases or stable datasets. The tool we’ll build later takes a more dynamic approach. Instead of querying a vector database, it retrieves fresh information directly from live web pages through our Web Scraping API during the conversation itself.

The 2 approaches are complementary. In advanced AI agent systems, developers often use RAG for long-term or private knowledge retrieval and live web access for real-time information such as news, pricing, product reviews, or rapidly changing documentation.

Prerequisites and system overview

Before we start building the tool, you’ll need the following prerequisites.

A local model runtime. We recommend downloading Ollama because it’s straightforward to install on Windows, macOS, and Linux, and Open WebUI connects to it automatically. LM Studio works too, as well as any runtime that exposes an Ollama-compatible API endpoint will work.

Open WebUI v0.3.x or later. The Tools system was introduced in v0.3.x. The "Docker run…" installation command pulls its latest version, which is always Tools-compatible.

A Decodo Web Scraping API account. Our Web Scraping API handles JavaScript rendering, anti-bot bypass, and proxy rotation server-side. Python’s requests.get() call may fail on sites that block scrapers, require JavaScript to load content, or implement rate limits. Decodo manages these challenges and returns clean, structured content to your tool.

Basic Python knowledge. The tool is a Python class with one method. If you can read a function definition and understand what a POST request does, you're good to go.

System overview. Here's a visual representation of how the pieces connect at runtime:

User prompt
    ↓
Open WebUI + Ollama model
    ↓  (model decides to call the tool)
fetch_page(url) -- Python tool
    ↓  (POST request with credentials)
Decodo Web Scraping API
    ↓  (returns clean HTML/text)
Model incorporates result → reply

Practical tip: Get Ollama and Open WebUI connected and working before adding the tool. Send a test message to the model with no tools enabled. A working baseline makes it much easier to isolate issues later.

Open WebUI's built-in web search vs. custom tools

Open WebUI includes a built-in web search feature under Settings → Web Search. You can connect it to providers like SearXNG, Brave Search, or Google Programmable Search Engine in a few clicks. For many general-purpose queries, this is often enough and works well out of the box.

However, it has limitations. Built-in search is designed around query → search results → snippets. It retrieves and injects summarized search results into the model context, rather than allowing direct, fine-grained access to arbitrary URLs. As a result, it offers limited control over what content is fetched, how it is parsed, and how it is structured before being passed to the model.

Custom tools operate at a lower level. A tool is essentially a Python function that the model can call during a conversation. This allows you to fetch specific URLs, process raw responses, extract structured data, apply custom filtering logic, and return precisely formatted outputs tailored to your use case. In other words, you control the retrieval pipeline end-to-end.

Here’s a simplified comparison:

Feature

Built-in Web Search

Custom Tools

Setup effort

Minimal, configure in UI

Moderate, write and install a Python tool

Retrieval scope

Search query-based

Direct URL-level access

Content control

Limited (search snippets/results)

Full control over fetching and parsing

Page rendering

Depends on the search provider

Can support headless rendering via external services

Anti-bot handling

Handled by search provider (if at all)

Can be implemented via services like Decodo or similar tools

Output format

Fixed by the search provider

Fully customizable (any structured or unstructured format)

Practical tip: Use the built-in search for quick, general queries where snippets are sufficient. Go with custom tools when you need deterministic access to specific pages, structured extraction, or a more reliable scraping pipeline that can handle complex or protected websites.

How the Open WebUI Tools system works

Open WebUI Tools follow a structured Python convention that determines how functions are exposed to the model. Understanding this structure makes implementing custom tools much easier.

Tools are defined as Python classes. A typical tool file contains a class named Tools, and each method inside that class is exposed as a callable function for the model. Open WebUI scans this class at load time to register available tool functions.

Docstrings and type hints define the tool schema. Open WebUI parses each method’s docstring and annotations to generate the function schema shown to the model. The model does not execute or interpret the docstring directly. Instead, it relies on this generated schema when deciding whether and how to call a tool. Clearly describing each function’s purpose and expected arguments will improve the model's reliability.

Streaming via the event emitter. Tool methods can optionally accept an __event_emitter__ parameter. When provided, Open WebUI injects a callback that allows the tool to stream status updates back to the UI during execution. This is what enables intermediate messages like "Fetching…" or progress indicators while the tool runs.

Tool calls are model-driven. There are no hardcoded triggers. The model receives the available function schemas at the start of a session and autonomously decides when a tool is relevant based on the user’s prompt and the schema description. This makes schema quality, especially docstrings, critical for consistent behavior.

Valves handle configuration. Sensitive or environment-specific values such as API keys, endpoints, timeouts, and limits are stored in a Valves inner class (typically defined as a Pydantic model). These values are configured through the Open WebUI interface rather than hardcoded in the tool logic.

Practical tip: Treat Valves like runtime configuration rather than application logic. Anything that may change across environments – credentials, base URLs, rate limits, or parsing limits – should live in a Valve, while the tool method itself should focus purely on execution logic.

Building and installing the web scraping tool

This section walks you through the complete implementation process, from setting up your local environment, writing the tool code, and uploading it to Open WebUI, to setting credentials via Valves and enabling it for your local model.

Setting up your environment

Step 1: Install Ollama

Download the installer for your OS from ollama.com/download and run it. On macOS, you can also install via Homebrew:

brew install ollama

Once installed, start Ollama:

ollama serve &

Verify it's running:

ollama --version

Pull a model that supports function calling:

ollama pull llama3.1

Confirm it works:

ollama run llama3.1 "Say hello"

Step 2: Install Docker Desktop

Download Docker Desktop. On Windows, enable the WSL 2 backend when prompted and restart your machine after installation. On macOS, simply start Docker Desktop after installing.

Verify that it’s installed:

docker --version

Step 3: Run Open WebUI

Run this single Docker command in PowerShell, Command Prompt, or any of the command tools you’re using:

docker run -d \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Windows (PowerShell) note: Replace the backslashes (\) with backticks (`) for line continuation, or paste the command as a single line.

Wait about 30 seconds, then open http://localhost:3000 in your browser. Click Get started and create an admin account when prompted.

Get started with Open WebUI, form showing Name, Email (@gmail.com), Password and 'Create Admin Account' on black background

Step 4: Connect Ollama

In Open WebUI, go to Admin Panel → Settings → Connections. You might need to set the Ollama API URL if it hasn't done that automatically. In Open WebUI, go to:

http://host.docker.internal:11434

Click the verify icon next to the URL. It should show a successful connection.

If you see a "network problem" error on Windows, Ollama may be bound to localhost only and unreachable from inside the Docker container. Fix it by restarting Ollama with all interfaces enabled:

# Stop the current process
pkill ollama

# Restart with all interfaces enabled
OLLAMA_HOST=0.0.0.0 ollama serve &

On PowerShell:

$env:OLLAMA_HOST="0.0.0.0"; ollama serve

Click the verify icon again. The connection should go green.

Writing the tool

The tool accepts a URL, sends it to Web Scraping API, and returns the page content as a string.

"""
Open WebUI Tool: Web Scraper via Decodo API
Allows the model to fetch live web content from any URL.
"""

import requests
from pydantic import BaseModel, Field


class Tools:
    class Valves(BaseModel):
        DECODO_USERNAME: str = Field(
            default="",
            description="Your Decodo account username.",
        )
        DECODO_PASSWORD: str = Field(
            default="",
            description="Your Decodo account password.",
        )
        REQUEST_TIMEOUT: int = Field(
            default=40,
            description="Seconds to wait for the API response before timing out.",
        )
        MAX_CHARS: int = Field(
            default=8000,
            description="Maximum characters to return. Prevents context window overflow.",
        )

    def __init__(self):
        self.valves = self.Valves()

    async def fetch_page(
        self,
        url: str,
        __event_emitter__=None,
    ) -> str:
        """
        Fetches the text content of a web page at the given URL.

        Use this tool whenever the user asks you to:
        - Read, summarize, or analyze a specific URL
        - Check the current content of a web page
        - Look up live documentation, prices, or news at a known address

        Do NOT call this tool for general knowledge questions that don't require
        fetching a specific URL.

        Args:
            url: The full URL to fetch (must start with http:// or https://)

        Returns:
            The plain text content of the page, or an error message.
        """

        if __event_emitter__:
            await __event_emitter__(
                {
                    "type": "status",
                    "data": {"description": f"Fetching {url}...", "done": False},
                }
            )

        if not self.valves.DECODO_USERNAME or not self.valves.DECODO_PASSWORD:
            return "Error: Decodo credentials not set. Go to the tool's Valves settings and add your username and password."

        if not url.startswith(("http://", "https://")):
            return f"Error: Invalid URL '{url}'. Must start with http:// or https://"

        try:
            response = requests.post(
                "https://scraper-api.decodo.com/v2/scrape",
                auth=(self.valves.DECODO_USERNAME, self.valves.DECODO_PASSWORD),
                json={
                    "url": url,
                    "target": "universal",
                    "headless": "html",
                },
                timeout=self.valves.REQUEST_TIMEOUT,
            )

            response.raise_for_status()
            data = response.json()

            results = data.get("results", [])
            if not results:
                return f"The page at {url} returned no results."

            content = results[0].get("content", "")

            if not content:
                return f"The page at {url} returned no text content."

            if len(content) > self.valves.MAX_CHARS:
                content = (
                    content[: self.valves.MAX_CHARS]
                    + f"\n\n[Content truncated at {self.valves.MAX_CHARS} characters]"
                )

            if __event_emitter__:
                await __event_emitter__(
                    {
                        "type": "status",
                        "data": {
                            "description": "Page fetched successfully.",
                            "done": True,
                        },
                    }
                )

            return content

        except requests.exceptions.Timeout:
            return f"Error: Request to {url} timed out after {self.valves.REQUEST_TIMEOUT} seconds."
        except requests.exceptions.HTTPError as e:
            return f"Error: Decodo API returned HTTP {e.response.status_code}. Check your credentials and account credits."
        except requests.exceptions.RequestException as e:
            return f"Error fetching {url}: {str(e)}"
        except Exception as e:
            return f"Unexpected error: {str(e)}"

A few things worth noting about the code implementation above:

The Valves class holds DECODO_USERNAME, DECODO_PASSWORD, REQUEST_TIMEOUT, and MAX_CHARS. None of these are hardcoded.
The fetch_page method uses a POST request, which our API requires it. Authentication is HTTP Basic, using the username and password from the API Playground in your Decodo dashboard.
The target: "universal" parameter tells the API to treat the URL as a generic web page. The headless: "html" parameter enables JavaScript rendering, which is necessary for pages that load content dynamically.
The response JSON returns a results array. The page content is at results[0]["content"].
Content is truncated at MAX_CHARS before being returned. This prevents context window overflow on long pages.
The __event_emitter__ calls stream status messages ("Fetching...", "Page fetched successfully.") to the chat UI while the request is in flight.

Installing the tool in Open WebUI

Follow these steps to install the tool:

In Open WebUI, click Workspace in the left sidebar, then Tools.
Click + New Tool to create a new tool.
Give it the name "Web Scraper" and a short description.
Paste the full tool code provided above into the editor.
Click Save.

Web Scraper — web_scraper Just a tiny web scraper; shows code 'Open WebUI Tool: Web Scraper via Decodo API' in dark UI

Open WebUI validates the Python syntax on save. If you see a syntax error badge, check the code for missing brackets or indentation issues before continuing.

Get Web Scraping API

Plug an API with 125M+ IPs worldwide into your LLM tool stack so it could access live data with at a 99.99% success rates.

Start for free

Setting credentials via Valves

After saving, a Valves (gear) icon appears next to the tool. Click it and fill in:

DECODO_USERNAME – your Decodo account username
DECODO_PASSWORD – your Decodo account password
REQUEST_TIMEOUT – leave it at 40 (seconds)
MAX_CHARS – leave at 8000 (characters)

Valves modal showing Decodo Username U0... and Decodo Password PW_170... in a dark settings panel with Save button

Click Save. Credentials are stored in Open WebUI's internal database, not in the tool source code.

Enabling the tool for a chat session

Follow these steps to test the tool:

Open a New Chat and select your model from the dropdown in the top-left corner of the page.
Click the icon next to the + button (labeled "Integrations" on hover) at the bottom-left of the chat input.
Toggle Web Scraper on.

The tool is now active for that session.

Using the web scraping tool in chat

With the tool enabled, the model can fetch live web content whenever it decides the prompt warrants it. Using any of the prompts below will trigger the tool.

Summarize the article at https://en.wikipedia.org/wiki/Web_scraping

What does the documentation at https://docs.python.org/3/library/requests.html say about timeouts?

Check the current pricing at /scraping/web

When you send one of these prompts, the model reads the function schema, determines that fetch_page is relevant, generates a tool call with the URL as the argument, waits for the returned content, and then responds based on the tool output.

Llama3.1:latest chat showing "Summarize the content at https://en.wikipedia.org/wiki/Web_scraping" and assistant fetching Wikipedia text in dark UI

Keep in mind that very long pages will be truncated at MAX_CHARS (8,000 by default). The model only sees what the tool returns, so if a page is much longer than that, the response will reflect only the first portion.

For heavily protected targets, such as major eCommerce platforms, paywalled news sites, or pages with aggressive bot detection, standard web scraping approaches may not work. We recommend using Site Unblocker as an escalation layer for sites that block conventional scraping requests.

It handles advanced anti-bot challenges through techniques such as browser fingerprint emulation, TLS and header normalization, proxy rotation, and full browser rendering. The integration pattern remains the same as end-to-end AI workflows built with LangChain and the Web Scraping API: same API architecture, different retrieval layer.

Troubleshooting and best practices

Below are some common issues you may encounter while building a tool like this, along with practical fixes and best practices to follow throughout the development process.

Issue #1: The tool doesn't fire

The model may not support function calling, or the docstring isn't clear enough. To fix this:

Check that you're using an instruction-tuned model (e.g., Ollama)
Try a more explicit prompt: "Use the fetch_page tool to get [URL]."
If it still doesn't fire, re-read the docstring and make the "when to call this" language more direct

Issue #2: HTTP 401 error

Either the credentials are wrong, or the Valve configuration wasn't saved correctly.

Double-check the username and password in the Valves panel
Verify API access with the following command (replace the placeholders with your actual credentials):

curl -X POST "https://scraper-api.decodo.com/v2/scrape" \
  -u "YOUR_DECODO_USERNAME:YOUR_DECODO_PASSWORD" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "target": "universal"}'

A successful response returns a JSON object with a results array.

Issue #3: HTTP 402 or 403 error

Your Decodo account has no remaining credits, or your subscription doesn't cover the requested target. Check your dashboard.

Issue #4: Empty content returned or response is truncated

If the tool returns little or no content, the target page may rely heavily on JavaScript rendering. Our tool already sends "headless": "html" to enable browser-based rendering, which resolves this issue for many modern websites.

If content is still empty on specific targets, the site may be protected by advanced anti-bot systems that block standard web scraping requests. In those cases, switch to our Site Unblocker for more advanced browser emulation and request handling.

Truncated responses are normal for very large pages. Increase MAX_CHARS in the Valves panel to allow more content through. For models with a 32K+ context window, values of 20K characters or higher are generally safe, depending on how much additional conversation context the model also needs to process.

Issue #5: The tool saved with a syntax error

Open WebUI shows a badge on the tool if the Python is invalid. Copy the code into a local Python file and run python web_scraper_tool.py to identify and resolve the error before re-uploading.

Best practices

Always verify that the tool works and returns content by testing it against a known-good URL, such as a simple Wikipedia article or Hacker News thread, before moving on to heavily protected or complex targets.
Keep REQUEST_TIMEOUT between 10 and 40 seconds. Long-running requests can stall or freeze the chat session.
Return extracted content as plain text rather than raw HTML. Models process clean text far more reliably, while HTML tags add unnecessary noise. The Decodo API’s content field already strips most markup, but it’s still worth checking the output if the model appears confused or produces inconsistent responses.
Keep tools single-purpose. A tool that only fetches URLs is easier to debug, monitor, and scale than one that tries to search, fetch, parse, and filter content all at once.

Customizing and extending your scraping tools

We intentionally made the fetch_page tool minimal for demonstration purposes. Follow the patterns below to extend it and build new tools that serve more specific use cases. These are structural starting points, not complete implementations.

1. Search-then-fetch

Add a second method to the Tools class that accepts a search query, sends it to the our Web Scraping API, extracts the top result URLs from the response, and then calls the equivalent of fetch_page on the first one. This gives your model the ability to search the web by topic rather than just fetching content from a given URL.

async def search_and_fetch(self, query: str, __event_emitter__=None) -> str:
    """
    Searches the web for the given query and returns the content of the top result.
    Use this when the user asks about a topic but doesn't provide a specific URL.

    Args:
        query: The search query string.

    Returns:
        The text content of the top search result page.
    """
    # Step 1: POST query to Decodo SERP API endpoint
    # Step 2: Parse results[0]["url"] from response
    # Step 3: Call fetch_page logic with that URL
    # Step 4: Return content

2. Structured data extraction

Instead of returning raw page content, parse the content string for specific fields and return a JSON string. This approach is useful when the model needs to compare data across multiple pages – for example, extracting price, product name, and availability from several eCommerce URLs.

import json
import re

# After getting content from the API:
price_match = re.search(r'\$[\d,]+\.?\d*', content)
price = price_match.group(0) if price_match else "not found"
return json.dumps({"url": url, "price": price})

3. Multi-URL summarization

Add a method that accepts multiple URLs, fetches each page in sequence, and returns a combined summary. This is useful for comparative research workflows, such as analyzing and summarizing multiple competitor pages in a single request.

async def fetch_multiple(self, urls: list[str], __event_emitter__=None) -> str:
    """
    Fetches multiple URLs and returns their combined content.
    Use when the user asks to compare or summarize several pages at once.

    Args:
        urls: A list of URLs to fetch.
    """
    # Fetch each URL, concatenate results with URL headers as separators

4. Domain filtering

Add an ALLOWED_DOMAINS list in Valve as a comma-separated string of approved domains. Before making any API request, validate that the requested URL’s domain matches one in the allowlist. And if it doesn’t, return an error string and abort the request. This is useful in shared or enterprise Open WebUI deployments where unrestricted URL fetching could pose a security risk.

ALLOWED_DOMAINS: str = Field(
    default="",
    description="Comma-separated list of allowed domains. Leave empty to allow all.",
)

# In fetch_page, before the API call:
if self.valves.ALLOWED_DOMAINS:
    allowed = [d.strip() for d in self.valves.ALLOWED_DOMAINS.split(",")]
    from urllib.parse import urlparse
    domain = urlparse(url).netloc
    if not any(domain.endswith(a) for a in allowed):
        return f"Error: {domain} is not in the allowed domains list."

Connecting to broader AI tooling

Open WebUI Tools work well as standalone extensions, but the same web access pattern composes into larger AI workflows.

If you need the model to fetch data on a schedule or route between multiple tools automatically, integrate n8n with Decodo to build real-time, automated data pipelines. If you're building agents with Python rather than Open WebUI, LangChain supports direct integration with our Web Scraping API using the same fetch-and-return pattern.

Furthermore, if you’re using Claude Desktop or other MCP-compatible clients, the Decodo MCP Server provides an alternative integration path that doesn't require Open WebUI. See the step-by-step MCP setup guide and top 10 MCPs for AI workflows in 2026 to understand where MCP fits in the broader ecosystem.

Final thoughts

Open WebUI’s Tools system provides a clean, model-agnostic way to extend local LLMs with live web data, without relying on a cloud provider, opaque retrieval layers, or prompt-based workarounds.

At its core is a Python function that sends a URL to the Decodo Web Scraping API and returns clean, usable text. That same pattern scales from a single-user local setup to a multi-user Open WebUI deployment without any structural changes.

From here, the next step is adding a search-then-fetch tool so the model can discover pages from queries instead of relying on predefined URLs. You can also explore n8n for scheduled web retrieval and monitor the MCP ecosystem if cross-client portability matters.

Web Scraping API for all your AI use cases

Integrate 125M+ IPs to feed live web data to your AI chatbot. AI-ready outputs and 99.99% success rates with auto-retry logic.

Get free plan

About the author

Justinas Tamasevicius

Director of Engineering

Justinas Tamaševičius is Director of Engineering with over two decades of expertise in software development. What started as a self-taught passion during his school years has evolved into a distinguished career spanning backend engineering, system architecture, and infrastructure development.

Connect with Justinas via LinkedIn.

All information on Decodo Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Decodo Blog or any third-party websites that may belinked therein.

In this article

Get Web Scraping API

Power your AI models with live web data through 125M+ IPs around the globe.

Try it now

Frequently asked questions

How do I create my own tracker?

The basic pattern is to fetch the page, store the last-seen value in a Valve, local file, or external database, then compare the latest response on each subsequent call. If changes are detected, return a diff or summary describing what changed.

What is the best rank tracker?

The best rank tracker really depends on how much it meets your needs. For tracking rankings specifically, a purpose-built rank tracker API is the more appropriate solution. A custom Open WebUI tool can retrieve a page’s title, meta description, or content snapshot via the Web Scraping API. However, it’s not a replacement for a dedicated rank tracking system.

Can I do SEO by myself?

Yes, and a local LLM with web access can help more than you'd expect. With a custom-built tool, you can point it at competitor pages, have it summarize content, check whether documentation is up-to-date, or analyze live SERP results directly. None of this requires a paid SEO platform. It won’t replace a full SEO suite, but for lightweight research and content auditing, it’s a practical and cost-free alternative.

How do I create a rank system?

A basic ranking system in Open WebUI Tools can be built around a simple pattern: a tool method that fetches a target URL, extracts structured data (such as SERP position, content snapshot, and publish date), and returns it as a JSON string for logging to a file or external system.

For full-scale rank tracking with historical trends and alerts, a dedicated rank tracking platform is a better foundation. The Open WebUI tool layer then becomes a lightweight complement for ad hoc checks and on-demand validation.

Instruction cards: buildSearchInstructions, buildAnalysisInstructions, buildReportFormat on dark dotted gradient background

DATA COLLECTION

End-to-End AI Workflows with LangChain and Web Scraping API

AI has evolved from programs that just follow rules to systems that can learn and make decisions. Businesses that understand this shift can leverage AI to tackle complex challenges, moving beyond simple task automation. In this guide, we'll walk you through how to connect modern AI tools with live web data to create an automated system that achieves a specific goal. This will give you a solid foundation for building even more sophisticated autonomous applications.

Vytautas Savickas

Last updated: Oct 22, 2025

11 min read

BIG DATA

DATA COLLECTION

How to Set Up MCP Server: Step-by-Step Guide

Over the past year, the Model Context Protocol (MCP) has gone from a niche idea to a go-to standard for integrating LLM agents with real-world tools and data. This setup lets agents deliver smarter, context-aware responses and handle complex workflows on their own. In this guide, you'll learn how to set up the Decodo MCP server with tools like Cursor, VS Code, and Claude Desktop and supercharge your web scraping operations.

Mykolas Juodis

Last updated: Aug 04, 2025

7 min read

BIG DATA

Top 10 MCPs for AI Workflows in 2026

MCP has shifted from niche adoption to widespread use, with major platforms like OpenAI, Microsoft, and Google supporting it natively. Public directories now feature thousands of MCP servers from community developers and vendors, covering everything from developer tools to business solutions.

In this guide, you'll learn what MCP is and why it matters for real-world AI agents, which 10 MCP servers are currently most useful, and how to safely choose and combine MCPs for your setup.

Mykolas Juodis

Last updated: Jan 13, 2026

9 min read

Open WebUI tools: how to give your local LLM real-time internet access with a scraping API

TL;DR

Why real-time web access changes what a local LLM can do

Prerequisites and system overview

Open WebUI's built-in web search vs. custom tools

How the Open WebUI Tools system works

Building and installing the web scraping tool

Setting up your environment

Step 1: Install Ollama

Step 2: Install Docker Desktop

Step 3: Run Open WebUI

Step 4: Connect Ollama

Writing the tool

Installing the tool in Open WebUI

Setting credentials via Valves

Enabling the tool for a chat session

Using the web scraping tool in chat

Troubleshooting and best practices

Issue #1: The tool doesn't fire

Issue #2: HTTP 401 error

Issue #3: HTTP 402 or 403 error

Issue #4: Empty content returned or response is truncated

Issue #5: The tool saved with a syntax error

Best practices

Customizing and extending your scraping tools

1. Search-then-fetch

2. Structured data extraction

3. Multi-URL summarization

4. Domain filtering

Connecting to broader AI tooling

Final thoughts

Frequently asked questions

How do I create my own tracker?

What is the best rank tracker?

Can I do SEO by myself?

How do I create a rank system?

Related articles