Back to blog

The End of “Free” Public Data? How AI Is Challenging The Industry

Cloudflare customers now send more than one billion HTTP 402 "Payment Required" responses on an average day. That figure, mentioned in passing in a recent Cloudflare blog post, signals a real shift. A 20-year convention around free public data is being repriced through pay-per-crawl and AI bot management. And businesses that depend on fresh public data now need to figure out how to react and how to do it fast.

The End of “Free” Public Data?

The deal that's quietly ending

"Free" public data was never built into how the web works. It was a side effect of three things that lined up for over two decades, and all three are now under stress.

The first was cheap bandwidth. The second was a give-and-take with search engines, where crawlers scanned your content and sent real users back in return. The third was a quiet assumption that bots were a small slice of total traffic. However, according to HUMAN Security's report, bots now account for around 51 to 52% of global web traffic. AI-driven traffic alone grew 187% throughout 2025.

The give-and-take is the part that has broken hardest. Akamai's analysis and other industry data suggest AI chatbot referrals send roughly 96% less traffic than traditional search. The trade is no longer balanced. Sites carry the bandwidth and infrastructure cost of being crawled without the human visits that used to make it worth it.

What Cloudflare’s pay-per-crawl actually does

Most of the coverage misses what Cloudflare actually shipped, so a brief clarification is due.

Pay-per-crawl is a private-beta feature inside Cloudflare’s AI Crawl Control. Publishers can set a flat per-request price, and an AI crawler either presents payment intent and gets a successful response, or receives an HTTP 402 with pricing attached and walks away. Cloudflare acts as the merchant of record. Crawler identity is verified through Web Bot Auth, which closes the obvious loophole of spoofing a user agent to access paid content for free.

Stack Overflow is one of the major brands that became an early adopter of this payment gateway. Sky News, Quora, Raptive, and Webflow have publicly endorsed the permission-based framing. And there’s no doubt the list of paying participants on the AI side will continue to grow throughout 2026.

Despite what headlines suggest, there are 3 things that pay-per-crawl doesn’t do:

  • It doesn’t affect humans, browsers, or normal site visitors.
  • It doesn’t replace direct licensing deals, which large publishers will continue to negotiate separately for premium content.
  • It doesn’t yet cover the long tail of sites that aren’t on Cloudflare or haven’t opted in.

The honest read is that pay-per-crawl is currently small in volume and large in signal. It normalizes a separation that almost no one was making explicitly two years ago, between human access and machine access to the same content. That separation will spread well beyond Cloudflare’s customer base and well beyond AI training crawlers.

Agent traffic is breaking the old bot vs. human distinction

The playbook for blocking bots is starting to block paying customers, too. Most site owners haven't caught up to why.

While crawlers read the web, agents act directly in it. Tools like ChatGPT Atlas, Perplexity Comet, and Claude for Chrome click links, fill out forms, log in, and complete checkouts. HUMAN Security’s data shows agent traffic grew 7,851% year over year. Of that activity, 77% landed on product and search pages, around 9% on account pages, 5% on login pages, and 2% on checkout pages.

Fast clicks and automatic form filling used to be a clear sign of an attack. Now it's just as likely to be a paying customer's AI agent doing the work for them. Site owners can no longer block every kind of automation by default. A real chunk of revenue this year is already arriving through agents acting on behalf of real people. Bot management has to start understanding who is behind the traffic and what they’re trying to do, not just flag suspicious patterns.

Anyone collecting public data has been pushed into one of two camps over the past 18 months. Slow, properly identified collection now looks legitimate, while anonymous, high-volume scraping looks worse than it ever has. The middle ground that most data work used to live in is shrinking from both directions.

Bot management is becoming a pricing layer

Bot management used to sit on the cost side with a single function – filter out unwanted traffic. Now it operates more like a commercial control system.

Allow, charge, or block, paired with a customizable HTTP 402 response, is now a configurable option in Cloudflare’s AI Crawl Control. Akamai, HUMAN Security, DataDome, and Imperva are all building similar capabilities. The point where a site defines its machine-access policy is shifting from static terms-of-service pages to programmatic, per-bot decisions with explicit pricing.

For businesses that rely on data, this expands the negotiation surface. The key question is no longer “does this site allow scraping?” Instead, it breaks into several variables – which bots are allowed, which incur charges, which are blocked, what identity requirements apply, and what the per-request cost is. And here you go – you’ve got five separate procurement variables that weren’t formalized as you’ve planned your company spend a year ago.

What real-time data businesses should do in 2026

The center of gravity for organizations that rely on real-time data has moved into procurement. Web access is becoming a managed input with prices, identities, and contracts. The vocabulary already exists in finance teams that handle cloud, electricity, and bandwidth budgets. After years in the data infrastructure business, the teams we see thriving in this transition share a single feature – procurement discipline arrives before the technical work.

The crawl-to-referral ratios explain why publishers welcomed it. According to Cloudflare's own data, OpenAI's crawler hit publisher sites roughly 1.7K times per referral it sent back. Anthropic ran around 73K-to-one. Google's at 14-to-one. Stack Overflow, Condé Nast, TIME, and the Associated Press all aligned with Cloudflare's pricing infrastructure within months once those numbers became public.

4 operational habits sit underneath all of that:

  • Track cost per usable update. Retries, blocks, and downstream processing belong in the unit economics number, comparing scrape costs against paid access becomes meaningless.
  • Separate discovery from refresh. The fields that actually move, prices and availability, need high-frequency collection. Stable attributes can be licensed or refreshed weekly.
  • Choose cost-efficient solutions. If the data is needed from websites that have dynamic rendering, tools like Decodo’s Web Scraping API enable JavaScript rendering to capture content that loads after the initial page request. Users can also pick between premium and regular proxies depending on the target's anti-bot defenses, set custom headers, choose specific geo-locations, and adjust session settings to match the scraping job. This flexibility means teams only pay for the features a particular project actually requires, rather than overspending on capabilities they won't use.
  • Centralize data access behind a shared retrieval layer. As public web data tightens, scraping becomes less about one-off scripts and more about controlled access pipelines, backed by proxies, automated unblocking, and standardized request logic. MCP servers and other AI integrations help streamline the process of web data extraction and retrieval, ensuring the highest level of reliability by closing the gap between fragmented web sources and centralized AI workflows.

Supply your AI with real-time data

Activate your free plan of Web Scraping API and power your AI workflows with data from any website.

Bottom line

The web is putting public data on terms. Pricing, identity verification, and access rules are replacing the old assumption that anyone could grab anything anonymously. Plumbing and procurement work done this year determines how well businesses operate inside the new reality.

Most of the companies losing ground here share a single mistake. They treated "free" as if it meant "permanent." A healthier posture looks more like how a serious finance team handles cloud spend, electricity, and bandwidth. Machine access becomes something to budget, govern, and negotiate, the same as any other operational input.

The question mark in the title is doing real work. Public data is maturing into a market. Once that lands on procurement reviews and vendor scorecards inside your organization, the planning conversation gets a lot more useful.

About the author

Vaidotas Juknys

CEO

Vaidotas Juknys is a commercial leader with 10+ years across technology, telecommunications, and management consulting. His analytical mindset and drive to make public data more accessible have shaped a career that culminated in his role as Decodo's CEO.

In between business strategies, Vaidotas is a self-confessed sci-fi fan with a shelf full of novels and comic books to prove it.

Connect with Vaidotas via LinkedIn.

All information on Decodo Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Decodo Blog or any third-party websites that may belinked therein.

Frequently asked questions

What is Cloudflare’s pay-per-crawl?

It is a private-beta feature inside Cloudflare’s AI Crawl Control that lets publishers charge AI crawlers a flat per-request price for accessing their content. When a crawler hits a paywalled URL, it either presents payment intent and receives a normal response or receives an HTTP 402 “Payment Required” response. Cloudflare verifies bot identity through Web Bot Auth and acts as the merchant of record.

Will pay-per-crawl end web scraping?

No, but it will narrow the gray market. Most sites are not on Cloudflare or have not opted in. Pay-per-crawl is currently small in volume. Its real impact is structural. It normalizes the idea that machine access has a price separate from human access. That idea will continue to spread across CDN providers and bot-management vendors throughout 2026 and beyond.

How should businesses prepare for AI agent traffic?

Treat agent traffic as a customer channel rather than a threat to block. A growing share of product discovery, account management, and even checkout activity will be initiated by agents acting on behalf of real users. That means bot management has to become intent-aware, not only signature-aware. Exposing structured, authenticated, rate-limited endpoints that agents can use cleanly is both a security measure and a growth channel.

Is real-time public data still viable as a business input in 2026?

Yes, with discipline. The companies that succeed will be the ones that classify their data sources, sign their bots, budget for machine access, and build event-driven pipelines that fetch what changes rather than scraping everything indiscriminately. The era of treating public data as both free and frictionless is closing. The era of treating it as a managed, priced procurement input is beginning.

The $141K Invisible Employee: What Your B2B Tech Stack Is Really Costing You

Most B2B companies treat their SaaS subscriptions as a handful of manageable line items. We decided to calculate the real number from scratch by aggregating pricing for every tool in a typical stack. For a 50-person company, the total exceeds $141K per year – more than the salary of a senior engineer or VP-level hire. Here’s a complete breakdown of how a handful of "just $99/month" subscriptions quietly add up to a six-figure line item.

Anthropic Blocks OpenClaw From Claude: What Happened and What to Do Now

On 4 April 2026, Anthropic blocked Claude Pro and Max subscribers from using OpenClaw and other third-party AI agent frameworks under their flat-rate plans. The change forces affected users onto pay-as-you-go billing, with some facing cost increases of up to 50 times their previous monthly spend. Here's what happened and what you can do about it.

What Banning Dynamic Pricing Could Mean to Your eCommerce Business

Last December, a Consumer Reports investigation revealed Instacart was charging different customers different prices for identical groceries. Lawmakers reacted fast, with more than 40 bills across 24 US states now targeting dynamic pricing. We tracked over 1.5M price changes across 120+ retailers for Decodo’s Dynamic Pricing Index, and these bills are solving the wrong problem.

© 2018-2026 decodo.com (formerly smartproxy.com). All Rights Reserved