The End of “Free” Public Data? How AI Is Challenging The Industry
Cloudflare customers now send more than one billion HTTP 402 "Payment Required" responses on an average day. That figure, mentioned in passing in a recent Cloudflare blog post, signals a real shift. A 20-year convention around free public data is being repriced through pay-per-crawl and AI bot management. And businesses that depend on fresh public data now need to figure out how to react and how to do it fast.
Vaidotas Juknys
Last updated: May 19, 2026
3 min read

The deal that's quietly ending
"Free" public data was never built into how the web works. It was a side effect of three things that lined up for over two decades, and all three are now under stress.
The first was cheap bandwidth. The second was a give-and-take with search engines, where crawlers scanned your content and sent real users back in return. The third was a quiet assumption that bots were a small slice of total traffic. However, according to HUMAN Security's report, bots now account for around 51 to 52% of global web traffic. AI-driven traffic alone grew 187% throughout 2025.
The give-and-take is the part that has broken hardest. Akamai's analysis and other industry data suggest AI chatbot referrals send roughly 96% less traffic than traditional search. The trade is no longer balanced. Sites carry the bandwidth and infrastructure cost of being crawled without the human visits that used to make it worth it.
What Cloudflare’s pay-per-crawl actually does
Most of the coverage misses what Cloudflare actually shipped, so a brief clarification is due.
Pay-per-crawl is a private-beta feature inside Cloudflare’s AI Crawl Control. Publishers can set a flat per-request price, and an AI crawler either presents payment intent and gets a successful response, or receives an HTTP 402 with pricing attached and walks away. Cloudflare acts as the merchant of record. Crawler identity is verified through Web Bot Auth, which closes the obvious loophole of spoofing a user agent to access paid content for free.
Stack Overflow is one of the major brands that became an early adopter of this payment gateway. Sky News, Quora, Raptive, and Webflow have publicly endorsed the permission-based framing. And there’s no doubt the list of paying participants on the AI side will continue to grow throughout 2026.
Despite what headlines suggest, there are 3 things that pay-per-crawl doesn’t do:
- It doesn’t affect humans, browsers, or normal site visitors.
- It doesn’t replace direct licensing deals, which large publishers will continue to negotiate separately for premium content.
- It doesn’t yet cover the long tail of sites that aren’t on Cloudflare or haven’t opted in.
The honest read is that pay-per-crawl is currently small in volume and large in signal. It normalizes a separation that almost no one was making explicitly two years ago, between human access and machine access to the same content. That separation will spread well beyond Cloudflare’s customer base and well beyond AI training crawlers.
Agent traffic is breaking the old bot vs. human distinction
The playbook for blocking bots is starting to block paying customers, too. Most site owners haven't caught up to why.
While crawlers read the web, agents act directly in it. Tools like ChatGPT Atlas, Perplexity Comet, and Claude for Chrome click links, fill out forms, log in, and complete checkouts. HUMAN Security’s data shows agent traffic grew 7,851% year over year. Of that activity, 77% landed on product and search pages, around 9% on account pages, 5% on login pages, and 2% on checkout pages.
Fast clicks and automatic form filling used to be a clear sign of an attack. Now it's just as likely to be a paying customer's AI agent doing the work for them. Site owners can no longer block every kind of automation by default. A real chunk of revenue this year is already arriving through agents acting on behalf of real people. Bot management has to start understanding who is behind the traffic and what they’re trying to do, not just flag suspicious patterns.
Anyone collecting public data has been pushed into one of two camps over the past 18 months. Slow, properly identified collection now looks legitimate, while anonymous, high-volume scraping looks worse than it ever has. The middle ground that most data work used to live in is shrinking from both directions.
Bot management is becoming a pricing layer
Bot management used to sit on the cost side with a single function – filter out unwanted traffic. Now it operates more like a commercial control system.
Allow, charge, or block, paired with a customizable HTTP 402 response, is now a configurable option in Cloudflare’s AI Crawl Control. Akamai, HUMAN Security, DataDome, and Imperva are all building similar capabilities. The point where a site defines its machine-access policy is shifting from static terms-of-service pages to programmatic, per-bot decisions with explicit pricing.
For businesses that rely on data, this expands the negotiation surface. The key question is no longer “does this site allow scraping?” Instead, it breaks into several variables – which bots are allowed, which incur charges, which are blocked, what identity requirements apply, and what the per-request cost is. And here you go – you’ve got five separate procurement variables that weren’t formalized as you’ve planned your company spend a year ago.
What real-time data businesses should do in 2026
The center of gravity for organizations that rely on real-time data has moved into procurement. Web access is becoming a managed input with prices, identities, and contracts. The vocabulary already exists in finance teams that handle cloud, electricity, and bandwidth budgets. After years in the data infrastructure business, the teams we see thriving in this transition share a single feature – procurement discipline arrives before the technical work.
The crawl-to-referral ratios explain why publishers welcomed it. According to Cloudflare's own data, OpenAI's crawler hit publisher sites roughly 1.7K times per referral it sent back. Anthropic ran around 73K-to-one. Google's at 14-to-one. Stack Overflow, Condé Nast, TIME, and the Associated Press all aligned with Cloudflare's pricing infrastructure within months once those numbers became public.
4 operational habits sit underneath all of that:
- Track cost per usable update. Retries, blocks, and downstream processing belong in the unit economics number, comparing scrape costs against paid access becomes meaningless.
- Separate discovery from refresh. The fields that actually move, prices and availability, need high-frequency collection. Stable attributes can be licensed or refreshed weekly.
- Choose cost-efficient solutions. If the data is needed from websites that have dynamic rendering, tools like Decodo’s Web Scraping API enable JavaScript rendering to capture content that loads after the initial page request. Users can also pick between premium and regular proxies depending on the target's anti-bot defenses, set custom headers, choose specific geo-locations, and adjust session settings to match the scraping job. This flexibility means teams only pay for the features a particular project actually requires, rather than overspending on capabilities they won't use.
- Centralize data access behind a shared retrieval layer. As public web data tightens, scraping becomes less about one-off scripts and more about controlled access pipelines, backed by proxies, automated unblocking, and standardized request logic. MCP servers and other AI integrations help streamline the process of web data extraction and retrieval, ensuring the highest level of reliability by closing the gap between fragmented web sources and centralized AI workflows.
Supply your AI with real-time data
Activate your free plan of Web Scraping API and power your AI workflows with data from any website.
Bottom line
The web is putting public data on terms. Pricing, identity verification, and access rules are replacing the old assumption that anyone could grab anything anonymously. Plumbing and procurement work done this year determines how well businesses operate inside the new reality.
Most of the companies losing ground here share a single mistake. They treated "free" as if it meant "permanent." A healthier posture looks more like how a serious finance team handles cloud spend, electricity, and bandwidth. Machine access becomes something to budget, govern, and negotiate, the same as any other operational input.
The question mark in the title is doing real work. Public data is maturing into a market. Once that lands on procurement reviews and vendor scorecards inside your organization, the planning conversation gets a lot more useful.
About the author

Vaidotas Juknys
CEO
Vaidotas Juknys is a commercial leader with 10+ years across technology, telecommunications, and management consulting. His analytical mindset and drive to make public data more accessible have shaped a career that culminated in his role as Decodo's CEO.
In between business strategies, Vaidotas is a self-confessed sci-fi fan with a shelf full of novels and comic books to prove it.
Connect with Vaidotas via LinkedIn.
All information on Decodo Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Decodo Blog or any third-party websites that may belinked therein.


