Cloudflare builds an AI to lead AI scraper bots into a horrible maze of junk content

Cloudflare has created a bot-busting AI to make life hell for AI crawlers.

The network-taming company built the tool after noticing that almost one percent of all requests to access web content that it can see now come from AI crawler bots. Those bots are probably scraping data that's gathered up to train AI models.

Web site operators can in theory block AI crawlers using various means such as a robots.txt file or changing web server settings to disallow visits from bots. Some even use CAPTCHAs to test whether visitors to a site are human, or adopt software designed to stymie bots.

In reality crawler operators ignore the instructions in robots.txt files, or work around CAPTCHAs and web server settings. The result is a lot of unwanted crawler traffic consuming resources, and info fed into training data without creators' permission - a contentious practice currently being tested in court amidst allegations of copyright abuse.

Cloudflare's response is to let crawler bots in and use generative AI to create junk content for them to devour in what the company has termed an "AI Labyrinth".

"When we detect unauthorized crawling, rather than blocking the request, we will link to a series of AI-generated pages that are convincing enough to entice a crawler to traverse them," explained Cloudflare's Reid Tatoris, Harsh Saxena, and Luis Miglietti. Cloudflare uses its own serverless Workers to create the content.

The trio wrote that the content is "real looking" but "not actually the content of the site we are protecting, so the crawler wastes time and resources." The content is also "real and related to scientific facts" because Cloudflare doesn't want to inadvertently create misinformation.

The AI slop is also designed not to mess with sites' reputations or search engine optimization efforts.

It is, however, designed to act as a deterrent to crawler operators, by keeping their bots busy and thereby increasing the cost of operating content scrapers.

Cloudflare thinks this stuff is also a useful tool to detect bot activity.

"No real human would go four links deep into a maze of AI-generated nonsense," Cloudflare's trio wrote. "Any visitor that does is very likely to be a bot, so this gives us a brand-new tool to identify and fingerprint bad bots, which we add to our list of known bad actors."

This sort of thing usually creates an arms race and Cloudflare is already thinking about what it will take to stay ahead.

"In the future, we'll continue to work to make these links harder to spot and make them fit seamlessly into the existing structure of the website they're embedded in," its authors wrote.

Cloudflare customers can enable the AI Labyrinth in their management consoles. ®

Search
About Us
Website HardCracked provides softwares, patches, cracks and keygens. If you have software or keygens to share, feel free to submit it to us here. Also you may contact us if you have software that needs to be removed from our website. Thanks for use our service!
IT News
Dec 11
European cloud trade group says EU should have blocked VMware-Broadcom merger

Org argues that the approval process was flawed and regulators should have known better

Dec 11
Legacy Update expands archive of vanished Microsoft downloads

Preserving not just updates, but also lots of the now-deleted optional extras

Dec 11
Trump's AI 'Genesis Mission' emerges from Land of Confusion

DOE lays out $320M plan for science platform linking national labs, industry, and academia

Dec 11
Microsoft research shows chatbots seeping into everyday life

Copilot - your cuddly companion for nighttime introspection

Dec 11
10K Docker images spray live cloud creds across the internet

Flare warns devs are unwittingly publishing production-level secrets

Dec 11
Airbus exec: Most CIOs in Europe will not finish SAP ECC6 migration by 2030

Exclusive Aerospace giant faces 'massive work' to move legacy ERP systems to S/4HANA as support deadline looms