Perplexity Accused Of Crawling Sites Illegally With Hidden Bots

Image by Marco Verch, from Unsplash

Perplexity Accused Of Crawling Sites Illegally With Hidden Bots

Reading time: 3 min

Cloudflare has accused AI answer engine Perplexity of using stealth techniques to crawl websites against their wishes, raising concerns about data privacy and online trust.

In a rush? Here are the quick facts:

  • Cloudflare de-listed Perplexity as a verified bot.
  • Tests showed Perplexity accessed private, restricted websites.
  • Undeclared bots mimic Chrome and rotate IPs to avoid detection.

In a detailed report, Cloudflare says Perplexity is “modifying their user agent and changing their source ASNs to hide their crawling activity,” even when sites explicitly blocked it via ‘robots.txt’ and firewall rules.

Cloudflare identifies this behavior as a violation of web standards which led them to remove Perplexity from their verified bot list.

Cloudflare developed private websites with no-crawling restrictions in order to test Perplexity’s methods. The company discovered that Perplexity continued to provide complete information about those pages despite the no-crawling rules.

“This response was unexpected, as we had taken all necessary precautions to prevent this data from being retrievable by their crawlers,” Cloudflare said.

The investigation showed that Perplexity’s official bots used a fake browser identity which mimicked Google Chrome to bypass protections when they were blocked. These stealth crawlers made 3–6 million daily requests, rotating through unknown IPs and disguising their source.

In contrast, Cloudflare praised OpenAI for following good web behavior. When tested under the same conditions, “ChatGPT-User fetched the robots file and stopped crawling when it was disallowed.”

Cloudflare says they’ve updated their protection systems to detect and block Perplexity’s hidden crawlers. They’re also urging bot operators to be more transparent and follow ethical web practices.

“There are clear preferences that crawlers should be transparent, serve a clear purpose, perform a specific activity, and, most importantly, follow website directives and preferences,” Cloudflare stated.

ArsTechnica notes that Cloudflare isn’t alone in calling out Perplexity’s tactics. Reddit CEO Steve Huffman described the blocking of Perplexity, Microsoft and Anthropic as “a real pain” because they treated all online content as fair game.

Recently, the BBC also threatened legal action, accusing Perplexity of scraping its website to train its default AI model without permission.

ArsTechnica also notes that Forbes and Wired have accused Perplexity of plagiarism. Wired reported that the company bypassed robots.txt restrictions while using suspicious IP addresses and concealing its bot to evade blocking measures.

With AI companies increasingly seeking training data, the fight over who controls online content is heating up. Cloudflare’s move highlights the growing pushback from publishers and platforms seeking to protect their digital boundaries.

Did you like this article? Rate it!
I hated it I don't really like it It was ok Pretty good! Loved it!

We're thrilled you enjoyed our work!

As a valued reader, would you mind giving us a shoutout on Trustpilot? It's quick and means the world to us. Thank you for being amazing!

Rate us on Trustpilot
0 Voted by 0 users
Title
Comment
Thanks for your feedback