Perplexity accused of scraping websites that explicitly blocked AI scraping vs Alternatives: Which One Is Actually Better?
What Is This About?
Overview
AI startup Perplexity is crawling and scraping content from websites that have explicitly indicated they don’t want to be scraped, according to internet infrastructure provider Cloudflare On Monday, Cloudflare published research saying it observed the AI startup ignore blocks and hide its crawling and scraping activities The network infrastructure giant accused Perplexity of obscuring its identity when trying to scrape web pages “in an attempt to circumvent the website’s preferences,” Cloudflare’s researchers wrote
Why This Matters
AI products like those offered by Perplexity rely on gobbling up large amounts of data from the internet, and AI startups have long scraped text, images, and videos from the internet many times without permission to make their products work In recent times, websites have tried to fight back by using the web standard Robots txt file, which tells search engines and AI companies which pages can be indexed and which shouldn’t, efforts that have seen mixed results so far
Key Insights
Perplexity appears to be willingly circumventing these blocks by changing its bots’ “user agent,” meaning a signal that identifies a website visitor by their device and version type, as well as changing their autonomous system networks, or ASN, essentially a number that identifies large networks on the internet, according to Cloudflare
“This activity was observed across tens of thousands of domains and millions of requests per day
Industry Impact
This development is expected to influence the technology industry, highlighting ongoing changes in innovation, competition, and adoption.
Final Thoughts
As the technology landscape continues to evolve, stories like this demonstrate why staying informed is increasingly important.
Why This Matters Right Now
This issue is becoming increasingly important as cost, risk, and long-term impact are drawing attention from businesses and users alike.
Real-World Impact
In real-world scenarios, this development could influence decision-making, technology adoption, and competitive positioning.
Risks and Limitations
Despite its potential, there are concerns related to scalability, security, regulatory challenges, and hidden costs.
Final Thoughts
Understanding this topic early can help readers make informed decisions and prepare for what comes next.
Source: Read Original Article
댓글 쓰기