Perplexity accused of scraping websites that explicitly blocked AI scraping vs Alternatives: Which One Is Actually Better?

Perplexity accused of scraping websites that explicitly blocked AI scraping vs Alternatives: Which One Is Actually Better?

What Is This About?

Overview

AI startup Perplexity is crawling and scraping content from websites that have explicitly indicated they don’t want to be scraped, according to internet infrastructure provider Cloudflare On Monday, Cloudflare published research saying it observed the AI startup ignore blocks and hide its crawling and scraping activities The network infrastructure giant accused Perplexity of obscuring its identity when trying to scrape web pages “in an attempt to circumvent the website’s preferences,” Cloudflare’s researchers wrote

Why This Matters

AI products like those offered by Perplexity rely on gobbling up large amounts of data from the internet, and AI startups have long scraped text, images, and videos from the internet many times without permission to make their products work In recent times, websites have tried to fight back by using the web standard Robots txt file, which tells search engines and AI companies which pages can be indexed and which shouldn’t, efforts that have seen mixed results so far

Key Insights

Perplexity appears to be willingly circumventing these blocks by changing its bots’ “user agent,” meaning a signal that identifies a website visitor by their device and version type, as well as changing their autonomous system networks, or ASN, essentially a number that identifies large networks on the internet, according to Cloudflare

“This activity was observed across tens of thousands of domains and millions of requests per day

Industry Impact

This development is expected to influence the technology industry, highlighting ongoing changes in innovation, competition, and adoption.

Final Thoughts

As the technology landscape continues to evolve, stories like this demonstrate why staying informed is increasingly important.

Why This Matters Right Now

This issue is becoming increasingly important as cost, risk, and long-term impact are drawing attention from businesses and users alike.

Real-World Impact

In real-world scenarios, this development could influence decision-making, technology adoption, and competitive positioning.

Risks and Limitations

Despite its potential, there are concerns related to scalability, security, regulatory challenges, and hidden costs.

Final Thoughts

Understanding this topic early can help readers make informed decisions and prepare for what comes next.


Source: Read Original Article

Post a Comment

다음 이전