The news: Reddit is suing Perplexity and data-scraping companies Oxylabs UAB, AWMProxy, and SerpApi, highlighting the battle over user-generated content (UGC) in the race to build the top genAI models.
The pressure to collect quality human content is creating an “industrial-scale ‘data laundering’ economy,” per Reddit chief legal officer Ben Lee. He added that Reddit is a prime target due to its massive, dynamic collection of UGC.
Perplexity said its “approach remains principled and responsible,” per The New York Times, and that the company won’t tolerate threats against openness and the public interest.
Why it matters: Cases like this could redefine how AI firms access and value online content, including original UGC and brand-owned material. Publishers’ crackdown on unpermitted access to user content could alter the supply of freely available data for training and refining AI models.
Marketers may need to rethink data-sourcing strategies by leaning more on first-party data and establishing direct content partnerships with publishers and platforms.
Zooming out: Beyond lawsuits, publishers have limited tools to protect their content from data scraping. While robots.txt code can be added to websites to tell bots what information they can and cannot scrape, it’s not legally binding.
What marketers should do: To navigate an increasingly complex landscape for information sourcing, marketers should:
This content is part of EMARKETER’s subscription Briefings, where we pair daily updates with data and analysis from forecasts and research reports. Our Briefings prepare you to start your day informed, to provide critical insights in an important meeting, and to understand the context of what’s happening in your industry. Non-clients can click here to get a demo of our full platform and coverage.
You've read 0 of 2 free articles this month.
One Liberty Plaza9th FloorNew York, NY 100061-800-405-0844
1-800-405-0844sales@emarketer.com