The news: Reddit is suing Perplexity and data-scraping companies Oxylabs UAB, AWMProxy, and SerpApi, highlighting the battle over user-generated content (UGC) in the race to build the top genAI models.

Reddit states in court documents that the three data scraping companies illegally pulled content from its website without permission through Google Search results to sell it, per Bloomberg.
The lawsuit also states that Perplexity has been buying that data from at least one of the companies.

The pressure to collect quality human content is creating an “industrial-scale ‘data laundering’ economy,” per Reddit chief legal officer Ben Lee. He added that Reddit is a prime target due to its massive, dynamic collection of UGC.

Perplexity said its “approach remains principled and responsible,” per The New York Times, and that the company won’t tolerate threats against openness and the public interest.

Why it matters: Cases like this could redefine how AI firms access and value online content, including original UGC and brand-owned material. Publishers’ crackdown on unpermitted access to user content could alter the supply of freely available data for training and refining AI models.

Marketers may need to rethink data-sourcing strategies by leaning more on first-party data and establishing direct content partnerships with publishers and platforms.

Zooming out: Beyond lawsuits, publishers have limited tools to protect their content from data scraping. While robots.txt code can be added to websites to tell bots what information they can and cannot scrape, it’s not legally binding.

What marketers should do: To navigate an increasingly complex landscape for information sourcing, marketers should:

Diversify reliance on genAI tools—across ChatGPT, Gemini, Anthropic, and others—to prevent operational slowdowns if some models lag in capability due to restricted training data.
Explore AI partners that offer legal indemnification clauses to ensure that a brand isn’t at legal risk if a provider errs by scraping copyrighted information.

This content is part of EMARKETER’s subscription Briefings, where we pair daily updates with data and analysis from forecasts and research reports. Our Briefings prepare you to start your day informed, to provide critical insights in an important meeting, and to understand the context of what’s happening in your industry. Non-clients can click here to get a demo of our full platform and coverage.

You've read 0 of 2 free articles this month.

Get more articles - create your free account today!

Products

Events & Resources

Topics

Latest Articles

Shoppers hate paying for return shipping more than any other part of the process

Doctors expect longer visits as parents seek clarity about pediatric vaccine guidelines

Lowe’s reaches out to Gen Alpha with new rewards program for kids

Meta faces critical questions as AI spending and regulatory scrutiny intensify

J&J launches direct-to-consumer prescription drug site with lower cash prices

The health and fitness industry’s future is subscription-first, not hardware-led

Scam exposure on social platforms threatens brands

Email is consumers’ top marketing channel, but inbox fatigue threatens ROI

How payment providers should react to OpenAI’s Instant Checkout walkback

Capital One joins finance industry layoff trend, sheds 1,100 Discover workers

About

Reddit sues over scraped content as AI firms chase UGC at scale

Coverage Areas →

Coverage Areas →

Advertising & Marketing

Health

Ecommerce & Retail

Technology

Financial Services

More Topics

Geographies

EMARKETER

Media Services

Free Content

Contact Us →

Worldwide HQ

Sales Inquiries