Perplexity lawsuit sends warning to product managers

7 min read
Share on

Doubtless you’ve seen the headlines that News Corp is suing Jeff Bezos-backed AI startup Perplexity for copyright infringement, accusing Perplexity of scraping content without permission, copying on a “massive scale”, and repurposing New Corp content without authorisation. As the lawsuit says: “This suit is brought by news publishers who seek redress for Perplexity’s brazen scheme to compete for readers while simultaneously freeriding on the valuable content the publishers produce.”

This isn’t Perplexity’s first rodeo. Earlier this year Forbes accused the company of stealing content (you can read about it here, Why Perplexity’s cynical theft represents everything that could go wrong with AI). 

An investigation into the company by Wired, published at about the same time, Perplexity is a bullshit machine, pulled no punches. Wired provided the chatbot with headlines from the website and prompts on subjects that Wired has reported. Its results found “the chatbot at times closely paraphrasing Wired stories, and at times summarising stories inaccurately and with minimal attribution. In one case, the text it generated falsely claimed that Wired had reported that a specific police officer in California had committed a crime”. The article concludes that “the magic trick that’s made Perplexity worth 10 figures, in other words, appears to be that it’s both doing what it says it isn’t and not doing what it says it is”.

AI content detector Copyleaks recently released some analysis that examines Perplexity's handling of paywalled content. It shows that while Perplexity denies reading protected materials, it appears to paraphrase or plagiarise substantial portions of the paywalled articles tested. Copyleaks' analysis found that one Perplexity summary paraphrased 48% of a Forbes article, while another included 7% plagiarism and 28% paraphrasing. As Copyleaks comments: “This inconsistency in responses from Perplexity highlights the need to explore appropriate use of AI-generated copy and content, and how AI platforms interact with protected content.”

Some fundamental issues at the heart of these analyses, namely issues of content ownership and copyright, brand reputation and ethics. News Corp’s legal action should give any product person pause, especially those working with generative AI.

News publishers are struggling. Their advertising revenues are a fraction of what they were before the world moved online, and news which relies on human judgement and oversight is expensive to produce. Shutting down plagiarists and preserving their intellectual property and content ownership is of the utmost importance to such publishers. 

Unlike search engines that direct users to external sites, Perplexity uses an AI model to provide answers within its own platform, and so deprives the external site of web traffic and subsequent ad revenue. The News Corp lawsuit says that Perplexity positions itself as a reliable source of information, often reproducing content from established providers and therefore competing directly with them for readership.

The legality of content scraping is a complex issue, with different approaches in different jurisdictions. This Medium article from last year is as good an introduction to the topic as any. 

While many websites prohibit content scraping, the training of AI models is very dependent on content scraping because of the huge and diverse amount of data needed to achieve high-quality language understanding and generation. And content scraping in itself presents some issues. In theory, scraped content should come only from publicly accessible sites, but even this introduces the risk of biased or inaccurate data entering the training set. We’ve all seen this in action - this report from CIO, 12 famous AI disasters, recounts some of the better-known impacts of biased and poor quality data. 

Copyright law typically plays a central role in governing content scraping, as scraping can often involve reproducing copyrighted material. In the US, for example, the doctrine of “fair use” might offer some room to manoeuvre, because it allows use of copyrighted material for research and education. Does the training of AI models qualify as fair use? It’s a question that has still not been settled. 

Copyright laws under the EU’s Digital Single Market Directive are stricter. EU law allows "text and data mining" for research purposes, but commercial organisations may still require licensing to scrape or use copyrighted material. 

There’s no doubt that brand reputation and consumer trust is damaged by inaccurate and fabricated content. The News Corp lawsuit highlights Perplexity’s AI “hallucinations”, false or misleading information that it attributes to News Corp publications. The lawsuit says this damages News Corp’s reputation for accuracy and dilutes its trademarks because users will believe the content is authentic news that comes from their publications.

None of these issues will be new to the Mind the Product community - ethical and responsible development using AI is a topic we’ve discussed widely on our blog and at our ProductTanks and conferences. This post from Pranav Khare, Ethical AI in SaaS: A product manager’s guide to responsible innovation, explores the ethical complexities of integrating AI in SaaS products. He examines the complicated business of consent to use historical data for unforeseen uses and suggests some strategies for product managers to ensure legal and regulatory compliance for their products, and to ensure data privacy and security. He also advises on approaches to combat bias in AI and emphasises the importance of recognising the wider impact on society that products using AI can have. 

As Pranav says: “Our aim should not be mere compliance or the pursuit of quick successes at the expense of ethical considerations. Instead, we should apply our knowledge and empathy to build AI solutions that are not only technically groundbreaking but also ethically sound and socially responsible.”

Similarly, this post Building products with trust and safety in the age of AI, by Anuja More, Product Lead at WhatsApp, looks at the importance of trust and safety and at the challenges product managers face to build a platform with integrity. She says: “With the advent of Gen AI, constant investment and improvement of trust and safety driven through its platform is a critical factor for the success of an organisation. Users expect a safe and secure environment, free from fraud, abuse, and harmful content which in turn drives trust and potentially a repeat customer in your future.”

Tensions between traditional content providers and AI firms understandably are high, but this legal action from News Corp is a small part of the much wider discourse about industry standards and legislation for AI. The lawsuit highlights the potential legal and operational risks for product managers using AI models reliant on third-party content, and serves as a stark reminder of the need for clearer content usage standards and compliance measures in the handling of paywalled content.

Until such time as these standards become clear, any responsible and ethical product person who integrates third-party information into their products must work with colleagues to make sure they’re creating monetisation models that respect publisher revenue streams, and that they prioritise accuracy and validate outputs. As Microsoft CEO Satya Nadella said during his annual fireside chat at this year’s World Economic Forum: "I don't think the world will put up anymore with any of us coming up with something where we haven't thought through safety, equity and trust – these are big issues for the world."

News Corp seeks massive damages from AI firm Perplexity for stealing content

AI governance: an update for product managers