Comedian Sarah Silverman is suing OpenAI and Meta, claiming their chatbots were trained on her copyrighted work without permission.
Along with authors Richard Kadrey and Christopher Golden, Silverman is accusing the tech firms of unauthorized use of her 2010 book The Bedwetter. It was used, claims the lawsuit, to train OpenAI’s ChatGPT and Meta’s rival AI chatbot LLaMA, along with hundreds of thousands of other books.
“Much of the material in the training datasets used by OpenAI and Meta comes from copyrighted works—including books written by Plaintiffs—that were copied by OpenAI and Meta without consent, without credit, and without compensation. Books in particular are recognized within the AI community as valuable training data,” write the trio’s lawyers, Joseph Saveri and Matthew Butterick.
“A team of researchers from MIT and Cornell recently studied the value of various kinds of textual material for machine learning. Books were placed in the top tier of training data that had ‘the strongest positive effects on downstream performance’.”
The lawsuit also makes reference to so-called ‘shadow libraries’, which provide content in bulk through torrent services, listing Bibliotik, Library Genesis and Z-Library.
“These flagrantly illegal shadow libraries have long been of interest to the AI-training community,” the suit states. “For that reason, these shadow libraries are also flagrantly illegal.”
The three point out that, when prompted, ChatGPT will summarize their books—noting, ironically, that these summaries don’t include the books’ copyright information.
Along with copyright violation, the cases allege negligence, unjust enrichment and unfair competition, and seek unspecified statutory damages and restitution of profits.
The case reinforces growing concern about the training data used by generative AIs. Already, a case is in train launched by image supplier Getty Images, which is suing Stability AI for reportedly using its images to train its art-generating AI, Stable Diffusion.
Meanwhile, a class-action lawsuit is currently underway against Microsoft, GitHub and OpenAI, alleging that they broke copyright law by using source code lifted from GitHub to train the Copilot code-generating AI system.
And last month, writers’ advocacy group The Authors Guild published an open letter accusing six tech forms of exploiting their work to train AI systems without consent, credit, or compensation.
“You’re spending billions of dollars to develop AI technology. It is only fair that you compensate us for using our writings, without which AI would be banal and extremely limited,” the group wrote.
“As a result of embedding our writings in your systems, generative AI threatens to damage our profession by flooding the market with mediocre, machine-written books, stories, and journalism based on our work.”
The outcome is likely to rest on the concept of ‘fair use’, which in the U.S. allows copyrighted material to be used in certain circumstances, such as when the original work is ‘transformed’.
We’ve contacted OpenAI and Meta for comment, and will update with any response.
Stay connected with us on social media platform for instant update click here to join our Twitter, & Facebook
We are now on Telegram. Click here to join our channel (@TechiUpdate) and stay updated with the latest Technology headlines.
For all the latest Technology News Click Here