OpenAI is being sued again for using works to train AI. Two prominent authors have filed a copyright lawsuit against the company behind ChatGPT and Bing Chat. According to them, OpenAI used their work as training data. This appears to be the first lawsuit filed over the use of text (as opposed to images or code) as training data.
In a lawsuit filed in the District Court for the Northern District of California, plaintiffs Paul Tremblay and Mona Awad allege that OpenAI and its affiliates violated copyright law, the Digital Millennium Copyright Act (DMCA), and California and general statute restrictions on unfair competition.
The authors are represented by the law firm of Joseph Saveri and Matthew Butterick, the same team behind the recent lawsuits against Stable Diffusion AI and GitHub. The complaint alleges that Tremblay’s novel The Cabin at the End of the World and the two novels Awad: 13 Ways to Look at a Fat Girl and The Bunny were used as training data for GPT-3.5 and GPT-4 were used. Although OpenAI has not disclosed that these novels are in its training data (which is kept secret), the plaintiffs conclude that since ChatGPT was able to provide detailed plot summaries and questions about the books, they must be there to answer what access to them would require texts.
“Because the OpenAI language models cannot function without the expressive information extracted from and stored in Complainants’ (and others’) works, the OpenAI language models themselves constitute an infringement of derivative works made without Complainants’ permission and under were created in violation of their exclusive rights under the Copyright Act. Law“, the complaint said.
All three books contain copyright information (CMI) such as ISBNs and copyright registration numbers. The Digital Millennium Copyright Act (DMCA) argues that deleting or tampering with CMI is illegal, and since ChatGPT responses do not contain this information, plaintiffs allege that in addition to copyright infringement, OpenAI also broke that law.
Although there are currently only two plaintiffs in the lawsuit, attorneys intend to settle the lawsuit, which could also allow other authors whose copyrighted work has been used by OpenAI to be awarded compensation. Lawyers are seeking damages, attorneys’ fees and an injunction forcing OpenAI to change its software and copyright business practices. The website of the law firm LLM Litigation details the plaintiffs’ position and the reasons for filing the lawsuit. “We have filed a class action lawsuit against OpenAI, alleging that ChatGPT and the underlying GPT-3.5 and GPT-4 major language models reworked the copyrighted works of thousands of authors – and many more – without consent, compensation or credit“say the lawyers.
They also criticize the concept of generative AI, stating: “Generative AI is just human intelligence repackaged and sold as a new product. This is not a new kind of intelligence. It’s just a new way of using another person’s intelligence without permission or compensation.“. They note that while OpenAI says they don’t know exactly what books were used to train the AI, it doesn’t matter because:”OpenAI knows that she has used many books and that she did not get permission from their authors“.
This isn’t the first time OpenAI has faced similar allegations. However, the new lawsuit will be the first to address the use of text data and could set a precedent for future lawsuits alleging artificial intelligence copyright infringement.