Two US authors have sued OpenAI in San Francisco federal court, claiming in a proposed class action that the company misused their works to “train” its popular generative artificial intelligence system, ChatGPT.
Massachusetts-based writers Paul Tremblay and Mona Awad said ChatGPT mined data copied from thousands of books without permission, infringing the authors’ copyrights.
Matthew Butterick, an attorney for the authors, declined to comment. Representatives for OpenAI, a private company backed by Microsoft, did not immediately respond to a request for comment.
Several legal challenges have been filed over material used to train cutting-edge AI systems. Plaintiffs include source-code owners against OpenAI and Microsoft’s GitHub, and visual artists against Stability AI, Midjourney and DeviantArt.
The lawsuit targets have argued that their systems make fair use of copyrighted work.
ChatGPT responds to users’ text prompts in a conversational way. It became the fastest-growing consumer application in history earlier this year, reaching 100 million active users in January only two months after it was launched.
ChatGPT and other generative AI systems create content using large amounts of data scraped from the internet. Tremblay and Awad’s lawsuit said books are a “key ingredient” because they offer the “best examples of high-quality long-form writing”.
The complaint estimated that OpenAI’s training data incorporated over 300 000 books, including from illegal “shadow libraries” that offer copyrighted books without permission.
Awad is known for novels including 13 Ways of Looking at a Fat Girl and Bunny. Tremblay’s novels include The Cabin at the End of the World, which was adapted in the M Night Shyamalan film Knock at the Cabin released in February.
Tremblay and Awad said ChatGPT could generate “very accurate” summaries of their books, indicating that they appeared in its database. The lawsuit seeks an unspecified amount of money damages on behalf of a nationwide class of copyright owners whose works OpenAI allegedly misused. — Blake Brittain , (c) 2023 Reuters