Close Menu
TechCentralTechCentral

    Subscribe to the newsletter

    Get the best South African technology news and analysis delivered to your e-mail inbox every morning.

    Facebook X (Twitter) YouTube LinkedIn
    WhatsApp Facebook X (Twitter) LinkedIn YouTube
    TechCentralTechCentral
    • News

      TCS | Reserve Bank fintech head Lyle Horsley on the G20 TechSprint

      22 May 2025

      Sam Altman and Jony Ive’s big bet to out-Apple Apple

      22 May 2025

      Former MTN bosses approach SA’s top court in Turkcell case

      22 May 2025

      iPhone designer Jony Ive to build AI devices with OpenAI

      22 May 2025

      Bitcoin smashes R2-million mark in record-breaking rally

      22 May 2025
    • World

      First AI-generated drugs could go on sale by 2030

      22 May 2025

      Google, Volvo deepen partnership on car software

      21 May 2025

      Microsoft pushes for industry standards in AI agent collaboration

      19 May 2025

      Microsoft to lay off 3% of workforce in organisation-wide cuts

      14 May 2025

      AI-voiced audiobooks are coming to Audible

      13 May 2025
    • In-depth

      South Africa unveils big state digital reform programme

      12 May 2025

      Is this the end of Google Search as we know it?

      12 May 2025

      Social media’s Big Tobacco moment is coming

      13 April 2025

      This is Europe’s shot to emerge from Silicon Valley’s shadow

      10 April 2025

      Microsoft turns 50

      4 April 2025
    • TCS

      TCS+ | Schneider Electric’s Clive Roberts on driving digitisation in the CPG sector

      22 May 2025

      TCS | Dalene Steyn on Capitec’s ambitious mobile gameplan

      21 May 2025

      Meet the CIO | Schalk Visser on Cell C’s big tech pivot

      13 May 2025

      TCS | Kiaan Pillay on fintech start-up Stitch and its R1-billion funding round

      7 May 2025

      TCS+ | Switchcom and Huawei eKit: networking made easy for SMEs

      6 May 2025
    • Opinion

      Solar panic? The truth about SSEG, fines and municipal rules

      14 April 2025

      Data protection must be crypto industry’s top priority

      9 April 2025

      ICT distributors must embrace innovation or risk irrelevance

      9 April 2025

      South Africa unprepared for deepfake chaos

      3 April 2025

      Google: South African media plan threatens investment

      3 April 2025
    • Company Hubs
      • Africa Data Centres
      • AfriGIS
      • Altron Digital Business
      • Altron Document Solutions
      • Arctic Wolf
      • AvertITD
      • Braintree
      • CallMiner
      • CYBER1 Solutions
      • Digicloud Africa
      • Digimune
      • Domains.co.za
      • ESET
      • Euphoria Telecom
      • Incredible Business
      • iONLINE
      • Iris Network Systems
      • LSD Open
      • NEC XON
      • Network Platforms
      • Next DLP
      • Ovations
      • Paracon
      • Paratus
      • Q-KON
      • SkyWire
      • Solid8 Technologies
      • Tenable
      • Vertiv
      • Videri Digital
      • Wipro
      • Workday
    • Sections
      • AI and machine learning
      • Banking
      • Broadcasting and Media
      • Cloud services
      • Contact centres and CX
      • Cryptocurrencies
      • Education and skills
      • Electronics and hardware
      • Energy and sustainability
      • Enterprise software
      • Fintech
      • Information security
      • Internet and connectivity
      • Internet of Things
      • Investment
      • IT services
      • Lifestyle
      • Motoring
      • Public sector
      • Retail and e-commerce
      • Science
      • Social media
      • Talent and leadership
      • Telecoms
    • Events
    • Advertise
    TechCentralTechCentral
    Home » AI and machine learning » New claims in Meta fight over copyrighted books used in AI

    New claims in Meta fight over copyrighted books used in AI

    Lawyers had warned Meta about the legal perils of using pirated books to train its AI models, according to a new filing.
    By Katie Paul13 December 2023
    Twitter LinkedIn Facebook WhatsApp Email Telegram Copy Link
    News Alerts
    WhatsApp

    Meta Platforms’ lawyers had warned it about the legal perils of using thousands of pirated books to train its AI models, but the company did it anyway, according to a new filing in a copyright infringement lawsuit initially brought earlier this year.

    The new filing late on Monday night consolidates two lawsuits brought against the Facebook and Instagram owner by comedian Sarah Silverman, Pulitzer Prize winner Michael Chabon and other prominent authors, who allege that Meta has used their works without permission to train its artificial intelligence language model, Llama.

    A California judge last month dismissed part of the Silverman lawsuit and indicated that he would give the authors permission to amend their claims. Meta did not immediately respond to a request for comment on the allegations.

    Tech companies have been facing a slew of lawsuits this year from content creators…

    The new complaint includes chat logs of a Meta-affiliated researcher discussing procurement of the dataset in a Discord server, a potentially significant piece of evidence indicating that Meta was aware that its use of the books may not be protected by US copyright law.

    In the chat logs quoted in the complaint, researcher Tim Dettmers describes his back-and-forth with Meta’s legal department over whether use of the book files as training data would be “legally okay”.

    “At Facebook, there are a lot of people interested in working with The Pile, including myself, but in its current form, we are unable to use it for legal reasons,” Dettmers wrote in 2021, referring to a dataset Meta has acknowledged using to train its first version of Llama, according to the complaint.

    ‘Active copyrights’

    The month prior, Dettmers wrote that Meta’s lawyers had told him “the data cannot be used or models cannot be published if they are trained on that data”, the complaint said.

    While Dettmers does not describe the lawyers’ concerns, his counterparts in the chat identify “books with active copyrights” as the biggest likely source of worry. They say training on the data should “fall under fair use”, a US legal doctrine that protects certain unlicensed uses of copyrighted works.

    Dettmers, a doctoral student at the University of Washington, said he was not immediately able to comment on the claims.

    Read: Pansy Tlakula: ‘Why AI is giving me sleepless nights’

    Tech companies have been facing a slew of lawsuits this year from content creators who accuse them of ripping off copyright-protected works to build generative AI models that have created a global sensation and spurred a frenzy of investment.

    If successful, those cases could dampen the generative AI craze, as they could raise the cost of building the data-hungry models by compelling AI companies to compensate artists, authors and other content creators for the use of their works.

    At the same time, new provisional rules in Europe regulating artificial intelligence could force companies to disclose the data they use to train their models, potentially exposing them to more legal risk.

    Meta released a first version of its Llama large language model in February and published a list of datasets used for training, including “the Books3 section of The Pile”. The person who assembled that dataset has said elsewhere that it contains 196 640 books, according to the complaint.

    The company did not disclose training data for its latest version of the model, Llama 2, which it made available for commercial use this northern hemisphere summer.

    Llama 2 is free to use for companies with fewer than 700 million monthly active users. Its release was seen in the tech sector as a potential game-changer in the market for generative AI software, threatening to upend the dominance of players like OpenAI and Google that charge for use of their models.  — (c) 2023 NewsCentral Media

    Get breaking news alerts from TechCentral on WhatsApp



    Google Meta Meta Platforms OpenAI
    Subscribe to TechCentral Subscribe to TechCentral
    Share. Facebook Twitter LinkedIn WhatsApp Telegram Email Copy Link
    Previous ArticleSpaceX valuation soars to nearly $180-billion
    Next Article US takes first step to mandating anti-drunk driving technology

    Related Posts

    Sam Altman and Jony Ive’s big bet to out-Apple Apple

    22 May 2025

    iPhone designer Jony Ive to build AI devices with OpenAI

    22 May 2025

    Google, Volvo deepen partnership on car software

    21 May 2025
    Company News

    Top tech leaders back SAPHILA 2025

    22 May 2025

    What SA’s financial institutions must know about the new IT governance law

    22 May 2025

    The end of Windows 10 support is nigh – what you need to know

    22 May 2025
    Opinion

    Solar panic? The truth about SSEG, fines and municipal rules

    14 April 2025

    Data protection must be crypto industry’s top priority

    9 April 2025

    ICT distributors must embrace innovation or risk irrelevance

    9 April 2025

    Subscribe to Updates

    Get the best South African technology news and analysis delivered to your e-mail inbox every morning.

    © 2009 - 2025 NewsCentral Media

    Type above and press Enter to search. Press Esc to cancel.