Preloader

Anthropic scandal: millions of looted books used to train AI and bypass copyrights

  • Jun 01, 2026 10:39

The "Project Panama" affair reveals a simple truth: behind artificial intelligence, there are also physical books, cut up and transformed into data.

Before becoming a file, a book is still an object. It has a spine, pages, glue, weight and dust. In the case of Anthropic, the company that developed Claude, this very concrete dimension found itself at the heart of an industrial process: books bought on the second-hand market, cut up, scanned and transformed into digital text. What was left of the volumes was then recycled.

Internally, the project was known as "Project Panama". Reading the documents from the copyright lawsuit, it becomes clear what the operation was all about: gathering a massive quantity of physical books to train artificial intelligence models. Books were chosen because they were considered to be much better linguistic material than the noise of the web. Fewer sentences gleaned at random online, more texts written, edited and published.

From books to data

The most striking step was the method. Volumes arrived from second-hand dealers, then were prepared for destructive scanning, cut along the spine and run through high-speed professional scanners. Once scanned, they never became books again. All that remained was data on one side, paper for recycling on the other.

Exact quantities are not entirely clear, but we're talking about hundreds of thousands, if not millions, of volumes. The project was designed to digitize between 500,000 and 2 million books in around six months. This was nothing like a small archiving operation. It was a real industry, with suppliers, warehouses, cutting machines, scanners, costs and logistics.

This is where the case becomes interesting, even beyond the legal debate. Artificial intelligence is often presented as something light, distant, almost immaterial: the cloud, the algorithm and a streamlined interface. Here, on the contrary, the cloud sounds like paper. It has boxes, industrial blades, pages torn out, books bought and then dismantled.

The crux of copyright

In the American proceedings, Judge William Alsup distinguished between two aspects. The use of books purchased legally and then scanned to train Claude was deemed compatible with fair use, the American doctrine which allows, in certain cases, the use of protected works without authorization. The situation is different for pirated books: the case documents revealed that Anthropic had downloaded and stored millions of texts from illegal archives, and this part was treated as a separate offence.

The switch to physical second-hand books therefore also appears to have been a prudent legal decision. Buying a paper copy offered the company more solid ground than downloading works from pirate libraries. In the United States, anyone who buys a physical object can resell it, lend it out or destroy it. The problem arises when this object is transformed into a digital copy and integrated into systems capable of generating new texts.

Anthropic subsequently accepted a $1.5 billion settlement to put an end to the class action brought by the authors, without admitting liability. The agreement covers pirated works and provides for approximately $3,000 per book involved. In May 2026, however, final approval was still under review: Judge Araceli Martinez-Olguin demanded further details on legal fees and payments to the main plaintiffs.

AI doesn't appear out of nowhere

The Anthropic case concerns Claude, but it applies to the entire industry. Great generative models need texts, images, code, articles, manuals, novels and essays. They need human work already done. Sometimes this work is authorized and paid for. At other times, it is collected en masse, integrated into non-transparent data sets, and only contested when a lawsuit breaks out.

"Project Panama" makes this dependency visible. To make the machine write better, we needed books written by people. To make a chatbot more natural, we used works from authors, editors, translators, proofreaders, publishers, libraries and readers. Today, the digital promise is still based on a very physical material.

The question also concerns Europe, where the relationship between copyright, data mining and artificial intelligence remains open. Companies talk about innovation, transformation and progress. Content creators are calling for authorization, remuneration and traceability. Between the two, there are courts, recent regulations and a very concrete question: what is the value of human labor when it becomes the fuel of AI?

Share: