Anthropic trained Claude on millions of books—including just a few torrented volumes—but it’s all ‘quintessentially

Anthropic trained Claude on millions of books—including just a few torrented volumes—but it’s all ‘quintessentially transformative’ fair use.

From the courtroom mess that will satisfy no one…

1 Copies of purchased print books, as scanned into internal central library = fair use (transformative: “scann[ing] the books into digital form – discarding the paper originals” and retain[ing] them instead in a digital format, easing storage and searchability.” pp. 4; 17-18)
2 Copies of pirated books, as downloaded into internal central library = not fair use (it’s theft, and nothing was transformed)
3 Copies of central library books used to train AI = fair use (“quintessentially transformative,” p. 13 + contortions like “The copies used to train specific LLMs did not and will not displace demand for copies of Authors’ works, or not in the way that counts under the Copyright Act,” p. 28).

  • Some, but probably not all, works in the internal central library were used to train. Some of those were pirated works, and some of those were purchased works (see p. 5, discussing training off each type).
  • There is a #4 category of copies. (“And, as for any copies made from central library copies but not used for training, this order does not grant summary judgment for Anthropic. On this record in this posture, the central library copies were retained even when no longer serving as sources for training copies,” p. 31). Those sentences are not about the same copies: the first is “copies made from central library copies”; the second is the “central library copies”. Suddenly it seems to matter whether a library copy served as a source of a training copy. Does that mean that #2 copies existed solely as intermediate copies for #3, would be fair use too?
  • The opinion notes that Anthropic pivoted to buying books when it became “’not so gung ho about’ training on pirated books ‘for legal reasons’” (p. 3), states the digital library included all of six of plaintiffs’ works in pirated and scanned-print form (notes 1-2); quotes several Anthropic leaders as stating the goal as retention “forever” for “general purpose” (p. 31). The facts seem to establish post-training retention, and availability for non-training purposes.

3 Statutory damages for all 7m+ (p. 2-3) pirated works = $5.25bn (minimum)

Factor Purpose & Character Of Use Nature of the Work Amount & Substantiality of Portion Used Effect on Market or Value of the Work
Real distinction Character only: theft vs. transformation. (Same purpose: maybe training, maybe not.) None (goal was obtaining the expressive content of every work) None (all of it) None (until this lawsuit, few knew these nonpublic copies existed).
Court’s distinction Destroying physical copy to create digital copy is transformative. Stealing digital copy is not transformative None None (sidestepped with past-tense rehash of factor 1: “already enjoyed entitlement” vs. “lacked any entitlement”) None (court discusses the infringement itself, not its effect, when it says one “displaced purchases” and the other “displaced demand”)
Print books digitized (language that is also true of pirated books in gray) ✅The “format change itself added no new copies, eased storage and enabled searchability, and was not done for purposes trenching upon the copyright owner’s interests — it was transformative.” Analyzed together: “The second factor points against fair use for all copies alike.” ✅Anthropic already enjoyed entitlement to keep the copies in its library. The purpose of the copying was to keep them in its library but with more favorable storage and searchability properties. Copying the entire work was exactly what this purpose required. There was no surplus copying. The source copy was destroyed. The third…factor favors fair use for the purchased library copies[.]” ✅”this order assumes Anthropic’s format change from print to digital displaced purchases of new digital copies that Anthropic would have made directly from Authors … But for reasons stated under the first factor, such losses did not relate to something the Copyright Act reserves for Authors to exploit. It was a format change.
Pirated books (language that is also true of digitized print books in gray) ❌”the initial copies were pirated to create a central, general-purpose library, as a substitute for paid copies to do the same thing…To retain copies should they prove useful for one thing or another, was its own use — and not a transformative one.” ❌”For the pirated library copies, however, Anthropic lacked any entitlement to hold copies of the books at all. Its purpose, it says, was to train LLMs. But its objective conduct was to seek “all the books in the world” and then retain them even after deciding it would not make further copies from them for training — indicating there were other further uses. Against the purpose of acquiring all the books one could on the chance some might prove useful for training LLMs and maybe other stuff too, almost any unauthorized copying would have been too much. Anthropic copied millions of books in toto, Authors’ works among them. The third factor points against fair use for the pirated library copies. ❌The copies … obtained from pirated sources plainly displaced demand for Authors’ books — copy for copy. Not every person who merely intends to make a fair use of a work is thereby entitled to a full copy in the meantime, nor even to steal a copy so that achieving this fair use is especially simple or cost-effective. Here, the copies employed in training LLMs were one thing, but the copies acquired to assemble a convenient, general-purpose library of works for various uses for which the company might have of them, if any, was a different use altogether.”

From: Moore, Schuyler smoore@greenbergglusker.com