Wargamer has identified 15 pirated Warhammer books in a dataset that was used to train AI systems, including Meta’s LLaMA and Bloomberg’s BloombergGPT, as well as non-Warhammer books by 34 authors who have also written for Games Workshop’s Black Library publishing imprint.
The books include titles from across the Warhammer 40k, Age of Sigmar, and even Warhammer: The Old World settings, with work by current and former Warhammer book authors .
On Monday, The Atlantic launched an online search tool that allows anyone to check whether their favorite author’s work appears in a massive repository of digital texts used to train several AI tools, called “Books3”. Meta and Bloomberg have cited Books3 as being part of the training data used to develop their AIs.
Wargamer used the Atlantic’s tool to search Books3 for Black Library authors. Here’s what we found:
Author | Book |
Dan Abnett | Legion |
David Annandale | Yarrick |
Andy Chambers | Bellathonis and the Shadow King Incubus |
John French | Ahriman: Exile |
Laurie Goulding | Mark of Calth |
Andy Hoare | Commissar |
William King | Sword of Caledor |
Steve Lyons | Down Among the Dead Men |
Robbie MacNiven | Red Tithe |
Sandy Mitchell | The Greater Good |
Josh Reynolds | Gotrek & Felix: Road of Skulls Soul Wars |
Gav Thorpe | Ravenwing |
C. L. Werner | Blood for the Blood God |
In addition, we found non-Black Library works from 34 authors who have also written for Games Workshop.
Authors including Sarah Silverman, Michael Chabon, and Paul Tremblay are currently raising lawsuits against Meta, on the grounds that unauthorized use of their texts constitutes copyright infringement.
Black Library authors are employed on a “work for hire” basis, surrendering copyright in their work to Games Workshop. Wargamer has contacted Games Workshop to ask for its stance on the use of its copyrighted works to train AI, and if it is taking any legal action, and will update this article if we receive comment.
The Atlantic’s search tool comes from the work of journalist Alex Reisner. Reisner published an article in The Atlantic on August 19 explaining how he had obtained the training dataset Books3, which was used to train Meta’s LLaMA AI, “the initial model of BloombergGPT”, and the open-source AI tool GPT-J. Books3 contains “roughly 190,000 entries”, each one a large text.
Reisner wrote custom programmes to extract usable information from the huge data repository. He uncovered the ISBNs – unique identifying codes – for 170,000 books. With just 15 books present, Black Library is underrepresented: Reisner says that “more than 30,000 titles are from Penguin Random House and its imprints, 14,000 from HarperCollins, 7,000 from Macmillan, 1,800 from Oxford University Press, and 600 from Verso”.
So-called AI Tools have become contentious in the gaming industry. So far discussions have focused mostly on AI-generated art, rather than text; there was a furore when a recent DnD book was discovered to contain AI-generated illustrations. We recommend you check out our interview with RPG creators about the “Pandoras box” of AI art for industry insights into the growing phenomenon.