We may earn a commission when you buy through links in our articles. Learn more.

Pirated Warhammer books used to train Meta and Bloomberg AI

Wargamer found 15 pirated Black Library books among tens of thousands in a dataset used to train AI Bloomberg, Meta, and open source AI tools.

Cover art from Eisenhorn, a Warhammer book published by Black Library - two men, both seriously dangerous looking, one in the facing hiding carapace armor of an Arbites enforcer, the other in a long green trenchcoat.

Wargamer has identified 15 pirated Warhammer books in a dataset that was used to train AI systems, including Meta’s LLaMA and Bloomberg’s BloombergGPT, as well as non-Warhammer books by 34 authors who have also written for Games Workshop’s Black Library publishing imprint.

The books include titles from across the Warhammer 40k, Age of Sigmar, and even Warhammer: The Old World settings, with work by current and former Warhammer book authors .

On Monday, The Atlantic launched an online search tool that allows anyone to check whether their favorite author’s work appears in a massive repository of digital texts used to train several AI tools, called “Books3”. Meta and Bloomberg have cited Books3 as being part of the training data used to develop their AIs.

Wargamer used the Atlantic’s tool to search Books3 for Black Library authors. Here’s what we found:

Author Book
Dan Abnett Legion
David Annandale Yarrick
Andy Chambers Bellathonis and the Shadow King
John French Ahriman: Exile
Laurie Goulding Mark of Calth
Andy Hoare Commissar
William King Sword of Caledor
Steve Lyons Down Among the Dead Men
Robbie MacNiven Red Tithe
Sandy Mitchell The Greater Good
Josh Reynolds Gotrek & Felix: Road of Skulls
Soul Wars
Gav Thorpe Ravenwing
C. L. Werner Blood for the Blood God

In addition, we found non-Black Library works from 34 authors who have also written for Games Workshop.

Authors including Sarah Silverman, Michael Chabon, and Paul Tremblay are currently raising lawsuits against Meta, on the grounds that unauthorized use of their texts constitutes copyright infringement.

Black Library authors are employed on a “work for hire” basis, surrendering copyright in their work to Games Workshop. Wargamer has contacted Games Workshop to ask for its stance on the use of its copyrighted works to train AI, and if it is taking any legal action, and will update this article if we receive comment.

YouTube Thumbnail

The Atlantic’s search tool comes from the work of journalist Alex Reisner. Reisner published an article in The Atlantic on August 19 explaining how he had obtained the training dataset Books3, which was used to train Meta’s LLaMA AI, “the initial model of BloombergGPT”, and the open-source AI tool GPT-J. Books3 contains “roughly 190,000 entries”, each one a large text.

Reisner wrote custom programmes to extract usable information from the huge data repository. He uncovered the ISBNs – unique identifying codes – for 170,000 books. With just 15 books present, Black Library is underrepresented: Reisner says that “more than 30,000 titles are from Penguin Random House and its imprints, 14,000 from HarperCollins, 7,000 from Macmillan, 1,800 from Oxford University Press, and 600 from Verso”.

So-called AI Tools have become contentious in the gaming industry. So far discussions have focused mostly on AI-generated art, rather than text; there was a furore when a recent DnD book was discovered to contain AI-generated illustrations. We recommend you check out our interview with RPG creators about the “Pandoras box” of AI art for industry insights into the growing phenomenon.