Adobe faces a proposed class-action lawsuit accusing it of using unauthorised book copies to train its SlimLM language models. The suit, filed by Oregon author Elizabeth Lyon, claims her non-fiction guides appeared in training data alongside thousands of others.
SlimLM features lightweight models with 400 million parameters for mobile document tasks. Adobe states pre-training used SlimPajama-627B, a 627-billion-token dataset from Cerebras in June 2023.
Lyon’s case—first reported by Reuters—alleges shady sourcing.
“The SlimPajama dataset was created by copying and manipulating the RedPajama dataset (including copying Books3),” the lawsuit says. “Thus, because it is a derivative copy of the RedPajama dataset, SlimPajama contains the Books3 dataset, including the copyrighted works of Plaintiff and the Class members.”
Books3, a 191,000-book trove from piracy sites, has sparked suits against Apple, Salesforce, and others over RedPajama use. Anthropic settled a similar claim for $1.5 billion in September. Adobe, known for ethical Firefly AI tools since 2023, has not commented.
Experts warn of rising demands for licensed data in AI training.