Training Data on Trial: AI's First Fair Use Test - MarketAlert – Real-Time Market & Crypto News, Analysis & Alerts

“Courts are adapting the flexible fair-use doctrine to modern technology without rewriting the statute.”

In 2025, three federal courts finally confronted a question that had hovered over artificial intelligence for years: can machines legally learn from copyrighted works? Each opinion — Thomson Reuters v. Ross Intelligence, Bartz v. Anthropic, and Kadrey v. Meta Platforms — applied the four-factor fair-use test under 17 U.S.C. §107 to large-scale model training. Together, they form the first real framework for evaluating how copyright interacts with machine learning.

The results point toward a single principle: when AI training reproduces a copyrighted work’s market function, it fails fair use. When the training is analytical — using works as data rather than expression — it passes. The following cases mark where courts are now drawing that line.

In Thomson Reuters v. Ross Intelligence (D. Del. 2025), Westlaw accused Ross of using its headnotes and key-number system to train a competing legal-research AI. Ross’s contractor, LegalEase, created more than 25,000 “Bulk Memos” built from Westlaw content. Those memos became Ross’s training corpus, allowing its AI to return results that mimicked Westlaw’s summaries and topic hierarchy.

Judge Stephanos Bibas found that Ross’s conduct went beyond learning. The company “built its competing product using Bulk Memos, which in turn were built from Westlaw headnotes.” That functional overlap was decisive.

Three of four factors — purpose, nature, and market effect — went against Ross. Only the third factor offered partial support. The result was straightforward: using a rival’s proprietary database to build a substitute product is infringement, not innovation. Ross now anchors the limit of permissible AI training.

Only months later, Judge William Orrick reached the opposite conclusion in Bartz v. Anthropic PBC (N.D. Cal. 2025). Authors alleged that Anthropic’s Claude models copied their books from “shadow libraries” during training. Anthropic responded that the training process extracted statistical patterns about language and style; it did not store or output the books’ expressive content.

Three of four factors — purpose, amount, and market effect — favored Anthropic. The second factor mattered little. Judge Orrick granted summary judgment for the company, holding that using copyrighted works as input for analytical learning qualified as fair use. Bartz thus established the first federal recognition of large-scale model training as transformative use.

Two days after Bartz, Judge Vince Chhabria issued a companion opinion in Kadrey v. Meta Platforms Inc. (N.D. Cal. 2025). Authors including Richard Kadrey and Sarah Silverman alleged that Meta trained its LLaMA 2 and 3 models on pirated copies of their novels. Meta conceded that full works were ingested but argued that training was purely analytical: the models extracted linguistic patterns and generated new text, not copies.

Three factors supported fair use, one was neutral. Kadrey reinforced Bartz: analytical ingestion of copyrighted works for machine learning is transformative and permissible when there is no market harm. With two consistent rulings from the Northern District of California, the courts signaled growing consensus on the legality of data-driven training.

Although each court applied the same four statutory factors, the outcomes diverged along a simple axis: Ross involved substitution; Bartz and Kadrey involved learning. From those decisions, several guiding principles now shape the fair-use landscape for AI.

Courts now focus on what the copying does, not how it looks. When a system ingests expressive works to compute statistical relationships rather than to reproduce text, the purpose is transformative. Ross failed this test because its AI served the same research function as Westlaw. Bartz and Kadrey passed because their models used books as linguistic data, not as market substitutes.

Following Google v. Oracle and Sony v. Connectix, courts accept complete copying when it is technologically necessary and non-expressive. The end user must never receive the protected content. Following suit, both Bartz v. Anthropic and Kadrey v. Meta treated wholesale ingestion as acceptable intermediate copying.”Market Harm Requires Evidence

Each decision reiterated that the fourth factor dominates. Speculative licensing markets or generalized fears of dilution are insufficient. Without empirical proof of substitution, fair use prevails. Ross offered concrete competition; Bartz and Kadrey did not.

The creative nature of novels or editorial summaries remains relevant but no longer decisive. In analytical or functional contexts, courts treat creativity as a weak factor that yields to transformation and market analysis.

The trilogy collectively draws a pragmatic boundary. AI training that competes with a copyrighted product will fail fair use. Training that learns from works to generate new, non-substitutive outputs will likely succeed.

This distinction aligns with Warhol and Google v. Oracle: transformation turns on purpose, not on medium. A change from text to algorithm matters only if it changes the function. When copying repurposes expression into data for computation, that function diverges enough to justify fair use.

For developers, these cases highlight the need to document how training data is sourced and used. Maintaining clear records that separate analytical use from expressive reproduction will strengthen future defenses. For rights holders, the message is to focus on demonstrable market harm rather than speculative licensing theories.

The 2025 decisions are the first chapter, not the last word. Three developments will shape what follows.

Until then, these cases provide the working rule of thumb for 2026 and beyond:

The takeaway here is that transformation protects learning; substitution invites liability. Courts are adapting the flexible fair-use doctrine to modern technology without rewriting the statute. As AI becomes embedded in creative and analytical work, the emerging doctrine rewards transparency and proof over speculation. The law is beginning to balance innovation and authorship — by asking what the machine does with the works it reads.

This news is powered by IPWatchdog.com | Patents & Patent Law

Training Data on Trial: AI’s First Fair Use Test

Like this:

Related

Share this:

Like this:

Related

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.