Artists Rage Against the Machine
Introduction
Perhaps the only thing that rivals the meteoric rise of generative artificial intelligence (AI) is the widespread blowback against its use in creating artwork and the number of lawsuits filed against AI companies.
Artists, authors, celebrities, music publishers, programmers and others in the creative community have sued various leading AI start-ups, including OpenAI, Midjourney, and Stable Diffusion, over various issues.
One recurring complaint is AI developers have scraped copyright works en masse to train their AI algorithms and, in doing so, allegedly infringed owners’ rights.
On 30 October 2023, US District Judge William H. Orrick of the Northern District of California issued a decision in Andersen et al. v. Stability AI Ltd., et. al. 23-cv-00201-WHO (N.D. Cal. Oct. 30, 2023), one of the very first copyright infringement cases filed against AI companies.
In the decision, Judge Orrick dismissed[1] all but one of the Plaintiffs’ claims.
The decision has been widely reported as a victory for AI companies, but the situation is a bit more nuanced.
It is not an outright victory for AI companies, but it does highlight several difficulties creators and artists face in using copyright laws to stop their intellectual property from being used for AI training.
How generative AI works
In general AI works by having specialized algorithms process huge amounts of data, analyse the data for patterns and correlations, and then use these to predict future outcomes.
It’s not a new technique and is already widely used in predictive text tools like smartphones texting apps or in search engines.
However, recent advances in AI development now mean that even untrained users can generate sophisticated text or photo-realistic images with just a few simple word prompts.
When it comes to image generation, AI adopts a technique called “diffusion”.
Images are first put through Al algorithms which progressively add visual noise to the images, “noising,” until the images look to humans like a collection of random dots.
At each step the algorithms record, or learn, how the addition of noise changes an image; the algorithms are than trained to reverse the process and generate images from noise, known as “denoising.”
The diffusion technique has been further refined and developed, with researchers adding additional functions like storing noise images, interposing or mixing different noises to generate new images, and shaping the denoising process with information called “conditioning,” like text prompts.
To achieve this, AI algorithms need to process and learn from a huge number of images and artists allege AI companies have, unlawfully and without consent or compensation, scraped their work from the Internet, most of which is protected under current copyright laws, to train their AI algorithms.
Background of the Andersen case
The Plaintiffs are three US-based artists and illustrators.
The Defendants are Stability AI, developer of Stable Diffusion AI, the engine driving many AI text-to-image services currently commercially available; MidJourney, a popular text-to-image AI generation service provider using Stable Diffusion technology; and DeviantArt, a large online artist community and artwork repository, which has launched its own Stable Diffusion-based AI tool named DreamUp.
The Plaintiffs allege, amongst other claims, the Defendants have infringed the copyright in their artwork by using them without authorization to train their AI algorithms.
They also claim all images generated by such AI tools are infringing derivative works[2].
The Defendants in turn all filed motions to dismiss all claims, leading to the present court decision.
The Decision
While Judge Orrick dismissed all but one of the Plaintiffs’ claims, it’s not in fact the resounding success it seems for the Defendants because:
- the Plaintiffs’ claim of direct infringement against Stability, arguably the most important claim of all, is allowed to proceed;
- in relation to the dismissed claims, the Plaintiffs are allowed to amend them to remove any defects identified by the judge’s decision.
All of which means the proceedings can still move forward.
However, the Judge identified several issues and made certain comments which suggest the Plaintiffs will have an uphill battle ahead.
Key Takeaways
1) The Plaintiffs’ direct infringement claim against Stability is allowed to proceed
This claim concerns Stability’s creation and storage of copies of copyright images scraped from the Large-Scale Artificial Intelligence Open Network dataset “LAION”, which the Plaintiffs allege is funded by Stability itself, to train the Stable Diffusion AI algorithm.
The Plaintiffs allege Stability has made copies and stored “billions of copyrighted images without permission to create Stable Diffusion”.
In essence they argue creating copies of copyright works to train AI is in itself a direct infringement of Plaintiffs’ copyright, regardless of what images have been generated using the AI algorithm.
Stability itself acknowledges the question of whether the act of copying amounts to copyright infringement cannot be resolved in a motion to dismiss (i.e. it should be resolved at the main trial).
This claim goes to the core of the Plaintiffs’ complaint, and is in fact the main complaint most creators/artists/copyright owners have against AI companies, making it, arguably, the most important claim of all.
As the use of training materials is critical to current generative AI technology, whether usage can be restricted is a relatively new and untested area in copyright law.
If the Plaintiff succeeds, the difficulties of claims based on AI-generated output may be sidestepped. The entire methodology of how AI companies obtain training material may have to be revamped[3].
This is certainly a battlefield to watch.
2) Third parties’ use of pre-existing AI infrastructures would unlikely be considered direct infringement
The Plaintiffs made similar direct infringements claims against MidJourney and DeviantArt, alleging their respective AI tools both rely on the Stable Diffusion engine, which the Plaintiffs claim “contains compressed copies of the (Training Images)”.
Both claims have been dismissed by Judge Orrick, who held the Plaintiffs failed to allege specific facts to show MidJourney and DeviantArt themselves have played any affirmative role in copying copyrighted work to create Training Images.
The Judge also commented that MidJourney and DeviantArt merely allow their customers access to Stable Diffusion, which is unlikely to be enough to support a direct infringement claim[4].
3) Difficulty for copyright owners to know and identify what works have been infringed
Throughout the Decision, the Defendants repeatedly argue the Plaintiffs have not been able to specifically identify which of their works have been allegedly copied to produce Training Images.
Andersen, one of the Plaintiffs, claims her name could be found on the site “haveibeentrained.com”[5], which supports her belief her work has been included in the LAION dataset and has thus been used in the training of Stable Diffusion.
The Defendants contend this allegation is not sufficient.
In the Decision, the Judge held that the “haveibeentrained.com” site does give Andersen a reasonable belief her work has been scraped and used in training and he allowed the case to proceed.
The Judge remarked the Defendants could challenge Andersen’s assertion in discovery.
While Andersen’s inability to identify which work has been infringed is not fatal to her claim in the Decision, it does draw attention to a specific difficulty copyright owners face in copyright infringement cases.
Outside LAION, which is one of the best-known databases, where third parties have developed countermeasures to discover what is included, how AI companies obtain their training data is largely unknown and unregulated.
It is widely assumed multiple AI companies/ AI training service providers scrap millions of images, including copyright ones, from the Internet without seeking consent.
While these claims have not been refuted by AI companies, there is arguably no direct evidence which would meet the standards required in a court of law.
Currently AI companies have no obligation to disclose what information has been used in training AI algorithms.
Other than cases where it is clear from the output that certain images must have been used in training (like the Getty Image[6]case where Getty Image’s own watermarks were reproduced in AI-generated output), it is very difficult for copyright owners to know which of their works have been used in AI training.
It may not be possible to prove their work has been used without going through an arduous discovery process.
This presents an even higher hurdle in the US, where the copyright registrations of specific works are required to start infringement litigation, something which may not be commercially feasible for artists who would have to register each piece.
4) AI-generated output would unlikely be considered infringing derivative works
Perhaps the hardest hurdle to overcome is that properly trained AI models will draw from hundreds, if not thousands, of Training Images to produce output which means their production is unlikely to closely resemble the original Training Images.
Copyright owners would have a hard time arguing that any output generated by AI are infringing derivative works.
This is acknowledged by the Plaintiffs themselves[7], while the Judge observed he is “not convinced that copyright claims based on a derivative theory can survive absent ‘substantial similarity’ type allegations.”
5) AI-processed data is unlikely to be considered infringing derivative works either
The Plaintiffs argue the Defendants store “compressed copies”[8] of copyright works in the Stable Diffusion engine, describing the noising process as “an alternate way of storing a copy of those images… in an even more efficient and compressed manner”[9].
This runs into the problem that Training Images that have gone through the noising process resemble collections of dots and look nothing like the original images. It is doubtful if such a process can be described as “compression”.
The Judge said “Stable Diffusion contains only algorithms and instructions that can be applied to the creation of images that include only a few elements of a copyrighted Training Image” (emphasis added)[10], which casts heavy doubt as to whether such arguments can be salvaged at all.
6) Is AI training fair use?
As the Decision only deals with the Defendants’ motions to dismiss, the Judge did not discuss possible defences the Defendants could raise.
However, it very likely the “fair use” defence will feature prominently in the main trial, as it is the position of many AI companies, including Stability, that use of copyright materials for AI training amounts to fair use and is thus not infringement.
It is often argued that AI machine learning is no different from how human artists learn and take reference from existing works to create their own work.
Since abstract notions like “art styles” are not protected by copyright, AI-generated output, so long as it is not substantially similar, cannot be seen as infringing rights even if AI “borrows” from existing copyrighted work.
Fair use is a predominantly US legal doctrine that allows for certain uses of copyright material without the consent of the copyright owners[11].
Some of the central considerations in deciding fair use are the “amount and substantiality of the portion taken” from the copyrighted work and whether the subsequent use is “transformative”.
Whether AI training amounts to fair use has yet to be tested in the courts.
Judge Orrick, however, briefly touched on the issue of “transformative use” in his decision, observing that “Output Images are not likely to be substantially similar to plaintiffs’ works captured as Training Images, and therefore may be the result of substantial transformation…”[12].
While the Judge left the issue to be resolved at a later date, it again highlights the fact that AI-generated output is likely substantially different from the Training Images, which weighs in favour of AI companies.
Whether other factors, like the fact AI-generated output is arguably directly competitive with, and could replace, the original copyrighted works, would tip the scale back in favour of artists, is yet to be seen.
Closing Comments
The court’s decision is likely to have a profound impact on all AI learning cases going forward in the US.
Since the Plaintiffs’ most important claim remains completely intact, the bottom-line is the fight is not over and AI companies are not out of the woods yet.
It does, however, give some early insights into how US Courts will treat such cases and what judges will focus on. These early indications highlight the difficulties copyright owners face.
The author believes the current copyright law regime, with its focus on “similarity”, “transformative effect” and “substantial taking”, is ill-equipped to deal with AI-related issues.
Generative AI is not comparable with previous inventions like cameras or computer-aided design, in the sense AI is not a mere passive tool for humans to manifest creative intent.
In fact, machine learning algorithms have developed so rapidly even their creators cannot fully comprehend how they work, the so-called “black box problem”.
AI has “creative capability” that exceeds what traditional copyright laws are designed to handle.
Perhaps it is time for governments to step in and update copyright laws to handle these new challenges.
[1] Judge Orrick has however granted the Plaintiffs leave (permission) to cure certain defects in the pleadings that he has identified.
[2] Generally the right to creative derivative work, i.e. a work based on existing works, is an exclusive right held by the copyright owner of the existing works.
[3] It should however be noted even if Stability’s mode of use of making copies of copyright images is found to be infringing, there are likely technical solutions for AI companies to sidestep this issue.
[4] The Judge comments on lines 3-7 of p. 10 of the Decision that “It is unclear, for example, if Stable Diffusion contains only algorithms and instructions that can be applied to the creation of images that include only a few elements of a copyrighted Training Image, whether DeviantArt or Midjourney can be liable for direct infringement by offering their clients use of the Stable Diffusion “library” through their own apps and websites.”
[5] A website developed by third parties that allows artists to see if their works are included in the LAION database.
[6] Getty Images (US), Inc. v. Stability AI, Inc. No. 1:23-cv-00135.
[7] Line 2-3, p. 11 of the Decision
[8] The Judge has requested the Plaintiffs to clarify what they mean by “compressed copies” in their amended filings.
[9] Lines 10-13, p. 9 of the Decision.
[10] Lines 4-5, p. 10 of the Decision
[11] Many jurisdictions operate on more restrictive “fair dealing” principles instead of the open-ended “fair use” principles used in the US as a possible defence to copyright infringement claims. Under fair dealing principles, under certain categories of use (e.g. education, archiving, news reporting) determined by the respective legislatures would be considered non-infringing. Currently no major “fair dealing” jurisdictions adopt machine learning as a “fair dealing” category.
[12] Lines 14-18, p. 22 of the Decision.