AI is evolving quickly and has the potential to change many areas of our lives. However, with this power comes the need to look closely at the legal and ethical issues that impact the authors of existing work that is used to train large language models (LLMs). The central issue is whether the use of copyrighted materials – for example, books by developers of AI models such as ChatGPT – constitutes ‘fair use’ within the meaning of Section 52 of the Indian Copyright Act, 1957.
Section 52 lays down limitations on copyright owners’ rights and the acts that do not constitute infringement, which are often referred to as fair dealing or fair use.
This article discusses how two recent decisions by the US District Court for the Northern District of California on the use of copyrighted material to train generative AI models will impact ANI Media v OpenAI Opco (ANI v OpenAI), which is currently being heard by the High Court of Delhi.
In late 2024, Asian News International (ANI) sued OpenAI for using its copyrighted content for training ChatGPT, OpenAI’s LLM, without its authorisation.
The case framed four issues, with a key question – discussed below – being whether the defendant’s use of the plaintiff’s copyrighted data qualifies as fair use in terms of Section 52 of the Copyright Act, 1957.
California District Court rulings on the use of copyrighted materials to train AI
Even in the US, the term ‘fair use’ is not defined under the US Copyright Act of 1976, and its definition is open for interpretation on a case-to-case basis. The four-factor test set forth in Section 107 of the act for determining fair use is:
Purpose and character of the use;
Nature of the copyrighted work;
Amount and substantiality of the portion used; and
Effect of the use on the potential market.
Recently, the US District Court for the Northern District of California passed two major judgments on the use of copyrighted material to train generative AI models.
In Andrea Bartz, Charles Graeber and Kirk Wallace Johnson v Anthropic, it was alleged that Anthropic downloaded pirated copies of copyrighted books from platforms such as Library Genesis and Books3 without authorisation to train its Claude family of LLMs. The court found that Anthropic’s use of these texts constituted fair use under US copyright law because it was highly transformative: each work was first copied from the central library for the training set, then cleaned of repeating text, and converted into a tokenised copy. Claude retained only compressed copies of the trained works, and the output was not traceable to the authors’ works.
In Richard Kadrey, et al v Meta Platforms, a group of authors sued Meta Platforms for using copies of their books sourced from shadow libraries without any permission to train its Llama family of LLM. The court held that the works of the 13 authors used to train Llama were highly transformative and the authors were unable to demonstrate market dilution of their works. However, the court also pointed out that the ruling only pertained to the rights of the 13 authors suing Meta and not anyone else whose works are used to train Meta’s models, and it cannot be interpreted that Meta’s use for training its LLMs is lawful since the plaintiffs failed to provide sufficient evidence to support their claim.
Applicability of US court observations under Indian copyright law
As noted above, Section 52 of the Indian Copyright Act, 1957 provides for various infringement exceptions and permits the use of copyrighted work or databases in certain scenarios. The issue has not been dealt with in relation to the use of data for training AI. Therefore, the decision in ANI v OpenAI will have far-reaching implications for the development of AI technology in India.
Guidance from Indian court cases
In Super Cassettes Industries v Chintamani Rao, the High Court of Delhi held that “Section 52 carefully and exhaustively enlists various actions which would not constitute infringement of copyright in different classes of works and the limits on such use”. The court construed Section 52 to be exhaustive in nature, having restricted the potential for additional judicial expansion independent of legislative intervention. It also stated that “Parliament deliberately and consciously chose the class of works in relation to which it permitted the exploitation of the copyright for specific purposes only”.
A strict reading of the above would be that expanding the scope of fair dealing is primarily the legislature’s responsibility. Seen through that lens, it seems an amendment of the statute is required to update copyright law to address the technological advancements.
On transformative use, the Division Bench of the High Court of Delhi, in Syndicate of the Press of the University of Cambridge v B D Bhandari, held that use of a work for the purpose of making a guidebook to be a substantially different purpose from that for which the plaintiff’s original work was made. The court recognised this purpose to be a transformative one, which did not impinge upon the expressive purpose for which the plaintiff had an exclusive reproduction right. Therefore, it seems transformative use could be read as an exception to copyright infringement.
Legislative view on AI
Interestingly, a response to a question put in Parliament to the union minister of state for commerce and industry clarified that “the Copyright Act, 1957 obligates the user of Generative AI to obtain permission to use their works for commercial purposes if such use is not covered under the fair dealing exceptions provided under Section 52”. Furthermore, the minister stated that the “current legal framework under the […] Copyright Act is well-equipped to protect Artificial Intelligence generated works and related innovations. Presently, there is no proposal to create any separate right to amend the law in the context of AI-generated content”. This indicates that the legislature wants the courts to interpret the existing provision under Section 52 of the Copyright Act, 1957 to determine if AI applications’ use of copyrighted work to train generative AI models is a fair use.
While the ministry is confident that the current copyright laws are well equipped to protect AI-generated works, the narrow interpretation of fair dealing under Section 52 indicates that the act lacks specific provisions for AI-generated content and raises the bar in ANI v OpenAI to show that the use of the data does not constitute copyright infringement. Additionally, the concept of a transformative use defence is not well-developed jurisprudence in the context of technological developments.
Final thoughts on copyright law and AI-developed works
The use of copyrighted works in training AI is a complex issue that involves balancing the rights of copyright owners with the advancement of AI technology and its societal benefits. The concept of fair use in the context of AI is still developing, and court decisions in generative AI cases will have significant policy implications. Policymakers may need to step in to balance these interests.
The outcomes of legal challenges, including courts’ reluctance to impose liability to avoid hindering a transformative industry, will shape the future of AI and its interaction with copyright law.