Imagine teaching a machine to paint like the celebrated Indian artist Raja Ravi Verma, write like the award-winning novelist Arundhati Roy, or compose music in a style reminiscent of the Oscar-winning musician A R Rahman. To achieve that, the machine must absorb thousands of hours of music, millions of pages of text, and vast collections of visual art. This is the central reality of generative AI. It also brings us to one of the most consequential legal questions of our time: can AI models lawfully train on copyrighted material without the creator’s permission?
For several years, this issue appeared to be a remote controversy unfolding primarily before courts in the US and Europe. That is no longer so. With the Delhi High Court having reserved judgment in ANI Media Pvt. Ltd. v OpenAI, India’s first significant AI-copyright dispute, the question has now come squarely before Indian courts. The court’s decision is likely to play an important role in shaping the initial contours of India’s approach to AI regulation and copyright policy.
The global playbook: how other jurisdictions are responding
To understand why ANI v OpenAI matters so deeply, it would be useful to begin with the approaches emerging in other parts of the world:
The US ‘fair use’ argument – technology companies in the US rely heavily on the flexible doctrine of fair use. Their position is that AI training is transformative, comparable to a person reading books to learn how to write. Creators, unsurprisingly, characterise the same process as large-scale unauthorised copying.
The EU and UK ‘opt-out’ model – European frameworks have leant towards text and data mining exemptions. In practical terms, AI developers may scrape material unless rights holders take affirmative technical steps to opt out.
Japan’s ‘AI-friendly’ model – Japan has taken perhaps the most generous view of AI training. Its copyright law broadly allows works to be used for information analysis, including text and data mining, so long as the AI developer is not trying to enjoy or reproduce the creative expression in the work. In simple terms, Japan treats AI training more like machine reading than human copying, while still leaving room to object where the use unfairly harms rights holders or results in substitutional reproduction.
India’s ‘evolving model’ – India’s position is still evolving. Unlike the US, India does not have a broad, open-ended fair use doctrine; it relies instead on the narrower concept of fair dealing. This means that mass scraping of copyrighted works for commercial AI training does not comfortably fit within any existing exception. The result is legal uncertainty: AI developers need clarity to innovate, while creators and publishers want assurance that their work will not be absorbed into training datasets without consent or compensation.
The dispute before the Delhi High Court
In ANI v OpenAI, the news agency ANI alleged that OpenAI unlawfully scraped and used its journalistic archive to train large language models. OpenAI’s defence follows a familiar pattern. It argues that it relied only on publicly available material, that its systems process facts rather than protected expression, and that ANI’s domain has since been blocked from future scraping.
The hearings, however, exposed how difficult it is to apply 20th-century copyright concepts to 21st-century AI systems. Three issues stand out in particular:
Transient storage – Indian copyright law recognises limited protection for temporary technical copies created during transmission. OpenAI argues that its processing falls within this safe zone. Publishers and intervenors, including the Digital News Publishers Association, argue that storing vast volumes of content during pre-training is neither temporary nor incidental but a form of large-scale infringement.
Facts versus expression – OpenAI maintains that its models merely learn statistical relationships between words, much as humans learn by reading. ANI, by contrast, has pointed to instances of apparent memorisation in which copyrighted news content was reproduced almost verbatim, potentially bypassing paywalls and reducing traffic to the original publisher.
Reputational harm – ANI has also emphasised hallucinated outputs falsely attributed to it. This argument broadens the dispute beyond copyright and suggests that flawed or unauthorised data inputs may also affect reputation, credibility, and brand value.
The policy turn: could compulsory licensing be the answer?
The ANI litigation has also highlighted a deeper institutional problem. Even the court-appointed amici curiae have expressed sharply different views on whether machine learning amounts to copyright infringement. That uncertainty has pushed policymakers to consider structural solutions rather than leaving the entire question to case-by-case adjudication. Against this backdrop, the Department for Promotion of Industry and Internal Trade has reportedly floated an alternative: a compulsory licensing framework for AI training data.
The idea is to avoid two extremes: unrestricted scraping on the one hand, and impractical one-to-one negotiations with every rights holder on the other. A statutory framework could create a regulated middle path. Under such a model, AI developers would receive a legal right to use proprietary datasets for training without prior permission but would be required to pay regulated royalties to the relevant rights holders in return.
The core tension: innovation versus creative autonomy
This proposal has divided stakeholders into two broad camps:
The innovation-first view – supporters of broader access argue that without affordable access to local and culturally relevant datasets, Indian startups will struggle to build truly domestic AI systems. If licensing costs are too high, only large multinational players may remain competitive.
The creator-rights view – publishers, authors, musicians, and other rights holders respond that compulsory licensing undermines a fundamental aspect of ownership: the right to refuse. For many creators, being compelled to license their works to technologies that may eventually compete with them is not a compromise but a forced surrender.
Moving forward: designing India’s hybrid path
India’s copyright framework was designed for a world in which humans used computers as tools, not for one in which generative systems ingest and recombine enormous bodies of creative material. That mismatch is now impossible to ignore. India’s long-term solution is unlikely to lie in simply importing Western models. A blunt compulsory licensing regime could dilute creator control, while an overly rigid prohibition could choke domestic innovation before it matures.
A more balanced framework, in the author’s view, may combine mandatory transparency obligations, clear disclosure of training datasets, and a functioning market for voluntary commercial licensing. Such an approach would better align innovation incentives with the legitimate interests of creators.
The Delhi High Court’s forthcoming decision in ANI v OpenAI may prove to be the first major turning point. It will indicate whether India is prepared to leave the matter largely to market forces and litigation, or whether legislative intervention will be needed to redraw the boundaries of creativity and technological development.