Innovation and enterprise blog

The British Library Business & IP Centre can help you start, run and grow your business

29 May 2024

Scarlett Johansson Vs. OpenAI: The blurred IP lines between human and machine

What data is being used to feed the ‘learning machine’?

Another AI-generated storm has occurred with the recent news that a chatbot voice created by OpenAI sounds eerily similar to Hollywood star, Scarlett Johansson. The story takes a more unusual turn, with claims by Johansson that she was asked by the same company to be a chatbot voice, refusing the offer a number of times. 

It seems the boundaries between human and machine are now completely blurred. The inexorable rise in the power and utility of generative Artificial Intelligence (AI) is producing case after case of controversy, ethical debates and inevitable concerns about where it’ll all end.  

Generative AI is artificial intelligence that is capable of creating text, videos, images or other data that has similar characteristics to the ‘training data’ used to create them. The question as to whether these works are original creations in and of themselves is hotly contested. 

However, Scarlett Johansson’s case is not the first time that generative AI has fallen afoul of celebrities and artists. Tom Hanks warned of a fake ad using his image to promote a dental plan he did not endorse. Musicians Drake and The Weeknd have experienced having their voices used to create an entirely new song, not of their creation. AI software was trained using their vocals and that was the key to the generative work being created: called, ‘Heart on My Sleeve’, it was subsequently dropped from streaming services after protests from the artist’s music labels. 

The question now with Johansson’s example is whether it’s all just an unfortunate coincidence that the voice of OpenAI’s, ‘Sky’, ended up sounding too similar to Johansson. And how was that voice created? Who owns the final product? 

In times like these, who else can you call, but an Intellectual Property lawyer? 

Where is the intellectual property? If Johansson decides to pursue a case of infringement against OpenAI, she can call upon laws in most US states known as ‘publicity laws’. These work in a similar way to other IP laws, in that the individual is ‘the product’. This means that they have the right to control the commercial use of their name, likeness, image or identity.  

In the case of Johansson, it’s the use of her voice.  

It’s no surprise to discover that the state of California, being the home of Hollywood, has such a law. (We don’t have an equivalent in the UK, but we do have other laws that can be stitched together to do the same thing). 

In fact in the US, legislators are right now actively looking at strengthening federal laws to provide further clarity on individuals and usage rights. Recent debate around the proposed bill for a NO FAKEs Act addresses these issues directly. 

Meanwhile, in the UK there seems to be a growing consensus in reasserting the interpretation of ‘data mining’ in existing UK copyright law to mean the use of data (for machine learning) only for non-commercial purposes. This is significant as it precludes the commercial use of data mining by AI companies.  It’s all in the prompts. 


Two sides to every case

The creators of generative AI have some arguments in their defence too. For an AI platform to generate an image, voice or text, word commands called prompts need to be used. And there is growing recognition in the power and skill of using the best possible prompts to create the most desired output. Think of a very long and focused search engine description.  

In fact, these particular prompts can be so integral to the final generated product, that they can be considered a trade secret and even copyright. These are two existing IP rights used all over the world.  

So it’s advisable that creators (be they companies or individuals) record the prompts (but keep them secret) in order to prove the creative process and also potentially as a defence in case the output does inadvertently infringe someone else’s copyright (or publicity right for that matter).  

Moreover, companies like OpenAI are significantly investing in their platforms by feeding them with all the data they need. They also have an IP interest because there is an inherent novelty and commercial value in creating (and licensing) the platforms themselves. They too, are protected by intellectual property rights and copyright is the predominant IP protection for software. 

But, as ever, there’s another flipside, and that is the question of what data is being used to feed the ‘learning machine’? Is the data public domain information? Or is it under copyright? If it’s the latter, it carries real risks of a generative platform creating an infringed work. As they say across the pond; ‘garbage in, garbage out’. 

Human v Machine; who’s the creator? 

The big question remains, who owns the intellectual property in an AI creation? If I use a generative AI platform, can I claim ownership of the final product?  

The first thing is to always check the licence agreement of the platform you’re using, especially if there’s a clause where they keep a record of the prompts used, or if the generated image can be reused.  

In the United States, the question of whether an AI created product can itself be subject to copyright has been partly addressed with the recent case of graphic comic titled Zaraya of the Dawn. The United States Copyright Office rules that ‘works created with substantial AI input are not eligible for copyright protection in the United States.’ Interestingly, they did recognise that the prompts were a work of human authorship, so therefore falling under copyright, as were the text and arrangement of images, but not the resultant images itself. 

In the UK, the question is complicated further by an interpretation of what Section 178 of the Copyright, Designs and Patent’s Act (CDPA 1988) could mean in relation to AI today. In the Act copyright cannot vest in machines or non-human actors but if ‘in the resulting author of a computer-generated work is the person “by whom the arrangements necessary for the creation of the work are undertaken.”’ It takes a sharp legal mind and a good case to define how that could be interpreted! Any interesting broader summary for existing UK copyright law and AI can be read here. 

It’s complicated... but also clear 

As the world begins to adapt to the massive disruption that AI will create, it’s safe to say that some boundaries have been drawn and clear sides are taken.  

  1. If you’re creating any original work of any kind, you have rights over that work. Nothing has changed. 
  2. If you’re using AI generated work, your ownership of work is open to question and at times challenged depending on the way national laws are interpreted, and the terms and conditions of the platform creating it. Best to seek legal advice for your particular context. 
  3. If you’re an AI developer, you are at risk if the data you’re training your machine learning on is potentially copyrighted. So legal advice around taking protective measures against the risk of infringement or seeking permissions to use that content is a necessity.  

Further useful guides on all can be found on some law firms, such as here. 

Regardless of who will win the IP wars between humans and machines, a human is still a human and their voice will always belongs to them. In our age of digital disruption, Scarlett Johansson may well be helping us all find our voice, and keep it too. 

Written by Jeremy O’Hare, Research and Business Development Manager at the BIPC.