The unsettling reality of AI copyright is that we never know what will occur next.
- Data that is protected by copyright is used to train generative AI models.
- The level of human involvement will probably determine whether the output of an AI model is copyrighted.
- It is probably permissible to train generative AI on data that is copyright-protected.
The year has been excellent for generative AI. Startups are raising hundreds of millions of dollars to compete with established companies like Microsoft, Adobe, and GitHub, and the technology is even having an impact on culture thanks to text-to-image AI models that have given rise to countless memes. The question of whether any of this is actually legal will be raised in any industry discussion about generative AI, though.
The issue comes up as a result of how generative AI systems are trained. They function by finding and reproducing patterns in data, just like the majority of machine learning software. However, since these programmes produce code, text, music, and artwork, the data they produce is itself human-generated, web-scraped, and therefore copyright-protected in some way.
This wasn't a big problem for AI researchers in the dimly remembered 2010s. At the time, the most advanced models could only produce images of faces that were no larger than the size of a fingernail. There wasn't a direct danger to anyone here. However, in 2022, when a lone amateur can use software like Stable Diffusion to copy an artist's style in a matter of hours or when businesses are selling AI-generated prints and social media filters that are overt ripoffs of living designers, questions of legality and ethics have become much more urgent.
Consider the situation of Disney illustrator Hollie Mengert, whose work was discovered to have been copied as part of an AI experiment by a Canadian mechanical engineering student. The student spent several hours training a machine learning model that could mimic Mengert's style after downloading 32 of the artist's works.
Andres Guadamuz, a professor at the University of Sussex in the UK who specialises in AI and intellectual property law, asserted that while there are still many unanswered questions, there are only a few fundamental ones that can be answered. First, can the output of a generative AI model be copyrighted, and if so, whose ownership is it? Second, does having copyrighted input used to train an AI give you any legal authority over the model or the products it produces?
After these queries are resolved, a bigger query surfaces: how do you handle the consequences of this technology? What types of legislative restrictions on data gathering should or ought to be put in place? Can there coexistence between those who are developing these systems and those whose data is required to do so?
Even if it is determined that the training of generative AI models falls under fair use, the issues facing the industry will still not be fully resolved. It also won't always apply to other generative AI domains, such as coding and music, and won't appease artists who are upset that their work has been used to train commercial models. Obtaining a licence and paying the data's producers is the most obvious recommendation. However, some believe that this will ruin the sector.