For some time, the training of advanced artificial intelligence models with exclusive content has become a controversial topic. Many companies are facing complaints from writers and different media. Meta has now admitted that it used copyrighted books in a collection called “Books3”, however the company is unwilling to compensate the authors.
According to a new report, a group of authors have filed a lawsuit against Meta, alleging illegal use of copyrighted material in the development of the Llama 1 and Llama 2 large language models. In response, the company told author and comedian Sarah Silverman, Richard Kadry, and other owners of copyrighted works that it trained its AI models using copyrighted books.
Meta acknowledgment and fair use of copyrighted resources
Meta has admitted to using the Books3 collection to train the Llama 1 and Llama 2 large language models. Books3 is a well-known collection containing the plain text of more than 195,000 books, totaling nearly 37 GB. This archive was created by an artificial intelligence researcher in 2020 as a way to provide a better data source for improving machine learning algorithms.
Meta has now acknowledged that it uses parts of the Books3 dataset; Meta's argument is that using copyrighted works to train AI models does not require consent or compensation to the authors. The company denies the plaintiffs' claims of copyright infringement, saying that any use of copyrighted works on Books3 should be considered “fair use.” This means that companies can use their resources for training artificial intelligence without permission from the owner.
OpenAI has also openly stated that it is impossible to train artificial intelligence models without using copyrighted materials, following a complaint by the New York Times.
RCO NEWS