
Trump's Tariff War: China Urges Immediate Repeal of Tariffs Amid Countermeasures
Section: News
The artificial intelligence organization OpenAI is currently under investigation for allegedly utilizing works from O'Reilly Media, a prominent US technology publisher, to train its GPT-4o model without appropriate authorization. This claim arises from a recent study conducted by the AI Disclosures Project, which includes input from O'Reilly's founder and CEO, Timothy O'Reilly.
According to the research, OpenAI reportedly relied on at least 34 O'Reilly titles during the training of GPT-4o. The study further examined two other models, GPT-3.5 Turbo and GPT-4o mini, but found less conclusive evidence regarding potential copyright infringements associated with these particular models.
In their analysis, the researchers posed a variety of multiple-choice questions to the OpenAI models. One of the answer options contained a direct quote from one of the 34 O'Reilly books, while the other choices were paraphrased versions. The study encompassed nearly 14,000 excerpts from these works. If the AI model correctly identified the verbatim quote, it was interpreted as an indication that the model had been trained using copyrighted material from the O'Reilly collection.
The researchers calculated an AUROC score, a statistical measure indicating the likelihood of the OpenAI models having been trained on O'Reilly's books. The score for GPT-4o reached 82 percent, suggesting a substantial probability that the copyrighted content was utilized in the training process. Additionally, the researchers speculated that OpenAI might have accessed a database from the shadow library, Library Genesis, which reportedly includes all 34 books in question.
Conversely, the study indicated that the significance of non-public data in training OpenAI models has increased over time. The AUROC score for GPT-3.5 Turbo, based on a dataset from 2021, was 54 percent for non-public excerpts, while GPT-4o mini, released in 2024, achieved a score of 56 percent, suggesting these models were not trained using O'Reilly's works.
The authors of the study highlight a broader, systematic issue regarding the use of copyrighted materials in training language models. They advocate for greater transparency and a formal licensing framework for the content used in such training processes. The authors warn that without appropriate compensation, the availability of content suitable for training AI models could diminish significantly. Recently, the New York Times also filed a lawsuit against OpenAI, alleging copyright violations related to the training of its AI systems.
Section: News
Section: News
Section: Health
Section: News
Section: News
Section: Travel
Section: News
Section: News
Section: Politics
Section: Arts
Health Insurance in Germany is compulsory and sometimes complicated, not to mention expensive. As an expat, you are required to navigate this landscape within weeks of arriving, so check our FAQ on PKV. For our guide on resources and access to agents who can give you a competitive quote, try our PKV Cost comparison tool.
Germany is famous for its medical expertise and extensive number of hospitals and clinics. See this comprehensive directory of hospitals and clinics across the country, complete with links to their websites, addresses, contact info, and specializations/services.
Join us for a captivating organ concert featuring Giacomo Gabusi from Bologna. Experience an evening of classical music with works by Wagner, Bossi, and Messiaen, among others. This event is part of the Pasinger Orgeltage series, promising a delightful musical experience. Admission is free, but...
No comments yet. Be the first to comment!