OpenAI Faces Scrutiny Over Alleged Use of O'Reilly Books for GPT-4o Training

Thu 3rd Apr, 2025

The artificial intelligence organization OpenAI is currently under investigation for allegedly utilizing works from O'Reilly Media, a prominent US technology publisher, to train its GPT-4o model without appropriate authorization. This claim arises from a recent study conducted by the AI Disclosures Project, which includes input from O'Reilly's founder and CEO, Timothy O'Reilly.

According to the research, OpenAI reportedly relied on at least 34 O'Reilly titles during the training of GPT-4o. The study further examined two other models, GPT-3.5 Turbo and GPT-4o mini, but found less conclusive evidence regarding potential copyright infringements associated with these particular models.

In their analysis, the researchers posed a variety of multiple-choice questions to the OpenAI models. One of the answer options contained a direct quote from one of the 34 O'Reilly books, while the other choices were paraphrased versions. The study encompassed nearly 14,000 excerpts from these works. If the AI model correctly identified the verbatim quote, it was interpreted as an indication that the model had been trained using copyrighted material from the O'Reilly collection.

The researchers calculated an AUROC score, a statistical measure indicating the likelihood of the OpenAI models having been trained on O'Reilly's books. The score for GPT-4o reached 82 percent, suggesting a substantial probability that the copyrighted content was utilized in the training process. Additionally, the researchers speculated that OpenAI might have accessed a database from the shadow library, Library Genesis, which reportedly includes all 34 books in question.

Conversely, the study indicated that the significance of non-public data in training OpenAI models has increased over time. The AUROC score for GPT-3.5 Turbo, based on a dataset from 2021, was 54 percent for non-public excerpts, while GPT-4o mini, released in 2024, achieved a score of 56 percent, suggesting these models were not trained using O'Reilly's works.

The authors of the study highlight a broader, systematic issue regarding the use of copyrighted materials in training language models. They advocate for greater transparency and a formal licensing framework for the content used in such training processes. The authors warn that without appropriate compensation, the availability of content suitable for training AI models could diminish significantly. Recently, the New York Times also filed a lawsuit against OpenAI, alleging copyright violations related to the training of its AI systems.

Article collated/edited/curated, or written in-house, by The Munich Eye.

Berlin Districts Differ on Disclosure of Secondary School Application Preferences

In Berlin, the process for assigning students to secondary schools has become a topic of increased scrutiny and debate. Traditionally, families have faced a lengthy waiting period--often exceeding...

Ukrainian Drones Accidentally Enter NATO Airspace, Strike Power Facility in Estonia; Similar Incident in Latvia

Recent incidents have raised concerns across Eastern Europe after drones originating from Russian airspace unintentionally entered the airspace of NATO member states Estonia and Latvia. Authorities...

Patent Dispute Intensifies Between DJI and Insta360 Over Camera and Drone Technologies

The competition between Chinese technology firms DJI and Insta360 has reached a new level as DJI has filed a lawsuit against Insta360 and its parent company, Arashi Vision, in China. The case centers...

Major Disruptions on Berlin Tram Line M10: Service Suspended Until April 12

Public transport users in Berlin are currently facing significant disruptions as the vital tram line M10 remains partially out of service. The suspension, which is set to last through April 12,...

German States Push for Stronger Action as Fuel Prices Remain High

Several German states are calling upon the federal government to introduce more robust measures in response to persistently high fuel prices. As the Bundesrat prepares to vote on a proposed fuel price...

Brandenburg Records Lower Homelessness Rate Than National Average, Potsdam Most Affected

Recent data indicates that the state of Brandenburg has a significantly lower rate of homelessness compared to the national average in Germany. According to figures from January 2025, around 4,900...

The 10 Best Private Health Insurance Providers in Germany (2026)

Section: Health Insurance

Fully Funded Scholarship Opportunity at the Bavarian International School for 2026-27

Section: News

How Digital Entertainment Platforms Are Growing Among German-Speaking Online Audiences

Section: Arts

The Kerkermeister Pension | Auerbach, Vogtland

Section: Travel

A Refined and Personal Macaron Workshop at Chez Marisette

Section: Arts

A Warm and Memorable Croissant Workshop at M Cook Studio

Section: Arts

Tohru in der Schreiberei, Munich's newest three-Michelin-star restaurant

Section: Arts

The Tabi: From Japanese Tradition to a Fashion Icon

Section: Fashion

Ukrainian Diplomat Calls for Stronger Security Commitments Beyond NATO-Style Guarantees

Section: Politics

Pantone's Color of the Year an Endless Neutral Loop

Section: Fashion

German Private Health Insurance

Both private Health Insurance in Germany and public insurance, is often complicated to navigate, not to mention expensive. As an expat, you are required to navigate this landscape within weeks of arriving, so check our FAQ on PKV. For our guide on resources and access to agents who can give you a competitive quote, try our PKV Cost comparison tool.

Hospital and Clinic Directory

Germany is famous for its medical expertise and extensive number of hospitals and clinics. See this comprehensive directory of hospitals and clinics across the country, complete with links to their websites, addresses, contact info, and specializations/services.

Upcoming Events

Christoph Scheuerecker: ,,Ereignis: Erzählung"

Join us at the Kunstraum in der Au for the exhibition titled ,,Ereignis: Erzählung" by Christoph Scheuerecker, focusing on the captivating world of bees. This exhibition invites visitors to explore the intricate relationship between bees and their environment through various artistic expressions,...