The new models of AI of Openai have hallucinated more

The new models of AI of Openai have hallucinated more


The models of AI O3 and O4-Mini recently launched by Openi are at the forefront in many respects. However, the new models still hallucinated or invent things – in fact, hallucinated Moreover of many of the oldest models in Openai.

Hallucinations have shown that they are one of the biggest and most difficult problems to be solved in artificial intelligence, influencing even today’s most performing systems. Historically, each new model has slightly improved in the hallucination department, allucinating less than its predecessor. But this does not seem to be the case of O3 and O4-Mini.

According to the internal tests of Openi, O3 and O4-Mini, which are the so-called reasoning models, hallucinated more often Compared to previous reasoning models of the company-O1, O1-Mini and O3-Mini-not the traditional “non-reasoned” models of Openi, such as GPT-4o.

Perhaps more worrying, the maker chatgpt doesn’t really know why it is happening.

In his technical report for O3 and O4-Mini, Openi writes that “further searches are needed” to understand why hallucinations are worsening as it resizes the reasoning models. O3 and O4-Mini work better in some areas, including coding and mathematics activities. But since “they make more statements in general”, they are often led to make “more accurate statements and more inaccurate/hallucinated requests”, for the relationship.

Openai discovered that O3 hallucinated in response to 33% of the questions about personqa, the company’s internal benchmark to measure the accuracy of the knowledge of a model on people. This is approximately double the hallucination rate of the previous reasoning models of Openi, O1 and O3-Mini, which have marked respectively 16% and 14.8%. O4-Mini did even worse on personqa, allucinating 48% of the time.

Third -party tests of Transluce, a non -profit research laboratory, has also found evidence of the fact that O3 has a tendency to invent actions taken in the process of arrival at the answers. In an example, healing O3 observed by saying that he had performed the code on a 2021 MacBook Pro “Outside of Chatgpt”, then copied the numbers in his response. While O3 has access to some tools, he cannot do it.

“Our hypothesis is that the type of reinforcement learning used for the series models or can amplify issues that are usually mitigated (but not completely deleted) from the standard post-ally pipes,” said Neil Chowdhury, a researcher of neglect and former Openi employee, in an e-mail in Techcrunch.

Sarah Schwettmann, co-founder of Transluce, added that the hallucination rate of O3 could make it less useful than otherwise it would be.

Kian Katanforosh, contract professor of Stanford and CEO of The Upskilling Startup Workra, told Techcrunch that his team is already testing O3 in their coding work flows and that they found a step above the competition. However, Katanforosh says that O3 tends to hallucinated the connections of the broken website. The model will provide a connection that, if clicked, does not work.

Hallucinations can help models to get to interesting ideas and be creative in their “thought”, but also make some models a difficult sale for companies in markets where accuracy is fundamental. For example, a law firm would probably not be happy with a model that inserts many factual errors in customer contracts.

A promising approach to increase the accuracy of the models is to provide them with web research features. GPT-4O of Openi with web research reaches 90% precision on Simpleqa. Potentially, research could also improve the hallucination rates of the reasoning models, at least in cases where users are willing to exhibit prompts to a third party search provider.

If the downsizing of the reasoning models actually continues to worsen hallucinations, it will make the hunt for a solution even more urgent.

“Dressing hallucinations on all our models is an ongoing research area and we are continually working to improve their accuracy and reliability,” said the spokesperson for Openai Niko Felix in one and -mail to Techcrunch.

In the last year, the wider artificial intelligence industry has moved to focus on reasoning models after techniques to improve traditional artificial intelligence models have started showing decreasing returns. The reasoning improves the performance of the model on a variety of activities without requesting enormous quantities of calculation and data during training. Yet it seems that reasoning can also lead to more hallucinations: presenting a challenge.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *