The new models at the GPT-4.1 of Openai focus on the coding

The new models at the GPT-4.1 of Openai focus on the coding


On Monday Openai launched a new family of models called GPT-4.1. Yes, “4.1” – As if the company nomenclature was not already confused enough.

There are GPT-4.1, GPT-4.1 Mini and GPT-4.1 Nano, all which Opens says “Excel” in following the coding and instructions. Available through Openi API but not chatgpt, the multimodal models have a context window of 1 million people, which means that they can take about 750,000 words in one way (longer than “War and Peace”).

GPT-4.1 arrives as Openai rivals such as Google and anthropic efforts for the construction of sophisticated programming models. Gemini 2.5 Pro recently released by Google, which also has a context window of 1 million people, it is a lot of popular coding benchmark. So they do the Sonetto Claude 3.7 of Anthropic and the Chinese startup at the Aipfeek V3 updated.

It is the goal of many technological giants, including Openai, to form coding models Ai able to perform complex software engineering activities. Openii’s great ambition is to create a “software engineer”, as CFO Sarah Friar said during a technological summit in London last month. The company states that its future models will be able to program entire end-to-end apps, managing aspects as a guarantee of quality, bug tests and writing of the documentation.

GPT-4.1 is a step in this direction.

“We have optimized GPT-4.1 for the use of the real world based on direct feedback to improve in areas of which developers are more concerned: coding of the Frontend, making fewer foreign changes, following reliable formats, adhering to the response structure and order, the constant use of the tools and more”, said a spokesperson for Openii Techcrunch via e-mail. “These improvements allow developers to build agents who are considerably better in software engineering activities in the real world.”

Openii says that the complete GPT-4.1 model exceeds its Mini GPT-4o and GPT-4O models on coding benchmark including Swe-Bench. It is said that GPT-4.1 Mini and Nano are more efficient and faster at the cost of a certain precision, with Openai says GPT-4.1 Nano is its fastest and cheaper model ever.

GPT-4.1 costs $ 2 per million input tokens and $ 8 per million output tokens. GPT-4.1 Mini is $ 0.40/me output tokens of $ 1.60/Me GPT-4.1 nano is token input $ 0.10/me output token $ 0.40/m.

According to the internal tests of Openai, GPT-4.1, which can generate more token at the same time than GPT-4o (32,768 against 16,384), obtained a score between 52% and 54.6% on the verified bench swes, a subset of Human Swe-evaluated bench. (Openai observed in a blog post that some solutions for the problems verified on the SWE bench have not been performed on its infrastructure, therefore the range of scores.) Those figures are slightly under the scores reported by Google and Anthropic for Gemini 2.5 Pro (63.8%) and Sonnet Claude 3.7 (62.3%), respectively, on the same reference point.

In a separate evaluation, Openi was probing GPT-4.1 using Video-Mme, designed to measure the ability of a model to “understand” the content in the videos. GPT-4.1 has reached a 72% precision at the top of the ranking in the “Long category category, no subtitles”, says Openai.

While GPT-4.1 marks reasonably well on the benchmark and has a more recent “cutting of knowledge”, giving it a better reference framework for current events (until June 2024), it is important to keep in mind that even some of the best models today fight with tasks that would not set the experts. For example, many studies have shown that the models they generate often cannot correct and even introduce vulnerability and safety bugs.

Openai also recognizes that GPT-4.1 becomes less reliable (i.e. semi-clear to make mistakes), the greater the input tokens with which it has to do. From one of the company’s tests, Openai-Mrcr, the accuracy of the model decreased by about 84% with 8,000 token to 50% with 1,024 token. GPT-4.1 also tended to be more “literal” than GPT-4O, says the company, sometimes requesting more specific and explicit instructions.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *