The model of Ai di Vanilla Maverick in Meta is under the rivals on a popular chat benchmark

Editor Team April 11, 2025No Comments

At the beginning of this week, Meta landed in hot water for the use of an experimental and unpublished version of its Llama 4 Maverick model to obtain a high score on a Crowdsourcing benchmark, LM Arena. The accident prompted the maintenancers of LM Arena to apologize, change their policies and mark the unrealized and unrelated vanilla.

It turns out that it is not very competitive.

The unrealized Maverick, “Llam-4-Maverick-17b-128E-Insstruct”, was classified below the models including Openi’s GPT-4o, the Claude 3.5 Anthropic and Google’s Gemini 1.5 Pro sonnet starting from Friday. Many of these models have months.

The release version of Llama 4 was added to Lmarena after it was discovered that they cheated, but you probably did not see it because you have to scroll until 32nd place and it is where the ranks are pic.twitter.com/a0bxkdx4LX

– P: ɡSN (@pigeon__s) April 11, 2025

Why the scarce performances? The experimental maverick of Meta, Llama-4-Maverick-03-26-Sperimental, was “optimized for the conversation”, explained the company in a table published last Saturday. Those optimizations evidently played well at the LM arena, which causes Human evaluators to compare the output of the models and choose they prefer.

As we wrote before, for various reasons, LM Arena has never been the most reliable measure of the performance of an AI model. However, adapting a model to a point of reference – in addition to being misleading – makes it difficult for developers to predict exactly how well the model will perform in different contexts.

In a declaration, a spokesman for the destination told Techcrunch that Meta experiences “all types of personalized variants”.

“” Llama-4-Maverick-03-26-Sperimental “is an optimized version for chat that we have experienced that it also behaves well on Lmarena,” said the spokesperson. “Now we have released our Open Source version and we will see how the developers customize Llama 4 for their use cases. We are excited to see what they will build and will not see the time of their current feedback.”

Editor Team

View All Posts

Comments

No comments yet. Why don’t you start the discussion?

Comments

Leave a Reply Cancel reply