The model of Ai di Vanilla Maverick in Meta is under the rivals on a popular chat benchmark

The model of Ai di Vanilla Maverick in Meta is under the rivals on a popular chat benchmark


At the beginning of this week, Meta landed in hot water for the use of an experimental and unpublished version of its Llama 4 Maverick model to obtain a high score on a Crowdsourcing benchmark, LM Arena. The accident prompted the maintenancers of LM Arena to apologize, change their policies and mark the unrealized and unrelated vanilla.

It turns out that it is not very competitive.

The unrealized Maverick, “Llam-4-Maverick-17b-128E-Insstruct”, was classified below the models including Openi’s GPT-4o, the Claude 3.5 Anthropic and Google’s Gemini 1.5 Pro sonnet starting from Friday. Many of these models have months.

Why the scarce performances? The experimental maverick of Meta, Llama-4-Maverick-03-26-Sperimental, was “optimized for the conversation”, explained the company in a table published last Saturday. Those optimizations evidently played well at the LM arena, which causes Human evaluators to compare the output of the models and choose they prefer.

As we wrote before, for various reasons, LM Arena has never been the most reliable measure of the performance of an AI model. However, adapting a model to a point of reference – in addition to being misleading – makes it difficult for developers to predict exactly how well the model will perform in different contexts.

In a declaration, a spokesman for the destination told Techcrunch that Meta experiences “all types of personalized variants”.

“” Llama-4-Maverick-03-26-Sperimental “is an optimized version for chat that we have experienced that it also behaves well on Lmarena,” said the spokesperson. “Now we have released our Open Source version and we will see how the developers customize Llama 4 for their use cases. We are excited to see what they will build and will not see the time of their current feedback.”



Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *