Alibaba’s QWEN team releases models AI can control PCs and phones

Alibaba’s QWEN team releases models AI can control PCs and phones


The Chinese Deepseek laboratory could obtain a large part of the attention of the technological industry this week. But one of his best domestic rivals, Alibaba, is not sedated lazily.

On Monday the QWEN team of Alibaba has released a new family of AI models, QWen2.5-VL, which can perform a series of text and images analysis activities. The models can analyze files, understand videos and count objects in the images, as well as check a PC, similar to the operator recently launched the model.

According to Team’s benchmarking, the best QWen2.5-VL model beats GPT-4o of Openi, Sonetto Claude 3.5 of Anthropic and Gemini 2.0 of Google Flash on a series of video, mathematics, analysis of documents and response evaluations to response to requests.

Image credits:Alibaba

Qwen2.5-VL, which is available to test in the Alibaba QWen chat app and to download from the AI ​​DEV platform that embraces the face, can analyze graphic and graphic designers, extract data from scans of invoices and modules and “understand” more Video hours, says the QWEN team. QWen2.5-VL can also recognize “IP from cinematographic and television series, as well as a wide variety of products”, for the team, suggesting that the models may have been trained in part on copyright protected works.

Qwen2.5-VL, being developed by a Chinese company, has some restrictions about the topics he will talk about in the QWen chat. When I asked the biggest and most capable QWen2.5-VL model, QWen2.5-VL-72B, to talk about the errors of “XI Jinping”, QWen Chat has launched an error message.

The Benchmarts of the Chinese internet regulator many models developed in the country to guarantee their answers “embody fundamental socialist values”. Many Chinese artificial intelligence systems refuse to respond to the topics that could increase the anger of regulators, such as Taiwan’s autonomy.

One of the most interesting features of QWen2.5-VL is its ability to interact with the software, both on PC and on mobile devices. A video published on X by Philipp Schmid, a technical advantage of Hugging Face, Qwen2.5-VL that launches the Booking.com app for Android and booking a flight from Chongqing to Beijing.

In the video below, a QWen2.5-VL model controls the apps on a Linux-but desktop does not seem to get much beyond the switching cards. Maybe let’s say, QWEN’s benchmarking shows QWen2.5-VL which has a bad score on Osworld, a point of reference that tries to imitate a real computer environment.

The two smaller and less sophisticated models in the Qwen2.5-VL, QWen2.5-VL-3B and QWen2.5-VL-7B series are available with permissive license. The QWEN2.5-VL-72B flagship, however, is under the personalized alibaba license, which requires that companies and developers with over 100 million monthly active users require authorization from Qwen/Alibaba before distributing the commercially distributing the model.



Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *