OpenAI's agent tool may be nearing release

OpenAI may be close to releasing an AI tool that can take control of your PC and perform actions on your behalf.

Tibor Blaho, a software engineer with a reputation for accurately leaking upcoming AI products, says he has discovered evidence of OpenAI’s long-rumored Operator tool. Publications including Bloomberg have already reported on Operator, which is said to be an “agent” system that can independently handle tasks such as writing code and booking travel.

According to The Information, OpenAI is targeting January as the operator’s release month. The code discovered by Blaho this weekend adds credibility to that report.

OpenAI’s ChatGPT client for macOS has gained options, hidden for now, to define shortcuts for “Toggle Operator” and “Force Quit Operator,” according to Blaho. And OpenAI has added references to Operator on its website, Blaho said, although references that are not yet publicly visible.

The OpenAI website already contains references to the Operator/OpenAI CUA (Computer Use Agent) – “Operator System Card Table”, “Operator Search Rating Table”, and “Operator Rejection Rate Table”. operator”

Including comparison with computer use Claude 3.5 Sonnet, Google Mariner, etc.

(preview of tables… pic.twitter.com/OOBgC3ddkU

— Tibor Blaho (@btibor91) January 20, 2025

According to Blaho, OpenAI’s site also contains not-yet-public tables comparing Operator’s performance to other AI systems that use computers. The tables could be placeholders. But if the numbers are accurate, they suggest that Operator is not 100% reliable, depending on the task.

The OpenAI website already contains references to the Operator/OpenAI CUA (Computer Use Agent) – “Operator System Card Table”, “Operator Search Rating Table”, and “Operator Rejection Rate Table”. operator”

Including comparison with computer use Claude 3.5 Sonnet, Google Mariner, etc.

(preview of tables… pic.twitter.com/OOBgC3ddkU

— Tibor Blaho (@btibor91) January 20, 2025

On OSWorld, a benchmark that tries to mimic a real computing environment, “OpenAI Computer Use Agent (CUA)” – perhaps the AI model behind Operator – scores 38.1%, ahead of the Control Model Anthropic computers but well below 72.4% of humans. point. OpenAI CUA outperforms human on WebVoyager, which evaluates an AI’s ability to navigate and interact with websites. But according to leaked benchmarks, the model falls short of human-level scores on another web-based benchmark, WebArena.

The operator also has difficulty with tasks that a human could easily perform, if the leak is to be believed. In a test where Operator had to register with a cloud provider and launch a virtual machine, Operator was only successful 60 percent of the time. Tasked with creating a Bitcoin wallet, Operator succeeded only 10% of the time.

OpenAI’s impending entry into the AI agent space comes as rivals, including the aforementioned Anthropic, Google and others, make plays for the nascent segment. AI agents may be risky and speculative, but tech giants are already touting them as the next big innovation in AI. The AI agent market could be worth $47.1 billion by 2030, according to analytics firm Markets and Markets.

Agents today are quite primitive. But some experts have raised concerns about their safety, should the technology improve rapidly.

One of the leaked graphs shows that the Operator performed well in selected security assessments, including tests that seek to trick the system into carrying out “illicit activities” and searching for “sensitive personal data.” Security testing is reportedly among the reasons for Operator’s long development cycle. In a recent post on

“I can only imagine the negative reactions if OpenAI released a similar release,” Zaremba wrote.

It’s worth noting that OpenAI has been criticized by AI researchers, including former employees, for allegedly de-emphasizing security work in favor of rapidly producing its technology.

OpenAI’s agent tool may be nearing release

Comments

Leave a Reply Cancel reply