Openii says that he has implemented a new system to monitor his latest AI, O3 and O4-Mini reasoning models, for suggestions relating to biological and chemical threats. The system aims to prevent the models from offering advice that could indicate to someone to perform potentially harmful attacks, according to the safety relationship of Openi.
O3 and O4-Mini represent a significant increase in capacity compared to the previous Openai models, says the company, and therefore place new risks in the hands of bad actors. According to the internal benchmark of Openi, O3 is more skilled in answering questions about creation in particular of certain types of biological threats. For this reason, to mitigate other risks-oven, it has created the new monitoring system, which the company describes as a “security reasoning monitor”.
The monitor, tailor-made for reasons on the policies of Openi’s content, works at the top of O3 and O4-Mini. It is designed to identify suggestions relating to biological and chemical risk and instruct the models to refuse to offer advice on these topics.
To establish a basic line, Openai has spent team of Red team by spending about 1,000 hours reporting “non-safe” conversations of Biorisisco from O3 and O4-Mini. During a test where Openai simulated the “blocker logic” of its safety monitor, the models refused to respond to the risky prompts of 98.7% of the time, according to Openai.
Openi recognizes that his test has not taken into account the people who could try new suggestions after being blocked by the monitor, which is why the company says that it will continue to rely partly on human monitoring.
O3 and O4-Mini do not cross the “high risk” of Opeeni for biorisks, according to the company. However, compared to O1 and GPT-4, Openai states that the first versions of O3 and O4-Mini proved to be more useful in answering questions about the development of biological weapons.
The company is actively monitoring the way its models could make it easier for harmful users to develop chemical and biological threats, according to the recently updated preparation framework of Openai.
Openi is increasingly based on automated systems to mitigate risks from its models. For example, to prevent the GPT-4o native images from creating materials for minors (CSAM), Openii claims to use a reasoning monitor similar to what the company has distributed for O3 and O4-Mini.
Yet several researchers have raised Openi concerns is not giving priority to security as much as it should. One of the company’s red team partners, met, said she had a short time to test O3 on a point of reference for deceptive behavior. In the meantime, Openi has decided not to release a security relationship for its GPT-4.1 model, launched at the beginning of this week.