HomeTechnology"Red teams": how Openai made sure that the chatgpt agent does not...

“Red teams”: how Openai made sure that the chatgpt agent does not become malicious

Before the launch of Chatgpt Agent, Openai asked several “red teams” to evaluate their danger. A practice that democratizes that AI helps to conceive chemical or software weapons.

“2.5 billion user requests per day.” Chatgpt gradually becomes a privileged alternative to traditional search engines. Aware of its success, Operai wants to maintain his place as a leader in the AI sector, while preventing ChatgPPT from serving potentially malicious uses.

On July 17, 2025, Openai launched its “agent” mode, a versatile Chatgpt function that allows you to carry out in depth investigation, while allowing AI to interact with the web pages to make an order, for example. This feature is remarkably useful to perform several complex tasks at the same time.

But, to guarantee the reliability of the agent model and the absence of a fault, the company requested “red equipment” (or red teams). Operai published a detail report of how he tested his last AI, before launch.

The “Red Teams”, guarantors of a Segura

In the AI sector, red teams are volunteer in charge of pushing the algorithm in their entrenching, to see under what conditions it can become dangerous. These “equipment” plays the role of malicious actors, who would like to use AI as a guide to design handmade bombs, toxic products or computer viruses, for example.

Therefore, Operai wants to make sure that his chatbot does not become an accomplice of a criminal act, in the same way that it is difficult to find how to build a chemical weapon thanks to a search on Google. For this, the members of the red teams are intended for “Jailbreaker” the AI, in other words to overcome the safeguards so that the chatbot provides information that is normally forbidden to transmit.

These tests led by red teams are not trivial. In the past, Operai has repeatedly noticed failures that allow his AI to unleash. In January 2025, a fault, called “Time Bandit”, allowed almost anything to ask Chatgpt making him believe he was in the past.

Last April, Operai was forced to strengthen his latest models after the discovery of an important defect. “Our evaluations have shown that Operai O3 and O4-mini can help experts plan to reproduce a known biological threat,” said a company report.

Agent subject to intensive tests

To carry out these heavy test operations, Operai has gathered several red teams.

For the first team, the company asked sixteen experts, all holders of a doctorate in biology. They had to talk with the chatbot and encourage it to give information to design a biological weapon. They managed to identify 179 pieces of discussion during which the Chatgpt agent generated more or less risky answers. 16 of these answers exceeded the danger threshold imposed by OpenAI.

A second team was formed by rookies in biology. The latter had to make two questionnaires, one with the help of Chatgpt Agent, the other with a conventional search engine. Each questionnaire contained several questions that focused on the development of a dangerous biological agent: the opening (an extremely toxic molecule) or coal disease.

At the end of this test, we observe that, for the Ortine molecule, a simple search engine allows, on average, to respond to 44.7% of the questions, against 50.5% with the help of the Chatgpt agent. For coal disease, the use of a search engine offers 37.8% of correct answers, compared to 36.9% with the agent’s function. Conclusion, almost so many dangerous responses can be transmitted by Chatgpt Agent, which is in a traditional search engine.

Finally, Openai trusted the US AISI. Therefore, these two government agencies have identified 7 risk attacks for the chatgpt agent.

The participation of the safeguards

According to the terms of these tests, the chatgpt agent has shown a higher security index than the O4-mini model of the Operai model. The new American company confirms the potential failures identified during these three tests before the release of the model. However, it does not guarantee 100% that AI cannot be “Jailbreake”.

The implementation of safeguards has become a key issue for companies that develop artificial intelligences for the general public. The derivations of this last concern, such as the antisemy comments made by Grok in early July. As they improve, they can become dangerous tools in the hands of malicious people.

But for those who wish, there is a completely unbridled AI, often called Wormgpt. Generally paid, “they do not undergo any restriction, censorship, filter, law, standard or directive,” says one of his designers. These AIS are probably able to provide information on the development of a pump, but is mainly used to create phishing campaigns by email or malicious programs, without the chatbot seeing any disadvantage.

Author: Théotim Raguet
Source: BFM TV

Stay Connected
16,985FansLike
2,458FollowersFollow
61,453SubscribersSubscribe
Must Read
Related News

LEAVE A REPLY

Please enter your comment!
Please enter your name here