Anthropic Launches Petri, An Open-Source Tool To Analyze AI Behavior

Photo by Ladislav Sh on Unsplash

Anthropic Launches Petri, An Open-Source Tool To Analyze AI Behavior

Reading time: 2 min

Anthropic released on Monday a new open-source AI tool called Petri to help researchers study and analyze AI models’ behaviors. The new AI-powered agent is part of the company’s broader effort to mitigate emerging cyber threats associated with the rapid development of advanced AI technologies.

In a rush? Here are the quick facts:

  • Anthropic released Petri, an open-source AI tool designed to assist researchers in studying and analyzing AI models’ behaviors.
  • Petri can audit AI systems autonomously, simulating realistic environments and applying different benchmarks.
  • The new feature has been built as part of Anthropic’s efforts to mitigate emerging cyber threats powered by AI models.

According to Anthropic, Petri—an acronym for Parallel Exploration Tool for Risky Interactions—enables researchers to test hypotheses and automatically run multiple experiments to study AI systems’ behavior.

The tool simulates realistic environments, provides performance scores, and generates summaries of model behavior. The process is fully automated and designed to streamline testing while giving human researchers greater leverage as AI-powered threats continue to grow.

Hackers have been using AI models to attack organizations, employing sophisticated strategies such as “vibe hacking,” in which malicious actors use chatbots and AI-powered platforms to create harmful software and tools with minimal technical expertise.

“As AI becomes more capable and is deployed across more domains and with wide-ranging affordances, we need to evaluate a broader range of behaviors,” wrote Anthropic. “This makes it increasingly difficult for humans to properly audit each model—the sheer volume and complexity of potential behaviors far exceeds what researchers can manually test.”

Petri has been trained to apply different benchmarks during its audits and assist developers in evaluation processes. Anthropic shared a demonstration in which Petri tested 14 frontier models across multiple behavioral dimensions, including deception, self-preservation, sycophancy, reward hacking, power-seeking, encouragement of user delusion, and cooperation with harmful requests.

One of Anthropic’s latest models, Claude Sonnet 4.5, showed some of the strongest results, outperforming competitors’ frontier models such as OpenAI’s GPT-5.

The company shared more technical documents for researchers and developers interested in learning more about Petri.

Did you like this article? Rate it!
I hated it I don't really like it It was ok Pretty good! Loved it!

We're thrilled you enjoyed our work!

As a valued reader, would you mind giving us a shoutout on Trustpilot? It's quick and means the world to us. Thank you for being amazing!

Rate us on Trustpilot
0 Voted by 0 users
Title
Comment
Thanks for your feedback