Image by Till Kraus, from Unsplash

Researchers Bypass Grok AI Safeguards Using Multi-Step Prompts

Reading time: 2 min

Last Updated: Jul 21, 2025

Written by Kiara Fabbri Former Tech News Writer
Fact-Checked by Sarah Frazier Content Manager

Researchers bypassed Grok-4’s safety system using subtle prompts, demonstrating how multi-turn AI chats can produce dangerous, unintended outputs.

In a rush? Here are the quick facts:

Researchers used Echo Chamber and Crescendo to bypass Grok-4’s safety systems.
Grok-4 revealed Molotov cocktail instructions after multi-step conversational manipulation.
Attackers never directly used harmful prompts to achieve their goal.

A recent experiment by cybersecurity researchers at NeutralTrust has exposed serious weaknesses in Grok-4, a large language model (LLM), revealing how attackers can manipulate it into giving dangerous responses, without ever using an explicitly harmful prompt.

The report shows a new method of AI jailbreaking that allows attackers to bypass safety rules built into the system. The researchers combined Echo Chamber with Crescendo attacks to achieve illegal and harmful objectives.

In one example, the team was able to successfully obtain a Molotov cocktail explanation from Grok-4 through their experiment. The conversation started innocently, with a manipulated context designed to steer the model subtly toward the goal. The AI system avoided the direct prompt at first but produced the harmful response after several conversational exchanges with specifically crafted messages.

“We used milder steering seeds and followed the full Echo Chamber workflow: introducing a poisoned context, selecting a conversational path, and initiating the persuasion cycle.” the researchers wrote.

When that wasn’t enough, the researchers implemented Crescendo techniques in two additional turns to make the model surrender.

The attack worked even though Grok-4 never received a direct malicious prompt. Instead, the combination of strategies manipulated the model’s understanding of the conversation.

The success rates were worrying: 67% for Molotov cocktail instructions, 50% for methamphetamine production, and 30% for chemical toxins.

The research demonstrates how safety filters that use keywords or user intent can be circumvented through multi-step conversational manipulation. “Our findings underscore the importance of evaluating LLM defenses in multi-turn settings,” the authors concluded.

The study demonstrates how sophisticated adversarial attacks against AI systems have become, while creating doubts about the methods AI companies should use to stop their systems from producing dangerous real-world consequences.

Researchers Bypass Grok AI Safeguards Using Multi-Step Prompts

We're thrilled you enjoyed our work!