
Photo by Adrian González on Unsplash
Anthropic Says Its AI Models Can End Conversations With Users to Protect Themselves
Anthropic said on Friday that it has given its AI models, Claude Opus 4 and 4.1, the ability to end conversations with users. The startup explained that the new feature would be used in rare cases where it is necessary to prevent harm—directed toward the AI model.
In a rush? Here are the quick facts:
- Anthropic allowed Claude Opus 4 and 4.1 the ability to end conversations with users to protect themselves.
- The new feature will be used as a last resort only when users insist on engaging in harmful interactions.
- The ability is part of Anthropic’s AI welfare program.
According to the article published by Anthropic, the company released this update as part of its AI welfare program, a new area in AI research that considers an AI system’s “interests” or well-being. It clarified that while the potential moral status of AI systems is “uncertain,” it’s researching ways to mitigate risks to its AI model’s welfare.
“We recently gave Claude Opus 4 and 4.1 the ability to end conversations in our consumer chat interfaces,” wrote the company. “This ability is intended for use in rare, extreme cases of persistently harmful or abusive user interactions.”
Anthropic explained that its model Claude Opus 4, the company’s most advanced model released with safety warnings, showed during tests preference for avoiding harm—such as the creation of sexual content involving children or information that could lead to acts of terror or violence.
In cases where users repeatedly requested Claude to engage in harmful conversations, the chatbot refused to comply and tried to redirect the discussion. Now, the chatbot can refuse to answer and block the chat so users cannot continue the conversation—except in cases of imminent risk.
The company clarified that the conversation-ending ability will be used only as a last resort—most users will not be affected by this update—and that users can start a new conversation on another chat immediately.
“We’re treating this feature as an ongoing experiment and will continue refining our approach,” wrote Anthropic. “If users encounter a surprising use of the conversation-ending ability, we encourage them to submit feedback by reacting to Claude’s message with Thumbs or using the dedicated ‘Give feedback’ button.”
The startup has been previously working on other projects related to AI welfare. Last year, Anthropic hired researcher Kyle Fish to study and protect the “interests” of AI models.