
Image by Aerps.com, from Unsplash
One-third Of AI Search Answers Contain Unsupported Claims, Study Finds
A new study claims that AI tools, tools designed to answer questions and perform online research, are struggling to live up to their promises.
In a rush? Here are the quick facts:
- GPT-4.5 gave unsupported claims in 47% of responses.
- Perplexity’s deep research agent reached 97.5% unsupported claims.
- Tools often present one-sided or overconfident answers on debate questions.
Researchers reported that about one-third of answers given by generative AI search engines and deep research agents contained unsupported claims, and many were presented in a biased or one-sided way.
The study, led by Pranav Narayanan Venkit at Salesforce AI Research, tested systems like OpenAI’s GPT-4.5 and 5, Perplexity, You.com, Microsoft’s Bing Chat and Google Gemini. Across 303 queries, answers were judged on eight criteria, including whether claims were backed up by sources.
The results were troubling. GPT-4.5 produced unsupported claims in 47 per cent of answers. Bing Chat had unsupported statements in 23 per cent of cases, while You.com and Perplexity reached about 31 per cent.
Perplexity’s deep research agent performed the worst, with 97.5 per cent of its claims unsupported. “We were definitely surprised to see that,” Narayanan Venkit said to New Scientist.
The researchers explain that generative search engines (GSEs) and deep research agents (DRs) are supposed to gather information, cite reliable sources, and provide long-form answers. However when tested in practice, they often fail.
The evaluation framework, called DeepTRACE, showed that these systems frequently give “one-sided and overconfident responses on debate queries and include large fractions of statements unsupported by their own listed sources,” as noted by the researchers.
Critics warn this undermines user trust. New Scientist reports that Felix Simon at the University of Oxford said: “There have been frequent complaints from users and various studies showing that despite major improvements, AI systems can produce one-sided or misleading answers.”
“As such, this paper provides some interesting evidence on this problem which will hopefully help spur further improvements on this front,” he added.
Others questioned the methods, but agreed that reliability and transparency remain serious concerns. As the researchers concluded, “current public systems fall short of their promise to deliver trustworthy, source-grounded synthesis.”