Skip to content

 

 

 

 

Many have experienced situations where an LLM confidently delivered completely inaccurate information while presenting it as correct. Despite rapid progress in the GenAI space, hallucinations remain a fundamental challenge in daily interactions with AI.

This is not just a subjective perception. Statistics confirm the problem and show that resolving it is far more complex.

OpenAI’s newest o3 model hallucinated 33% on factual questions, and healthcare studies show 19% hallucination rates even with domain-specific fine-tuning. These errors cost businesses $67.4 billion in 2024, as highlighted by AllAboutAI.

In its latest paper, OpenAI shows that this is not a bug, but a tradeoff in the way models are trained.

𝐓𝐡𝐞 𝐦𝐚𝐭𝐡𝐞𝐦𝐚𝐭𝐢𝐜𝐚𝐥 𝐟𝐨𝐮𝐧𝐝𝐚𝐭𝐢𝐨𝐧 𝐨𝐟 𝐡𝐚𝐥𝐥𝐮𝐜𝐢𝐧𝐚𝐭𝐢𝐨𝐧𝐬

Hallucinations are not mysterious glitches. They emerge from the fundamental statistical properties of how LLMs learn. The research connects language generation to a simpler problem called “Is-It-Valid” (IIV) binary classification, proving that generative error rates are at least twice the misclassification rates in equivalent classification tasks.

Two key factors

𝐒𝐢𝐧𝐠𝐥𝐞𝐭𝐨𝐧 𝐫𝐚𝐭𝐞 𝐝𝐢𝐬𝐜𝐨𝐯𝐞𝐫𝐲. Hallucination rates correlate directly with “singleton facts”, information appearing only once in training data. If 20% of celebrity birthdays occur only once, expect 20% hallucination rates for those facts. This aligns with Turing’s missing-mass estimator from the 1950s.

𝐓𝐡𝐞 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 𝐬𝐲𝐬𝐭𝐞𝐦 𝐩𝐫𝐨𝐛𝐥𝐞𝐦. Current benchmarks penalize uncertainty. More than 90% of tests use binary grading that awards zero points for “I don’t know.” Models that express doubt score lower than those confidently guessing wrong. This incentivizes fabricated answers over honest uncertainty, creating what researchers call an “epidemic of penalizing uncertainty.”

Minimizing hallucinations

The paper suggests confidence thresholds such as:

  • “Only answer if you’re >75% confident.”
  • “Wrong answers cost 3x more than saying ‘I don’t know.’”

𝗦𝘂𝗺𝗺𝗮𝗿𝘆

Hallucinations cannot be eliminated, but understanding their statistical roots allows us to build better safeguards. Enterprises must combine fact-checking, monitoring, and evaluation frameworks to manage reliability effectively.

Microsoft researchers also introduced a promising approach: mitigating hallucinations through multi-model systems (see https://lnkd.in/g5HYbegF).

The future is not hallucination-free AI. It is AI that knows when to stay silent.

For further context, see this related post on the absence of information, and why knowing when not to know can be equally beneficial: https://lnkd.in/ggCx9B_H

Credit: Jan Moser

#AI #MachineLearning #LLM #ArtificialIntelligence #DigitalTransformation #EnterpriseAI

#ResponsibleAI