AI hallucinations are getting worse

May 6, 2025

A recent New York Times article reports a growing concern in the world of GenAI: hallucinations. These are instances where AI generates information that appears plausible but is entirely false. As companies deploy increasingly advanced reasoning models, hallucinations are on the rise. In benchmark testing, OpenAI’s newest o4-mini model hallucinated nearly 80% of the time when answering general knowledge questions.

That’s a serious problem for businesses depending on generative AI for customer support, content generation, data analysis, or decision-making. In one example reported by the Times, an AI chatbot used by the programming tool Cursor falsely informed customers of a new usage restriction, prompting angry posts and cancellations. The company had to step in to clarify that no such policy existed.

The risk is everywhere

Hallucinations are a fundamental limitation of how GenAI works. GenAI does not understand facts. Rather, GenAI generates responses based on patterns in training data. Whether you’re asking a model to write a help article or summarize a legal document, it may invent details with full confidence.

Even when AI providers reduce hallucination rates in controlled tests, the problem often reappears in real-world scenarios. The risks are especially acute in regulated or high-stakes industries like healthcare, law, and finance, where factual precision is critical.

So, why is the problem getting worse? One major issue: AI companies are now building models that are better at complex reasoning, like solving math problems or writing computer code. But while these models are more capable in some ways, they also tend to make more factual mistakes. As they “think” through problems step by step, they can make errors at each stage—and those errors can pile up. Essentially, the more steps they take, the more chances they have to go wrong.

Off-the-shelf models are harder to control

Businesses using off-the-shelf models like GPT, Claude, or Gemini face an added challenge: they’re relying on someone else’s product. These models are trained on data and objectives you can’t see or influence. You don’t know which updates might change the model’s behavior. You can’t access the model weights or the original training set. Your only tools are prompt engineering, system instructions, and external guardrails.

This lack of control makes it harder to systematically reduce hallucinations, even if you build a thoughtful user experience around the model. You’re also limited in your ability to diagnose or fix problems when they arise.

That doesn’t mean custom-built models are immune. Even with your own data and infrastructure, hallucinations remain a persistent risk. But at least you can isolate variables, test systematically, and adapt the model to your domain.

Your business can manage the risk of hallucinations

Businesses need to take the hallucination problem seriously regardless of which model you use. That means combining AI with domain-specific data, human feedback, and continuous evaluation frameworks:

High-quality data curation. Curated, domain-specific datasets reduce the likelihood of AI generating inaccurate or fabricated information.
Human-in-the-loop (HITL) oversight. Involving people in training and deployment phases helps catch and correct errors before they escalate.
Reinforcement learning from human feedback (RLHF). Teaching models from real-world corrections sharpens their output over time—but only when feedback is grounded in factual accuracy, not just fluency or likability.
Safe AI governance. Testing models against adversarial cases and establishing review protocols improves output reliability and reduces risk.
Flexible architecture and deployment: A modular, scalable setup allows businesses to adapt AI systems to different use cases and risk levels.

Centific’s frontier AI data foundry platform is built to operationalize these practices at scale. It supports curated data pipelines, integrates human oversight into AI workflows, and enables continuous model evaluation using reinforcement learning techniques. With a modular design and built-in governance tools, the platform gives businesses the control you need to reduce hallucinations while accelerating AI adoption.

Learn more about Centific’s frontier AI data foundry platform.

Latest news

AI news

Google’s Gemini Deep Think marks an era of advanced AI reasoning

Gemini Deep Think redefines AI reasoning. Learn how Google’s multi-agent system powers next-gen creativity and decision-making.

Aug 7, 2025

AI news

Walmart gets a head start with GenAI and AI agents

Walmart is taking the lead in retail GenAI and AI agents. Learn how it’s using AI to boost efficiency, improve CX, and drive innovation at scale.

Jul 30, 2025

Deliver modular, secure, and scalable AI solutions

Centific offers a plugin-based architecture built to scale your AI with your business, supporting end-to-end reliability and security. Streamline and accelerate deployment—whether on the cloud or at the edge—with a leading frontier AI data foundry.

Start building