AI hallucinations are getting worse
Categories
AI hallucinations
GenAI
LLM
Safe AI governance
Share
A recent New York Times article reports a growing concern in the world of GenAI: hallucinations. These are instances where AI generates information that appears plausible but is entirely false. As companies deploy increasingly advanced reasoning models, hallucinations are on the rise. In benchmark testing, OpenAI’s newest o4-mini model hallucinated nearly 80% of the time when answering general knowledge questions.
That’s a serious problem for businesses depending on generative AI for customer support, content generation, data analysis, or decision-making. In one example reported by the Times, an AI chatbot used by the programming tool Cursor falsely informed customers of a new usage restriction, prompting angry posts and cancellations. The company had to step in to clarify that no such policy existed.
The risk is everywhere
Hallucinations are a fundamental limitation of how GenAI works. GenAI does not understand facts. Rather, GenAI generates responses based on patterns in training data. Whether you’re asking a model to write a help article or summarize a legal document, it may invent details with full confidence.
Even when AI providers reduce hallucination rates in controlled tests, the problem often reappears in real-world scenarios. The risks are especially acute in regulated or high-stakes industries like healthcare, law, and finance, where factual precision is critical.
Off-the-shelf models are harder to control
Businesses using off-the-shelf models like GPT, Claude, or Gemini face an added challenge: they’re relying on someone else’s product. These models are trained on data and objectives you can’t see or influence. You don’t know which updates might change the model’s behavior. You can’t access the model weights or the original training set. Your only tools are prompt engineering, system instructions, and external guardrails.
This lack of control makes it harder to systematically reduce hallucinations, even if you build a thoughtful user experience around the model. You’re also limited in your ability to diagnose or fix problems when they arise.
That doesn’t mean custom-built models are immune. Even with your own data and infrastructure, hallucinations remain a persistent risk. But at least you can isolate variables, test systematically, and adapt the model to your domain.
Your business can manage the risk of hallucinations
Businesses need to take the hallucination problem seriously regardless of which model you use. That means combining AI with domain-specific data, human feedback, and continuous evaluation frameworks:
High-quality data curation. Curated, domain-specific datasets reduce the likelihood of AI generating inaccurate or fabricated information.
Human-in-the-loop (HITL) oversight. Involving people in training and deployment phases helps catch and correct errors before they escalate.
Reinforcement learning from human feedback (RLHF). Teaching models from real-world corrections sharpens their output over time.
Safe AI governance. Testing models against adversarial cases and establishing review protocols improves output reliability and reduces risk.
Flexible architecture and deployment: A modular, scalable setup allows businesses to adapt AI systems to different use cases and risk levels.
Centific’s frontier AI data foundry platform is built to operationalize these practices at scale. It supports curated data pipelines, integrates human oversight into AI workflows, and enables continuous model evaluation using reinforcement learning techniques. With a modular design and built-in governance tools, the platform gives businesses the control you need to reduce hallucinations while accelerating AI adoption.
Learn more about Centific’s frontier AI data foundry platform.
Categories
AI hallucinations
GenAI
LLM
Safe AI governance
Share