Why Do LLMs Hallucinate?

20 May 2024

GPT-4o is out, but the problem of AI hallucination is still alive and kicking.

Hallucination in LLMs is a pretty succinct way of saying “My AI assistant is spitting out incorrect or nonsensical information”. Think of wrong dates, historical inaccuracies, or responses that have nothing to do with your prompt. Things like that.

LLM hallucinations come in two main flavors.

First, we have intrinsic hallucinations. In this case, the model will straight-up contradict the original text or any external knowledge base it's supposed to draw from. Say you ask for a summary of an article about climate change, and jammed in there is a point or two about how global warming is a hoax perpetrated by both clowns and God. While entertaining, that likely wasn’t in the article you were reading.

Then we have extrinsic hallucinations, where the model introduces brand-new, unverifiable information. Typically information that wasn't even present in the source material, but is written as fact, e.g. fake statistics.

Hallucination crops up across all sorts of generative tasks, from image generation to dialogue, and even in quantitative tasks.

AI hallucination is in the design

If you understand how LLMs work on a base level, then you must know one thing: LLMs are more like probability machines.

Large Language Models group information into sets. For example, green, red, blue go in the “colour” set. Blue may also go in the “emotion” set along with the words sad, down, and happy.

The model draws from millions of those data sets to guess the next most probable word/answer/response to a query.

So, if the prompt “Happy __!” is given, the LLM will identify this as a celebratory message, and will pick out the most likely phrase from its identified set of “celebratory phrases”:

—> Happy birthday! (most likely)

—> Happy graduation! (less likely, still probable)

—> Happy divorce! (least likely, especially if not accounting for sarcasm/wit)

With deep learning algorithms, LLMs are able to achieve a high level of accuracy in both understanding and generating human language.

As Andrej Karpathy points out, general-purpose LLMs are designed to hallucinate and will try to answer any question you throw at them. This hallucination ability - a side effect of their creativity - is what separates LLMs from basic search engines.

When people say they want ChatGPT/Claude/Gemini to hallucinate less, they're not talking about situations where creativity is the goal, e.g. script writing, art generation etc. They're referring to domains where accuracy and reliability must be the rule, like law and medicine, or even general work assistants that deal with real-life, factual situations.

What triggers hallucination in LLMs

A few key things that can trigger hallucinations:

  1. The LLM's training data was incomplete or contradictory
  2. Overfitting and lack of novelty, so it just regurgitates memorised patterns
  3. Vague or lacking prompts that force the LLM to take wild guesses

Identifying and training out hallucination

Having human experts rigorously test the model – aka "Red teaming" – is one of the more relied-on methods for catching hallucinations in AI.

At the product level, tactics like allowing user edits, structured prompts/outputs, and incorporating user feedback can also kill off hallucinations.

Some other common solutions:

  1. Adjusting model parameters like temperature, frequency/presence penalties and top-p sampling to balance creativity and accuracy.

  2. Providing domain-specific data to facilitate adaptation and augment the LLM's knowledge for that area.

The goal is to reduce hallucinations by controlling inputs, fine-tuning, incorporating feedback, and expanding the model's context and knowledge.

A novel solution: Memory Tuning

What really caught my interest and triggered this dive into LLMs and hallucination was reading about Lamini AI's new "memory tuning" approach. The idea here is to train the LLM extensively on a specific domain's data so it can accurately recall and match that information, drastically reducing hallucinations. I can see why this would work for domain-specific work assistants, which is where people need less “creative” AI.

LLMs aren't designed for strict accuracy; their hallucinations are a byproduct of their creative strength. We can leverage this "what-if" ability for tasks like creative writing, but it needs a leash in factual domains. So yes, while hallucinations are kind of baked into how LLMs work, there are ways to mitigate them for specialized applications.