in Blog

March 28, 2024

Hallucination in Large Language Models? Strategies for Secure Generative AI Deployment

Author:

Artur Haponik

CEO & Co-Founder

Reading time:

11 minutes

Large language models are perhaps the biggest technological revolution of the past decade. They have completely altered the way humans interact with machines, enabling seamless, human-like conversations with machines. These technological marvels have also performed exceedingly well in different domains, including language, math, the understanding of scientific concepts, and fluent, coherent text generation. However, despite their numerous perks, LLMs have one significant limitation – hallucination. Hallucinations in large language models negatively impact their reliability and subsequent adoption in real-world use-case scenarios that require accuracy and dependability.

This article will delve into the intricacies of LLM hallucinations, evaluating everything from what they are, what causes them, and the necessary measures that can be undertaken to reduce or even eliminate hallucinations in large language models.

What are hallucinations in large language models?

Hallucinations in large language models can be described as a phenomenon in which a model perceives patterns or objects that are nonexistent to human observers, causing it to generate responses that seem syntactically sound, natural, and fluent but are, in fact, factually incorrect and nonsensical, or unrelated to the provided source input. [1]

Generative AI models work by analyzing a user’s request and then generating a response that appropriately addresses the prompt. This can be anything from generating text, answering a question, or, in the case of multimodal models like the Gemini models – analyzing an image. [2]

However, Large Language Models are sometimes prone to producing outputs that are not necessarily based on their training data, don’t follow an identifiable pattern, or are incorrectly decoded by transformers, thereby ‘hallucinating’ a response.

Implications of LLM Hallucinations

Given these facts, it’s evident that machines can, in fact, hallucinate much like humans. Like with humans, it is necessary to address the issue of hallucinations in language models as they can have drastic consequences like violation of privacy and spreading misinformation.

Model hallucinations can also have significant implications for real-world applications. For instance, a generative AI model employed in the healthcare sector can incorrectly identify a benign lesion as malignant, consequently leading to unnecessary medical intervention.

There’s also the problem of misinformation. [3] As we live in a digital world where everything is connected, and people get most of their information online, a generative AI model tasked with relaying emergency information can cause mass panic as a result of hallucination. In such a scenario, the model may disseminate information that hasn’t been fact-checked, thus undermining its reliability and hindering mitigation efforts as people would most likely lose trust in the government organizations responsible for relaying emergency information.

AI models are also quite susceptible to adversarial attacks. In a typical adversarial attack, bad actors can manipulate the model’s output by making changes in the input data. For instance, an adversarial attack targeting an image recognition task might involve adding some specially-curated ‘noise’ to an image, causing an AI model to misclassify it.

Such a scenario could raise a lot of security concerns, particularly in sensitive areas like autonomous vehicles and cybersecurity technologies.

That said, hallucinations in language models aren’t always undesirable. Hallucinations could sometimes come in handy in certain use cases, such as creative writing tasks. In this case, a model hallucinating when writing a movie plot can cause it to come up with an interesting story.

This means that while model hallucinations present significant challenges that could lead to potentially catastrophic results, the level of risk associated with each hallucination is ultimately application-dependent.

Understanding Hallucinations in LLMS

There are different types of hallucinations in large language models, each with its unique set of characteristics and implications. In that regard, there are two main types of LLM hallucinations: intrinsic and extrinsic.

Intrinsic model hallucinations occur when the generated output contradicts the model’s training data. Conversely, extrinsic model hallucinations occur when the generated output cannot be verified from the source content. Put plainly, the generated output can neither be contradicted nor supported by the source.
Hallucinations in language models can be further categorized into factuality and faithfulness hallucinations.

Factuality Hallucination

As the name suggests, factuality hallucinations occur when an AI model generates factually incorrect information. For example, a model may claim that whales lay eggs, which is factually incorrect.

Factuality hallucination is typically caused by a model’s limited contextual understanding and errors or inherent noise in its training data, which can lead to responses that are not factually correct.

Faithfulness Hallucination

Faithfulness hallucination occurs when the model produces unfaithful content or generates an output that is inconsistent with the provided source content.

For instance, in the context of text summarization, a hallucinating model would generate an output that is not consistent with the information contained in the original document. I.e., if the original document states that the FDA approved the first Ebola vaccine in 2019, a hallucinating model would claim that it rejected it (intrinsic hallucination) or that China just tested a COVID-19 vaccine (extrinsic hallucination).

Faithfulness hallucination can be further categorized into three types. They include:

Instruction inconsistency
This form of faithfulness hallucination occurs when the model ignores the specific instructions provided by the user. A good example of this would be when a model generates an answer in English despite being commanded to translate it into Spanish by the user.
Logical inconsistency
Logical inconsistency occurs when a model generates an output containing a logical error despite starting off correctly. For instance, a generative AI model may end up performing an arithmetic operation incorrectly, despite performing the first steps of the operation correctly.
Context inconsistency
This form of hallucination occurs when a generative AI model produces information that is not present in the context provided or contradicts it altogether.

Causes of hallucinations in LLMS

LLM technology, despite the major strides it has taken over the past few years, is still in its infancy. As such, researchers and developers are still trying to figure out what causes some of the problems associated with the technology and ways to mitigate them.

That said, there is an ongoing thread of research specifically aimed at studying the various causes of this phenomenon. Findings suggest that large language model hallucination is a multifaceted issue that emanates from various aspects of model development and deployment.

In that regard, some of the most notable causes of hallucinations in LLMs include:

Issues with Training Data

One of the most significant contributors to LLM hallucination is the nature of the data employed during model training and development. LLMs, including popular models like Chat-GPT and Google’s Gemini, undergo extensive unsupervised training with massive, diverse datasets.

Most of this data comes from unverifiable sources like web documents, which can make it difficult to verify the data’s unbiasedness, factual accuracy, and correctness. When language models learn from and incorporate this data into their architecture, they’re susceptible to picking up and replicating the factual inaccuracies and biases within the training data.

Ultimately, this leads to scenarios in which the model is unable to distinguish between what’s true and what’s not, causing it to generate outputs that deviate from logical reasoning and factual facts.

This issue is most common in models trained with internet-sourced datasets, which may include incorrect or biased information. For instance, Google Bard generated an incorrect response when asked about the Jammes Webb telescope, indicating how overreliance on flawed data can lead to inconsistencies and incorrectness in generated content.

Challenges during the Inference Stage

The inference stage is a crucial step in LLM development. It is where the capabilities learned during training are put to work to ensure that the model can effectively generate relevant outputs based on the users’ prompts. [4]

Unfortunately, this stage is barred by various complex factors that could cause the model to hallucinate. This includes everything from the inherent randomness in sampling methods and decoding strategies used by the model.

Architectural and Training Objectives

Flaws in the model architecture and suboptimal training objectives can also contribute to model hallucination. Everything from misaligned training objectives to architectural flaws can cause the model to generate content that does not align with its intended purpose. The former can also cause the model to generate nonsensical or factually incorrect content.

Source-reference Divergence

Training a language model with a source-reference divergence can cause it to generate content that is not necessarily grounded or faithful to the given source. Source reference divergence can happen intentionally or unintentionally.

Intentional source-reference divergence is typically employed when developers want diversity in the generated output. In such cases, developers fail to align information with the source and target, thus causing a divergence.

Conversely, unintentional source-reference divergence occurs when the data is heuristically created from diverse sources. For instance, a divergence can occur when the reference contains information that is not present in the source, such as using a news incident from two different sources as a source-reference pair.

Discrepancies Between Training Time and Inference Time Decoding

The teacher-forced maximum likelihood estimation (MLE) method is one of the most common methods applied when training LLMs. In the MLE method, the decoder predicts the next token based on the ground-truth prefix sequences.

But during the inference stage, the model predicts the next token based on historical sequences generated by the model itself. This leads to various discrepancies, which could cause the model to hallucinate, especially when dealing with long sequences.

There’s also the issue of training models to represent the statistical connections between subword tokens. Models trained like this only acquire a limited capability to generate factually accurate outputs. This is very common in the GPT-3 model, which may generate inaccurate outputs when prompted with complex queries.

How can we reduce LLM hallucinations?

Large language model hallucinations are directly tied to the methods employed during development and deployment. As such, the most effective way to reduce them is by eliminating the factors that cause them in the first place.

In that regard, here are some of the most effective ways to reduce the occurrence of hallucinations in LLMs:

Use High-Quality Data

Certain data-related concerns contributing to large language model hallucinations can be effectively addressed. While it may take significant time and effort, creating a high-quality, noise-free dataset could help minimize output bias and give the model a better understanding of its task, ultimately causing it to generate more reliable outputs.

Clearly Define the Model’s Purpose

Hallucinations are more common in general generative AI models compared to domain-specific models. [5] The reasoning behind this notion is pretty straightforward – a model without clearly defined responsibilities is more prone to generating irrelevant, hallucinatory results.

Therefore, it is advisable to establish and define the model’s responsibilities and limitations. This will allow the model to focus primarily on its intended purpose and limit the generation of irrelevant outputs.

Limit the Model’s Responses

Some generative AI models hallucinate because they lack effective constraints that limit the number of possible outcomes. To curb this issue, developers can use filtering tools and add probabilistic thresholds to improve the model’s overall consistency and improve the accuracy of generated responses.

Case studies

One of the most notable case studies applied in the mitigation of hallucinations by LLMs is the Knowledge Graph-based Retrofitting (KGR) method. The method incorporates LLMs with knowledge graphs, effectively addressing factual hallucination. [6]

By combining LLMs with knowledge graphs, you can retrofit the initial draft responses produced by the LLM with factual knowledge stored in knowledge graphs.

In addition to addressing model hallucination issues, combining the two technologies can help you leverage them more effectively. In a nutshell, the knowledge graph utilizes the LLM’s capability to automatically extract, validate, select, and retrofit factual statements in model-generated responses, thus effectively reducing the need for human intervention.

Final thoughts

Hallucinations in generative AI models are some of the biggest issues facing effective AI utilization in real-world applications. Despite the numerous possibilities for generative AI utilization in various sectors, including the medical sector and autonomous vehicle technology, the mere fact that their responses cannot be fully relied upon in sensitive tasks significantly reduces AI model utilization.

However, by carefully evaluating the causes of hallucinations in LLMs, and addressing them effectively, developers can minimize the occurrence of hallucinations, thus making generative AI models more dependable.

References

[1] Techtarget.com. AI Hallucination. URL:
https://www.techtarget.com/whatis/definition/AI-hallucination. Accessed on March 20, 2024
[2] Nvidia.com. What is Generative AI? URL:
https://www.nvidia.com/en-us/glossary/generative-ai/#:~:text=GANs%20pit%20two%20neural%20networks,)%20or%20fake%20(generated). Accessed on March 20, 2024
[3] Misinforeview.hks.harvard.edu. Misinformation reloaded? Fears About the Impact of Generative AI on Misinformation Are Overblown. URL:
https://misinforeview.hks.harvard.edu/article/misinformation-reloaded-fears-about-the-impact-of-generative-ai-on-misinformation-are-overblown. Accessed on March 20, 2024
[4] run.ai. Understanding Machine Learning Inference. URL:
https://www.run.ai/guides/machine-learning-inference/understanding-machine-learning-inference. Accessed on March 20, 2024
[5] Arxiv.org. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. URL:
https://arxiv.org/abs/2311.05232. Accessed on March 20, 2024
[6] Arxiv.org. Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-based Retrofitting. URL:
https://arxiv.org/abs/2311.13314. Accessed on March 20, 2024