Skip to content

What Are Generative AI Hallucinations?

By Matthew Edgar · Last Updated: April 30, 2024

One of the biggest problems with generative AI responses is hallucinations. What are hallucinations? Why do they happen? More importantly, as generative AI starts to dominate search results, how do we factor hallucinations into our future SEO strategies?

What Are Hallucinations?

A hallucinated response is when the generative AI tool returns a response that is inaccurate, misleading, or just plain strange. Hallucinations contain factual inaccuracies, fabricated information, irrelevant statements, and illogical responses.

Master of Code has categorized five major types of hallucinations:

  1. Sentence contradiction: one sentence in the generated response contradicts another sentence in the same response.
  2. Prompt contradiction: the response contradicts or ignores the prompt entered by the user.
  3. Factual contradiction: the response contains fictitious information.
  4. Nonsensical output: the response is not logically coherent and is incomprehensible.
  5. Irrelevant or random output: the response does contain information relevant to the prompt.

A response can also contain a combination of hallucination types. A response containing any hallucination type worsens the user experience and gives the user reasons to doubt the value of the conversational AI tool. The article from Master of Code also discusses the ethical concerns raised by hallucinations, including concerns about privacy, cybersecurity, and toxic or discriminatory content.

Hallucinations have a variety of causes, but two types are more relevant for understanding how generative AI compares with traditional search and websites: underlying data quality issues and issues with prompts users enter.

Data Quality Issues

Generative AI generates a response by using a large language model (LLM). The LLM models human language and is trained on a large volume of existing data and content to learn about the structure of language. OpenAI, as an example, has trained its GPT language model on a variety of sources, including books, academic papers, and a crawl of the web (via Common Crawl).

The large volume of training material introduces a problem: the training data may suffer from quality issues. The output generated by an LLM is only as good as the training data—as the saying goes, “garbage in, garbage out”. If the training material suffers from problems, those issues will be present in the generated output. There are different types of data-quality issues, including:

  • Insufficient: Insufficiency refers to both the quality and quantity of the material used to train the AI tool. AI tools need a large volume of data to detect meaningful patterns that can be used to generate a response. However, the volume of data must cover all the topics the AI tool may discuss in a generated response; an AI tool may not have been trained on enough data to accurately respond to a prompt about a niche topic.
  • Outdated: One of the challenges with ChatGPT is that its training data is dated (at least at present). Any responses will be based on older information. Some older information may not be relevant when seeking current answers.
  • Overfitting: Overfitting means the AI tool is too well trained on the input data and is unable to apply what it has learned to new situations. The AI tool can become fixated on certain data points or specific examples.
  • Factual errors: If the input data contains factual errors or other mistakes, then the AI tool will incorporate those errors in the responses.
  • Bias: If the input data is biased, then the AI tool will also reflect that bias. Overcorrecting for bias can lead to hallucinations as well.
  • Noisy data: If the input data is corrupted or distorted, it is meaningless. Anything the LLM learns from this data will be inaccurate or cause the AI tool to draw illogical conclusions.

How Traditional Search Addresses Data Quality Issues

Traditional search engines face similar problems with data quality and put a lot of effort into addressing these problems. After Google fetches content from websites, it uses complex algorithms to evaluate the content it has retrieved from the crawl. As part of this evaluation, Google addresses many of the same data-quality issues as generative AI systems. For example, Google’s algorithms may detect that a page’s content is outdated, contains manipulative statements, or suffers from other content-quality issues. Only websites that pass this evaluation are included in Google’s index and eligible to appear in search results.

However, there is a key difference in the way data quality issues impact traditional versus generative AI search. Generative AI search is using data to learn how to generate text in response to a prompt. This response is novel and is uniquely generated for that user. A generative AI system might make mistakes when generating this content because of problems with how it was trained.

In contrast, traditional search engines are not attempting to generate a novel response to the search query and are not attempting to learn how to generate text. Traditional search engines list the best websites they know about that are related to a user’s search query. The evaluation process is about weeding out poor-quality websites that should not appear in that list of results.

Traditional search has another advantage that helps it avoid hallucinations: search engines have better input data. Google’s search engine is designed to send traffic to websites and, by doing so, Google has incentivized companies to improve their websites so that they can get more traffic from Google. This is the entire goal of SEO—the process of optimizing a website for search engines is about improving the website’s content to make it more relevant and fixing problems that exist so that Google can evaluate that website favorably. In exchange for better optimization work, the website will be likely to appear in more search results. This oversimplifies SEO, and there are of course bad actors who manipulate the process. However, the point is that optimization work in SEO is really a process of improving Google’s input data.

Currently, generative AI tools are largely not designed to send traffic to websites (and in fact, may never send traffic to websites). So, companies have little incentive to improve their content in a way that improves how generative AI systems are trained. If anything, some companies are actively preventing generative AI systems from using their content during training. As of April 2024, a third of the top 1,000 websites block ChatGPT from accessing their content.

Prompt-Related Issues

Hallucinations can also occur when the AI tool does not understand the prompt provided by the user. Whether the AI tool fully understands the prompt or not, the AI tool is programmed to generate some type of response. When given a poorly constructed prompt, the AI tool will still attempt to respond.

For example, the prompt may be too open-ended with a vaguely defined request. When the prompt is vague, the AI tool needs to make assumptions and guess about what type of response the user wants. Sometimes, those assumptions or guesses will result in hallucinations.

The user’s prompt may also be poorly articulated, confusing the AI tool about how to respond. The generative AI tool will not understand how to respond, so it generates a response full of nonsensical or contradictory information instead.

The prompt may also include irrelevant information that confuses the AI tool, causing logical inconsistencies or factual errors. This tends to happen with overly long and involved prompts, where too much information is given to the AI tool.

Of course, some users will input prompts trying to generate hallucinations. These are referred to as jailbreak prompts and users who input these types of prompts are looking for vulnerabilities or flaws. Even the most sophisticated tools have vulnerabilities and are subject to jailbreaking.

How Traditional Search Addresses Query-Related Issues

Google faces many of these same challenges with search queries. A search query may be overly broad and not clearly indicate the user’s intentions. Google makes assumptions and guesses about what type of search result would be the most relevant. There are times when these assumptions and guesses are inaccurate. Search results, though, list multiple websites. So, while some of those websites may not be relevant to the user’s query, some of the other websites might be more relevant.

This also highlights a fundamental difference in generative AI search versus traditional search engines in the context of information seeking. Generative AI is responding to a prompt to help the user avoid doing extra work; people are conversing with AI tools with the goal of obtaining the information they want while avoiding going to a website. In contrast, users go to traditional search engines knowing they will have to do additional work to sort through the websites listed in the search results; they are prepared to see irrelevant websites as they try to find the desired information.

Hallucinations caused by prompt-related issues may be reduced in the future. Work is being done to improve language models to ensure that tools can still derive meaning and relevance from vague prompts. At the same time, the entire world is learning how to use generative AI tools, including learning how to construct better prompts. AI tools are also introducing ways of helping people construct better prompts, including offering more relevant prompt suggestions, with the goal of helping people avoid hallucinations.

How Common Are Hallucinations?

Misinformation is not much of a problem in traditional search results. Google has done a fairly good job of only surfacing authoritative websites with more accurate information. Mistakes can be made but because search results list multiple websites, people can usually visit enough of the websites listed to find correct information.

With generative AI responses taking over search results, mistakes might become more common. Google’s PaLM 2 Chat LLM, which is one of the LLMs powering Google’s SGE, has a hallucination rate of 10% as of April 2024 according to HuggingFace’s leaderboard. ChatGPT only has a hallucination rate of 2.5-3.5% across its different models.

ModelCompanyHallucination Rate
GPT 4 TurboOpenAI2.50%
GPT 4OpenAI3.00%
GPT 3.5 TurboOpenAI3.50%
Google Palm 2Google8.60%
Google Palm 2 ChatGoogle10.00%
Data from HuggingFace as of April 2024

I do not doubt that Google will improve their hallucination rate. However, the point is that the hallucination rate will never be 0% because of how generative AI works. Hallucinations will occur in at least some generative AI responses. So, what do we do about this?

How to Prepare for Hallucinations in SGE

Action Item #1: Check Related Generative AI Responses for Hallucinations

To begin, you need to review generative AI responses related to your business, the products or services your company sells, and your industry more generally. Do those responses contain any hallucinations? If so, how significant are those hallucinations?

If there are significant hallucinations or other forms of errors provided in the generated response, you need to understand where the generative AI system learned that misinformation. This may require digging through older content on the web. Of course, it might not be a specific source that caused the misinformation. Rather, it could be the generative AI system responded with fabricated information based on assumptions it made because there was no training data available. In that case, you need to create the necessary training data for the AI system to use when generating future responses.

Action Item #2: Provide Accurate Training Data

Generative AI systems are trained, at least in part, on website content. You need to make sure your website provides accurate information about your company, your products or services, your staff, and more. As much as you can, you also want to make sure other websites are accurately representing your company as well. This may require more extensive PR and brand-building efforts to ensure correct information is presented about your company.

There is an interesting problem to consider, though. As mentioned above, many companies are choosing to block ChatGPT and other generative AI tools from accessing their website. If generative AI tools cannot access your website, they cannot be trained on any of the information used on your website. Likely, people will ask ChatGPT for information about your company—do you want the generative AI system to use your website as a source when generating a response? If so, you need to allow ChatGPT to access at least some parts of your website.

Action Item #3: Prepare Responses for Users and Customers

We know people already turn to Google to learn about our companies. The review articles, forums, and other websites discussing our business might provide inaccurate information. However, generative AI takes this to a new level. With generative AI responses, your customers may get even more inaccurate or fabricated information about your products or services. This can lead to a distorted understanding of your company.

You need to be prepared to respond to these inaccuracies and fabrications. Have materials ready that provide accurate information to your users and customers. Let them know about common hallucinations they will see if asking generative AI systems questions about your products or services. Also, make sure your customer service and sales staff are ready to respond. You may also want to adjust your website content to proactively address any false information that your users or customers might encounter when conversing with generative AI.

Need Help?

If you want help preparing, contact me today. At Elementive, we help clients prepare for the future of search results powered by generative AI.

You may also like

Will People Click Links in SGE Results?

Will people click links in SGE results? How many clicks could we lose with SGE rolling out? We conducted an in-depth study to find out.

Understanding Prompt Types

How do we optimize for prompts and make sure our companies show up in generative AI search responses? How is this different than optimizing for search queries? It begins by understanding what types of prompts people use.

Search Intents & Generative AI

How will SGE change SEO? It likely depends on the search query intent. Learn how to adapt your SEO strategy and prepare for SGE.