If you have asked ChatGPT or Perplexity about your business and come up empty, you are not alone. The natural question is: why? The natural next question is usually something about training data, or whether the AI has seen your website, or whether you need to get into some model's knowledge base. Almost all of those instincts point at the wrong problem. The actual answer depends on which AI tool you are asking, and once you understand the split, the path forward becomes much clearer.
Two systems that often get confused
The first distinction worth making is between AI crawlers and AI search tools. They are related but not the same thing.
AI crawlers are bots: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, Google-Extended. They visit your website to collect content for training datasets or to power real-time retrieval. Your robots.txt governs whether they are allowed in. Blocking them is one way a site can hurt its AI visibility, and it is something Stackra checks directly.
AI search tools are what you are actually talking to: ChatGPT, Claude, Perplexity, Gemini, Microsoft Copilot. These are the tools people ask questions. Whether a crawler visited your site is a prerequisite, not a guarantee, that an AI search tool will mention you. Allowing bots in is necessary but nowhere near sufficient.
Inside AI search tools: the split that changes everything
The more important distinction is between two entirely different operating modes inside these tools.
- Static knowledge (training data): The model learned from a snapshot of the web taken before a hard cutoff date. It has no awareness of anything that happened after that date. When you use ChatGPT without web search enabled, or Claude with no browsing turned on, you are talking to a static knowledge base. It either knows about your business from training or it does not.
- Real-time web search (live retrieval): The tool searches the web at query time and synthesizes an answer from current results. Perplexity does this on every single query, by design. Microsoft Copilot uses Bing's live index. ChatGPT with web search enabled uses Bing. Gemini uses Google's live index. For these tools, the answer is generated from whatever is ranking right now.
Training cutoffs: what they actually are
Every static knowledge model has a date beyond which it knows nothing. Here are the current cutoffs for the major models as of early 2026:
- Claude Opus 4.6 and Sonnet 4.6 (Anthropic): August 2025
- GPT-5.4 and GPT-5.2 (OpenAI): August 2025
- GPT-4.1, o3, o4-mini (OpenAI): June 2024
- GPT-4o, GPT-4o mini, o1 (OpenAI): October 2023
- Gemini 3.1 Flash-Lite (Google): January 2025
- Gemini 2.0 Flash (Google): mid-2024
- GPT-4, GPT-3.5 (older OpenAI): September 2021
Perplexity is not on this list because Perplexity always searches the web. Its base model has a training cutoff, but every answer draws primarily from live retrieval — not from frozen knowledge.
The more important thing to understand about these cutoffs is not the dates themselves. It is what makes it into training data in the first place. Training datasets skew heavily toward high-authority sources: major publications, Wikipedia, Reddit, GitHub, large industry sites. An individual small business website — even a well-built one — is statistically unlikely to appear in training data regardless of the cutoff date. The cutoff tells you when the data collection stopped. It says nothing about whether your site met the threshold to be included.
What this means for your business
Here is the part that most articles on this topic get wrong. Chasing training data inclusion is almost never the right move for an SMB. These are the things that do not move the needle:
- Training cutoff dates: the cutoff is not why you are missing. Inclusion thresholds are why you are missing, and those are determined by third-party authority and link coverage, not site optimization.
- Getting into the model: there is no submission process, no opt-in, and no direct path for a small business to guarantee training data inclusion in any major model.
- AI crawler access by itself: allowing GPTBot and ClaudeBot is necessary but does not cause a tool to mention you. It just removes a potential blocker.
What actually determines whether a real-time AI tool mentions your business comes down to three things:
- You exist in the index. You need to be crawlable, indexed, and ranking at least moderately for the queries where you want to appear. If Google or Bing cannot find you, Gemini and ChatGPT Search cannot either. This is SEO — not optional, not advanced.
- You are understandable as an entity. AI tools do not just retrieve pages. They identify who the page is about. Your business name, what you do, and where you operate need to be explicit in your structured data and metadata — not buried in a paragraph or implied by your domain name.
- You are extractable. This is the piece that traditional SEO misses. AI tools generate answers by pulling specific passages from pages. If your content is not written as direct answers — clear headings, specific summaries, structured how-tos — the model may retrieve your page and still not quote it. It needs content it can lift intact and attribute cleanly.
The actual reason AI is not mentioning your business
For Perplexity, ChatGPT Search, Copilot, and Gemini — the tools most people are actually using for local and product research — the answer comes from live web results. Your business is not in those answers because it is not ranking for the relevant query, or because the content on your page is not extractable enough to quote, or because your entity signals are ambiguous enough that the AI cannot confidently attribute the answer to you.
If AI isn't mentioning your business, it's almost never because of training data. It's because your site wasn't selected during real-time retrieval.
These are solvable problems. Training data inclusion is not, at least not directly. The gap between "AI doesn't mention me" and "AI mentions me regularly" is almost always on the retrieval side: SEO foundations, entity clarity, and extractable content. That is the work worth doing.