If you have asked ChatGPT or Perplexity about your business and come up empty, you are not alone. The natural question is: why? The natural next question is usually something about training data, or whether the AI has seen your website, or whether you need to get into some model's knowledge base. Almost all of those instincts point at the wrong problem. The actual answer depends on which AI tool you are asking, and once you understand the split, the path forward becomes much clearer.
Two systems that often get confused
The first distinction worth making is between AI crawlers and AI search tools. They are related but not the same thing.
AI crawlers are bots: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, Google-Extended. They visit your website to collect content for training datasets or to power real-time retrieval. Your robots.txt governs whether they are allowed in. Blocking them is one way a site can hurt its AI visibility, and it is something Stackra checks directly.
AI search tools are what you are actually talking to: ChatGPT, Claude, Perplexity, Gemini, Microsoft Copilot. These are the tools people ask questions. Whether a crawler visited your site is a prerequisite, not a guarantee, that an AI search tool will mention you. Allowing bots in is necessary but nowhere near sufficient.
Inside AI search tools: the split that changes everything
The more important distinction is between two entirely different operating modes inside these tools.
- Static knowledge (training data): The model learned from a snapshot of the web taken before a hard cutoff date. It has no awareness of anything that happened after that date. When you use ChatGPT without web search enabled, or Claude with no browsing turned on, you are talking to a static knowledge base. It either knows about your business from training or it does not.
- Real-time web search (live retrieval): The tool searches the web at query time and synthesizes an answer from current results. Perplexity does this on every query, by design. ChatGPT Search uses live web retrieval, though OpenAI has not publicly described the exact indexing layer in detail. Microsoft Copilot draws from Bing's live index. Gemini draws from Google's. The mechanics differ across tools, but in all cases the answer is shaped by what is findable and rankable right now.
Training cutoffs: what they actually are
Every static knowledge model has a date beyond which it knows nothing. Here are the current cutoffs for the major models as of early 2026:
- Claude Opus 4.6 and Sonnet 4.6 (Anthropic): August 2025
- GPT-5.4 and GPT-5.2 (OpenAI): August 2025
- GPT-4.1, o3, o4-mini (OpenAI): June 2024
- GPT-4o, GPT-4o mini, o1 (OpenAI): October 2023
- Perplexity Sonar (base model): June 2024
- Gemini 3.1 Flash-Lite (Google): January 2025
- Gemini 2.0 Flash (Google): mid-2024
- GPT-4, GPT-3.5 (older OpenAI): September 2021
The more important thing to understand about these cutoffs is not the dates themselves. It is what makes it into training data in the first place. Based on what is publicly known about how large training datasets are assembled, they tend to weight heavily toward high-authority sources: major publications, Wikipedia, Reddit, GitHub, large industry sites. The exact selection criteria are proprietary, but the pattern is consistent enough that most practitioners treat it as a working assumption: an individual small business website, even a well-built one, is unlikely to carry much weight in training data regardless of the cutoff. The cutoff tells you when collection stopped. It says little about whether your site was meaningfully included.
What this means for your business
Here is the part that most articles on this topic get wrong. Chasing training data inclusion is almost never the right move for an SMB. These are the things that do not move the needle:
- Training cutoff dates: the cutoff is not why you are missing. Inclusion thresholds are why you are missing, and those are determined by third-party authority and link coverage, not site optimization.
- Getting into the model: there is no submission process, no opt-in, and no direct path for a small business to guarantee training data inclusion in any major model.
- AI crawler access by itself: allowing GPTBot and ClaudeBot matters, and blocking them can create a real problem. But access alone does not cause a tool to mention you. It removes one potential blocker. The content and signals on those pages still determine what happens next.
What actually determines whether a real-time AI tool mentions your business comes down to three things:
- You exist in the index. You need to be crawlable, indexed, and ranking at least moderately for the queries where you want to appear. If Google or Bing cannot find you, Gemini and ChatGPT Search cannot either. This is SEO — not optional, not advanced.
- You are understandable as an entity. AI tools do not just retrieve pages. They identify who the page is about. Your business name, what you do, and where you operate need to be explicit in your structured data and metadata — not buried in a paragraph or implied by your domain name.
- You are extractable. This is the piece that traditional SEO misses. AI tools generate answers by pulling specific passages from pages. If your content is not written as direct answers — clear headings, specific summaries, structured how-tos — the model may retrieve your page and still not quote it. It needs content it can lift intact and attribute cleanly.
The actual reason AI is not mentioning your business
For Perplexity, ChatGPT Search, Copilot, and Gemini — the tools most people are actually using for local and product research — answers are drawn from live web results. The mechanics differ between tools, but the outcome is the same: your business needs to be findable, rankable, and clearly expressed. If you are absent from those answers, it is almost always because the site is not visible enough, the business is not recognizable as a distinct entity, or the content is not written in a way the model can extract and attribute cleanly.
For most small businesses, the main reason AI tools do not mention them is not a training data cutoff. It is usually one of three things: the site is not visible enough in live retrieval systems, the business is not clearly expressed as an entity, or the page content is not structured in a way the model can confidently extract and cite.
These are solvable problems. Training data inclusion is not, at least not directly. The gap between "AI doesn't mention me" and "AI mentions me regularly" is almost always on the retrieval side: SEO foundations, entity clarity, and extractable content. That is the work worth doing.