When ChatGPT, Claude, or Perplexity wants to answer a question by browsing the web, the first thing each tool does is request your site's robots.txt file. That file tells crawlers what they are allowed to read. If your robots.txt blocks the major AI crawlers, your site becomes ineligible to be fetched and cited by those tools when they answer real-time queries, and (separately) excluded from the training data future versions of those models are built on. The exact effect varies per crawler, but the cumulative result for a small business is meaningfully reduced visibility in AI search. We have audited dozens of small business sites in the past year, and a meaningful percentage have this block in place without the owner ever knowing they enabled it. Here is how to check, and how to fix it if you find it.

How to check your robots.txt in 60 seconds

Open a new browser tab and type your domain followed by /robots.txt. For example: https://yoursite.com/robots.txt. You should see a small text file. If you get a 404 or a blank page, that is fine: it means no crawlers are blocked. If you see content, scan it for lines that start with User-agent followed by any of these names: GPTBot, ChatGPT-User, ClaudeBot, anthropic-ai, Google-Extended, PerplexityBot, CCBot, Bytespider, FacebookBot, Applebot-Extended. If any of those user-agents are followed by a Disallow: / line, that crawler is blocked.

What each crawler does

GPTBot is OpenAI's web crawler used to gather training data and to fetch live pages when ChatGPT browses on a user's behalf. ChatGPT-User is the user-initiated version (when someone in a chat asks ChatGPT to look at a specific URL). ClaudeBot and anthropic-ai are Anthropic's crawlers for Claude. Google-Extended is Google's separate opt-out crawler for Bard, Gemini, and AI Overviews training. PerplexityBot is Perplexity's crawler. CCBot is Common Crawl, which feeds many of these systems indirectly. Blocking any one of these closes a specific channel.

Why your site might be blocking these without you knowing

Three common ways this happens. First, Cloudflare. Cloudflare added an AI Scrapers and Crawlers blocking feature in 2024 that is a one-click toggle in the dashboard. It is easy to enable by accident, and it modifies your robots.txt automatically. Second, your CMS or host. Some WordPress security plugins and some managed hosts (especially WP Engine and Kinsta variants) added default AI blocking after public discussion in 2023 and 2024. Third, your developer or agency may have added these blocks during a privacy or content-protection sweep without flagging it as a search trade-off.

How to fix it: WordPress

If you are on WordPress and the block is plugin-driven, look in your SEO plugin (RankMath, Yoast, or All in One SEO) for a robots.txt editor. Remove the lines blocking AI crawlers and save. If the block is from a security plugin (Wordfence, iThemes Security), check that plugin's crawler settings. If you can't find the source, the simplest fix is to override robots.txt by adding your own version: most SEO plugins allow this directly from their dashboard.

How to fix it: Cloudflare

Log into your Cloudflare dashboard. Select your site. In the left sidebar, find Security, then Bots. Look for AI Scrapers and Crawlers. If it is set to Block, change it to Allow or to Managed Challenge if you want some friction without an outright block. Cloudflare also exposes a per-crawler control called AI Audit; use that if you want to allow some AI crawlers but block others.

How to fix it: Shopify, Wix, Squarespace

On Shopify, robots.txt was historically locked but is now editable through the robots.txt.liquid template (Online Store > Themes > Edit code). On Wix and Squarespace, robots.txt is auto-generated and not directly editable. Both platforms allow AI crawlers by default, so the most likely cause of blocking on these platforms is a Cloudflare or third-party CDN sitting in front of your site. Check there first.

What good looks like after the fix

After you remove the blocks, fetch your robots.txt again to confirm. The minimum viable version for AI visibility allows all the major AI crawlers (no Disallow lines for them). The cleanest version explicitly allows them: User-agent: GPTBot followed by Allow: /. This is functionally the same as no entry at all, but it serves as visible documentation that you have made an intentional choice. Most major sites that want to be cited in AI now publish a robots.txt that explicitly lists the AI crawlers they allow.

How long until AI tools start seeing my site again?

AI crawlers typically re-fetch robots.txt within 24 to 48 hours. After that, your pages become eligible to be fetched and used in AI answers when relevant queries come in. Re-citation from AI tools follows the same indexing latency as Google: you will not see immediate inclusion, but within a few weeks of being unblocked, your business should start appearing in answers to relevant queries that you would expect to be cited for.

What this fix does not do

Allowing AI crawlers makes your site eligible to be cited. It does not guarantee you will be. The factors that determine actual citation are largely the same as traditional SEO factors plus structured data (schema markup), clear topic focus, and authoritative content depth. For local businesses specifically, your Google Business Profile drives more local AI and search traffic than any single change to your website. We covered the broader landscape in our 16 SEO truths post.

Frequently asked questions

Common questions about robots.txt and AI crawlers.

Should small businesses block AI crawlers?

For most small businesses, no. Blocking AI crawlers makes your business invisible to a fast-growing share of search traffic without giving you anything meaningful in return. The privacy and copyright reasons cited for blocking apply more to media and publishing companies whose business model is selling content. A typical local service business or SMB has the opposite problem: not enough discoverability, not too much.

Will allowing AI crawlers slow down my site?

No, in any meaningful sense. AI crawlers respect rate limits and crawl politely. The total bandwidth from AI crawlers across a typical small business site is a tiny fraction of normal traffic. If you do see suspicious load, the right response is per-crawler rate limiting, not a blanket block.

Is GPTBot the same as ChatGPT?

Not exactly. GPTBot is OpenAI's general-purpose web crawler, used for training data and offline indexing. ChatGPT-User is the user-initiated crawler used when someone in a chat asks ChatGPT to fetch a specific page. Blocking GPTBot but allowing ChatGPT-User means you opt out of training but still allow real-time browsing. Some sites use this configuration deliberately.

What about Common Crawl (CCBot)?

Common Crawl is a non-profit web crawler whose data feeds many AI training datasets indirectly. Blocking CCBot does not block AI tools at query time, but it removes your site from the training corpora that future LLM versions are built on. For most small businesses, this is a meaningful long-term visibility loss with little upside.

Where is robots.txt actually stored on my server?

Physically, robots.txt is just a text file in the root directory of your web server. On WordPress and most modern CMS platforms, it is generated dynamically and may not exist as an actual file: the platform serves the response when the URL is requested. On a static site or a custom-built site, it lives at /public/robots.txt or /static/robots.txt. To edit it, you typically use your CMS interface or your host's file manager rather than editing the file directly.