Running a GEO audit is not the same as running a schema validator. Schema validity matters, but which schema types you have, which AI crawlers you allow, and whether your site can be attributed to a real entity are distinct questions. Stackra checks all four groups of GEO signals independently and reports them as structured evidence alongside your overall AI visibility score.
Group 1: AI bot access
The most foundational GEO signal is whether major AI crawlers are permitted by your robots.txt. Stackra reads and parses your robots.txt and checks access for each crawler individually, not just as a pass or fail on the file as a whole:
- GPTBot: OpenAI's primary web crawler, used for model training and ChatGPT Search.
- OAI-SearchBot: OpenAI's real-time search crawler for ChatGPT's live web browsing. It has its own user-agent string and is governed separately from GPTBot.
- ClaudeBot: Anthropic's crawler for Claude.
- Google-Extended: Google's opt-out signal for AI training and Gemini products. Blocking it removes your site from AI Overviews and Gemini while leaving standard Googlebot unaffected.
- PerplexityBot: Perplexity's crawler. Stackra records it for evidence but flags it as informational only. Cloudflare has documented that PerplexityBot does not reliably honor robots.txt disallow rules. Your robots.txt stance on PerplexityBot does not reliably control whether Perplexity crawls your site.
-
OpenAI: GPTBot documentation ↗
Official OpenAI documentation for GPTBot, including how to allow or disallow it in robots.txt.
-
Google: AI crawlers and Google-Extended ↗
Google's full list of crawlers including Google-Extended and how each one is governed.
-
Cloudflare: PerplexityBot and robots.txt compliance ↗
Cloudflare's analysis of AI bot behavior, including documented cases of PerplexityBot ignoring disallow rules.
Most audit tools that check robots.txt do so for Googlebot only, or report a single pass or fail on whether the file is reachable. Stackra parses per-bot rules so you can see your access posture for each AI platform separately.
Group 2: Schema readiness
Schema markup is the structured signal layer that tells AI tools what your content means, not just what it says. Stackra divides schema readiness into two sub-categories with different GEO functions.
Entity schemas
Entity schemas establish who you are. AI tools use these to place your site in a knowledge graph and associate it with a real-world entity. Three types are tracked:
- Organization: the primary entity signal for any business without a physical storefront. Should include name, URL, description, and logo at minimum.
- LocalBusiness: extends Organization with address, phone, and hours. The right choice for any business with a physical location.
- Person: for individual practitioners, consultants, and personal brands. Includes name and relevant professional context.
Citability schemas
Citability schemas establish what you publish. These directly influence whether AI tools include your pages when generating answers about topics your content covers:
- Article and BlogPosting: the primary citability signal for editorial content. Applies to blog posts, guides, and opinion pieces. The most broadly applicable citability schema type.
- HowTo: highest citability signal for step-by-step and instructional content. Well-suited to tutorials, process guides, and how-to articles.
- BreadcrumbList: present in 15–20 percent of AI-cited pages (ChatGPT Search and Google AI Mode respectively, per Semrush's 2025 AI search study). Signals site hierarchy and helps AI systems understand where a page sits within a site's structure.
- FAQPage: detected and stored on every scan. For healthcare providers and government organizations, FAQPage remains a first-class citability signal — Google still surfaces rich results for those site types — and it IS counted in citabilityTypeCount for them. For all other site types, Google restricted FAQPage rich results in August 2023, so it is excluded from the count. It still has residual value for non-Google AI tools like Perplexity and ChatGPT, which apply no equivalent restriction.
-
Semrush: Technical SEO Impact on AI Search (2025) ↗
Study analyzing schema markup patterns across pages cited by ChatGPT Search and Google AI Mode.
Stackra also computes a citabilityTypeCount: a count of distinct active citability schema types detected on a site. For most sites the count covers Article/BlogPosting and HowTo, ranging from 0 to 2. For healthcare and government sites, FAQPage is also counted — Google still surfaces FAQPage rich results for those categories — making the range 0 to 3. BreadcrumbList is detected and stored but excluded from the count because it is a subpage schema that will always appear absent on a homepage scan. The count is reported directly in your GEO evidence card.
Group 3: Entity clarity
Entity clarity answers whether AI tools can confidently identify who runs your site and where you operate. Stackra derives three entity signals from multiple detection layers, with no AI call involved. The detection is fully deterministic:
- Business name detected: checked first in JSON-LD Organization or LocalBusiness schema (name property), then in microdata itemprop values, then in page metadata (og:site_name, publisher tag).
- Location detected: checked first in JSON-LD address, geo, and areaServed fields, then in microdata address properties, then in geo meta tags (geo.region, geo.placename).
- Named person detected: checked first in JSON-LD Person schema, then in microdata author itemprop values, then in page metadata author tags.
The three signals combine into a three-tier confidence rating. Zero confirmed signals: low. One confirmed signal: moderate. Two or more confirmed signals: high. A site with Organization schema (business name) and a LocalBusiness address (location) automatically reaches high confidence with no additional work required.
Group 4: Supporting signals
Two infrastructure signals that determine whether AI crawlers can discover and fully index your site:
- Sitemap reachable: whether your sitemap.xml is present and returns a valid response. AI crawlers use sitemaps the same way Googlebot does, to discover pages they might not find through link crawling alone.
- Robots.txt reachable: whether your robots.txt returns a valid response. A missing or unreachable robots.txt means crawlers fall back to permissive defaults, but it also signals a configuration gap that affects GEO signal reliability.
How this compares to other audit tools
Most website auditors that mention GEO or AI visibility do one of three things:
- Check whether your robots.txt blocks Googlebot — but not AI-specific bots.
- Show a raw list of detected schema types with no grouping by function.
- Run a generic structured data validator focused on markup errors.
None of these tell you whether your site is citable, what your entity confidence level is, or whether the specific bots powering ChatGPT Search and Perplexity can reach you at all.
Stackra's GEO check is fully deterministic. No AI call is made during signal collection. Signals are read from your robots.txt, your structured data, your HTML metadata, and your site infrastructure. The result is a grouped evidence view that separates what you allow, what you publish, who you are, and whether your crawlability foundations are in place.