I wanted AI-powered reviews from three perspectives: a CMO focused on growth, an SEO specialist on search visibility, and a CTO on technical quality. But in early tests: CMO said "You have 17 CTAs above the fold." SEO said "You have 0 CTAs." Same website. Same crawl. The personas were hallucinating, contradicting each other, and repeating the same advice three different ways.

How I fixed it

Four changes made the system reliable:

  • Each persona only sees relevant signals. The CMO doesn't get raw header data, the CTO doesn't get content readability scores
  • Shared facts layer: same evidence, different interpretations. All three personas work from the same crawl data
  • Universal guardrails: no inventing data, no raw scores in prose. If a fact isn't in the input pack, it can't appear in the output
  • Gemini reviews every AI output for unsupported claims before it reaches the user

Why disagreement is valuable

The personas still disagree sometimes. A CMO might recommend adding more CTAs while a CTO flags page weight. That's good. It's real disagreement about priorities, not bad data. Explicit formatting fixed the confusion: "17 total CTAs, 14 above fold" leaves no room for misinterpretation.

Disagreement is valuable, and it's real disagreement about priorities, not bad data.