I wanted AI-powered reviews from three perspectives: a CMO focused on growth, an SEO specialist on search visibility, and a CTO on technical quality. But in early tests: CMO said "You have 17 CTAs above the fold." SEO said "You have 0 CTAs." Same website. Same crawl. The personas were hallucinating, contradicting each other, and repeating the same advice three different ways.
How I fixed it
Four changes made the system reliable:
- Each persona only sees relevant signals. The CMO doesn't get raw header data, the CTO doesn't get content readability scores
- Shared facts layer: same evidence, different interpretations. All three personas work from the same crawl data
- Universal guardrails: no inventing data, no raw scores in prose. If a fact isn't in the input pack, it can't appear in the output
- Gemini reviews every AI output for unsupported claims before it reaches the user
Why disagreement is valuable
The personas still disagree sometimes. A CMO might recommend adding more CTAs while a CTO flags page weight. That's good. It's real disagreement about priorities, not bad data. Explicit formatting fixed the confusion: "17 total CTAs, 14 above fold" leaves no room for misinterpretation.
Disagreement is valuable, and it's real disagreement about priorities, not bad data.