Day one, I pointed my crawler at a site. Got nothing. Cloudflare challenge page. How do we fix that? Repeat with Imperva. Repeat with Turnstile. I wasn't even getting to the content.
What I used
The crawling stack is built on Crawlee with Playwright-Extra and the Stealth Plugin. But the tech alone isn't enough. The real work was:
- Custom detection for protection patterns (Cloudflare, Imperva, Turnstile, Akamai)
- Consistent browser identity: user-agent, fingerprints, and referrer chains that actually align
The key insight: degradation, not failure
I stopped treating "blocked" as failure. Instead, I built a degradation ladder with 4 levels: Full 30-page crawl is the best case. If that's blocked, fall back to a partial crawl. If that fails, try homepage only. And if even that's blocked, run a preflight check that still extracts DNS, SSL, and header data. Every step down still returns value. And critically, I tell the user what happened.
Blocked ≠ broken. Users trust you more when you explain what you couldn't do, instead of pretending you did it.