How to Block AI Crawlers in robots.txt
Blocking AI crawlers takes one robots.txt group and has zero effect on your Google ranking — AI crawlers like GPTBot are completely separate from Googlebot. The only real decision is whether to block just training crawlers, or AI search crawlers too. This guide gives you both files and shows you how to verify they work.
The quick answer
Copy this into your robots.txt to block the major AI training crawlers while staying fully visible in Google, Bing, and every other search engine:
# Block AI training crawlers, keep search engines
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: CCBot
User-agent: Google-Extended
User-agent: Bytespider
User-agent: meta-externalagent
User-agent: Applebot-Extended
Disallow: /
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xmlThis does not hurt your SEO
Training crawlers vs. AI search crawlers
AI crawlers do two different jobs, and the right policy is usually different for each:
| Type | Crawlers | What blocking costs you |
|---|---|---|
| Training | GPTBot, ClaudeBot, CCBot, Google-Extended, Bytespider | Nothing measurable — your content just stops feeding model training. |
| AI search | OAI-SearchBot, Claude-SearchBot, PerplexityBot | Citations and referral clicks from ChatGPT Search, Claude, and Perplexity answers. |
Most sites block training and allow AI search: it protects content from being absorbed into models while keeping the attributed, linked visibility AI answers can bring. If you want out of AI entirely, use the full block below.
# Block AI training AND AI search crawlers
User-agent: GPTBot
User-agent: OAI-SearchBot
User-agent: ClaudeBot
User-agent: Claude-SearchBot
User-agent: PerplexityBot
User-agent: CCBot
User-agent: Google-Extended
User-agent: Bytespider
User-agent: meta-externalagent
User-agent: Applebot-Extended
User-agent: Amazonbot
Disallow: /
User-agent: *
Allow: /Verify the block actually works
- Upload the file to
https://yourdomain.com/robots.txt(it must be at the root — see where robots.txt lives). - Run your domain through the Analyzer — the access matrix shows exactly which AI crawlers are blocked and confirms Googlebot is still allowed.
- Spot-check a URL with the URL Tester against a specific user-agent like GPTBot — it traces the exact rule that matched.
robots.txt is a request, not a firewall
Common mistakes
Putting Disallow: / under User-agent: *
That blocks every crawler, including Google and Bing. AI blocks belong in their own group; your * group should keep Allow: /.
Blocking GPTBot and assuming you're done
Each company runs multiple crawlers under different tokens. Anthropic, OpenAI and Google each split training and search — a complete policy names each user-agent.
Expecting removal from existing models
robots.txt controls future crawling. Content already collected in past training runs is not retroactively removed.
Forgetting CCBot
Common Crawl's dataset feeds dozens of downstream model trainers. Blocking CCBot is the single highest-leverage line in the file.
Per-crawler deep dives
Every crawler in these examples has a dedicated guide with operator documentation, nuances, and FAQs — or browse the full AI Crawler Directory to compare all of them at a glance.
Does blocking AI crawlers hurt my Google ranking?
No. AI crawlers (GPTBot, ClaudeBot, CCBot, Google-Extended) use separate user-agents from Googlebot and Bingbot. You can block all of them and remain fully indexed and ranked in search. Google explicitly documents that Google-Extended has no effect on Search.
How do I block all AI bots at once?
List each AI user-agent on its own User-agent line in a single robots.txt group, followed by Disallow: /. There is no single wildcard that means 'all AI crawlers', so a complete policy names each token — or use the AI Crawler Manager to generate the group automatically.
Do AI crawlers actually obey robots.txt?
The major operators — OpenAI, Anthropic, Google, Common Crawl, Apple, Meta — document that their crawlers honor robots.txt. Bytespider has a reported history of ignoring it. robots.txt is a voluntary standard, so verify with an analyzer and use server-level blocking when compliance matters.
Should I block AI search crawlers like PerplexityBot too?
It's a trade-off. AI search crawlers cite and link your pages, which can drive referral traffic. Blocking them removes you from those answers. Most sites block training crawlers and keep AI search allowed.
Will blocking remove my content from ChatGPT or Claude?
Blocking prevents future crawling. It does not remove content from models that were already trained on it. The earlier you block, the less of your content enters future training runs.
Robots.txt Validator
Catch syntax errors and best-practice issues, with a health score.
Robots.txt Studio Editorial · Technical SEO & crawling
We build robots.txt tooling and parse thousands of real-world files. Guides are written by practitioners and reviewed against the Google and RFC 9309 specifications.