How to Block AI Crawlers in robots.txt

Blocking AI crawlers takes one robots.txt group and has zero effect on your Google ranking — AI crawlers like GPTBot are completely separate from Googlebot. The only real decision is whether to block just training crawlers, or AI search crawlers too. This guide gives you both files and shows you how to verify they work.

RSRobots.txt Studio Editorial Updated June 8, 2026 Reviewed against Google Search Central and RFC 9309

AI Crawler ManagerManage AI crawlers

The quick answer

Copy this into your robots.txt to block the major AI training crawlers while staying fully visible in Google, Bing, and every other search engine:

Block AI training, keep search

# Block AI training crawlers, keep search engines
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: CCBot
User-agent: Google-Extended
User-agent: Bytespider
User-agent: meta-externalagent
User-agent: Applebot-Extended
Disallow: /

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

This does not hurt your SEO

Google-Extended, GPTBot, ClaudeBot and the rest are separate from Googlebot and Bingbot. You can block every AI training crawler and lose nothing in search rankings.

AI Crawler ManagerManage AI crawlers

Training crawlers vs. AI search crawlers

AI crawlers do two different jobs, and the right policy is usually different for each:

Type	Crawlers	What blocking costs you
Training	GPTBot, ClaudeBot, CCBot, Google-Extended, Bytespider	Nothing measurable — your content just stops feeding model training.
AI search	OAI-SearchBot, Claude-SearchBot, PerplexityBot	Citations and referral clicks from ChatGPT Search, Claude, and Perplexity answers.

Most sites block training and allow AI search: it protects content from being absorbed into models while keeping the attributed, linked visibility AI answers can bring. If you want out of AI entirely, use the full block below.

Block every AI crawler, training and search

# Block AI training AND AI search crawlers
User-agent: GPTBot
User-agent: OAI-SearchBot
User-agent: ClaudeBot
User-agent: Claude-SearchBot
User-agent: PerplexityBot
User-agent: CCBot
User-agent: Google-Extended
User-agent: Bytespider
User-agent: meta-externalagent
User-agent: Applebot-Extended
User-agent: Amazonbot
Disallow: /

User-agent: *
Allow: /

Verify the block actually works

Upload the file to https://yourdomain.com/robots.txt (it must be at the root — see where robots.txt lives).
Run your domain through the Analyzer — the access matrix shows exactly which AI crawlers are blocked and confirms Googlebot is still allowed.
Spot-check a URL with the URL Tester against a specific user-agent like GPTBot — it traces the exact rule that matched.

robots.txt is a request, not a firewall

OpenAI, Anthropic, Google, and Common Crawl document compliance. Bytespider has a patchy record, and bad actors ignore robots.txt entirely. For hard enforcement, block user-agents or IPs at your CDN/server on top of robots.txt.

Robots.txt AnalyzerAnalyze a site

Common mistakes

Putting Disallow: / under User-agent: *
That blocks every crawler, including Google and Bing. AI blocks belong in their own group; your * group should keep Allow: /.
Blocking GPTBot and assuming you're done
Each company runs multiple crawlers under different tokens. Anthropic, OpenAI and Google each split training and search — a complete policy names each user-agent.
Expecting removal from existing models
robots.txt controls future crawling. Content already collected in past training runs is not retroactively removed.
Forgetting CCBot
Common Crawl's dataset feeds dozens of downstream model trainers. Blocking CCBot is the single highest-leverage line in the file.

Per-crawler deep dives

Every crawler in these examples has a dedicated guide with operator documentation, nuances, and FAQs — or browse the full AI Crawler Directory to compare all of them at a glance.

Frequently asked questions

Does blocking AI crawlers hurt my Google ranking?

No. AI crawlers (GPTBot, ClaudeBot, CCBot, Google-Extended) use separate user-agents from Googlebot and Bingbot. You can block all of them and remain fully indexed and ranked in search. Google explicitly documents that Google-Extended has no effect on Search.

How do I block all AI bots at once?

List each AI user-agent on its own User-agent line in a single robots.txt group, followed by Disallow: /. There is no single wildcard that means 'all AI crawlers', so a complete policy names each token — or use the AI Crawler Manager to generate the group automatically.

Do AI crawlers actually obey robots.txt?

The major operators — OpenAI, Anthropic, Google, Common Crawl, Apple, Meta — document that their crawlers honor robots.txt. Bytespider has a reported history of ignoring it. robots.txt is a voluntary standard, so verify with an analyzer and use server-level blocking when compliance matters.

Should I block AI search crawlers like PerplexityBot too?

It's a trade-off. AI search crawlers cite and link your pages, which can drive referral traffic. Blocking them removes you from those answers. Most sites block training crawlers and keep AI search allowed.

Will blocking remove my content from ChatGPT or Claude?

Blocking prevents future crawling. It does not remove content from models that were already trained on it. The earlier you block, the less of your content enters future training runs.

Robots.txt Validator

Catch syntax errors and best-practice issues, with a health score.

Validate your file

AI Crawler Manager

Apply any of these policies in one click.

Read

AI Crawler Directory

Every AI crawler compared in one table.

Read

robots.txt Examples

Complete files for common site types.

Read

Robots.txt Validator

Check your file before you ship it.

Read

Next upBlock GPTBot in robots.txt

Robots.txt Studio Editorial · Technical SEO & crawling

We build robots.txt tooling and parse thousands of real-world files. Guides are written by practitioners and reviewed against the Google and RFC 9309 specifications.