AI Crawler Directory

AI companies crawl your site to train models and answer questions about it. This directory lists the crawlers that matter, what each one actually does, whether it respects robots.txt, and the exact rules to allow or block it. To apply any of this in one click, use the AI Crawler Manager.

RSRobots.txt Studio Editorial Updated June 8, 2026 Reviewed against Google Search Central and RFC 9309

AI Crawler ManagerManage AI crawlers

Every AI crawler at a glance

Two questions decide every AI crawler policy: does it use my content for training, and does it honor robots.txt? Training crawlers (GPTBot, ClaudeBot, CCBot) are the ones most sites block. AI-search crawlers (OAI-SearchBot, PerplexityBot) can send referral traffic, so blocking them is a trade-off.

Crawler	Company	Trains AI	AI search	Honors robots.txt	Recommended
GPTBot	OpenAI	Yes	No	Yes	Block
OAI-SearchBot	OpenAI	No	Yes	Yes	Depends
ClaudeBot	Anthropic	Yes	No	Yes	Block
Claude-SearchBot	Anthropic	No	Yes	Yes	Depends
PerplexityBot	Perplexity	Partial	Yes	Partial	Depends
CCBot	Common Crawl	Yes	No	Yes	Block
Google-Extended	Google	Yes	No	Yes	Block
Bytespider	ByteDance	Yes	No	Partial	Block
Amazonbot	Amazon	Yes	Partial	Yes	Depends

Blocking AI crawlers does not affect Google ranking

Google-Extended, GPTBot and the rest are separate from Googlebot. You can block every AI training crawler and remain 100% visible in Google Search.

Robots.txt AnalyzerAnalyze a site

How to allow or block any AI crawler

Each crawler is identified by its User-agent token. Add a group for the crawler with Disallow: / to block it everywhere, or Disallow: (empty) to allow it. Multiple crawlers can share one group.

Block AI training, keep search engines

# Block the major AI training crawlers, allow everything else
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: CCBot
User-agent: Google-Extended
Disallow: /

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

Compliance varies

Well-behaved crawlers (OpenAI, Anthropic, Google) honor these rules. Some — like Bytespider — have a patchy record. robots.txt is a request, not an enforced firewall; verify behavior in the Analyzer.

GPTBot — OpenAI

Crawls the web to gather training data for OpenAI's foundation models.

Trains AI	AI search	Honors robots.txt
Yes	No	Yes

GPTBot allow / block config

# Block GPTBot
User-agent: GPTBot
Disallow: /

# Allow GPTBot
User-agent: GPTBot
Disallow:

Recommended: Block

Pure training crawler — block it if you don't want your content training ChatGPT.

Full GPTBot guide →Official OpenAI docs ↗

OAI-SearchBot — OpenAI

Fetches and links pages to surface them in ChatGPT Search results.

Trains AI	AI search	Honors robots.txt
No	Yes	Yes

OAI-SearchBot allow / block config

# Block OAI-SearchBot
User-agent: OAI-SearchBot
Disallow: /

# Allow OAI-SearchBot
User-agent: OAI-SearchBot
Disallow:

Recommended: Depends

Allow it for referral traffic from ChatGPT Search; block it to stay out of AI answers.

Full OAI-SearchBot guide →Official OpenAI docs ↗

ClaudeBot — Anthropic

Crawls the web to gather training data for Anthropic's Claude models.

Trains AI	AI search	Honors robots.txt
Yes	No	Yes

ClaudeBot allow / block config

# Block ClaudeBot
User-agent: ClaudeBot
Disallow: /

# Allow ClaudeBot
User-agent: ClaudeBot
Disallow:

Recommended: Block

Training crawler — block it to keep your content out of Claude's training set.

Full ClaudeBot guide →Official Anthropic docs ↗

Claude-SearchBot — Anthropic

Fetches pages so Claude can cite and answer with current web results.

Trains AI	AI search	Honors robots.txt
No	Yes	Yes

Claude-SearchBot allow / block config

# Block Claude-SearchBot
User-agent: Claude-SearchBot
Disallow: /

# Allow Claude-SearchBot
User-agent: Claude-SearchBot
Disallow:

Recommended: Depends

Allow it to be cited in Claude's web answers; block it to opt out of AI search.

Full Claude-SearchBot guide →Official Anthropic docs ↗

PerplexityBot — Perplexity

Indexes pages for Perplexity's AI answer engine and citations.

Trains AI	AI search	Honors robots.txt
Partial	Yes	Partial

PerplexityBot allow / block config

# Block PerplexityBot
User-agent: PerplexityBot
Disallow: /

# Allow PerplexityBot
User-agent: PerplexityBot
Disallow:

Recommended: Depends

Drives referral traffic, but has been reported to fetch some pages without a declared agent — verify with the Analyzer.

Full PerplexityBot guide →Official Perplexity docs ↗

CCBot — Common Crawl

Builds the open Common Crawl dataset that seeds many third-party AI models.

Trains AI	AI search	Honors robots.txt
Yes	No	Yes

CCBot allow / block config

# Block CCBot
User-agent: CCBot
Disallow: /

# Allow CCBot
User-agent: CCBot
Disallow:

Recommended: Block

One block stops your content reaching dozens of downstream model trainers that use Common Crawl.

Full CCBot guide →Official Common Crawl docs ↗

Google-Extended — Google

Opt-out token controlling whether Google uses your content for Gemini training and grounding.

Trains AI	AI search	Honors robots.txt
Yes	No	Yes

Google-Extended allow / block config

# Block Google-Extended
User-agent: Google-Extended
Disallow: /

# Allow Google-Extended
User-agent: Google-Extended
Disallow:

Recommended: Block

Blocking it has zero effect on Google Search ranking — Googlebot is separate — so you can opt out of AI training safely.

Full Google-Extended guide →Official Google docs ↗

Bytespider — ByteDance

Aggressively crawls the web for ByteDance/TikTok AI training.

Trains AI	AI search	Honors robots.txt
Yes	No	Partial

Bytespider allow / block config

# Block Bytespider
User-agent: Bytespider
Disallow: /

# Allow Bytespider
User-agent: Bytespider
Disallow:

Recommended: Block

High-volume training crawler with a patchy compliance record — most sites block it to save bandwidth.

Full Bytespider guide →

Amazonbot — Amazon

Crawls pages for Amazon products including Alexa answers and AI features.

Trains AI	AI search	Honors robots.txt
Yes	Partial	Yes

Amazonbot allow / block config

# Block Amazonbot
User-agent: Amazonbot
Disallow: /

# Allow Amazonbot
User-agent: Amazonbot
Disallow:

Recommended: Depends

Block to opt out of Amazon AI; allow if you want Alexa to answer from your content.

Full Amazonbot guide →Official Amazon docs ↗

How to test your AI crawler policy

A rule you haven't tested is a rule you're hoping works. Run your domain through the Analyzer to see every crawler's resolved access in one matrix, or check one URL against one user-agent with the URL Tester, which traces the exact rule that matched. Before shipping any edit, the Validator catches syntax errors that would silently change who gets in. For the complete blocking playbook, read How to Block AI Crawlers.

URL TesterTest a URL

Frequently asked questions

What is an AI crawler?

An AI crawler is a bot that fetches web pages to train an AI model or to answer questions in an AI product. Examples include OpenAI's GPTBot, Anthropic's ClaudeBot, and Perplexity's PerplexityBot. They are separate from search-engine crawlers like Googlebot.

Do AI crawlers respect robots.txt?

The major operators (OpenAI, Anthropic, Google, Common Crawl) document that their crawlers honor robots.txt. Some crawlers, such as Bytespider, have been reported to ignore it. robots.txt is a voluntary standard, so use the Analyzer to confirm a crawler is actually being blocked.

Should I block all AI crawlers?

Block training crawlers (GPTBot, ClaudeBot, CCBot, Google-Extended) if you don't want your content used to train models — it has no effect on search ranking. AI-search crawlers like OAI-SearchBot and PerplexityBot can drive referral traffic, so blocking those is a business trade-off, not an automatic win.

Which AI crawler should I worry about most?

CCBot has the widest reach: Common Crawl's dataset feeds dozens of downstream model trainers, so one block stops many of them. GPTBot, ClaudeBot, and Google-Extended cover the largest commercial models directly.

Robots.txt Validator

Catch syntax errors and best-practice issues, with a health score.

Validate your file

How to Block AI Crawlers

The complete blocking playbook.

Read

AI Crawler Manager

Allow or block every crawler with one click.

Read

Crawler Directory

All crawlers — search, SEO, social, AI.

Read

Robots.txt Analyzer

Confirm a crawler is really blocked.

Read

Next upHow to Block AI Crawlers

Robots.txt Studio Editorial · Technical SEO & crawling

We build robots.txt tooling and parse thousands of real-world files. Guides are written by practitioners and reviewed against the Google and RFC 9309 specifications.