Crawler Directory

The complete reference to the bots that crawl the web: search engines, AI crawlers, SEO tools, and social preview bots. For each one — what it does, whether it respects robots.txt, and the exact rules to control it. To act on any of this, run the Analyzer or open the AI Crawler Manager.

RSRobots.txt Studio Editorial Updated June 8, 2026 Reviewed against Google Search Central and RFC 9309

Every crawler that visits your site

A web crawler is an automated bot that fetches your pages. Some build search indexes that send you traffic (Googlebot, Bingbot), some gather data to train AI models (GPTBot, ClaudeBot, Bytespider), some power SEO tools (AhrefsBot, SemrushBot), and some build social link previews. robots.txt is how you tell each one what it may access. This directory covers the crawlers worth knowing, grouped by what they do.

Blocking AI and SEO crawlers does not affect your Google ranking

Only search-engine crawlers (Googlebot, Bingbot) control your search visibility. You can block every AI and SEO crawler and remain fully indexed in Google and Bing.
Robots.txt AnalyzerAnalyze a site

Search engines

CrawlerOperatorWhat it doesRecommended
GooglebotGoogleGoogle Search's crawler — controls your visibility in Google.Allow
BingbotMicrosoftMicrosoft Bing's crawler — controls visibility in Bing.Allow
SlurpYahooYahoo Search's crawler.Allow

AI training crawlers

CrawlerOperatorWhat it doesRecommended
GPTBotOpenAICollects content to train OpenAI's models (e.g. ChatGPT).Block
ClaudeBotAnthropicCollects content to train Anthropic's Claude models.Block
CCBotCommon CrawlBuilds the Common Crawl public dataset used to train many AI models.Block
BytespiderByteDanceByteDance's crawler, used for AI training.Block
Meta-ExternalAgentMetaMeta's crawler used for AI training.Block

SEO crawlers

CrawlerOperatorWhat it doesRecommended
AhrefsBotAhrefsAhrefs' SEO backlink crawler.Allow
SemrushBotSemrushSemrush's SEO analytics crawler.Allow
MJ12botMajesticMajestic's backlink crawler.Allow

Social crawlers

CrawlerOperatorWhat it doesRecommended
TwitterbotX (Twitter)Generates link previews for X/Twitter.Allow
facebookexternalhitMetaGenerates link previews for Facebook.Allow

Other / data crawlers

CrawlerOperatorWhat it doesRecommended
AmazonbotAmazonAmazon's crawler (used by Alexa and AI products).Block
cohere-aiCohereCohere's crawler for AI products.Block
DiffbotDiffbotDiffbot's structured-data crawler.Block
ImagesiftBotImageSiftImageSift's image crawler.Block

How to allow or block any crawler

Every crawler is identified by a User-agent token. Add a group naming the crawler, then Disallow: / to block it everywhere or an empty Disallow: to allow it. Crawlers not named anywhere fall back to the User-agent: * group.

Block AI training, allow search engines
# Block AI training crawlers, keep search engines
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: CCBot
User-agent: Bytespider
Disallow: /

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

robots.txt is a request, not a firewall

Reputable crawlers obey it; some (like Bytespider) have a patchy record. Confirm a crawler is actually blocked with the Analyzer, and verify a bot's identity by reverse DNS before trusting its user-agent.
Frequently asked questions
What is a web crawler?

A web crawler (or bot/spider) is an automated program that fetches web pages. Search engines crawl to build their index, AI companies crawl to train models or answer questions, and SEO tools crawl to map links. You control what each crawler can access using robots.txt.

How do I block a specific crawler?

Add a group to your robots.txt naming the crawler's User-agent token, followed by Disallow: / to block your whole site. For example, User-agent: AhrefsBot then Disallow: /. Use an empty Disallow: to allow it.

Will blocking crawlers hurt my SEO?

Only if you block search-engine crawlers like Googlebot or Bingbot. Blocking AI crawlers (GPTBot, ClaudeBot) or SEO tools (AhrefsBot, SemrushBot) has no effect on your Google or Bing rankings.

How do I know a crawler is really who it claims to be?

User-agent strings can be spoofed. For major crawlers, verify the request IP with a reverse DNS lookup that forward-confirms to the operator's domain (e.g. googlebot.com), or match the operator's published IP ranges.

AI Crawler Manager

Allow or block GPTBot, ClaudeBot, PerplexityBot and more in one place.

Manage AI crawlers
Related resources
Next upAI Crawler Directory
RS

Robots.txt Studio Editorial · Technical SEO & crawling

We build robots.txt tooling and parse thousands of real-world files. Guides are written by practitioners and reviewed against the Google and RFC 9309 specifications.