What Is robots.txt?
robots.txt is a plain-text file at the root of your domain that tells web crawlers which parts of your site they may request. It is the first thing search engines and AI bots look for before crawling — and one of the easiest files to get subtly wrong.
The short definition
A robots.txt file is a set of instructions for automated clients (crawlers, bots, spiders). It lives at one fixed location — the root of your domain — and uses a simple, line-based format defined by the Robots Exclusion Protocol (standardized in 2022 as RFC 9309).
User-agent: *
Disallow: /admin/
Allow: /
Sitemap: https://example.com/sitemap.xmlIn plain English, that file says: “every crawler may access the whole site, except anything under /admin/, and the sitemap is over here.”
What robots.txt actually controls
robots.txt controls crawling — whether a well-behaved bot will request a URL. That's it. It is an access-management file, not a security or privacy mechanism.
- Tell search engines which directories to skip (e.g. internal search, faceted filters, admin areas).
- Point crawlers to your XML sitemap.
- Allow or block AI crawlers like GPTBot, ClaudeBot, and PerplexityBot.
- Reduce wasted crawling on low-value URLs (helping crawl budget on large sites).
What robots.txt does NOT do
robots.txt is not a security control
- It does not guarantee a page stays out of Google. A disallowed URL can still be indexed (without its content) if other pages link to it. Use a noindex meta tag or header to keep pages out of the index.
- It does not stop malicious bots. Disreputable scrapers simply ignore it.
- It does not remove already-indexed pages by itself.
This distinction trips up almost everyone. If your goal is to keep a page out of search results, see our robots.txt for SEO guide, which covers when to use robots.txt versus noindex.
The anatomy of a rule
robots.txt is organized into groups. Each group starts with one or more User-agent lines (which crawler the rules apply to) followed by Allow and Disallow rules (which paths).
User-agent: Googlebot
Disallow: /private/
User-agent: GPTBot
Disallow: /For a directive-by-directive reference, read robots.txt syntax. To see exactly how a crawler chooses which group applies, read how robots.txt works.
Common misconceptions
“Disallow hides my page from Google.”
It blocks crawling, not indexing. Use noindex to remove a page from results.
“I need robots.txt for security.”
It's public and advisory. Use authentication for anything sensitive.
“Blocking CSS/JS speeds up my site.”
Disallowing assets stops Google from rendering your pages correctly and can hurt rankings.
Does every website need a robots.txt file?
No. If you're happy for crawlers to access everything, you can omit it — a missing robots.txt is treated as “allow all.” But most sites benefit from one to declare a sitemap and control AI crawlers.
What happens if I don't have a robots.txt file?
Crawlers receive a 404 and assume they may crawl the entire site. Nothing breaks, but you lose the chance to point them to your sitemap or restrict low-value paths.
Is robots.txt the same as a robots meta tag?
No. robots.txt controls crawling at the site level (in a file). The robots meta tag (and X-Robots-Tag header) controls indexing at the page level. To keep a page out of Google, use the meta tag, not robots.txt.
Where is the robots.txt file located?
Always at the root of the host: https://example.com/robots.txt. A robots.txt in a subdirectory is ignored. Each subdomain needs its own.
Robots.txt Generator
Build a valid robots.txt from presets and crawler toggles — no syntax required.
Robots.txt Studio Editorial · Technical SEO & crawling
We build robots.txt tooling and parse thousands of real-world files. Guides are written by practitioners and reviewed against the Google and RFC 9309 specifications.