robots.txt Syntax
robots.txt has a small vocabulary: a handful of directives, two wildcard characters, and a grouping rule. This page is a precise reference for each one, with the gotchas that cause real outages.
The directives
| Directive | Purpose |
|---|---|
User-agent | Starts a group; names the crawler the following rules apply to (* = all). |
Disallow | A path prefix the crawler must not request. Empty value = allow everything. |
Allow | A path prefix that overrides a broader Disallow (used for exceptions). |
Sitemap | Absolute URL of an XML sitemap. File-level, not tied to a group. |
Crawl-delay | Seconds between requests. Honored by Bing/Yandex; ignored by Google. |
Groups and User-agent
Rules are organized into groups. A group is one or more User-agent lines followed by Allow/Disallow rules. Consecutive User-agent lines share the same rule block.
User-agent: GPTBot
User-agent: CCBot
Disallow: /Path rules: Allow and Disallow
Paths are matched as prefixes from the start of the URL path. A trailing slash matters: /admin matches /administrator, while /admin/ matches only things inside that directory.
User-agent: *
Disallow: /tmp/
Disallow: /*.json$
Allow: /tmp/public/Empty Disallow means allow-all
Wildcards: * and $
- * matches any sequence of characters, including none. /private*/ matches /private/ and /private-data/.
- $ anchors to the end of the path. /*.pdf$ matches URLs that end in .pdf.
- These are the only two special characters. Everything else is literal.
Syntax mistakes that break files
Missing the colon
Disallow /admin (no colon) is not a valid directive and is ignored.
Relative sitemap URL
Sitemap: /sitemap.xml is invalid — it must be a full absolute URL.
Crawl-delay: 0
Has no effect and Google ignores Crawl-delay entirely. Use Search Console crawl settings instead.
Catch all of these automatically with the Validator.
Is robots.txt case sensitive?
User-agent names are matched case-insensitively. Directive names are case-insensitive too. But URL paths in Allow/Disallow are case-sensitive: /Admin and /admin are different paths.
What does Disallow: / mean?
It blocks the entire site for the matching crawler — every path. Use it deliberately (for example, to block an AI crawler).
What does the * wildcard do in robots.txt?
It matches any run of characters within a path, so /*.pdf matches any URL containing a .pdf segment. It's supported by major crawlers but is technically an extension to the original standard.
What is the $ symbol in robots.txt?
It anchors a pattern to the end of the URL. /page$ matches /page exactly but not /page/sub or /page2.
Robots.txt Validator
Catch syntax errors and best-practice issues, with a health score.
Robots.txt Studio Editorial · Technical SEO & crawling
We build robots.txt tooling and parse thousands of real-world files. Guides are written by practitioners and reviewed against the Google and RFC 9309 specifications.
Comments and Sitemap
Anything after a # is a comment (whole-line or inline). The Sitemap directive takes a full absolute URL and can appear multiple times.
For where the Sitemap line should go and how to list several, see robots.txt sitemap.