robots.txt Syntax

robots.txt has a small vocabulary: a handful of directives, two wildcard characters, and a grouping rule. This page is a precise reference for each one, with the gotchas that cause real outages.

RSRobots.txt Studio Editorial Updated June 8, 2026 Reviewed against Google Search Central and RFC 9309

The directives

DirectivePurpose
User-agentStarts a group; names the crawler the following rules apply to (* = all).
DisallowA path prefix the crawler must not request. Empty value = allow everything.
AllowA path prefix that overrides a broader Disallow (used for exceptions).
SitemapAbsolute URL of an XML sitemap. File-level, not tied to a group.
Crawl-delaySeconds between requests. Honored by Bing/Yandex; ignored by Google.

Groups and User-agent

Rules are organized into groups. A group is one or more User-agent lines followed by Allow/Disallow rules. Consecutive User-agent lines share the same rule block.

Both crawlers share one rule: block everything
User-agent: GPTBot
User-agent: CCBot
Disallow: /

Path rules: Allow and Disallow

Paths are matched as prefixes from the start of the URL path. A trailing slash matters: /admin matches /administrator, while /admin/ matches only things inside that directory.

Block /tmp/ and any .json file, but allow /tmp/public/
User-agent: *
Disallow: /tmp/
Disallow: /*.json$
Allow: /tmp/public/

Empty Disallow means allow-all

Disallow: (with no value) places no restriction. Disallow: / blocks the whole site. The single missing slash is the difference between “everything” and “nothing.”

Wildcards: * and $

  • * matches any sequence of characters, including none. /private*/ matches /private/ and /private-data/.
  • $ anchors to the end of the path. /*.pdf$ matches URLs that end in .pdf.
  • These are the only two special characters. Everything else is literal.

Comments and Sitemap

Anything after a # is a comment (whole-line or inline). The Sitemap directive takes a full absolute URL and can appear multiple times.

robots.txt
# Block faceted navigation
User-agent: *
Disallow: /*?color=  # query-parameter facets

Sitemap: https://example.com/sitemap.xml

For where the Sitemap line should go and how to list several, see robots.txt sitemap.

Syntax mistakes that break files

  • Missing the colon

    Disallow /admin (no colon) is not a valid directive and is ignored.

  • Relative sitemap URL

    Sitemap: /sitemap.xml is invalid — it must be a full absolute URL.

  • Crawl-delay: 0

    Has no effect and Google ignores Crawl-delay entirely. Use Search Console crawl settings instead.

Catch all of these automatically with the Validator.

Frequently asked questions
Is robots.txt case sensitive?

User-agent names are matched case-insensitively. Directive names are case-insensitive too. But URL paths in Allow/Disallow are case-sensitive: /Admin and /admin are different paths.

What does Disallow: / mean?

It blocks the entire site for the matching crawler — every path. Use it deliberately (for example, to block an AI crawler).

What does the * wildcard do in robots.txt?

It matches any run of characters within a path, so /*.pdf matches any URL containing a .pdf segment. It's supported by major crawlers but is technically an extension to the original standard.

What is the $ symbol in robots.txt?

It anchors a pattern to the end of the URL. /page$ matches /page exactly but not /page/sub or /page2.

Robots.txt Validator

Catch syntax errors and best-practice issues, with a health score.

Validate your file
Related resources
Next uprobots.txt Format
RS

Robots.txt Studio Editorial · Technical SEO & crawling

We build robots.txt tooling and parse thousands of real-world files. Guides are written by practitioners and reviewed against the Google and RFC 9309 specifications.