How to Read robots.txt

A robots.txt file can look cryptic, but it follows a predictable structure. Once you can spot the groups and apply the precedence rule, you can read any file — yours or a competitor's — in under a minute.

RSRobots.txt Studio Editorial Updated June 8, 2026 Reviewed against Google Search Central and RFC 9309

A worked example

We'll read this top to bottom
User-agent: *
Disallow: /search
Disallow: /*?
Allow: /search/about

User-agent: GPTBot
Disallow: /

Sitemap: https://example.com/sitemap.xml

There are two groups and one sitemap. The first group applies to all crawlers; the second applies only to GPTBot. Let's decode each.

Step 1 — Find the groups

Scan for User-agent lines. Each one starts a group. Everything until the next User-agent (ignoring blank lines and comments) belongs to that group.

  • Group 1 — User-agent: * → the default rules for any crawler without its own group.
  • Group 2 — User-agent: GPTBot → rules just for OpenAI's GPTBot.
  • Sitemap is file-level — it applies regardless of group.

Step 2 — Pick the group for your crawler

A crawler uses the single most specific matching group. GPTBot uses group 2. Googlebot has no named group, so it uses the wildcard group 1. This is the step people skip — and why they misread files.

Named groups don't inherit the wildcard

Because GPTBot has its own group, it ignores group 1 entirely. Its only rule is Disallow: / — it's blocked from everything.

Step 3 — Apply the rules to a URL

Within the chosen group, find every Allow/Disallow that matches the URL, then keep the one with the longest path (Allow wins ties). For Googlebot reading group 1:

  • /search → matches Disallow: /search → blocked.
  • /search/about → matches both Disallow: /search and Allow: /search/about; the Allow is longer → allowed.
  • /products?color=red → matches Disallow: /*? → blocked (query strings).
  • /about → matches nothing → allowed by default.

That longest-match logic is covered in depth in how robots.txt works. To verify your reading of any specific URL, use the URL Tester.

The shortcut

Reading by hand is a great skill, but you don't have to. The Explainer turns any robots.txt into plain-English sentences — per group and per AI crawler — so you can confirm your interpretation instantly.

Frequently asked questions
How do I know which rules apply to a specific crawler?

Find the group whose User-agent best matches the crawler's name; if none match, use the User-agent: * group. A crawler only ever follows one group — it does not combine a named group with the wildcard group.

What does an empty Disallow line mean?

Disallow: with no value places no restriction — it effectively allows everything. It's the opposite of Disallow: /, which blocks everything.

How are Allow and Disallow conflicts resolved?

The rule with the longest matching path wins. If an Allow and a Disallow match the same length, the Allow (least restrictive) wins.

Do comments affect how a file is read?

No. Everything after # is ignored by crawlers. Comments only help humans understand the file.

Robots.txt Explainer

Read any robots.txt in plain English, including AI crawler impact.

Explain a file
Related resources
Next uprobots.txt Examples
RS

Robots.txt Studio Editorial · Technical SEO & crawling

We build robots.txt tooling and parse thousands of real-world files. Guides are written by practitioners and reviewed against the Google and RFC 9309 specifications.