How to Read robots.txt

A robots.txt file can look cryptic, but it follows a predictable structure. Once you can spot the groups and apply the precedence rule, you can read any file — yours or a competitor's — in under a minute.

RSRobots.txt Studio Editorial Updated June 8, 2026 Reviewed against Google Search Central and RFC 9309

A worked example

We'll read this top to bottom

User-agent: *
Disallow: /search
Disallow: /*?
Allow: /search/about

User-agent: GPTBot
Disallow: /

Sitemap: https://example.com/sitemap.xml

There are two groups and one sitemap. The first group applies to all crawlers; the second applies only to GPTBot. Let's decode each.

Step 1 — Find the groups

Scan for User-agent lines. Each one starts a group. Everything until the next User-agent (ignoring blank lines and comments) belongs to that group.

Group 1 — User-agent: * → the default rules for any crawler without its own group.
Group 2 — User-agent: GPTBot → rules just for OpenAI's GPTBot.
Sitemap is file-level — it applies regardless of group.

Step 2 — Pick the group for your crawler

A crawler uses the single most specific matching group. GPTBot uses group 2. Googlebot has no named group, so it uses the wildcard group 1. This is the step people skip — and why they misread files.

Named groups don't inherit the wildcard

Because GPTBot has its own group, it ignores group 1 entirely. Its only rule is Disallow: / — it's blocked from everything.

Step 3 — Apply the rules to a URL

Within the chosen group, find every Allow/Disallow that matches the URL, then keep the one with the longest path (Allow wins ties). For Googlebot reading group 1:

/search → matches Disallow: /search → blocked.
/search/about → matches both Disallow: /search and Allow: /search/about; the Allow is longer → allowed.
/products?color=red → matches Disallow: /*? → blocked (query strings).
/about → matches nothing → allowed by default.

That longest-match logic is covered in depth in how robots.txt works. To verify your reading of any specific URL, use the URL Tester.

The shortcut

Reading by hand is a great skill, but you don't have to. The Explainer turns any robots.txt into plain-English sentences — per group and per AI crawler — so you can confirm your interpretation instantly.

Frequently asked questions

How do I know which rules apply to a specific crawler?

Find the group whose User-agent best matches the crawler's name; if none match, use the User-agent: * group. A crawler only ever follows one group — it does not combine a named group with the wildcard group.

What does an empty Disallow line mean?

Disallow: with no value places no restriction — it effectively allows everything. It's the opposite of Disallow: /, which blocks everything.

How are Allow and Disallow conflicts resolved?

The rule with the longest matching path wins. If an Allow and a Disallow match the same length, the Allow (least restrictive) wins.

Do comments affect how a file is read?

No. Everything after # is ignored by crawlers. Comments only help humans understand the file.

Robots.txt Explainer

Read any robots.txt in plain English, including AI crawler impact.

Explain a file

Robots.txt Explainer

Plain-English breakdown of any file.

Read

How robots.txt works

The matching algorithm.

Read

robots.txt examples

Read real-world files.

Read

Next uprobots.txt Examples

Robots.txt Studio Editorial · Technical SEO & crawling

We build robots.txt tooling and parse thousands of real-world files. Guides are written by practitioners and reviewed against the Google and RFC 9309 specifications.