How to Read robots.txt
A robots.txt file can look cryptic, but it follows a predictable structure. Once you can spot the groups and apply the precedence rule, you can read any file — yours or a competitor's — in under a minute.
A worked example
User-agent: *
Disallow: /search
Disallow: /*?
Allow: /search/about
User-agent: GPTBot
Disallow: /
Sitemap: https://example.com/sitemap.xmlThere are two groups and one sitemap. The first group applies to all crawlers; the second applies only to GPTBot. Let's decode each.
Step 1 — Find the groups
Scan for User-agent lines. Each one starts a group. Everything until the next User-agent (ignoring blank lines and comments) belongs to that group.
- Group 1 — User-agent: * → the default rules for any crawler without its own group.
- Group 2 — User-agent: GPTBot → rules just for OpenAI's GPTBot.
- Sitemap is file-level — it applies regardless of group.
Step 2 — Pick the group for your crawler
A crawler uses the single most specific matching group. GPTBot uses group 2. Googlebot has no named group, so it uses the wildcard group 1. This is the step people skip — and why they misread files.
Named groups don't inherit the wildcard
Step 3 — Apply the rules to a URL
Within the chosen group, find every Allow/Disallow that matches the URL, then keep the one with the longest path (Allow wins ties). For Googlebot reading group 1:
- /search → matches Disallow: /search → blocked.
- /search/about → matches both Disallow: /search and Allow: /search/about; the Allow is longer → allowed.
- /products?color=red → matches Disallow: /*? → blocked (query strings).
- /about → matches nothing → allowed by default.
That longest-match logic is covered in depth in how robots.txt works. To verify your reading of any specific URL, use the URL Tester.
The shortcut
Reading by hand is a great skill, but you don't have to. The Explainer turns any robots.txt into plain-English sentences — per group and per AI crawler — so you can confirm your interpretation instantly.
How do I know which rules apply to a specific crawler?
Find the group whose User-agent best matches the crawler's name; if none match, use the User-agent: * group. A crawler only ever follows one group — it does not combine a named group with the wildcard group.
What does an empty Disallow line mean?
Disallow: with no value places no restriction — it effectively allows everything. It's the opposite of Disallow: /, which blocks everything.
How are Allow and Disallow conflicts resolved?
The rule with the longest matching path wins. If an Allow and a Disallow match the same length, the Allow (least restrictive) wins.
Do comments affect how a file is read?
No. Everything after # is ignored by crawlers. Comments only help humans understand the file.
Robots.txt Explainer
Read any robots.txt in plain English, including AI crawler impact.
Robots.txt Studio Editorial · Technical SEO & crawling
We build robots.txt tooling and parse thousands of real-world files. Guides are written by practitioners and reviewed against the Google and RFC 9309 specifications.