robots.txt Format

“Format” is about the shape of the file rather than the meaning of each rule: how groups are structured, what encoding to use, and what a clean, valid robots.txt looks like end to end.

RSRobots.txt Studio Editorial Updated June 8, 2026 Reviewed against Google Search Central and RFC 9309

The overall shape

A robots.txt file is a sequence of groups plus some file-level directives (Sitemap, comments). Each group is a block of one-directive-per-line text. Blank lines separate groups for readability but aren't required.

A well-formatted file
# 1. file-level comment
User-agent: *          # start of the default group
Allow: /
Disallow: /admin/

User-agent: GPTBot     # a crawler-specific group
Disallow: /

Sitemap: https://example.com/sitemap.xml

Formatting rules that matter

  • One directive per line, in the form Field: value.
  • Field names are case-insensitive (User-agent, user-agent, USER-AGENT all work).
  • Group order doesn't change behavior — each crawler picks its most specific group regardless of position.
  • Rule order within a group doesn't matter either; precedence is by match length, not line order.
  • Sitemap and comments can appear anywhere; Sitemap is always file-level.

Encoding and delivery

  • Encode as UTF-8. Avoid a leading byte-order mark (BOM).
  • Serve with Content-Type: text/plain and a 200 status.
  • Either LF or CRLF line endings are fine.
  • Keep it under 500 KiB — Google ignores anything past that limit.

Beware the HTML error page

If /robots.txt returns your site's HTML 404 page with a 200 status, crawlers see invalid content and may ignore your rules. The Analyzer flags this as an invalid content type.

A safe default template

Allow everything + declare a sitemap
User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

That's the safest starting point for most sites. Build a tailored one — with the right disallows and AI-crawler rules — in the Generator, then confirm it parses cleanly in the Validator.

Formatting mistakes

  • Multiple directives on one line

    User-agent: * Disallow: / is invalid; each directive needs its own line.

  • Indented directives

    Leading whitespace can confuse some parsers; keep directives flush-left.

  • Smart quotes / rich text

    Save as plain text, not from a word processor that inserts curly quotes.

Frequently asked questions
What should a robots.txt file look like?

A series of groups, each beginning with one or more User-agent lines followed by Allow/Disallow rules, plus a Sitemap line. One directive per line, plain UTF-8 text. See the template above for a safe default.

Does the order of rules matter in robots.txt?

No. Crawlers select the most specific user-agent group and then apply the longest-matching path rule — neither depends on the order lines appear in the file.

Can robots.txt have comments?

Yes. Anything after a # is a comment, on its own line or after a directive. Comments are ignored by crawlers but help maintainers.

Where does the Sitemap line go?

Anywhere in the file — it's file-level, not part of any group. Putting it at the top or bottom is purely a style choice.

Robots.txt Validator

Catch syntax errors and best-practice issues, with a health score.

Validate your file
Related resources
Next upHow to Find robots.txt
RS

Robots.txt Studio Editorial · Technical SEO & crawling

We build robots.txt tooling and parse thousands of real-world files. Guides are written by practitioners and reviewed against the Google and RFC 9309 specifications.