The robots.txt File
The robots.txt file is a single UTF-8 text file served from /robots.txt at your domain root. This guide covers exactly what belongs in it, what a healthy file looks like, and how to create one whether you run WordPress, a static site, or a custom app.
Where the file must live
Crawlers only check one location per host: the root. The file must be reachable at https://yourdomain.com/robots.txt and returned with a 2xx status and a text/plain content type.
- https://example.com/robots.txt — correct.
- https://example.com/blog/robots.txt — ignored; subdirectory files don't count.
- https://blog.example.com/robots.txt — required separately; each subdomain has its own.
- http vs https and www vs non-www are different hosts — each needs its own file (usually via redirects to the canonical host).
What goes in a robots.txt file
A complete, healthy file usually contains four things:
- A default group (User-agent: *) with any site-wide disallows.
- Optional crawler-specific groups (e.g. blocking AI training bots).
- A Sitemap line pointing to your XML sitemap.
- Comments (lines starting with #) explaining non-obvious rules.
# Default policy
User-agent: *
Allow: /
Disallow: /cart/
Disallow: /checkout/
# Keep AI training crawlers out
User-agent: GPTBot
Disallow: /
Sitemap: https://example.com/sitemap.xmlNot sure what to disallow for your platform? The examples page has copy-paste files for blogs, WordPress, and ecommerce.
How to create and upload it
- Static / custom sites: create a file named robots.txt and place it in your public/web root. Next.js, Astro, and most frameworks serve files from a public directory or a generated route.
- WordPress: a virtual robots.txt exists by default; edit it via an SEO plugin (Yoast, Rank Math) or upload a physical file to the web root to override it.
- Shopify: robots.txt is generated automatically; customize it with the robots.txt.liquid template.
Generate it instead of hand-writing
Always validate before you ship
A single misplaced slash can deindex a site. After creating the file, validate the syntax and test a few real URLs before deploying.
Run it through the robots.txt checker and confirm a key page resolves the way you expect in the URL Tester.
Common file mistakes
Wrong location
Placing robots.txt anywhere but the root means it's never read.
Wrong content type
Serving the file as text/html (e.g. a 200 HTML error page) makes crawlers ignore it.
BOM / wrong encoding
Save as UTF-8 without a byte-order mark; some parsers mishandle a leading BOM.
What should be in a robots.txt file?
At minimum, a User-agent: * group with any paths you want to disallow, and a Sitemap line. Add crawler-specific groups (for example, to block AI bots) as needed. If you want everything crawled, a file with just a sitemap is fine.
How do I create a robots.txt file?
Create a plain-text file named exactly robots.txt, add your rules, and place it at your web root so it's served from /robots.txt. The Generator can produce a valid file for you to download.
What file format and encoding should robots.txt use?
Plain text, UTF-8 encoded, served as text/plain. Line endings can be LF or CRLF. Avoid a leading byte-order mark (BOM).
Can a site have more than one robots.txt file?
One per host. Each subdomain (and each protocol/host variant) is treated separately and needs its own robots.txt at its root.
Robots.txt Generator
Build a valid robots.txt from presets and crawler toggles — no syntax required.
Robots.txt Studio Editorial · Technical SEO & crawling
We build robots.txt tooling and parse thousands of real-world files. Guides are written by practitioners and reviewed against the Google and RFC 9309 specifications.