robots.txt for SEO: Best Practices
robots.txt is a precision instrument in technical SEO — powerful for crawl control, dangerous when misused for indexing. This guide covers the practices that help, the traps that hurt, and a checklist to run before every deploy.
Crawling vs indexing — the core distinction
This is the most important concept in robots.txt SEO. robots.txt controls crawling (whether a bot fetches a URL). It does not control indexing (whether a URL appears in results).
Disallow does not equal noindex
| Goal | Use |
|---|---|
| Stop a page appearing in Google | noindex meta tag / X-Robots-Tag (allow crawling) |
| Stop crawlers fetching a section | robots.txt Disallow |
| Keep content out of AI datasets | robots.txt Disallow for AI user-agents |
Best practices that help SEO
- Disallow infinite/low-value URL spaces: internal search, faceted filters, session parameters.
- Always declare your sitemap with an absolute https URL.
- Keep crawler-specific groups (e.g. AI bots) explicit so policy is intentional, not inherited.
- Re-audit after migrations and CMS changes — replatforming often regenerates robots.txt.
- Prefer canonical tags and parameter handling for duplicate content; use robots.txt for crawl-budget control.
Anti-patterns that hurt SEO
Blocking CSS and JavaScript
Google renders pages to evaluate them. Disallowing assets produces broken renders and can lower rankings.
Using Disallow to remove pages
It prevents crawling, not indexing — and blocks Google from seeing your noindex. Allow crawling + noindex instead.
Blocking pages that have inbound links
They can still appear as URL-only results. Use noindex.
Leaving Disallow: / in production
The classic staging leak that deindexes a whole site.
AI crawlers and SEO are separate decisions
Blocking GPTBot or Google-Extended has no effect on your Google Search ranking — those are distinct from Googlebot. You can fully opt out of AI training while remaining 100% visible in search.
See why robots.txt is important for the strategic view, and manage it in the AI Crawler Manager.
Pre-deploy checklist
- No accidental Disallow: / in the production file.
- CSS/JS and image directories are crawlable.
- Sitemap line present, absolute, https, and resolves.
- AI crawler policy is intentional (explicit allow or block).
- Validated for syntax, then spot-tested on 2–3 real URLs.
Run the live file through the Analyzer for a health score and findings, and verify key URLs in the URL Tester.
Should I block a page with robots.txt or noindex?
If you want it out of Google, use noindex and let Google crawl the page so it can see the tag. Use robots.txt Disallow only when you want to prevent crawling (e.g. crawl-budget control or AI bots), not to control indexing.
Does Disallow remove a page from Google?
No. It stops crawling, but the URL can still be indexed without content if it's linked elsewhere. To remove it, allow crawling and add noindex, or use a removal request.
Should I disallow CSS and JavaScript?
No. Google needs to render pages with their CSS and JS. Blocking assets leads to incomplete rendering and can harm rankings.
Is robots.txt good or bad for SEO?
It's good when used for crawl control and neutral-to-helpful for rankings. It becomes bad only when misused — for example, blocking assets or using Disallow where noindex is needed.
Robots.txt Analyzer
Fetch and audit any site's live robots.txt in one report.
Robots.txt Studio Editorial · Technical SEO & crawling
We build robots.txt tooling and parse thousands of real-world files. Guides are written by practitioners and reviewed against the Google and RFC 9309 specifications.