robots.txt for SEO: Best Practices

robots.txt is a precision instrument in technical SEO — powerful for crawl control, dangerous when misused for indexing. This guide covers the practices that help, the traps that hurt, and a checklist to run before every deploy.

RSRobots.txt Studio Editorial Updated June 8, 2026 Reviewed against Google Search Central and RFC 9309

Crawling vs indexing — the core distinction

This is the most important concept in robots.txt SEO. robots.txt controls crawling (whether a bot fetches a URL). It does not control indexing (whether a URL appears in results).

Disallow does not equal noindex

A disallowed URL can still rank — as a bare link with no description — if other pages point to it. Worse, because Google can't crawl it, it can't see a noindex tag on that page.
GoalUse
Stop a page appearing in Googlenoindex meta tag / X-Robots-Tag (allow crawling)
Stop crawlers fetching a sectionrobots.txt Disallow
Keep content out of AI datasetsrobots.txt Disallow for AI user-agents

Best practices that help SEO

  • Disallow infinite/low-value URL spaces: internal search, faceted filters, session parameters.
  • Always declare your sitemap with an absolute https URL.
  • Keep crawler-specific groups (e.g. AI bots) explicit so policy is intentional, not inherited.
  • Re-audit after migrations and CMS changes — replatforming often regenerates robots.txt.
  • Prefer canonical tags and parameter handling for duplicate content; use robots.txt for crawl-budget control.

Anti-patterns that hurt SEO

  • Blocking CSS and JavaScript

    Google renders pages to evaluate them. Disallowing assets produces broken renders and can lower rankings.

  • Using Disallow to remove pages

    It prevents crawling, not indexing — and blocks Google from seeing your noindex. Allow crawling + noindex instead.

  • Blocking pages that have inbound links

    They can still appear as URL-only results. Use noindex.

  • Leaving Disallow: / in production

    The classic staging leak that deindexes a whole site.

AI crawlers and SEO are separate decisions

Blocking GPTBot or Google-Extended has no effect on your Google Search ranking — those are distinct from Googlebot. You can fully opt out of AI training while remaining 100% visible in search.

See why robots.txt is important for the strategic view, and manage it in the AI Crawler Manager.

Pre-deploy checklist

  1. No accidental Disallow: / in the production file.
  2. CSS/JS and image directories are crawlable.
  3. Sitemap line present, absolute, https, and resolves.
  4. AI crawler policy is intentional (explicit allow or block).
  5. Validated for syntax, then spot-tested on 2–3 real URLs.

Run the live file through the Analyzer for a health score and findings, and verify key URLs in the URL Tester.

Frequently asked questions
Should I block a page with robots.txt or noindex?

If you want it out of Google, use noindex and let Google crawl the page so it can see the tag. Use robots.txt Disallow only when you want to prevent crawling (e.g. crawl-budget control or AI bots), not to control indexing.

Does Disallow remove a page from Google?

No. It stops crawling, but the URL can still be indexed without content if it's linked elsewhere. To remove it, allow crawling and add noindex, or use a removal request.

Should I disallow CSS and JavaScript?

No. Google needs to render pages with their CSS and JS. Blocking assets leads to incomplete rendering and can harm rankings.

Is robots.txt good or bad for SEO?

It's good when used for crawl control and neutral-to-helpful for rankings. It becomes bad only when misused — for example, blocking assets or using Disallow where noindex is needed.

Robots.txt Analyzer

Fetch and audit any site's live robots.txt in one report.

Analyze a site
Related resources
Next uprobots.txt Checker
RS

Robots.txt Studio Editorial · Technical SEO & crawling

We build robots.txt tooling and parse thousands of real-world files. Guides are written by practitioners and reviewed against the Google and RFC 9309 specifications.