robots.txt vs noindex
This is the single most consequential robots.txt distinction. robots.txt controls crawling; noindex controls indexing. They solve different problems, and using the wrong one — or both together — is how pages end up stuck in Google with no description.
Crawling vs indexing
Crawling is a bot fetching a URL. Indexing is a search engine storing that URL so it can appear in results. robots.txt Disallow stops the fetch. The noindex directive (a meta tag or X-Robots-Tag header) tells engines not to store the page even if they do fetch it. Because they act at different stages, they are not interchangeable.
| robots.txt Disallow | noindex | |
|---|---|---|
| Acts on | Crawling (the fetch) | Indexing (the storage) |
| Where it lives | /robots.txt | Meta tag or HTTP header on the page |
| Keeps a page out of Google? | No — can still be indexed URL-only | Yes |
| Saves crawl budget? | Yes | No (page is still crawled) |
| Right for AI crawlers? | Yes | Not respected by most AI bots |
The trap: blocking and noindex together
Disallow hides your noindex
This is the most common way teams accidentally keep unwanted pages in Google. The fix is counterintuitive: remove the Disallow, keep the noindex, let Google crawl the page once, see the noindex, and drop it.
Robots.txt AnalyzerAnalyze a siteWhich one should you use?
- Want a page out of search results → use noindex, and allow crawling so it's seen.
- Want to stop crawlers wasting budget on a section (internal search, filters) → use robots.txt Disallow.
- Want to keep content out of AI training → use robots.txt Disallow for the AI user-agents (noindex won't help).
- Want a page gone immediately and it's sensitive → use authentication or a removal request, not robots.txt.
Examples
Deindex a thank-you page (allow crawl, add noindex):
<!-- In the page <head> -->
<meta name="robots" content="noindex">Or via HTTP header (good for non-HTML like PDFs):
X-Robots-Tag: noindexStop crawlers spending budget on internal search (crawl control, not deindex):
User-agent: *
Disallow: /searchCommon mistakes
Using Disallow to remove a page from Google
It blocks crawling, not indexing. The URL can still appear. Use noindex.
Disallowing a page that has noindex
Google can't see the noindex if it can't crawl the page. Allow the crawl.
Trusting noindex for AI bots
Most AI crawlers act on robots.txt, not noindex. Use Disallow for them.
Using robots.txt for private data
Listed paths are public. Use authentication for anything sensitive.
What is the difference between robots.txt and noindex?
robots.txt Disallow controls crawling — whether a bot fetches a URL. noindex controls indexing — whether a search engine stores the page in its results. robots.txt does not remove pages from Google; noindex does.
Should I use robots.txt or noindex to remove a page from Google?
Use noindex, and make sure the page is not blocked in robots.txt so Google can crawl it and see the tag. A robots.txt Disallow can actually keep a URL in the index without a description.
Why is my noindex page still in Google?
Most often because the page is also blocked in robots.txt, so Google never crawls it and never sees the noindex. Remove the Disallow, keep the noindex, and let Google recrawl.
Can I use robots.txt and noindex together?
Generally no — they conflict. Blocking a page in robots.txt prevents Google from reading its noindex. Use one or the other based on whether your goal is crawl control (robots.txt) or de-indexing (noindex).
Robots.txt Analyzer
Fetch and audit any site's live robots.txt in one report.
Robots.txt Studio Editorial · Technical SEO & crawling
We build robots.txt tooling and parse thousands of real-world files. Guides are written by practitioners and reviewed against the Google and RFC 9309 specifications.