Bytespider in robots.txt

Bytespider is ByteDance's crawler. Here's exactly what it does, whether it respects robots.txt, and the rules to control it. To apply a policy in one click, use the AI Crawler Manager.

RSRobots.txt Studio Editorial Updated June 8, 2026 Reviewed against Google Search Central and RFC 9309

AI Crawler ManagerManage AI crawlers

What Bytespider is

Bytespider is ByteDance's web crawler, used to gather training data for its AI products (the company behind TikTok). It is known for high-volume crawling and a patchy record of honoring robots.txt, so many sites block it to save bandwidth.

Property	Value
User-agent	`Bytespider`
Operator	ByteDance
Category	AI training crawlers
Honors robots.txt	Partial
Affects search ranking	No

What Bytespider does

Crawls large volumes of pages to collect AI training data for ByteDance.
Has been widely reported as one of the most aggressive AI crawlers by request volume.

Why site owners care

Heavy crawl volume can meaningfully increase server load and bandwidth costs.
Reports indicate inconsistent robots.txt compliance, so a rule may not fully stop it.
Blocking it keeps your content out of ByteDance's AI training.

Robots.txt AnalyzerAnalyze a site

How to allow or block Bytespider

Add a group targeting the Bytespider user-agent. Disallow: / blocks it from your whole site; an empty Disallow: allows it.

Block Bytespider

User-agent: Bytespider
Disallow: /

Allow Bytespider

User-agent: Bytespider
Disallow:

No effect on search ranking

Bytespider is an AI training crawler, not a search engine — blocking it does not affect rankings.

How to verify Bytespider

There is no widely published reverse-DNS verification method; match the Bytespider user-agent and watch request IPs and volume in your logs. Because compliance is imperfect, server-level blocking (firewall / WAF) may be needed in addition to robots.txt.

Does it honor robots.txt?

ByteDance states Bytespider honors robots.txt, but third-party reports describe cases where it crawled despite disallow rules. Verify behavior on your own logs.

Recommendation

Recommended: Block

High-volume training crawler with imperfect compliance — most sites block it to protect bandwidth and opt out of training.

Frequently asked questions

Does Bytespider respect robots.txt?

Officially yes, but multiple site operators have reported Bytespider crawling pages that robots.txt disallowed. If a robots.txt rule doesn't stop it on your logs, block it at the server or WAF level as well.

Why is Bytespider crawling my site so much?

Bytespider is one of the highest-volume AI training crawlers. On larger sites it can generate substantial request traffic, which is why many operators block it to control bandwidth.

AI Crawler Manager

Allow or block GPTBot, ClaudeBot, PerplexityBot and more in one place.

Manage AI crawlers

AI Crawler Directory

Every AI crawler compared.

Read

Block GPTBot

OpenAI's training crawler.

Read

CCBot

Common Crawl — ai training.

Read

Amazonbot

Amazon — data.

Read

Crawler Directory

Every crawler, compared.

Read

Next upCrawler Directory

Robots.txt Studio Editorial · Technical SEO & crawling

We build robots.txt tooling and parse thousands of real-world files. Guides are written by practitioners and reviewed against the Google and RFC 9309 specifications.