Bytespider in robots.txt

Bytespider is ByteDance's crawler. Here's exactly what it does, whether it respects robots.txt, and the rules to control it. To apply a policy in one click, use the AI Crawler Manager.

RSRobots.txt Studio Editorial Updated June 8, 2026 Reviewed against Google Search Central and RFC 9309

What Bytespider is

Bytespider is ByteDance's web crawler, used to gather training data for its AI products (the company behind TikTok). It is known for high-volume crawling and a patchy record of honoring robots.txt, so many sites block it to save bandwidth.

PropertyValue
User-agentBytespider
OperatorByteDance
CategoryAI training crawlers
Honors robots.txtPartial
Affects search rankingNo

What Bytespider does

  • Crawls large volumes of pages to collect AI training data for ByteDance.
  • Has been widely reported as one of the most aggressive AI crawlers by request volume.

Why site owners care

  • Heavy crawl volume can meaningfully increase server load and bandwidth costs.
  • Reports indicate inconsistent robots.txt compliance, so a rule may not fully stop it.
  • Blocking it keeps your content out of ByteDance's AI training.
Robots.txt AnalyzerAnalyze a site

How to allow or block Bytespider

Add a group targeting the Bytespider user-agent. Disallow: / blocks it from your whole site; an empty Disallow: allows it.

Block Bytespider
User-agent: Bytespider
Disallow: /
Allow Bytespider
User-agent: Bytespider
Disallow:

No effect on search ranking

Bytespider is an AI training crawler, not a search engine — blocking it does not affect rankings.

How to verify Bytespider

There is no widely published reverse-DNS verification method; match the Bytespider user-agent and watch request IPs and volume in your logs. Because compliance is imperfect, server-level blocking (firewall / WAF) may be needed in addition to robots.txt.

Does it honor robots.txt?

ByteDance states Bytespider honors robots.txt, but third-party reports describe cases where it crawled despite disallow rules. Verify behavior on your own logs.

Recommendation

Recommended: Block

High-volume training crawler with imperfect compliance — most sites block it to protect bandwidth and opt out of training.
Frequently asked questions
Does Bytespider respect robots.txt?

Officially yes, but multiple site operators have reported Bytespider crawling pages that robots.txt disallowed. If a robots.txt rule doesn't stop it on your logs, block it at the server or WAF level as well.

Why is Bytespider crawling my site so much?

Bytespider is one of the highest-volume AI training crawlers. On larger sites it can generate substantial request traffic, which is why many operators block it to control bandwidth.

AI Crawler Manager

Allow or block GPTBot, ClaudeBot, PerplexityBot and more in one place.

Manage AI crawlers
Related resources
Next upCrawler Directory
RS

Robots.txt Studio Editorial · Technical SEO & crawling

We build robots.txt tooling and parse thousands of real-world files. Guides are written by practitioners and reviewed against the Google and RFC 9309 specifications.