Bytespider in robots.txt
Bytespider is ByteDance's crawler. Here's exactly what it does, whether it respects robots.txt, and the rules to control it. To apply a policy in one click, use the AI Crawler Manager.
What Bytespider is
Bytespider is ByteDance's web crawler, used to gather training data for its AI products (the company behind TikTok). It is known for high-volume crawling and a patchy record of honoring robots.txt, so many sites block it to save bandwidth.
| Property | Value |
|---|---|
| User-agent | Bytespider |
| Operator | ByteDance |
| Category | AI training crawlers |
| Honors robots.txt | Partial |
| Affects search ranking | No |
What Bytespider does
- Crawls large volumes of pages to collect AI training data for ByteDance.
- Has been widely reported as one of the most aggressive AI crawlers by request volume.
Why site owners care
- Heavy crawl volume can meaningfully increase server load and bandwidth costs.
- Reports indicate inconsistent robots.txt compliance, so a rule may not fully stop it.
- Blocking it keeps your content out of ByteDance's AI training.
How to allow or block Bytespider
Add a group targeting the Bytespider user-agent. Disallow: / blocks it from your whole site; an empty Disallow: allows it.
User-agent: Bytespider
Disallow: /User-agent: Bytespider
Disallow:No effect on search ranking
How to verify Bytespider
There is no widely published reverse-DNS verification method; match the Bytespider user-agent and watch request IPs and volume in your logs. Because compliance is imperfect, server-level blocking (firewall / WAF) may be needed in addition to robots.txt.
Does it honor robots.txt?
Recommendation
Recommended: Block
Does Bytespider respect robots.txt?
Officially yes, but multiple site operators have reported Bytespider crawling pages that robots.txt disallowed. If a robots.txt rule doesn't stop it on your logs, block it at the server or WAF level as well.
Why is Bytespider crawling my site so much?
Bytespider is one of the highest-volume AI training crawlers. On larger sites it can generate substantial request traffic, which is why many operators block it to control bandwidth.
AI Crawler Manager
Allow or block GPTBot, ClaudeBot, PerplexityBot and more in one place.
Robots.txt Studio Editorial · Technical SEO & crawling
We build robots.txt tooling and parse thousands of real-world files. Guides are written by practitioners and reviewed against the Google and RFC 9309 specifications.