GPTBot and AI Crawlers
GPTBot is OpenAI's web crawler — one of a growing list of AI-specific crawlers (PerplexityBot, ClaudeBot, Google-Extended, Applebot-Extended, etc.) that fetch content for training and live retrieval by AI assistants.
Why it matters
If your robots.txt blocks these crawlers, those AI platforms can't cite you. Many sites have inadvertently blocked AI crawlers as a side effect of broad scraping defences. Allow the AI bots that cite, block the ones that only train (or accept the trade and allow both) — but make the choice deliberately.
The major AI crawlers
- GPTBot, ChatGPT-User, OAI-SearchBot — OpenAI / ChatGPT.
- PerplexityBot, Perplexity-User — Perplexity.
- ClaudeBot, anthropic-ai, Claude-Web — Anthropic / Claude.
- Google-Extended — Google AI products (Gemini, AI Overviews, AI Mode).
- Applebot-Extended — Apple Intelligence.
- Cohere-AI, meta-externalagent, DuckAssistBot, Bingbot, Amazonbot — others.
Training crawlers vs citation crawlers
Two functionally different jobs:
- Training crawlers — pull content into model training pipelines. The brand gets no direct attribution.
- Citation crawlers — pull content for live answer-engine retrieval. The brand often gets cited or linked when the AI answers.
Some crawlers do both. Most brands choose to allow citation crawlers (immediate AEO upside) and may or may not allow training crawlers (longer-term, no attribution).
Configuring robots.txt
Use explicit per-bot User-agent blocks rather than a single wildcard, so you can change posture per crawler over time without rewriting the whole file. A wide-open Allow: / works but doesn't signal intent — and some platforms look for explicit allow rules.
Related terms


