Developer Experience

GPTBot and AI Crawlers

GPTBot is OpenAI's web crawler — one of a growing list of AI-specific crawlers (PerplexityBot, ClaudeBot, Google-Extended, Applebot-Extended, etc.) that fetch content for training and live retrieval by AI assistants.

Why it matters

If your robots.txt blocks these crawlers, those AI platforms can't cite you. Many sites have inadvertently blocked AI crawlers as a side effect of broad scraping defences. Allow the AI bots that cite, block the ones that only train (or accept the trade and allow both) — but make the choice deliberately.

The major AI crawlers

GPTBot, ChatGPT-User, OAI-SearchBot — OpenAI / ChatGPT.
PerplexityBot, Perplexity-User — Perplexity.
ClaudeBot, anthropic-ai, Claude-Web — Anthropic / Claude.
Google-Extended — Google AI products (Gemini, AI Overviews, AI Mode).
Applebot-Extended — Apple Intelligence.
Cohere-AI, meta-externalagent, DuckAssistBot, Bingbot, Amazonbot — others.

Training crawlers vs citation crawlers

Two functionally different jobs:

Training crawlers — pull content into model training pipelines. The brand gets no direct attribution.
Citation crawlers — pull content for live answer-engine retrieval. The brand often gets cited or linked when the AI answers.

Some crawlers do both. Most brands choose to allow citation crawlers (immediate AEO upside) and may or may not allow training crawlers (longer-term, no attribution).

Configuring robots.txt

Use explicit per-bot User-agent blocks rather than a single wildcard, so you can change posture per crawler over time without rewriting the whole file. A wide-open Allow: / works but doesn't signal intent — and some platforms look for explicit allow rules.

Related terms

Large Language Model(LLM)A neural network trained on massive text corpora that can understand and generate human language, typically with billions of parameters.Answer Engine Optimisation(AEO)Answer Engine Optimisation (AEO) is the practice of making a brand discoverable, accurately described, and frequently cited by AI assistants like ChatGPT, Gemini, Perplexity, and Google's AI Overviews.AI VisibilityAI visibility is the measurable presence and accuracy of a brand inside AI-assistant responses — covering how often it's mentioned, in what tone, with what facts, and against which competitors.AI CitationAn AI citation is a single instance of an AI assistant mentioning a brand by name (or linking to its content) in a generated answer — the unit of measurement that AEO platforms aggregate into visibility scores.ChatGPTChatGPT is OpenAI's conversational AI assistant — the consumer product that ignited the generative AI category in November 2022 and reached 900 million weekly active users by February 2026.ClaudeClaude is Anthropic's family of large language models and the chat product that exposes them — built around safety-first design, longer context windows, and Constitutional AI alignment.PerplexityPerplexity is an AI-native answer engine that always cites its sources — built from the ground up around web search and inline citations, rather than retrofitting a chat product to add browsing.Microsoft CopilotMicrosoft Copilot is Microsoft's AI assistant family powered by GPT models and Bing search — embedded across Windows, Microsoft 365, GitHub, Edge, and as a standalone chat product at copilot.microsoft.com.

From our blog

AI Strategy10 min

GPTBot and AI Crawlers

The major AI crawlers

Training crawlers vs citation crawlers

Configuring robots.txt

We Looked Into the Markdown-for-AI Theory. The Data Wasn't Kind.

llms.txt Explained: Should Your Website Have One?

How to Get Cited by ChatGPT for Your Business