Retrieval & Data

Topic Clustering

Topic clustering is the grouping of related prompts, AI answers, or content pieces into thematic clusters — used in AEO to consolidate prompt-level visibility data into actionable narratives about how AI describes a category.

Why it matters

Prompt-level data is too granular to act on; brand-aggregate data is too coarse. Topic clustering is the middle layer that turns 'we got mentioned 47 times across 200 prompts' into 'AI describes us as fast but expensive — here's the prompt cluster where price comes up.'

How it works

Embeddings convert each prompt and each AI response into a vector representation. A clustering algorithm (k-means, HDBSCAN, hierarchical clustering) groups vectors by semantic similarity. The resulting clusters represent themes — buyer-intent groups, feature-comparison groups, problem-discovery groups.

Why it beats pure aggregation

Without clustering, you have a flat list of prompts and answers. With clustering, you can ask:

  • Which themes drive most of our share-of-voice?
  • Which themes describe us most negatively?
  • Which themes do competitors dominate that we don't appear in at all?
  • How is theme distribution shifting over time?

These questions can't be answered from prompt-level data alone, and they're the questions that produce content priorities and PR strategy.

Practical considerations

  • Cluster granularity — too few clusters lose detail; too many fragment the data. 10-30 clusters is typical for category-level analysis.
  • Stability over time — re-clustering at every measurement run produces noise. Lock the cluster definitions and just classify new prompts into them.
  • Human labelling — clusters need human-readable names to be actionable. Auto-generated labels from LLMs work as a starting point but need editorial review.