AI Glossary

Plain-language definitions for the AI terms that actually matter, with practical context on why each one is relevant.

A

An AI system that can autonomously plan, reason, and take actions to accomplish goals, often using tools and external APIs.

Agents & Orchestration

The challenge of ensuring AI systems act in accordance with human intentions and values — making them do what we actually want, not just what we literally ask for.

Evaluation & Safety

Systematic unfairness in AI outputs caused by skewed training data, flawed labelling, or model design choices that reflect and amplify existing societal inequities.

Evaluation & Safety

An AI citation is a single instance of an AI assistant mentioning a brand by name (or linking to its content) in a generated answer — the unit of measurement that AEO platforms aggregate into visibility scores.

AI Search & Discoverability

AI Orchestration

The coordination and management of multiple AI models, tools, data flows, and human inputs within a unified workflow to accomplish complex tasks that no single component could handle alone.

Agents & Orchestration

AI Overviews are Google's AI-generated answer summaries that appear above the traditional list of search results, launched broadly in May 2024 and rolled out across most query types since.

AI Search & Discoverability

AI SEO is an industry-popular term for the practice of optimising content to be cited by AI assistants — essentially a synonym for AEO and GEO, but framed as an extension of traditional SEO rather than a separate discipline.

AI Search & Discoverability

AI visibility is the measurable presence and accuracy of a brand inside AI-assistant responses — covering how often it's mentioned, in what tone, with what facts, and against which competitors.

AI Search & Discoverability

AI systems that can autonomously plan, reason, use tools, and execute multi-step tasks with minimal human oversight — going beyond simple question-answering to take actions on behalf of users.

Agents & Orchestration

AI systems that find patterns, anomalies, and insights in large datasets, used for tasks like fraud detection, medical imaging analysis, and business intelligence.

Models & Platforms

An answer engine is an AI system that responds to user questions with synthesised answers rather than a list of links — examples include ChatGPT, Perplexity, Gemini, Google AI Overviews, and Microsoft Copilot.

AI Search & Discoverability

Answer Engine Optimisation(AEO)

Answer Engine Optimisation (AEO) is the practice of making a brand discoverable, accurately described, and frequently cited by AI assistants like ChatGPT, Gemini, Perplexity, and Google's AI Overviews.

AI Search & Discoverability

Artificial Intelligence(AI)

A broad field of computer science focused on building systems that can perform tasks normally requiring human intelligence, such as understanding language, recognizing patterns, and making decisions.

Models & Platforms

C

Chain-of-Thought Reasoning(CoT)

A prompting technique that improves LLM accuracy on complex tasks by guiding the model to show its reasoning step-by-step before arriving at a final answer.

Developer Experience

ChatGPT is OpenAI's conversational AI assistant — the consumer product that ignited the generative AI category in November 2022 and reached 900 million weekly active users by February 2026.

Models & Platforms

The process of splitting documents into smaller, semantically meaningful segments optimised for embedding and retrieval in AI systems like RAG pipelines.

Retrieval & Data

Citation Authority

Citation authority is the ranked list of third-party domains that AI assistants actually quote when answering questions about a topic — the AI-search equivalent of a backlink profile.

AI Search & Discoverability

Claude is Anthropic's family of large language models and the chat product that exposes them — built around safety-first design, longer context windows, and Constitutional AI alignment.

Models & Platforms

Compound AI Systems

AI architectures that combine multiple models, retrievers, tools, and control logic to tackle tasks that no single model could reliably handle on its own.

Agents & Orchestration

The maximum amount of text (measured in tokens) that a language model can consider at once — including both the input prompt and the generated output.

Models & Platforms

Conversational AI

AI systems designed to engage in natural language dialogue with humans, ranging from simple chatbots with scripted responses to advanced assistants powered by large language models.

Interfaces & UX

An AI assistant embedded directly into a workflow tool (IDE, browser, email) that suggests actions, generates content, or automates tasks inline.

Interfaces & UX

Crawlability is whether — and how easily — automated agents (search crawlers, AI bots) can fetch and read the content of a website, governed by robots.txt, response codes, JavaScript rendering, paywalls, login walls, and rate limits.

Developer Experience

D

Data Privacy in AI

The set of concerns and practices around protecting personal and sensitive information when using AI systems, covering training data, user inputs, model outputs, and data retention.

Evaluation & Safety

Deep Learning(DL)

A subset of machine learning that uses neural networks with many layers to learn increasingly abstract representations of data, powering breakthroughs in language, vision, and generation.

Models & Platforms

Diffusion Model

A generative AI architecture that creates images, video, and other media by learning to gradually remove noise from random static until a coherent output emerges.

Models & Platforms

E

Dense numerical representations of text (or other data) in a high-dimensional vector space, where similar meanings are placed closer together.

Retrieval & Data

F

Few-Shot Prompting

A prompting technique where you include a few examples of the desired input-output format in your prompt, helping the model understand exactly what you want without any fine-tuning.

Developer Experience

The process of further training a pre-trained model on a smaller, task-specific dataset to improve its performance on that particular task or domain.

Models & Platforms

Function Calling

The mechanism that allows LLMs to interact with external tools and APIs by outputting structured data — typically JSON — specifying which function to invoke and with what parameters.

Agents & Orchestration

G

GPTBot and AI Crawlers

GPTBot is OpenAI's web crawler — one of a growing list of AI-specific crawlers (PerplexityBot, ClaudeBot, Google-Extended, Applebot-Extended, etc.) that fetch content for training and live retrieval by AI assistants.

Developer Experience

Gemini is Google's family of large language models and the conversational AI product that exposes them — also the engine behind Google's AI Overviews and AI Mode in Search.

Models & Platforms

Generative AI(GenAI)

Artificial intelligence that creates new content — text, images, video, audio, or code — by learning patterns from existing data and producing original outputs in response to prompts.

Models & Platforms

Generative Engine Optimization(GEO)

Generative Engine Optimization (GEO) is the academic name for AEO, formalised in a 2024 KDD paper by researchers at Princeton and IIT Delhi who tested which content modifications improve visibility in AI-generated answers.

AI Search & Discoverability

Google AI Mode is the full conversational AI search experience inside Google Search — an evolution of AI Overviews into a multi-turn chat interface that lets users follow up on complex questions.

Models & Platforms

Programmatic constraints placed around AI model inputs and outputs to prevent harmful, off-topic, or policy-violating behavior.

Evaluation & Safety

H

When an AI model generates information that sounds plausible but is factually incorrect, fabricated, or unsupported by its training data or provided context.

Evaluation & Safety

A retrieval approach that combines traditional keyword matching (BM25) with semantic vector search to capture both the precision of exact term matches and the contextual understanding of meaning-based search.

Retrieval & Data

K

Knowledge Cutoff

The date beyond which a language model has no training data, meaning it cannot know about events, discoveries, or changes that occurred after that point.

Models & Platforms

Knowledge Graph(KG)

A structured representation of information as a network of entities and their relationships, enabling machines to reason about connections between concepts.

Retrieval & Data

L

LLMO (Large Language Model Optimisation) is one of several names for the practice of getting cited by AI assistants — interchangeable with AEO, GEO, and AI SEO, but specifically frames the target as the underlying language model rather than the answer-engine product.

AI Search & Discoverability

Large Language Model(LLM)

A neural network trained on massive text corpora that can understand and generate human language, typically with billions of parameters.

Models & Platforms

llms.txt is an emerging standard for a plain-text file at the root of a website that summarises the site for AI assistants — analogous to robots.txt for crawlers or sitemap.xml for search engines.

Developer Experience

M

The set of practices and tools for deploying, monitoring, and maintaining machine learning models in production — essentially DevOps principles applied to the ML lifecycle.

Developer Experience

Machine Learning(ML)

A subset of AI where systems learn patterns from data rather than following explicitly programmed rules, improving their performance as they see more examples.

Models & Platforms

Microsoft Copilot

Microsoft Copilot is Microsoft's AI assistant family powered by GPT models and Bing search — embedded across Windows, Microsoft 365, GitHub, Edge, and as a standalone chat product at copilot.microsoft.com.

Models & Platforms

Mixture of Experts(MoE)

A neural network architecture that scales model capacity efficiently by routing each input through only a small subset of specialized sub-networks ("experts"), keeping compute costs manageable even as total model size grows.

Models & Platforms

Model Context Protocol(MCP)

An open standard for connecting AI assistants to external data sources and tools through a unified, composable interface.

Developer Experience

AI systems that can process, understand, and generate across multiple types of data — text, images, audio, video, and code — within a single model.

Models & Platforms

N

Next-Token Prediction

The core mechanism of large language models: given a sequence of text, predict the most likely next piece (token), then repeat to generate coherent text one token at a time.

Models & Platforms

O

The practice of monitoring, tracing, and evaluating AI system behavior in production — including LLM calls, latency, costs, retrieval quality, and output correctness.

Evaluation & Safety

P

Perplexity is an AI-native answer engine that always cites its sources — built from the ground up around web search and inline citations, rather than retrofitting a chat product to add browsing.

Models & Platforms

The initial training phase where a language model learns general language patterns from a massive text corpus, before being fine-tuned for specific tasks or behaviors.

Models & Platforms

AI systems that forecast outcomes based on historical data patterns, used for tasks like demand forecasting, risk assessment, and recommendation engines.

Models & Platforms

Prompt Engineering

The practice of designing and refining inputs to language models to elicit more accurate, useful, and consistent outputs.

Developer Experience

R

A training technique where human preferences are used to fine-tune a language model through reinforcement learning, teaching it to produce responses that humans judge as helpful, accurate, and safe.

Models & Platforms

Reasoning Models

LLMs trained with reinforcement learning to "think before they answer" by generating internal chains of reasoning — producing more accurate results on complex tasks like maths, coding, and multi-step logic at the cost of higher latency and token usage.

Interfaces & UX

Systematic adversarial testing of AI systems to identify vulnerabilities, failure modes, and unintended behaviours before deployment — adapted from cybersecurity to probe AI-specific weaknesses like prompt injection and jailbreaks.

Evaluation & Safety

Reinforcement Learning(RL)

A machine learning approach where an agent learns by taking actions in an environment and receiving rewards or penalties, gradually discovering which strategies produce the best outcomes.

Models & Platforms

A second-stage retrieval technique that re-scores and reorders an initial set of retrieved documents using a more computationally expensive cross-encoder model to surface the most relevant results.

Retrieval & Data

Retrieval-Augmented Generation(RAG)

A technique that grounds a language model's output in external data by retrieving relevant documents before generating a response.

Retrieval & Data

S

Schema markup is structured metadata embedded in a webpage (using the schema.org vocabulary, typically as JSON-LD) that tells search engines and AI assistants exactly what a page is about — product, article, FAQ, organisation, person, recipe, event, etc.

Developer Experience

Semantic Search

A search technique that understands the meaning and intent behind queries rather than matching exact keywords, using vector embeddings to find conceptually relevant results even when different words are used.

Retrieval & Data

Sentiment Analysis

In an AI-search context, sentiment analysis is the classification of how AI assistants describe a brand — positive, neutral, or negative — across the answers they generate, beyond just whether the brand is mentioned.

Evaluation & Safety

Share of Voice(SoV)

Share of voice (in the context of AI search) is the percentage of AI-generated answers about a category that mention a given brand, measured against named competitors over the same prompt set.

AI Search & Discoverability

Small Language Model(SLM)

A compact AI language model — typically under 10 billion parameters — designed to run efficiently on edge devices and single GPUs while delivering strong task-specific performance.

Models & Platforms

Structured Data

Structured data is any machine-readable representation of information on a webpage — most commonly schema.org JSON-LD, but also microdata, RDFa, sitemaps, OpenGraph tags, Twitter cards, and (increasingly) llms.txt files.

Developer Experience

Structured Output

A technique for constraining a language model's output to follow a specific format like JSON, XML, or a defined schema, ensuring the response can be reliably parsed by downstream code.

Developer Experience

Supervised Learning

A machine learning approach where the model learns from labeled examples — input-output pairs where the correct answer is provided during training.

Models & Platforms

The foundational instruction set given to an LLM that defines its role, behaviour, tone, and constraints for a particular application — set once at the application level and shaping all subsequent user interactions.

Developer Experience

T

A parameter that controls how random or deterministic an LLM's output is — lower values produce more predictable, focused responses while higher values increase creativity and variation.

Developer Experience

The basic unit of text that a language model processes — typically a word, subword, or punctuation mark, roughly equivalent to 3/4 of an English word.

Models & Platforms

Topic Clustering

Topic clustering is the grouping of related prompts, AI answers, or content pieces into thematic clusters — used in AEO to consolidate prompt-level visibility data into actionable narratives about how AI describes a category.

Retrieval & Data

The dataset used to teach a machine learning model, containing the examples and patterns the model learns to recognize and reproduce.

Models & Platforms

Transformer Architecture

A neural network architecture that powers modern AI by processing entire input sequences simultaneously through an attention mechanism, rather than reading them word by word.

Models & Platforms

U

Unsupervised Learning

A machine learning approach where the model finds patterns and structure in data without labeled examples, discovering groupings and relationships on its own.

Models & Platforms

V

Vector Database

A database optimized for storing, indexing, and querying high-dimensional vector embeddings, enabling fast similarity search at scale.

Retrieval & Data

Z

Zero-click Search

A zero-click search is a search query that ends without the user clicking through to any external website — the user gets their answer from the search results page itself.

AI Search & Discoverability