Agentic RAG

Moving from "retrieve then generate" to "agent decides how to retrieve, validates evidence, and iterates" — production frameworks make this viable, but debugging and cost control are harder than vanilla RAG.

Agentic·RAG

arxiv.org

Our Take

What It Is

Agentic RAG wraps an AI agent around the retrieval pipeline. Instead of a fixed sequence (embed query, search index, generate answer), the agent plans its retrieval strategy, decides which tools to use (keyword search, semantic search, chunk-level reads), evaluates the evidence, and iterates until it has enough context. The A-RAG paper (February 2026) formalised hierarchical retrieval interfaces that expose search capabilities directly to the model agent.

Why It Matters

Standard RAG fails on complex queries that require information from multiple documents, reasoning across evidence, or following chains of references. Agentic RAG handles these by decomposing questions, retrieving from multiple sources, and cross-referencing results. In radiology QA, agentic decomposition improved diagnostic accuracy from 68% to 73%.

For practitioners, the production stacks have converged: LangGraph for orchestration, LlamaIndex AgentWorkflow for retrieval, hybrid search plus rerankers, and critic loops for evidence validation. This isn't theoretical anymore.

Key Developments

Feb 2026: A-RAG paper published with hierarchical retrieval interfaces for agents.
2025: LlamaIndex pivots entirely to AgentWorkflow as primary abstraction.
2025: MA-RAG demonstrates collaborative chain-of-thought across specialised agents.
2025: Microsoft GraphRAG reaches 20K+ GitHub stars as a parallel agentic approach.

What to Watch

The gap between "agentic RAG" in papers and production reality is still significant. Most deployed systems use predefined step sequences rather than truly autonomous agents. Watch for evaluation frameworks that can measure agent planning quality and iteration efficiency — without those, it's hard to know if the agentic layer is actually helping or just adding cost and latency.

Strengths

Handles multi-hop queries: Agents decompose questions, retrieve from multiple sources, and iterate until evidence is sufficient.
Adaptive retrieval strategy: A-RAG's hierarchical interfaces let agents choose between keyword, semantic, and chunk-level search per query.
Production frameworks are mature: LangGraph and LlamaIndex AgentWorkflow provide state machines, traceability, and debuggability.
Composable with existing infrastructure: Layers on top of existing vector stores and rerankers. An orchestration upgrade, not rip-and-replace.

Considerations

Higher cost per query: Multiple retrieval and LLM calls per query. A single agentic RAG query can cost 3-10x a static RAG query.
Debugging complexity: Non-deterministic agent behaviour makes failure reproduction harder. Observability tooling is essential.
Latency increases with iteration: Each agent reasoning step adds latency. Planning overhead may be unacceptable for real-time applications.
Evaluation is harder: Standard RAG metrics need extension to measure planning quality, iteration efficiency, and tool selection accuracy.

Resources

Papers

A-RAG: Hierarchical Retrieval Interfacesarxiv.org

February 2026 paper on scaling agentic RAG with hierarchical retrieval.

Agentic RAG Surveyarxiv.org

Comprehensive survey establishing taxonomy of agentic RAG design patterns.

Documentation

LangGraph Documentationlangchain-ai.github.io

Primary orchestration framework for production agentic RAG.

LlamaIndex AgentWorkflowllamaindex.ai

LlamaIndex's agent-first abstraction for retrieval workflows.

More in Agents & Orchestration

Agentic RAG· A2A Protocol· OpenAI Agents SDK· PydanticAI· AI Browser Use· CrewAI· Multi-agent Orchestration· OpenClaw· Chain-of-Thought· LangGraph· Model Context Protocol· Tool Use / Function Calling

Back to AI Radar