Strong signal and real results. Worth committing a pilot to.
Agentic RAG
Moving from "retrieve then generate" to "agent decides how to retrieve, validates evidence, and iterates" — production frameworks make this viable, but debugging and cost control are harder than vanilla RAG.
Agentic·RAG
arxiv.orgOur Take
What It Is
Agentic RAG wraps an AI agent around the retrieval pipeline. Instead of a fixed sequence (embed query, search index, generate answer), the agent plans its retrieval strategy, decides which tools to use (keyword search, semantic search, chunk-level reads), evaluates the evidence, and iterates until it has enough context. The A-RAG paper (February 2026) formalised hierarchical retrieval interfaces that expose search capabilities directly to the model agent.
Why It Matters
Standard RAG fails on complex queries that require information from multiple documents, reasoning across evidence, or following chains of references. Agentic RAG handles these by decomposing questions, retrieving from multiple sources, and cross-referencing results. In radiology QA, agentic decomposition improved diagnostic accuracy from 68% to 73%.
For practitioners, the production stacks have converged: LangGraph for orchestration, LlamaIndex AgentWorkflow for retrieval, hybrid search plus rerankers, and critic loops for evidence validation. This isn't theoretical anymore.
Key Developments
- Feb 2026: A-RAG paper published with hierarchical retrieval interfaces for agents.
- 2025: LlamaIndex pivots entirely to AgentWorkflow as primary abstraction.
- 2025: MA-RAG demonstrates collaborative chain-of-thought across specialised agents.
- 2025: Microsoft GraphRAG reaches 20K+ GitHub stars as a parallel agentic approach.
What to Watch
The gap between "agentic RAG" in papers and production reality is still significant. Most deployed systems use predefined step sequences rather than truly autonomous agents. Watch for evaluation frameworks that can measure agent planning quality and iteration efficiency — without those, it's hard to know if the agentic layer is actually helping or just adding cost and latency.
Strengths
- Handles multi-hop queries: Agents decompose questions, retrieve from multiple sources, and iterate until evidence is sufficient.
- Adaptive retrieval strategy: A-RAG's hierarchical interfaces let agents choose between keyword, semantic, and chunk-level search per query.
- Production frameworks are mature: LangGraph and LlamaIndex AgentWorkflow provide state machines, traceability, and debuggability.
- Composable with existing infrastructure: Layers on top of existing vector stores and rerankers. An orchestration upgrade, not rip-and-replace.
Considerations
- Higher cost per query: Multiple retrieval and LLM calls per query. A single agentic RAG query can cost 3-10x a static RAG query.
- Debugging complexity: Non-deterministic agent behaviour makes failure reproduction harder. Observability tooling is essential.
- Latency increases with iteration: Each agent reasoning step adds latency. Planning overhead may be unacceptable for real-time applications.
- Evaluation is harder: Standard RAG metrics need extension to measure planning quality, iteration efficiency, and tool selection accuracy.
Resources
Papers
More in Agents & Orchestration
Agentic RAG· A2A Protocol· OpenAI Agents SDK· PydanticAI· AI Browser Use· CrewAI· Multi-agent Orchestration· OpenClaw· Chain-of-Thought· LangGraph· Model Context Protocol· Tool Use / Function Calling
Back to AI Radar