Function Calling
The mechanism that allows LLMs to interact with external tools and APIs by outputting structured data — typically JSON — specifying which function to invoke and with what parameters.
Why it matters
Function calling is the bridge that transforms LLMs from text generators into systems that can take real-world actions — querying databases, calling APIs, and triggering workflows.
How it works
Function calling bridges the gap between text generation and real-world action. The process follows a consistent pattern across providers: the developer defines a set of tool specifications (function name, description, parameter schema) and passes them alongside the user's prompt. The model then decides whether any tool is relevant to the request, and if so, outputs a structured invocation — a JSON object specifying the function name and arguments.
The critical distinction: the model does not execute anything. It expresses intent in a structured format. Your application code receives that intent, runs the actual function, and feeds the result back to the model for the next step. This separation is what makes function calling safe and controllable.
Key implementations
- OpenAI — supports parallel function calling, where the model can request multiple tool invocations in a single turn. Tool definitions use JSON Schema for parameter validation.
- Anthropic — uses
tool_usecontent blocks within the message structure, making tool calls first-class parts of the response rather than a side channel. Supports forced tool use and sequential chaining. - Google — can auto-generate tool schemas from Python function signatures, reducing boilerplate. Supports both automatic and manual tool selection modes.
Despite surface-level differences, all implementations share the same core loop: define tools, let the model choose, execute externally, return results.
Relationship to agentic AI
Function calling is the foundational capability that makes AI agents possible. Without it, an LLM is limited to generating text. With it, the same model can search databases, send emails, create calendar events, query APIs, and trigger entire workflows.
The quality of function calling directly determines an agent's reliability. A model that picks the wrong function, hallucinates parameters, or fails to recognise when no tool is needed will produce agents that break in unpredictable ways. This is why function calling accuracy has become one of the most important benchmarks for evaluating models in production settings — it is not enough for a model to be articulate if it cannot reliably select and parameterise the right tool at the right time.