Tool Use Isn't Magic: The Three-Stage Pipeline That Makes LLMs Actually Do Things
For Western developers building AI agents, this is the canonical architecture behind every tool-calling system from OpenAI, Anthropic, and open-source models. Understanding the three-stage pipeline — not just the API surface — is what separates production-grade agent implementations from toy demos.
An LLM is a brain trapped in a server — it can't see a screen, touch a keyboard, or call an API. Yet users watch it search the web, analyze spreadsheets, and control computers. The mechanism behind this illusion is Tool Use, and it works through a precise three-stage pipeline.
First, cognitive implantation: every tool is described as a JSON Schema — a text-based instruction manual that translates a function's name, parameters, and purpose into language the LLM can understand. The LLM doesn't know what an API is, but it reads descriptions. Second, intent recognition: when the LLM encounters a question it can't answer from training data (like a real-time stock price), it outputs a structured `tool_calls` object specifying which function to call and with what arguments. The LLM never executes anything — it only produces instructions.
Third, code intervention: application-layer code parses the `tool_calls`, executes the actual function, and pushes the result back into the message array with a `tool` role. The LLM then reads that result and generates a natural-language response. The entire flow requires two LLM calls — one to decide, one to summarize — and the messages array acts as the nervous system connecting the brain to its tools.
The most common production bug — pushing assistant messages twice — reveals how fragile the message protocol is and why developers need to understand the state machine, not just copy-paste SDK examples.
The fact that LLMs can't execute code but can describe function signatures in JSON Schema is a profound architectural constraint: the model's power is in pattern matching, not action.
Calling Tool Use 'just API calls' misses the point — the real innovation is the message protocol that lets a probabilistic text generator orchestrate deterministic function execution.
The two-call pattern (decision then summary) is an elegant solution to the fundamental problem: LLMs are good at reasoning but bad at doing, and code is good at doing but bad at reasoning.