Tool Use Isn't Magic: The Three-Stage Pipeline That Makes LLMs Actually Do Things
The Technical Logic Behind Tool Use
More than just API calls.
Introduction: Those "Smart" AIs
- Doubao can automatically search the web — two tools: date retrieval tool, web search tool
- Claude can analyze Excel spreadsheets — two tools: file reading tool, Excel analysis tool
- AI Agents can operate computers
Agent = LLM + tools
Does AI have self-awareness? As developers, this is a carefully crafted illusion — users see the LLM "completing" the work, but in reality it's calling tools.
The LLM running wild inside the GPU is essentially still a word-guessing game. It's a brain trapped in a server, unable to see the screen or touch the keyboard. How does a probabilistic model that can only do Next Token Prediction break through physical limitations, call APIs, read databases, and manipulate the physical world?
The answer is Tool Use.
Three-Stage Model
Tools are functions. The LLM's ability to call tools relies on three stages:
Stage One: Cognitive Implantation
Before executing a task, when you configure tools in the system prompt, you are doing something very subtle — cognitive implantation.
Make the Tool into language — use language to describe what the function is, what it does, what parameters it needs, and what results it returns. The LLM doesn't understand what a weather API is or what a database query is, but it understands language.
JSON Schema translates complex software interface functions into an "instruction manual" that the LLM can understand.
- The outer layer uses JSON to declare the tool format (function name, description, parameter list)
- The inner
parametersfield uses JSON Schema to constrain parameter types (string, number, required)
Because LLMs have probabilistic randomness, tool descriptions must be specific and clear.
{
"type": "function",
"function": {
"name": "get_closing_price",
"description": "Get the closing price of a specified stock",
"parameters": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "Stock name"
}
},
"required": ["name"]
}
}
}
At this stage, a complex software tool (get_closing_price) is reduced to a pure text description (JSON Schema). When a user asks "What is the closing price of Tsingtao Beer?", the LLM can't answer — but it knows it has a tool it can use.
Stage Two: Intent Recognition
The LLM starts reasoning: training data doesn't have real-time stock prices → can't answer → checks the cognitively implanted tools → finds get_closing_price → decides to call the tool.
The LLM no longer directly replies to the user, but outputs tool_calls — strictly following the JSON Schema instruction manual, generating structured tool call instructions. Note that content might be an empty string, or it might include a transitional phrase (like "Let me check for you"), depending on the model implementation:
{
"role": "assistant",
"content": "",
"tool_calls": [
{
"id": "call_xxx",
"type": "function",
"function": {
"name": "get_closing_price",
"arguments": "{\"name\": \"Tsingtao Beer\"}"
}
}
]
}
The LLM cannot execute the tool — the developer can! The LLM is only responsible, through pattern recognition and reasoning, for outputting "which function to call and what parameters to pass". The actual execution happens entirely outside the model.
Stage Three: Your Code Intervenes
The LLM stops after outputting tool_calls. Next, application-layer code (Node/Python/Java, etc.) takes over — parses the tool_calls, matches the function name, passes the parameters, actually executes the tool function, and gets the result.
Key point: The result is not returned directly to the user, but is stuffed back into messages with the tool role and sent to the LLM again. The LLM generates the final response based on the complete context (user's original question → its own decision → the tool's returned result).
Practical: Complete Message Flow
Using the "query Tsingtao Beer closing price" example in index.mjs, break down the changes in the messages array at each stage.
Core Roles
| Role | Who Produces It | Meaning |
|---|---|---|
user |
User/Frontend | The user's question |
assistant |
LLM | What the LLM says (or tool call instructions) |
tool |
Your Code | The execution result of the tool function |
Stage 0: Initial messages
let messages = [{ role: 'user', content: 'What is the closing price of Tsingtao Beer?' }];
[
{ "role": "user", "content": "What is the closing price of Tsingtao Beer?" }
]
Just one user question, the starting point of the entire conversation.
Stage 1: First LLM Call → Corresponds to "Intent Recognition" in the Three-Stage Model
const response = await sendMessage(messages);
The LLM sees the user asking about stock price, training data doesn't have real-time stock price, so it checks its tool list — finds get_closing_price, doesn't return normal text, but returns tool_calls:
{
"role": "assistant",
"content": "Let me check the closing price of Tsingtao Beer.",
"tool_calls": [
{
"index": 0,
"id": "call_00_B05dAElKlxAKyio9AXDN4208",
"type": "function",
"function": {
"name": "get_closing_price",
"arguments": "{\"name\": \"Tsingtao Beer\"}"
}
}
]
}
tool_calls Structure
| Field | Meaning |
|---|---|
id |
Unique identifier for this call, used by the subsequent tool message to associate |
type |
Fixed as "function" |
function.name |
The function name to call |
function.arguments |
JSON string, the parameters passed |
content and tool_calls can coexist — content is a polite remark to the user, tool_calls is the real action. But the LLM only "says" what to call, it doesn't execute it itself.
How does the LLM know what tools are available? — This is "Cognitive Implantation"
const res = await client.chat.completions.create({
model: 'deepseek-v4-pro',
messages,
tools, // ← Cognitive Implantation: JSON Schema describing the function manual
tool_choice: 'auto'
});
The LLM doesn't understand what an API is, but it can read language. This is reducing functions to language — translating code functions into text descriptions the LLM can understand. tool_choice: 'auto' means letting the LLM decide whether to use a tool.
Stage 2: Push assistant message
messages.push({
role: message.role,
content: message.content,
tool_calls: message.tool_calls
});
[
{ "role": "user", "content": "What is the closing price of Tsingtao Beer?" },
{
"role": "assistant",
"content": "Let me check the closing price of Tsingtao Beer.",
"tool_calls": [
{
"id": "call_xxx",
"type": "function",
"function": { "name": "get_closing_price", "arguments": "{\"name\":\"Tsingtao Beer\"}" }
}
]
}
]
This step tells the context: "The LLM just saw the user's question and decided to call get_closing_price('Tsingtao Beer')". This is part of the conversation history, needed for the second call.
Common Bug: If you push the assistant message twice here, the API will report
insufficient tool messages following tool_calls message— each assistant message withtool_callsmust be followed by a correspondingtoolmessage.
Stage 3: Execute Tool → Corresponds to "Code Intervention" in the Three-Stage Model
if (response.choices[0].message.tool_calls) {
const toolCall = response.choices[0].message.tool_calls[0];
Why take [0]
tool_calls is an array; the LLM can request multiple tools at once. For example, if the user asks "Tsingtao Beer closing price and Beijing weather", the LLM might return two tool_calls. Taking [0] here is a simplification.
Why JSON.parse
const args = JSON.parse(toolCall.function.arguments);
// "{\"name\": \"Tsingtao Beer\"}" → { name: "Tsingtao Beer" }
const price = get_closing_price(args.name); // "67.92"
arguments is a string, not a JS object. Without parsing, you can't access the .name property.
Push tool result
messages.push({
role: 'tool',
content: price, // "67.92"
tool_call_id: toolCall.id // "call_xxx"
});
[
{ "role": "user", "content": "What is the closing price of Tsingtao Beer?" },
{ "role": "assistant", "content": "...", "tool_calls": [{"id": "call_xxx", ...}] },
{ "role": "tool", "content": "67.92", "tool_call_id": "call_xxx" }
]
tool_call_id pairs with tool_calls[0].id in the assistant message — the LLM uses this id to know that "67.92" is the result of the "closing price query", not a weather query.
Key: The result is not returned directly to the user, but returned to the LLM.
Stage 4: Second LLM Call
const finalRes = await sendMessage(messages);
The LLM sees the complete context — the user asked about stock price, it decided to call get_closing_price("Tsingtao Beer"), the tool returned "67.92" — so it digests this into natural language:
{
"role": "assistant",
"content": "The closing price of Tsingtao Beer is 67.92 yuan."
}
Complete Flow Diagram
messages change process (4 messages, 2 API calls):
[user]
│
├─ 1st sendMessage ────────── Intent Recognition: LLM decides which tool to call
│
▼
[user, assistant(tool_calls)]
│
├─ Your code executes get_closing_price ── Code Intervention: actually executes the tool
│
▼
[user, assistant(tool_calls), tool(67.92)]
│
├─ 2nd sendMessage ────────── LLM digests the result, generates response
│
▼
[user, assistant(tool_calls), tool, assistant]
Difference Between the Two Calls
| First Call | Second Call | |
|---|---|---|
| LLM Role | Decision-maker — whether to call a tool, which one | Summarizer — turns tool results into human language |
| Corresponding Stage | Intent Recognition | Post-code intervention wrap-up |
Does messages contain tool? |
❌ | ✅ |
| Return Value | tool_calls (instructions to call tools) |
Pure content (answer for the user) |
| Who Executes | LLM only gives instructions | LLM only interprets results |
Essence
The deeper meaning of Agent = LLM + Tools:
- LLM is the brain: makes decisions, understands language, generates responses
- Tools are the hands and feet: query databases, call APIs, manipulate files
- Application code is the nervous system: connects the brain and limbs, translates the LLM's decisions into function calls, translates tool results back into conversation context
The entire Tool Use mechanism is the message passing protocol between these three. The illusion that "AI is smart" that users see comes from this carefully designed three-stage pipeline — the LLM is responsible for decision-making and expression, tools are responsible for execution, and connecting it all are those few JSON messages with different roles in the messages array.