Tool Use Isn't Magic: The Three-Stage Pipeline That Makes LLMs Actually Do Things

The Technical Logic Behind Tool Use

More than just API calls.

Introduction: Those "Smart" AIs

Doubao can automatically search the web — two tools: date retrieval tool, web search tool
Claude can analyze Excel spreadsheets — two tools: file reading tool, Excel analysis tool
AI Agents can operate computers

Agent = LLM + tools

Does AI have self-awareness? As developers, this is a carefully crafted illusion — users see the LLM "completing" the work, but in reality it's calling tools.

The LLM running wild inside the GPU is essentially still a word-guessing game. It's a brain trapped in a server, unable to see the screen or touch the keyboard. How does a probabilistic model that can only do Next Token Prediction break through physical limitations, call APIs, read databases, and manipulate the physical world?

The answer is Tool Use.

Three-Stage Model

Tools are functions. The LLM's ability to call tools relies on three stages:

Stage One: Cognitive Implantation

Before executing a task, when you configure tools in the system prompt, you are doing something very subtle — cognitive implantation.

Make the Tool into language — use language to describe what the function is, what it does, what parameters it needs, and what results it returns. The LLM doesn't understand what a weather API is or what a database query is, but it understands language.

JSON Schema translates complex software interface functions into an "instruction manual" that the LLM can understand.

The outer layer uses JSON to declare the tool format (function name, description, parameter list)
The inner parameters field uses JSON Schema to constrain parameter types (string, number, required)

Because LLMs have probabilistic randomness, tool descriptions must be specific and clear.

{
    "type": "function",
    "function": {
        "name": "get_closing_price",
        "description": "Get the closing price of a specified stock",
        "parameters": {
            "type": "object",
            "properties": {
                "name": {
                    "type": "string",
                    "description": "Stock name"
                }
            },
            "required": ["name"]
        }
    }
}

At this stage, a complex software tool (get_closing_price) is reduced to a pure text description (JSON Schema). When a user asks "What is the closing price of Tsingtao Beer?", the LLM can't answer — but it knows it has a tool it can use.

Stage Two: Intent Recognition

The LLM starts reasoning: training data doesn't have real-time stock prices → can't answer → checks the cognitively implanted tools → finds get_closing_price → decides to call the tool.

The LLM no longer directly replies to the user, but outputs tool_calls — strictly following the JSON Schema instruction manual, generating structured tool call instructions. Note that content might be an empty string, or it might include a transitional phrase (like "Let me check for you"), depending on the model implementation:

{
    "role": "assistant",
    "content": "",
    "tool_calls": [
        {
            "id": "call_xxx",
            "type": "function",
            "function": {
                "name": "get_closing_price",
                "arguments": "{\"name\": \"Tsingtao Beer\"}"
            }
        }
    ]
}

The LLM cannot execute the tool — the developer can! The LLM is only responsible, through pattern recognition and reasoning, for outputting "which function to call and what parameters to pass". The actual execution happens entirely outside the model.

Stage Three: Your Code Intervenes

The LLM stops after outputting tool_calls. Next, application-layer code (Node/Python/Java, etc.) takes over — parses the tool_calls, matches the function name, passes the parameters, actually executes the tool function, and gets the result.

Key point: The result is not returned directly to the user, but is stuffed back into messages with the tool role and sent to the LLM again. The LLM generates the final response based on the complete context (user's original question → its own decision → the tool's returned result).

Practical: Complete Message Flow

Using the "query Tsingtao Beer closing price" example in index.mjs, break down the changes in the messages array at each stage.

Core Roles

Role	Who Produces It	Meaning
`user`	User/Frontend	The user's question
`assistant`	LLM	What the LLM says (or tool call instructions)
`tool`	Your Code	The execution result of the tool function

Stage 0: Initial messages

let messages = [{ role: 'user', content: 'What is the closing price of Tsingtao Beer?' }];

[
  { "role": "user", "content": "What is the closing price of Tsingtao Beer?" }
]

Just one user question, the starting point of the entire conversation.

Stage 1: First LLM Call → Corresponds to "Intent Recognition" in the Three-Stage Model

const response = await sendMessage(messages);

The LLM sees the user asking about stock price, training data doesn't have real-time stock price, so it checks its tool list — finds get_closing_price, doesn't return normal text, but returns tool_calls:

{
  "role": "assistant",
  "content": "Let me check the closing price of Tsingtao Beer.",
  "tool_calls": [
    {
      "index": 0,
      "id": "call_00_B05dAElKlxAKyio9AXDN4208",
      "type": "function",
      "function": {
        "name": "get_closing_price",
        "arguments": "{\"name\": \"Tsingtao Beer\"}"
      }
    }
  ]
}

tool_calls Structure

Field	Meaning
`id`	Unique identifier for this call, used by the subsequent `tool` message to associate
`type`	Fixed as `"function"`
`function.name`	The function name to call
`function.arguments`	JSON string, the parameters passed

content and tool_calls can coexist — content is a polite remark to the user, tool_calls is the real action. But the LLM only "says" what to call, it doesn't execute it itself.

How does the LLM know what tools are available? — This is "Cognitive Implantation"

const res = await client.chat.completions.create({
    model: 'deepseek-v4-pro',
    messages,
    tools,          // ← Cognitive Implantation: JSON Schema describing the function manual
    tool_choice: 'auto'
});

The LLM doesn't understand what an API is, but it can read language. This is reducing functions to language — translating code functions into text descriptions the LLM can understand. tool_choice: 'auto' means letting the LLM decide whether to use a tool.

Stage 2: Push assistant message

messages.push({
    role: message.role,
    content: message.content,
    tool_calls: message.tool_calls
});

[
  { "role": "user", "content": "What is the closing price of Tsingtao Beer?" },
  {
    "role": "assistant",
    "content": "Let me check the closing price of Tsingtao Beer.",
    "tool_calls": [
      {
        "id": "call_xxx",
        "type": "function",
        "function": { "name": "get_closing_price", "arguments": "{\"name\":\"Tsingtao Beer\"}" }
      }
    ]
  }
]

This step tells the context: "The LLM just saw the user's question and decided to call get_closing_price('Tsingtao Beer')". This is part of the conversation history, needed for the second call.

Common Bug: If you push the assistant message twice here, the API will report insufficient tool messages following tool_calls message — each assistant message with tool_calls must be followed by a corresponding tool message.

Stage 3: Execute Tool → Corresponds to "Code Intervention" in the Three-Stage Model

if (response.choices[0].message.tool_calls) {
    const toolCall = response.choices[0].message.tool_calls[0];

Why take [0]

tool_calls is an array; the LLM can request multiple tools at once. For example, if the user asks "Tsingtao Beer closing price and Beijing weather", the LLM might return two tool_calls. Taking [0] here is a simplification.

Why JSON.parse

const args = JSON.parse(toolCall.function.arguments);
// "{\"name\": \"Tsingtao Beer\"}"  →  { name: "Tsingtao Beer" }
const price = get_closing_price(args.name);  // "67.92"

arguments is a string, not a JS object. Without parsing, you can't access the .name property.

Push tool result

messages.push({
    role: 'tool',
    content: price,            // "67.92"
    tool_call_id: toolCall.id  // "call_xxx"
});

[
  { "role": "user", "content": "What is the closing price of Tsingtao Beer?" },
  { "role": "assistant", "content": "...", "tool_calls": [{"id": "call_xxx", ...}] },
  { "role": "tool", "content": "67.92", "tool_call_id": "call_xxx" }
]

tool_call_id pairs with tool_calls[0].id in the assistant message — the LLM uses this id to know that "67.92" is the result of the "closing price query", not a weather query.

Key: The result is not returned directly to the user, but returned to the LLM.

Stage 4: Second LLM Call

const finalRes = await sendMessage(messages);

The LLM sees the complete context — the user asked about stock price, it decided to call get_closing_price("Tsingtao Beer"), the tool returned "67.92" — so it digests this into natural language:

{
  "role": "assistant",
  "content": "The closing price of Tsingtao Beer is 67.92 yuan."
}

Complete Flow Diagram

messages change process (4 messages, 2 API calls):

[user]
  │
  ├─ 1st sendMessage ────────── Intent Recognition: LLM decides which tool to call
  │
  ▼
[user, assistant(tool_calls)]
  │
  ├─ Your code executes get_closing_price ── Code Intervention: actually executes the tool
  │
  ▼
[user, assistant(tool_calls), tool(67.92)]
  │
  ├─ 2nd sendMessage ────────── LLM digests the result, generates response
  │
  ▼
[user, assistant(tool_calls), tool, assistant]

Difference Between the Two Calls

	First Call	Second Call
LLM Role	Decision-maker — whether to call a tool, which one	Summarizer — turns tool results into human language
Corresponding Stage	Intent Recognition	Post-code intervention wrap-up
Does messages contain `tool`?	❌	✅
Return Value	`tool_calls` (instructions to call tools)	Pure `content` (answer for the user)
Who Executes	LLM only gives instructions	LLM only interprets results

Essence

The deeper meaning of Agent = LLM + Tools:

LLM is the brain: makes decisions, understands language, generates responses
Tools are the hands and feet: query databases, call APIs, manipulate files
Application code is the nervous system: connects the brain and limbs, translates the LLM's decisions into function calls, translates tool results back into conversation context

The entire Tool Use mechanism is the message passing protocol between these three. The illusion that "AI is smart" that users see comes from this carefully designed three-stage pipeline — the LLM is responsible for decision-making and expression, tools are responsible for execution, and connecting it all are those few JSON messages with different roles in the messages array.