Backend

The Agent Tool Stack: Why Your Runtime Needs Intent Routing, Not Just Function Calling

By lizhongxuan · Jun 26, 2026

Read original on juejin.cn ↗ Google Translate ↗ Alt translation

As AI agents move from demos to production, the tool layer becomes the critical bottleneck. Western developers building multi-model, multi-tenant agent systems face the same problems this guide addresses: token bloat from exposing all tools, model mis-selection from vague tool names, and security risks from ungoverned tool execution. The routing patterns here are directly applicable to any agent framework, regardless of vendor choice.

Summary

A deep technical guide argues that production-grade Agent Runtimes must move beyond the naive "model calls a function" pattern. The real engineering challenge is controlling tool visibility, execution, and result projection across a growing matrix of vendor-hosted tools, custom functions, and MCP servers.

The guide maps out the full tool landscape across OpenAI, Anthropic, Gemini, Mistral, xAI, Alibaba Cloud, Z.AI, and DeepSeek — showing that each vendor's built-in tools (web search, code interpreter, file search) have fundamentally different execution models from custom function calling. The critical architectural insight: business code should declare capability intent ("I need web.search"), not vendor-specific payloads.

A production Runtime needs three routing layers: intent routing (what does the user need?), capability routing (which provider or implementation to use?), and execution routing (hosted, local function, MCP, or human approval?). The guide provides concrete TypeScript interfaces for a Capability Registry, Policy Engine, and Provider Adapter, plus a state machine for the tool execution loop with hard constraints on iterations, costs, and latency.

Takeaways

— Tools are not API wrappers but five-layer contracts: capability, input, execution, output, and governance.

— Vendor-hosted tools (web_search, code_interpreter) execute on the provider's server; custom function calling executes in your Runtime — they have different lifecycles and must be handled separately.

— Production Runtimes need three routing layers: Intent Routing (what capability?), Capability Routing (which implementation?), and Execution Routing (who executes?).

— Business code should declare capability intent ("web.search") and let a Provider Adapter translate it into vendor-specific payloads.

— Tool results must be projected (compressed, structured, citation-preserved) before re-entering the model context to avoid token explosion and noise.

— Tool execution is a state machine with hard constraints: max iterations (3-8), max calls per turn (5-20), max wall time, max cost, and max context tokens from tools.

— Parallel tool execution is essential for latency but must respect dependency groups and side-effect classifications.

— External content from url.fetch must be marked untrusted and isolated from sensitive write tools to prevent indirect prompt injection.

— Every tool call should be a trace span recording input, output, latency, cost, and policy decision.

— Tool eval must measure selection quality, argument correctness, evidence grounding, cost efficiency, and safety — not just final answer accuracy.

Conclusions

The most overlooked capability is tool_search — dynamically loading tools from a registry rather than stuffing all schemas into context. This shifts the tool registry from a static JSON array to a search engine.

Anthropic's tool_choice design (auto, required, forbidden) is a pattern every Runtime should adopt for high-risk enterprise scenarios. The model should not always be allowed to call tools.

The guide's decision table for when to use vendor-hosted vs. custom vs. MCP tools is a practical framework that most agent tutorials skip. The key variable is control over search indexing, ranking, and citation strategy.

The separation of web.search from url.fetch is a security pattern that many production systems miss. Fetching arbitrary URLs is a fundamentally different risk profile from searching indexed content.

The guide's emphasis on structured evidence output (JSON with claims, confidence, source type) over plain text is a strong architectural opinion. It enables better auditing, eval, and UI rendering.

The cost governance section highlights a non-obvious danger: multi-round tool loops can cause costs to balloon non-linearly because each round carries the full history. Evidence stores and artifact references are the mitigation.

The anti-pattern of letting the model decide permissions is a recurring failure in agent demos. The guide correctly assigns permission decisions to the Runtime's Policy Engine, not the model.

The guide's unified capability model (ToolIntent, ExecutionMode, ToolCandidate, ToolRouteContext) is a concrete pattern that could be adopted by any agent framework like LangChain, CrewAI, or AutoGen.

Concepts & terms

Tool Routing

The process of dynamically selecting which tools to expose to a model in a given turn, based on intent detection, capability registry, policy engine, and execution mode. Prevents token waste, model mis-selection, and security risks from exposing all tools.

Result Projector

A component that compresses, structures, and adds citations to tool results before they re-enter the model context. Prevents token explosion, noise, and sensitive data leakage from large tool outputs.

Capability Registry

A centralized store of all available tools, their intents, providers, execution modes, risk classes, and priority scores. Enables the Runtime to dynamically discover and select tools without hardcoding vendor-specific payloads.

Provider Adapter

A translation layer that converts a unified tool intent (e.g., 'web.search') into vendor-specific payloads (e.g., OpenAI's {type: 'web_search'}, Gemini's {type: 'google_search'}). Keeps business code vendor-agnostic.

tool_search

A capability in OpenAI's Responses API that allows the model to dynamically retrieve and load tools from a deferred registry, rather than having all tool schemas injected into the context upfront. Reduces token consumption and improves selection accuracy.

MCP (Model Context Protocol)

An open protocol that standardizes how Agent Runtimes discover and execute tools, resources, and prompts from remote servers. Solves the problem of managing 50+ tools across different teams and systems.

Source: juejin.cn ↗ Google Translate ↗ Backup ↗