The Agent Tool Stack: Why Your Runtime Needs Intent Routing, Not Just Function Calling
As AI agents move from demos to production, the tool layer becomes the critical bottleneck. Western developers building multi-model, multi-tenant agent systems face the same problems this guide addresses: token bloat from exposing all tools, model mis-selection from vague tool names, and security risks from ungoverned tool execution. The routing patterns here are directly applicable to any agent framework, regardless of vendor choice.
A deep technical guide argues that production-grade Agent Runtimes must move beyond the naive "model calls a function" pattern. The real engineering challenge is controlling tool visibility, execution, and result projection across a growing matrix of vendor-hosted tools, custom functions, and MCP servers.
The guide maps out the full tool landscape across OpenAI, Anthropic, Gemini, Mistral, xAI, Alibaba Cloud, Z.AI, and DeepSeek — showing that each vendor's built-in tools (web search, code interpreter, file search) have fundamentally different execution models from custom function calling. The critical architectural insight: business code should declare capability intent ("I need web.search"), not vendor-specific payloads.
A production Runtime needs three routing layers: intent routing (what does the user need?), capability routing (which provider or implementation to use?), and execution routing (hosted, local function, MCP, or human approval?). The guide provides concrete TypeScript interfaces for a Capability Registry, Policy Engine, and Provider Adapter, plus a state machine for the tool execution loop with hard constraints on iterations, costs, and latency.
The most overlooked capability is tool_search — dynamically loading tools from a registry rather than stuffing all schemas into context. This shifts the tool registry from a static JSON array to a search engine.
Anthropic's tool_choice design (auto, required, forbidden) is a pattern every Runtime should adopt for high-risk enterprise scenarios. The model should not always be allowed to call tools.
The guide's decision table for when to use vendor-hosted vs. custom vs. MCP tools is a practical framework that most agent tutorials skip. The key variable is control over search indexing, ranking, and citation strategy.
The separation of web.search from url.fetch is a security pattern that many production systems miss. Fetching arbitrary URLs is a fundamentally different risk profile from searching indexed content.
The guide's emphasis on structured evidence output (JSON with claims, confidence, source type) over plain text is a strong architectural opinion. It enables better auditing, eval, and UI rendering.
The cost governance section highlights a non-obvious danger: multi-round tool loops can cause costs to balloon non-linearly because each round carries the full history. Evidence stores and artifact references are the mitigation.
The anti-pattern of letting the model decide permissions is a recurring failure in agent demos. The guide correctly assigns permission decisions to the Runtime's Policy Engine, not the model.
The guide's unified capability model (ToolIntent, ExecutionMode, ToolCandidate, ToolRouteContext) is a concrete pattern that could be adopted by any agent framework like LangChain, CrewAI, or AutoGen.