← Back to the summary

How to Build an Agent Tool System That Won't Collapse After 5 Tools

The project introduced in this article is dskcode — a terminal CLI tool for an AI programming assistant based on DeepSeek, implemented in TypeScript, targeting domestic users only. All design details and code examples in this article come from the actual implementation of dskcode.

Install and try: npm install -g dskcode or npx dskcode

Follow-up note: This article is based on the src/tool/ module implementation of agent-cli. The code is TypeScript, but the design ideas are language-agnostic — any LLM agent connecting tools cannot avoid the "tool system" layer.

Assumed audience: You can already get an LLM to call tools like bash and read_file, but things break down after connecting 5 tools — parameter validation always leaks, parallel execution gets tangled, error codes are all over the place, and tool writing occasionally crosses boundaries into ~/.ssh. If you are looking for "how to make an agent's tool system robust," this article is the answer.

The essence of an Agent tool system = declarative tool registry + strongly typed contracts + unified result protocol + sandboxed execution context. Get these four things right, and connecting 80 tools is as stable as connecting 8.
Three core principles: The runtime doesn't know the type, but the tool itself does (type erasure + factory pattern inference) / The schema the LLM sees and the types the developer writes come from the same source (Zod ↔ JSONSchema automatic bidirectional) / Any tool failure must be an unambiguous ToolResult (success and failure share the same structure, with an error code).
Key trade-off: Using Zod yields an extremely concise "write once, produce three things" (TS type + JSONSchema + validator), at the cost of introducing a ~50KB dependency — worth it.

1. Why an agent tool system deserves its own article

The trouble with an LLM agent tool system is not "can you get it to work," but "can it keep working without crashing."

After an agent runs for a while, the toolset looks like this:

read_file   # Read file
write_file  # Write file
edit_file   # Precise replacement
bash        # Run command
grep        # Search code
glob        # Find files
fetch       # Fetch web pages
...         # 30+ various User-defined

When connecting 5 tools, each tool parses, validates, and try/catches on its own; the code volume is 5×80=400 lines, manageable.

When connecting 30 tools, you suddenly find:

For the same "file read but too large" error, some tools return FILE_TOO_LARGE, some return TOOL_ERROR, some even throw.
Some tools' args are { path: string }, others secretly use { filePath } or { filepath }.
The LLM gives wrong parameters (path written as Path), some tools silently fallback, others crash directly.
Bash runs timeout and doesn't kill the process thoroughly; grep runs on 100M logs and blows up memory.
The security boundary for write tools relies on each tool remembering it; within a few days, one forgets.

graph LR
    subgraph Real state without a system
      A1[read_file tool<br/>self-manages exceptions]
      A2[bash tool<br/>self-manages exceptions]
      A3[grep tool<br/>self-manages exceptions]
      A4[fetch tool<br/>self-manages exceptions]
      A5[Your user_defined tool<br/>...also self-manages]
    end
    A1 -.Reworked three times.-> Z[Each tool has<br/>500 lines of boilerplate]
    A2 -.Reworked three times.-> Z
    A3 -.Reworked three times.-> Z
    A4 -.Reworked three times.-> Z

This is the problem the "tool system" abstraction layer solves: Centralizing cross-tool boilerplate code and all the things every tool needs to care about into one place.

2. Design Goals: What kinds of problems the tool system needs to solve

Clearly defining the "scope of responsibility" for the tool system prevents the implementation from bloating into a big ball of mud. Our goals are only four categories:

Category	Problem Solved	Corresponding in `src/tool/`
Registration	Tools have names, descriptions, and invocation methods, and can be found by the framework	`registry.ts`
Types	A tool's own args/return values are strongly typed, but the registry can store "any tool"	`types.ts`'s `AgentTool` + `AnyAgentTool`
Validation	Are the parameters given by the LLM correct? If not, tell it how to fix them	`schema-validator.ts` + `zod-schema-validator.ts`
Sandbox	Cross-cutting concerns like path safety, timeout, output truncation, binary detection	`sandbox.ts` + `eol.ts` + `diff.ts`

Things outside this scope — like "tool dependency injection," "tool remote invocation," "tool trace persistence," "tool dashboard" — we resolutely do not do. Once these are done, the tool system degenerates into "building another microservice framework."

Below, we expand on these four categories one by one.

3. Registration: The three things `ToolRegistry` does

The registry is essentially a Map with filtering capabilities. Three actions:

// src/tool/registry.ts (simplified)
export class ToolRegistry {
  readonly #tools = new Map<string, AnyAgentTool>();

  // ① Register
  register<I, O>(tool: AgentTool<I, O>): this {
    return this.registerErased(eraseTool(tool));   // Automatically erases type
  }

  // ② Look up by name
  get(name: string): AnyAgentTool | undefined {
    if (!this.#isToolEnabled(name)) return undefined;
    return this.#tools.get(name);
  }

  // ③ List available
  list(): AnyAgentTool[] {
    const result: AnyAgentTool[] = [];
    for (const [name, tool] of this.#tools) {
      if (this.#isToolEnabled(name)) result.push(tool);
    }
    return result;
  }
}

The real trouble is "filtering" — three layers of enable checks

list() is not a simple Map.values(). In the "dynamic assembly" scenario of an agent, a tool's availability is composed:

#isToolEnabled(name: string): boolean {
  const tool = this.#tools.get(name);
  if (!tool) return false;

  // 1. Disabled by user in config? — e.g., bash disabled in a project
  if (this.#disabledNames.has(name)) return false;

  // 2. Feature Flag off? — e.g., a tool is still experimental
  if (!this.#featureFlagChecker(name)) return false;

  // 3. Provider compatibility? — e.g., this tool only supports Anthropic
  if (this.#provider && tool.supportedProviders.length > 0) {
    if (!tool.supportedProviders.includes(this.#provider)) return false;
  }

  return true;
}

Hidden in this code is a non-obvious design decision: Why AND and not OR?

Consider this scenario: The user wrote disabledTools: ["bash"] in the config (project policy disables bash), but the tool itself is marked supportedProviders: ["anthropic"] (only supports Claude). Both conditions must be met to enable it. If either is not met, it's unavailable.

If OR were used, the "double unavailability" of user-disabled + provider-unsupported would be parity-checked away.

The registry also provides a convenience API for "bucketing by kind"

listByKind(kind: ToolKind): AnyAgentTool[] { ... }
listReadTools(): AnyAgentTool[]    { return this.listByKind(ToolKind.Read); }
listWriteTools(): AnyAgentTool[]   { return this.list().filter((t) => !isReadOnly(t.kind)); }

This is not decorative convenience — it is a key dependency for the agent's main loop: Read tools can be parallel / Write tools must be serial. See § 8 for details.

4. Types: `AgentTool<I, O>` and Type Erasure

This is one of the most "soulful" design decisions of the entire tool system.

The tool declares its own type

// src/tool/types.ts (excerpt)
export interface AgentTool<I, O extends ToolResult = ToolResult> {
  readonly name: string;
  readonly kind: ToolKind;
  readonly parameters: JSONSchema;
  readonly description: string;

  // Key: The tool's own execute signature, strongly typed with I
  execute(args: I, ctx: ToolContext): Promise<O>;

  initialTitle?(args: I): string;
  // ...
}

Each tool's execute(args, ctx) has a complete TS type in its own file — read_file sees { path: string, startLine?: number }, edit_file sees { path: string, old_text: string, new_text: string }. If written wrong, the IDE reports an error directly.

But the registry can only store "any tool" — here the type must be erased

// src/tool/types.ts (excerpt)
export interface AnyAgentTool {
  readonly name: string;
  readonly description: string;
  readonly kind: ToolKind;
  readonly parameters: JSONSchema;
  readonly supportsInputStreaming: boolean;
  readonly supportedProviders: string[];

  // Key: args become unknown, execute internally asserts
  execute(args: unknown, ctx: ToolContext): Promise<ToolResult>;
  initialTitle?(args: unknown): string;
}

If the registry kept generics, it would have to be Map<string, AgentTool<unknown, ToolResult>> — then every place using the registry would need to know "oh, this item is a read_fileTool". The essence of this erasure is: decoupling "finding a tool" from "using a tool" at the type level.

Erasure itself is just one step:

export function eraseTool<I, O extends ToolResult = ToolResult>(
  tool: AgentTool<I, O>,
): AnyAgentTool {
  return {
    get name() { return tool.name; },
    get description() { return tool.description; },
    // ...
    async execute(args: unknown, ctx: ToolContext): Promise<ToolResult> {
      // Internal assertion: the Registry caller knows the name anyway
      return tool.execute(args as I, ctx);
    },
    initialTitle(args: unknown): string {
      return tool.initialTitle?.(args as I) ?? tool.name;
    },
  };
}

After the caller gets AnyAgentTool, the only thing needed is: Assert back to the specific tool type based on the name:

const readTool = registry.get("read_file")!;
// readTool is AnyAgentTool, but we know it's read_file
const result = await readTool.execute({ path: "src/main.ts" }, ctx);
//                                          ^ Strong typing only holds when we know it ourselves

This "upstream strong typing + registry erasure + pre-execution assertion" is a trade-off — safer than "all any", more flexible than "all with generics".

What is the cost of type erasure?

The only cost: When calling execute(args), if args are external input (parsed from LLM JSON), you cannot rely on the TS type system for protection. This is exactly what the next section solves — runtime schema validation.

5. Validation: The dual-track design of Zod and JSONSchema

The parameters given by the LLM are structured JSON, but it can make mistakes — giving read_file a { path: 42 } (number instead of string), missing an old_text field, timeout given as "abc", etc. How to make the LLM self-correct?

Validation results must be structured and feedable back to the LLM

// src/tool/schema-validator.ts (excerpt)
export interface ValidationIssue {
  path: string;         // JSON Pointer: `$.path`, `$.items[2].name`
  expected: string;     // "string" / "present" / "enum[A,B,C]" / "length >= 3"
  received: string;      // Value description truncated to 60 chars, to avoid log explosion
  message: string;       // Human-readable Chinese, directly fed to LLM
}

Note that message is in Chinese. This detail is a lifesaver — LLMs self-correct more easily with Chinese error messages than English, because there is more Chinese prompt data in the training corpus. path using JSON Pointer precisely tells it "which field is wrong."

Dual track: Both JSONSchema and Zod work, outputs are isomorphic

There are two ways to declare a tool's parameter schema —

// Method A: JSONSchema (used by the old 8 built-in tools)
export const readFileTool: AgentTool<ReadFileArgs> = {
  name: "read_file",
  kind: ToolKind.Read,
  parameters: {
    type: "object",
    properties: {
      path: { type: "string", description: "..." },
      startLine: { type: "number", description: "..." },
    },
    required: ["path"],
    additionalProperties: false,
  },
  // ...
};

// Method B: Zod-first (recommended for new tools)
import { z } from "zod";
import { defineTool } from "../zod-schema-validator.js";

const ReadFileSchema = z.object({
  path: z.string().min(1, "path cannot be empty"),
  startLine: z.number().int().min(1).optional(),
  endLine: z.number().int().min(1).optional(),
});

export const readFileZodTool = defineTool<z.infer<typeof ReadFileSchema>>({
  name: "read_file_zod",
  kind: ToolKind.Read,
  schema: ReadFileSchema,         // Write once, produce three things
  async execute(args, ctx) { ... }
});

The essence of defineTool: Write the Zod schema once, and simultaneously get TS type + JSONSchema + validator. Single source of truth, cutting maintenance work in half later.

graph LR
    A[Zod schema<br/>z.object... x]
    A -->|z.infer| B[TS Type I]
    A -->|zodSchemaToJSONSchema| C[JSONSchema<br/>fed to LLM]
    A -->|Runtime check _def+safeParse| D[Zod validation path<br/>to ValidationIssue]
    A -.Written only once in source.-> E[&#34;Want to add a field?<br/>Change only one place&#34;]

    A2[JSONSchema<br/>handwritten]
    A2 -->|Lightweight recursive validator| D2[Self-built validation path<br/>isomorphic output ValidationIssue]

Isomorphism is more important than you think

There are three downstream consumers of validation failure:

LLM — fed into the next round of conversation for correction
Reflector — decides whether to reflection / retry
UI — shows the developer "which step went wrong"

None of these three consumers should care "whether this tool uses Zod or JSONSchema" — they only face ValidationIssue[].

Implementation uses a duck-typing detection branch:

// src/tool/schema-validator.ts (excerpt)
export function validateArgs(args: unknown, schema: unknown): ValidationResult {
  // Duck-type detection of Zod schema — avoids static import dragging weight
  if (isZodSchema(schema)) {
    return zodSafeValidate(args, schema as never);
  }

  // Otherwise, go through JSONSchema lightweight validation
  if (!isPlainObject(schema)) return { ok: true, issues: [] };
  // ...
}

function isZodSchema(schema: unknown): schema is ZodType {
  const obj = schema as Record<string, unknown>;
  return (
    "_def" in obj &&
    typeof obj.parse === "function" &&
    typeof obj.safeParse === "function"
  );
}

Why duck typing instead of instanceof z.ZodType? Because statically importing zod would make schema-validator pull in zod at load time, and any project not writing Zod tools would also bear this dependency. Duck detection allows zod to be loaded "on demand" only in tools that actually use it.

The `integer ↔ number` interoperability detail

JS has no real int. 1 is number, and JSON.parse also gives number. If the schema writes integer, the LLM gives 1, and validate says "expected integer, got number" — this is a false positive.

So:

function typeMatches(actual: string, expected: string): boolean {
  if (actual === expected) return true;
  if (expected === "integer" && actual === "number") return true;
  if (expected === "number" && actual === "integer") return true;
  return false;
}

jsonTypeOf distinguishes integer/number, but validation treats them interchangeably — this matches the LLM's actual behavior.

6. Sandbox: Cross-cutting concerns centralized in `sandbox.ts`

Read tools, write tools, and bash tools all care about "path safety" and "timeout" — these must be centralized. I've seen people write if (path.startsWith("~")) expand... in every tool; this is a nightmare.

6.1 Path resolution and `@` references

// src/tool/sandbox.ts (excerpt)

/** Strip a leading `@` reference marker */
export function stripMentionPrefix(inputPath: string): string {
  if (inputPath.startsWith("@")) return inputPath.slice(1);
  return inputPath;
}

export function resolvePath(inputPath: string, cwd: string): string {
  const stripped = stripMentionPrefix(inputPath);
  const resolved = isAbsolute(stripped) ? stripped : resolve(cwd, stripped);
  return resolve(resolved);   // Secondary normalize
}

Why secondary resolve? Because resolve does not eliminate .. (it only resolves ~ and relative paths), so resolve("/a/b/../c") is still /a/b/../c. This step "compacts" the semantics into an absolute path, so the subsequent realpath can compare reliably.

The @ detail saved me — in the system prompt, @<path> is syntactic sugar for "file path reference," but the LLM faithfully passes @test.ts as-is to the tool.

6.2 Write tools must `confine` within a whitelist

export async function confine(
  allowedRoots: string[],
  target: string,
): Promise<{ ok: true } | { ok: false; error: string }> {
  if (allowedRoots.length === 0) return { ok: true };   // No restrictions, allow

  const realTarget = await realPath(target);

  for (const root of allowedRoots) {
    const realRoot = await realPath(root);
    const rel = relative(realRoot, realTarget);
    // All three threats are judged: .. escape, empty (==root), absolute path
    if (!rel.startsWith("..") && rel !== "" && !rel.startsWith("/") && !rel.startsWith("\\")) {
      return { ok: true };
    }
    if (realTarget === realRoot) return { ok: true };
  }

  return { ok: false, error: `Path "${target}" is not within the allowed write scope ${allowedRoots.join(", ")}` };
}

Tool write code:

if (ctx.writeRoots && ctx.writeRoots.length > 0) {
  const conf = await confine(ctx.writeRoots, filePath);
  if (!conf.ok) {
    return { success: false, data: conf.error, error: "OUTSIDE_WRITE_ROOTS" };
  }
}

This means even if the LLM writes target = ~/.ssh/authorized_keys, it will be rejected by confine.

realPath is a lifesaving detail — if there are symbolic links in the path, like /var/www -> /home/user/www, resolving via realpath prevents bypassing.

6.3 Timeout abort: External signal + built-in timer

export function createTimeoutSignal(signal?: AbortSignal, timeoutMs = 30_000): AbortController {
  const controller = new AbortController();
  if (signal) signal.addEventListener("abort", () => controller.abort(), { once: true });
  const timer = setTimeout(() => controller.abort(), timeoutMs);
  controller.signal.addEventListener("abort", () => clearTimeout(timer), { once: true });
  return controller;
}

Why link the external signal + internal timer? The user might click "Stop" in the UI (externalAbort), and bash itself might run over 30s (timeoutAbort), either trigger must abort the child process.

6.4 Two-stage kill for bash child processes

Bash's execCommand also does something brutal:

const timeout = setTimeout(() => {
  child.kill("SIGTERM");
  setTimeout(() => {
    child.kill("SIGKILL");   // If not exited after 5 seconds, force kill
  }, 5000);
}, timeoutMs);

SIGTERM gives a 5s graceful exit window; if it hasn't left after 5s, use SIGKILL. This avoids "fake timeouts" (process not really dead, resources not released) — especially for commands like npm install, cargo build that spawn child-of-child processes.

7. Result Protocol: `ToolResult` success and failure share the same structure

An LLM agent's tool results must obey one iron rule: Success and failure both return objects of the same shape. Otherwise, the LLM's training pattern when seeing errors mixes with normal responses, getting messier the more it's used.

Core fields of `ToolResult`

// src/tool/types.ts (excerpt)
export interface ToolResult {
  success: boolean;        // The only flag for success or failure
  data: string;            // Content for the LLM to see (output on success, error on failure)
  error?: string;          // Error classification tag, LLM doesn't need to care, but the program does
  diff?: FileDiff;         // Only carried by file modification tools
  summary?: string;        // One-line summary for UI (LLM doesn't see)
  issues?: ValidationIssue[];   // Carried when schema validation fails
  denial?: ToolDenial;     // v0.6 new: permission denial details
}

Every tool error must have a machine-readable error code:

Error Scenario	`success`	`error`	`data`
File not found	`false`	`READ_ERROR`	`"Read file failed: ENOENT..."`
File too large	`false`	`FILE_TOO_LARGE`	`"File too large (15.3MB)..."`
Binary file	`false`	`BINARY_FILE`	`"Looks like a binary file..."`
old_text not found	`false`	`TEXT_NOT_FOUND`	`"Text to replace not found..."`
old_text appears multiple times	`false`	`TEXT_MULTIPLE_MATCHES`	`"Appears multiple times, please provide more context..."`
Outside writeRoots	`false`	`OUTSIDE_WRITE_ROOTS`	`"Path not within allowed scope..."`
Command exited non-0	`false`	`EXIT_CODE_1`	`"...\n[Exit code: 1]"`
Success	`true`	(none)	Tool output

Error codes should not be "descriptive" like Message — the error field is for programmatic branching decisions, it must be a stable identifier. If the LLM wants to read error details, it reads from data.

`data` must always be directly readable by the LLM

Don't throw Error, don't return undefined, don't call JSON.stringify(obj) (LLMs choke on large object descriptions).

On failure, data is a readable error description that the LLM can see in the next conversation round and self-correct — for example:

return {
  success: false,
  data: "Text to replace not found. Please confirm old_text matches the file content exactly (including indentation and spaces).",
  error: "TEXT_NOT_FOUND",
};

Seeing this, the LLM will likely replay the old text or use read_file first in the second round, rather than getting stuck.

`summary` for UI, `data` for LLM — two separate lines

return {
  success: true,
  data: `File edited: ${filePath}\nReplacement location: Line 120...`,   // ← Feed LLM
  summary: `📝 Modified: foo.ts (+3 -5)`,                    // ← Feed UI
  diff,                                                  // ← Feed UI diff view
};

Why two lines? Rendering a diff for a 10MB file directly in the UI doesn't work; but the LLM needs to see the full content to verify the modification is correct. data is the "LLM-friendly version", summary is the "human-friendly version", diff is the "tool-friendly version".

8. Tool Parallelism and Serialization: Starting from `ToolKind`

The actual capability of a tool system is not just "can it be invoked", but also "can it be invoked smartly". An engineering agent running 3 read_file calls simultaneously is a common need; running 3 edit_file calls editing the same file simultaneously is a disaster.

`ToolKind` is a semantic classification

export enum ToolKind {
  Read = "read",       // Pure read, no side effects, parallelizable
  Edit = "edit",       // File/directory content editing
  Delete = "delete",   // Deletion
  Move = "move",       // Rename/move
  Other = "other",     // bash, fetch fallback
}

export function isReadOnly(kind: ToolKind): boolean {
  return kind === ToolKind.Read;
}

The primary job of this field is for the agent main loop to use as a switch for deciding parallelism:

const reads = registry.listReadTools();
const writes = registry.listWriteTools();

// Main loop:
// 1. Fan-out all read-type calls in parallel (independent IO)
// 2. Writes must wait for the previous batch to complete before executing (single-threaded serial)

Why this division? Read operations have no external side effects — 5 read_file calls running on two different files in the same second, system state unchanged. Write operations have side effects (one write overwrites another, order-sensitive), must be serial.

Another hidden benefit: UI categorized display

If tool calls in the terminal are rendered grouped by ToolKind, the user can see at a glance "now reading / now writing":

✓ read_file      src/main.ts        (1ms)
✓ read_file      src/utils.ts       (2ms)
✓ grep           "TODO"             (8ms, 3 matches)
── Now starting writes ──
✓ edit_file      src/main.ts        (+3 -1)
✓ bash           npm test           (exit 0)

This "read / write" grouping is not showing off — it gives the user a sense of which phase the agent is currently in.

9. End-to-End Assembly

Putting all the above parts together looks like this:

graph TB
    subgraph Registration Phase
      RT[&#34;readFileTool: AgentTool<br/>strictly typed&#34;]
      ET[&#34;editFileTool: AgentTool<br/>strictly typed&#34;]
      BT[&#34;bashTool: AgentTool<br/>strictly typed&#34;]
    end

    subgraph &#34;Registry (registry.ts)&#34;
      REG[&#34;ToolRegistry<br/>tools: Map&#34;]
    end

    subgraph &#34;Scheduling Phase (Agent Main Loop)&#34;
      SCHED[&#34;Caller<br/>1. registry.listReadTools() / listWriteTools()<br/>2. validateArgs(args, tool.parameters)<br/>3. Permission gate check (optional unified entry point)<br/>4. tool.execute(args, ctx)&#34;]
    end

    subgraph Validation
      VJ[JSONSchema validation<br/>→ ValidationIssue[]]
      VZ[Zod validation<br/>→ ValidationIssue[]]
    end

    subgraph Sandbox
      SB[&#34;ToolContext {<br/>cwd, signal, timeout,<br/>writeRoots[] }<br/>+ sandbox.ts helpers&#34;]
    end

    RT -->|eraseTool| REG
    ET -->|eraseTool| REG
    BT -->|eraseTool| REG
    REG --> SCHED
    SCHED -->|validate schema| VJ
    SCHED -->|validate schema| VZ
    SCHED -->|execute| SB
    SB -->|ToolResult| SCHED
    SCHED -->|deny → lastDenial| GT[Permission Gate<br/>(unified policy entry point)]
    SCHED -->|success / fail| LLM[Next round LLM]
    end

The code for a complete invocation looks like this:

// ① Find tool
const tool = registry.get(toolName);
if (!tool) return { success: false, data: `Tool "${toolName}" does not exist`, error: "TOOL_NOT_FOUND" };

// ② Validate parameters (using the tool's own schema, JSONSchema or Zod both work)
const validation = validateArgs(rawArgs, (tool as any).schema ?? tool.parameters);
if (!validation.ok) {
  return {
    success: false,
    data: "Parameter validation failed:\n" + validation.issues.map((i) => i.message).join("\n"),
    error: "INVALID_ARGS",
    issues: validation.issues,
  };
}

// ③ Optional: Permission gate check (if a unified policy entry point exists; skip this section if not passed)
const gate = dispatchOptions.gate;     // Gate held by call context, this section does not depend on specific implementation
if (gate && !(await gate.check(toolName, rawArgs))) {
  return {
    success: false,
    data: `Permission denied: ${gate.lastDenial?.reason ?? "Unknown reason"}`,
    error: "GATE_DENIED",
    denial: gate.lastDenial,
  };
}

// ④ Prepare context
const ctx: ToolContext = {
  cwd: process.cwd(),
  signal: externalSignal,
  timeout: 30_000,
  writeRoots: config.writeRoots,   // Optional
};

// ⑤ Execute
try {
  const result = await tool.execute(rawArgs, ctx);
  // ⑥ Any thrown exception will be wrapped into EXECUTION_ERROR by registry.execute (see registry.ts)
  return result;
} catch (err) {
  return {
    success: false,
    data: `Exception: ${(err as Error).message}`,
    error: "EXECUTION_ERROR",
  };
}

This is the entirety of the core scheduling logic — the rest is the tools' own business.

10. Example of a Real Tool: How `edit_file` is written "by the rules"

Pick edit_file to explain, because it involves three cross-cutting concerns: sandbox, diff, and eol, best illustrating the point:

// src/tool/builtins/edit-file.ts (excerpt)
export const editFileTool: AgentTool<EditFileArgs> = {
  name: "edit_file",
  kind: ToolKind.Edit,
  description: "Performs exact string replacements on a file...",
  parameters: {
    type: "object",
    properties: {
      path:        { type: "string", description: "..." },
      old_text:    { type: "string", description: "..." },
      new_text:    { type: "string", description: "..." },
    },
    required: ["path", "old_text", "new_text"],
    additionalProperties: false,    // ← Prevents LLM from giving extra fields
  },

  async execute(args, ctx) {
    // ① Quick parameter check (defensive, formal validation is during scheduling)
    if (!args.path)   return { success: false, data: "Missing path",   error: "INVALID_ARGS" };
    if (typeof args.old_text !== "string") return { success: false, data: "...",  error: "INVALID_ARGS" };
    if (typeof args.new_text !== "string") return { success: false, data: "...",  error: "INVALID_ARGS" };

    // ② Path resolution
    const filePath = resolvePath(args.path, ctx.cwd);

    // ③ Sandbox: Write operations must be under whitelist roots
    if (ctx.writeRoots && ctx.writeRoots.length > 0) {
      const conf = await confine(ctx.writeRoots, filePath);
      if (!conf.ok) return { success: false, data: conf.error, error: "OUTSIDE_WRITE_ROOTS" };
    }

    // ④ Read + LF normalization (original file might be CRLF, LLM is used to LF)
    const content = await readFile(filePath, "utf-8");
    const contentN = toLf(content);
    const oldTextN = toLf(args.old_text);

    // ⑤ Uniqueness matching — error if appears 0 times or 2+ times
    const first = contentN.indexOf(oldTextN);
    if (first === -1) return { success: false, data: "Not found...", error: "TEXT_NOT_FOUND" };
    if (contentN.indexOf(oldTextN, first + 1) !== -1) {
      return { success: false, data: "Appears multiple times...", error: "TEXT_MULTIPLE_MATCHES" };
    }

    // ⑥ Replace and restore original EOL
    const newContentN =
      contentN.slice(0, first) +
      toLf(args.new_text) +
      contentN.slice(first + oldTextN.length);
    const writtenContent = normalizeEol(content, newContentN);
    await writeFile(filePath, writtenContent, "utf-8");

    // ⑦ Compute diff for UI
    const diff = computeFileDiff(content, writtenContent, filePath);

    return {
      success: true,
      data: `File edited: ${filePath}\nReplacement location: Line ${startLine}\nChanges: ${diff.additions} +/ ${diff.deletions} -`,
      summary: `📝 Modified: ${basename(filePath)} (+${diff.additions} -${diff.deletions})`,
      diff,
    };
  },
};

A few details worth naming:

additionalProperties: false — Prevents the LLM from giving extra fields, which are mostly hallucinations (things the LLM shouldn't give).
Dual error codes — TEXT_NOT_FOUND and TEXT_MULTIPLE_MATCHES are two completely different situations. The LLM's coping strategy differs (the former reads the file to confirm old_text, the latter adds context), so they must be subdivided.
CRLF/LF normalization — This is the biggest pain point for Windows users. toLf flattens EOL for matching, normalizeEol restores the EOL when writing the replacement back to the original file, the LLM won't be repeatedly tripped up by platform differences.
diff is part of the result, not a side effect — Computing diff only takes a few dozen milliseconds, but the UI experience improvement is huge (changes are clear at a glance).

11. Trade-offs and Pitfalls Encountered

Pitfall 1: What is the real cost of introducing Zod?

zod is about 50KB (before gzip). If your project resolutely does not use Zod, you can fully support it with pure JSONSchema — the 8 built-in tools are all JSONSchema, ran for half a year without problems.

However, every time a new tool is added, you have to write it three times (JSONSchema + TS interface + validation), and the synchronization rework is particularly tiring. Zod-first gets it right in one go, the human time saved alone earns back that 50KB.

The correct choice: Use Zod-first for new tools, leave old tools untouched, keep all interfaces. That's exactly what we did (_examples/read-file-zod.ts is a template left for posterity).

Pitfall 2: Tool thrown exceptions should be swallowed

If a tool itself throw new Error, the Registry will intercept it and wrap it into EXECUTION_ERROR. This seems "gentle", but it actually protects the tool author — any tool forgetting try/catch won't crash the entire agent.

Counter-example: Some agent frameworks let exceptions bubble up, tool authors must strictly try/catch. The result is everyone shifting blame "you threw over there".

Pitfall 3: Is parallel reading really safe?

read_file is marked Read, theoretically parallelizable. But if the LLM reads foo.ts and the log it generated in the same round — there is a logical dependency. The model itself will sort this dependency; we don't need the tool system to manage it.

Conversely, what if a write tool is marked Read by the model (trying to be lazy)? — Don't worry, the LLM won't make this mistake, because the write tool's kind: ToolKind.Edit is declared. If the model calls it, we will follow the Edit serial logic. This is schema validation, not kindness validation.

Pitfall 4: Don't stuff objects into `ToolResult.data`

// ❌ Wrong: LLM sees a bunch of JSON it doesn't recognize
return { success: true, data: JSON.stringify(someComplexObject) };

// ✓ Correct: Human language description + key facts
return {
  success: true,
  data: `Found 3 matches:\n1. src/a.ts:12 → const foo = bar\n2. src/b.ts:5 → ...`,
};

LLMs are particularly sensitive to "natural language + numbered lists" and often misread pure JSON descriptions. The data field is by default for the LLM to see, write it in an LLM-friendly way.

Pitfall 5: Tool description is the system prompt the LLM sees

I initially thought description was documentation for developers, later found — it is the basis for the LLM deciding "whether to call you". A poorly written description will make the LLM never call you (or call you randomly).

Good practices:

Clearly state applicable scenarios: "Suitable for viewing source code, configuration files, and other text files" is more useful than ~Read file~.
Clearly state forbidden scenarios: "Do not use cat/type to read files, please use read_file" — carve out the niche against similar tools.
Point out error correction paths: "If old_text appears multiple times, please provide more context" — directly fed to the LLM as a hint.

Pitfall 6: The `@` reference marker must be unified early

I've seen people let each tool do if (path.startsWith("@")) path = path.slice(1) on its own, resulting in two tools forgetting to strip, and the @xxx.ts reference syntax mentioned in the system prompt was broken for them.

// Solution: Inject stripMentionPrefix into resolvePath, no one can bypass it
export function resolvePath(inputPath: string, cwd: string): string {
  const stripped = stripMentionPrefix(inputPath);   // ← Mandatory step
  // ...
}

Cross-cutting concerns, centralized, done once. All places resolving file paths for writing should go through resolvePath, not their own path.resolve(cwd, args.path).

12. Conclusion: The "Constitution" of the Tool System

Looking back at the entire design, it is the embodiment of several universal principles:

Unified Contract — The single invocation method AgentTool.execute(args, ctx) supports all 8+ tools
Type Erasure — AnyAgentTool + eraseTool, decoupling "finding a tool" from "using a tool" at the type level
Centralized Policy — Sandbox confine is always the last gate, no one is allowed to write their own copy
Isomorphic Results — ToolResult.success/data/error same shape, success and failure the LLM sees are both string descriptions
Default Policy — ToolKind.Other catches all non-file tools

This tool system itself also has a few "constitutional" level rules:

Separation of Concerns — Sandbox belongs to sandbox, validation to validation, invocation to invocation, not allowed to be stuffed together
Isomorphic Protocol — Success/Failure, Zod/JSONSchema, Built-in/UDF must have unified interfaces
Degradable, Not Crashable — Schema wrong gives issues for LLM to retry, file out-of-bounds gives error code for UI to prompt, exceptions are wrapped into EXECUTION_ERROR by Registry.execute instead of crashing the process
Cross-cutting concerns must be centralized — Path, timeout, output truncation, EOL, diff only need to be written once, no tool is allowed to copy them again
Type safety is degradable — Each tool retains its own strong types, erased to AnyAgentTool when unified into the Registry, caller asserts back by name; the cost is args go through schema validation for secondary protection

Landing in code, the core modules are only ~1200 lines: types.ts 200 lines / registry.ts 130 lines / sandbox.ts 200 lines / schema-validator.ts 250 lines / zod-schema-validator.ts 400 lines. Plus 8 built-in tools totaling ~1500 lines.

This design can be directly copied and used — any "let LLM call tools" scenario can apply it. The day you see your agent's code volume hasn't doubled after the 30th tool goes live, readability hasn't degraded, error handling isn't chaotic — you'll know these less than 2000 lines of code were worth it.

How to Build an Agent Tool System That Won't Collapse After 5 Tools

1. Why an agent tool system deserves its own article

2. Design Goals: What kinds of problems the tool system needs to solve

3. Registration: The three things ToolRegistry does

The real trouble is "filtering" — three layers of enable checks

The registry also provides a convenience API for "bucketing by kind"

4. Types: AgentTool<I, O> and Type Erasure

The tool declares its own type

But the registry can only store "any tool" — here the type must be erased

What is the cost of type erasure?

5. Validation: The dual-track design of Zod and JSONSchema

Validation results must be structured and feedable back to the LLM

Dual track: Both JSONSchema and Zod work, outputs are isomorphic

Isomorphism is more important than you think

The integer ↔ number interoperability detail

6. Sandbox: Cross-cutting concerns centralized in sandbox.ts

6.1 Path resolution and @ references

6.2 Write tools must confine within a whitelist