How to Build an Agent Tool System That Won't Collapse After 5 Tools
The project introduced in this article is dskcode — a terminal CLI tool for an AI programming assistant based on DeepSeek, implemented in TypeScript, targeting domestic users only. All design details and code examples in this article come from the actual implementation of
dskcode.Install and try:
npm install -g dskcodeornpx dskcode
Follow-up note: This article is based on the
src/tool/module implementation ofagent-cli. The code is TypeScript, but the design ideas are language-agnostic — any LLM agent connecting tools cannot avoid the "tool system" layer.Assumed audience: You can already get an LLM to call tools like
bashandread_file, but things break down after connecting 5 tools — parameter validation always leaks, parallel execution gets tangled, error codes are all over the place, and tool writing occasionally crosses boundaries into~/.ssh. If you are looking for "how to make an agent's tool system robust," this article is the answer.
- The essence of an Agent tool system = declarative tool registry + strongly typed contracts + unified result protocol + sandboxed execution context. Get these four things right, and connecting 80 tools is as stable as connecting 8.
- Three core principles: The runtime doesn't know the type, but the tool itself does (type erasure + factory pattern inference) / The schema the LLM sees and the types the developer writes come from the same source (Zod ↔ JSONSchema automatic bidirectional) / Any tool failure must be an unambiguous ToolResult (success and failure share the same structure, with an error code).
- Key trade-off: Using Zod yields an extremely concise "write once, produce three things" (TS type + JSONSchema + validator), at the cost of introducing a ~50KB dependency — worth it.
1. Why an agent tool system deserves its own article
The trouble with an LLM agent tool system is not "can you get it to work," but "can it keep working without crashing."
After an agent runs for a while, the toolset looks like this:
read_file # Read file
write_file # Write file
edit_file # Precise replacement
bash # Run command
grep # Search code
glob # Find files
fetch # Fetch web pages
... # 30+ various User-defined
When connecting 5 tools, each tool parses, validates, and try/catches on its own; the code volume is 5×80=400 lines, manageable.
When connecting 30 tools, you suddenly find:
- For the same "file read but too large" error, some tools return
FILE_TOO_LARGE, some returnTOOL_ERROR, some even throw. - Some tools' args are
{ path: string }, others secretly use{ filePath }or{ filepath }. - The LLM gives wrong parameters (
pathwritten asPath), some tools silently fallback, others crash directly. - Bash runs timeout and doesn't kill the process thoroughly; grep runs on 100M logs and blows up memory.
- The security boundary for write tools relies on each tool remembering it; within a few days, one forgets.
graph LR
subgraph Real state without a system
A1[read_file tool<br/>self-manages exceptions]
A2[bash tool<br/>self-manages exceptions]
A3[grep tool<br/>self-manages exceptions]
A4[fetch tool<br/>self-manages exceptions]
A5[Your user_defined tool<br/>...also self-manages]
end
A1 -.Reworked three times.-> Z[Each tool has<br/>500 lines of boilerplate]
A2 -.Reworked three times.-> Z
A3 -.Reworked three times.-> Z
A4 -.Reworked three times.-> Z
This is the problem the "tool system" abstraction layer solves: Centralizing cross-tool boilerplate code and all the things every tool needs to care about into one place.
2. Design Goals: What kinds of problems the tool system needs to solve
Clearly defining the "scope of responsibility" for the tool system prevents the implementation from bloating into a big ball of mud. Our goals are only four categories:
| Category | Problem Solved | Corresponding in src/tool/ |
|---|---|---|
| Registration | Tools have names, descriptions, and invocation methods, and can be found by the framework | registry.ts |
| Types | A tool's own args/return values are strongly typed, but the registry can store "any tool" | types.ts's AgentTool + AnyAgentTool |
| Validation | Are the parameters given by the LLM correct? If not, tell it how to fix them | schema-validator.ts + zod-schema-validator.ts |
| Sandbox | Cross-cutting concerns like path safety, timeout, output truncation, binary detection | sandbox.ts + eol.ts + diff.ts |
Things outside this scope — like "tool dependency injection," "tool remote invocation," "tool trace persistence," "tool dashboard" — we resolutely do not do. Once these are done, the tool system degenerates into "building another microservice framework."
Below, we expand on these four categories one by one.
3. Registration: The three things ToolRegistry does
The registry is essentially a Map with filtering capabilities. Three actions:
// src/tool/registry.ts (simplified)
export class ToolRegistry {
readonly #tools = new Map<string, AnyAgentTool>();
// ① Register
register<I, O>(tool: AgentTool<I, O>): this {
return this.registerErased(eraseTool(tool)); // Automatically erases type
}
// ② Look up by name
get(name: string): AnyAgentTool | undefined {
if (!this.#isToolEnabled(name)) return undefined;
return this.#tools.get(name);
}
// ③ List available
list(): AnyAgentTool[] {
const result: AnyAgentTool[] = [];
for (const [name, tool] of this.#tools) {
if (this.#isToolEnabled(name)) result.push(tool);
}
return result;
}
}
The real trouble is "filtering" — three layers of enable checks
list() is not a simple Map.values(). In the "dynamic assembly" scenario of an agent, a tool's availability is composed:
#isToolEnabled(name: string): boolean {
const tool = this.#tools.get(name);
if (!tool) return false;
// 1. Disabled by user in config? — e.g., bash disabled in a project
if (this.#disabledNames.has(name)) return false;
// 2. Feature Flag off? — e.g., a tool is still experimental
if (!this.#featureFlagChecker(name)) return false;
// 3. Provider compatibility? — e.g., this tool only supports Anthropic
if (this.#provider && tool.supportedProviders.length > 0) {
if (!tool.supportedProviders.includes(this.#provider)) return false;
}
return true;
}
Hidden in this code is a non-obvious design decision: Why AND and not OR?
Consider this scenario: The user wrote disabledTools: ["bash"] in the config (project policy disables bash), but the tool itself is marked supportedProviders: ["anthropic"] (only supports Claude). Both conditions must be met to enable it. If either is not met, it's unavailable.
If OR were used, the "double unavailability" of user-disabled + provider-unsupported would be parity-checked away.
The registry also provides a convenience API for "bucketing by kind"
listByKind(kind: ToolKind): AnyAgentTool[] { ... }
listReadTools(): AnyAgentTool[] { return this.listByKind(ToolKind.Read); }
listWriteTools(): AnyAgentTool[] { return this.list().filter((t) => !isReadOnly(t.kind)); }
This is not decorative convenience — it is a key dependency for the agent's main loop: Read tools can be parallel / Write tools must be serial. See § 8 for details.
4. Types: AgentTool<I, O> and Type Erasure
This is one of the most "soulful" design decisions of the entire tool system.
The tool declares its own type
// src/tool/types.ts (excerpt)
export interface AgentTool<I, O extends ToolResult = ToolResult> {
readonly name: string;
readonly kind: ToolKind;
readonly parameters: JSONSchema;
readonly description: string;
// Key: The tool's own execute signature, strongly typed with I
execute(args: I, ctx: ToolContext): Promise<O>;
initialTitle?(args: I): string;
// ...
}
Each tool's execute(args, ctx) has a complete TS type in its own file — read_file sees { path: string, startLine?: number }, edit_file sees { path: string, old_text: string, new_text: string }. If written wrong, the IDE reports an error directly.
But the registry can only store "any tool" — here the type must be erased
// src/tool/types.ts (excerpt)
export interface AnyAgentTool {
readonly name: string;
readonly description: string;
readonly kind: ToolKind;
readonly parameters: JSONSchema;
readonly supportsInputStreaming: boolean;
readonly supportedProviders: string[];
// Key: args become unknown, execute internally asserts
execute(args: unknown, ctx: ToolContext): Promise<ToolResult>;
initialTitle?(args: unknown): string;
}
If the registry kept generics, it would have to be Map<string, AgentTool<unknown, ToolResult>> — then every place using the registry would need to know "oh, this item is a read_fileTool". The essence of this erasure is: decoupling "finding a tool" from "using a tool" at the type level.
Erasure itself is just one step:
export function eraseTool<I, O extends ToolResult = ToolResult>(
tool: AgentTool<I, O>,
): AnyAgentTool {
return {
get name() { return tool.name; },
get description() { return tool.description; },
// ...
async execute(args: unknown, ctx: ToolContext): Promise<ToolResult> {
// Internal assertion: the Registry caller knows the name anyway
return tool.execute(args as I, ctx);
},
initialTitle(args: unknown): string {
return tool.initialTitle?.(args as I) ?? tool.name;
},
};
}
After the caller gets AnyAgentTool, the only thing needed is: Assert back to the specific tool type based on the name:
const readTool = registry.get("read_file")!;
// readTool is AnyAgentTool, but we know it's read_file
const result = await readTool.execute({ path: "src/main.ts" }, ctx);
// ^ Strong typing only holds when we know it ourselves
This "upstream strong typing + registry erasure + pre-execution assertion" is a trade-off — safer than "all any", more flexible than "all with generics".
What is the cost of type erasure?
The only cost: When calling execute(args), if args are external input (parsed from LLM JSON), you cannot rely on the TS type system for protection. This is exactly what the next section solves — runtime schema validation.
5. Validation: The dual-track design of Zod and JSONSchema
The parameters given by the LLM are structured JSON, but it can make mistakes — giving read_file a { path: 42 } (number instead of string), missing an old_text field, timeout given as "abc", etc. How to make the LLM self-correct?
Validation results must be structured and feedable back to the LLM
// src/tool/schema-validator.ts (excerpt)
export interface ValidationIssue {
path: string; // JSON Pointer: `$.path`, `$.items[2].name`
expected: string; // "string" / "present" / "enum[A,B,C]" / "length >= 3"
received: string; // Value description truncated to 60 chars, to avoid log explosion
message: string; // Human-readable Chinese, directly fed to LLM
}
Note that message is in Chinese. This detail is a lifesaver — LLMs self-correct more easily with Chinese error messages than English, because there is more Chinese prompt data in the training corpus. path using JSON Pointer precisely tells it "which field is wrong."
Dual track: Both JSONSchema and Zod work, outputs are isomorphic
There are two ways to declare a tool's parameter schema —
// Method A: JSONSchema (used by the old 8 built-in tools)
export const readFileTool: AgentTool<ReadFileArgs> = {
name: "read_file",
kind: ToolKind.Read,
parameters: {
type: "object",
properties: {
path: { type: "string", description: "..." },
startLine: { type: "number", description: "..." },
},
required: ["path"],
additionalProperties: false,
},
// ...
};
// Method B: Zod-first (recommended for new tools)
import { z } from "zod";
import { defineTool } from "../zod-schema-validator.js";
const ReadFileSchema = z.object({
path: z.string().min(1, "path cannot be empty"),
startLine: z.number().int().min(1).optional(),
endLine: z.number().int().min(1).optional(),
});
export const readFileZodTool = defineTool<z.infer<typeof ReadFileSchema>>({
name: "read_file_zod",
kind: ToolKind.Read,
schema: ReadFileSchema, // Write once, produce three things
async execute(args, ctx) { ... }
});
The essence of defineTool: Write the Zod schema once, and simultaneously get TS type + JSONSchema + validator. Single source of truth, cutting maintenance work in half later.
graph LR
A[Zod schema<br/>z.object... x]
A -->|z.infer| B[TS Type I]
A -->|zodSchemaToJSONSchema| C[JSONSchema<br/>fed to LLM]
A -->|Runtime check _def+safeParse| D[Zod validation path<br/>to ValidationIssue]
A -.Written only once in source.-> E["Want to add a field?<br/>Change only one place"]
A2[JSONSchema<br/>handwritten]
A2 -->|Lightweight recursive validator| D2[Self-built validation path<br/>isomorphic output ValidationIssue]
Isomorphism is more important than you think
There are three downstream consumers of validation failure:
- LLM — fed into the next round of conversation for correction
- Reflector — decides whether to reflection / retry
- UI — shows the developer "which step went wrong"
None of these three consumers should care "whether this tool uses Zod or JSONSchema" — they only face ValidationIssue[].
Implementation uses a duck-typing detection branch:
// src/tool/schema-validator.ts (excerpt)
export function validateArgs(args: unknown, schema: unknown): ValidationResult {
// Duck-type detection of Zod schema — avoids static import dragging weight
if (isZodSchema(schema)) {
return zodSafeValidate(args, schema as never);
}
// Otherwise, go through JSONSchema lightweight validation
if (!isPlainObject(schema)) return { ok: true, issues: [] };
// ...
}
function isZodSchema(schema: unknown): schema is ZodType {
const obj = schema as Record<string, unknown>;
return (
"_def" in obj &&
typeof obj.parse === "function" &&
typeof obj.safeParse === "function"
);
}
Why duck typing instead of instanceof z.ZodType? Because statically importing zod would make schema-validator pull in zod at load time, and any project not writing Zod tools would also bear this dependency. Duck detection allows zod to be loaded "on demand" only in tools that actually use it.
The integer ↔ number interoperability detail
JS has no real int. 1 is number, and JSON.parse also gives number. If the schema writes integer, the LLM gives 1, and validate says "expected integer, got number" — this is a false positive.
So:
function typeMatches(actual: string, expected: string): boolean {
if (actual === expected) return true;
if (expected === "integer" && actual === "number") return true;
if (expected === "number" && actual === "integer") return true;
return false;
}
jsonTypeOf distinguishes integer/number, but validation treats them interchangeably — this matches the LLM's actual behavior.
6. Sandbox: Cross-cutting concerns centralized in sandbox.ts
Read tools, write tools, and bash tools all care about "path safety" and "timeout" — these must be centralized. I've seen people write if (path.startsWith("~")) expand... in every tool; this is a nightmare.
6.1 Path resolution and @ references
// src/tool/sandbox.ts (excerpt)
/** Strip a leading `@` reference marker */
export function stripMentionPrefix(inputPath: string): string {
if (inputPath.startsWith("@")) return inputPath.slice(1);
return inputPath;
}
export function resolvePath(inputPath: string, cwd: string): string {
const stripped = stripMentionPrefix(inputPath);
const resolved = isAbsolute(stripped) ? stripped : resolve(cwd, stripped);
return resolve(resolved); // Secondary normalize
}
Why secondary resolve? Because resolve does not eliminate .. (it only resolves ~ and relative paths), so resolve("/a/b/../c") is still /a/b/../c. This step "compacts" the semantics into an absolute path, so the subsequent realpath can compare reliably.
The @ detail saved me — in the system prompt, @<path> is syntactic sugar for "file path reference," but the LLM faithfully passes @test.ts as-is to the tool.
6.2 Write tools must confine within a whitelist
export async function confine(
allowedRoots: string[],
target: string,
): Promise<{ ok: true } | { ok: false; error: string }> {
if (allowedRoots.length === 0) return { ok: true }; // No restrictions, allow
const realTarget = await realPath(target);
for (const root of allowedRoots) {
const realRoot = await realPath(root);
const rel = relative(realRoot, realTarget);
// All three threats are judged: .. escape, empty (==root), absolute path
if (!rel.startsWith("..") && rel !== "" && !rel.startsWith("/") && !rel.startsWith("\\")) {
return { ok: true };
}
if (realTarget === realRoot) return { ok: true };
}
return { ok: false, error: `Path "${target}" is not within the allowed write scope ${allowedRoots.join(", ")}` };
}
Tool write code:
if (ctx.writeRoots && ctx.writeRoots.length > 0) {
const conf = await confine(ctx.writeRoots, filePath);
if (!conf.ok) {
return { success: false, data: conf.error, error: "OUTSIDE_WRITE_ROOTS" };
}
}
This means even if the LLM writes target = ~/.ssh/authorized_keys, it will be rejected by confine.
realPath is a lifesaving detail — if there are symbolic links in the path, like /var/www -> /home/user/www, resolving via realpath prevents bypassing.
6.3 Timeout abort: External signal + built-in timer
export function createTimeoutSignal(signal?: AbortSignal, timeoutMs = 30_000): AbortController {
const controller = new AbortController();
if (signal) signal.addEventListener("abort", () => controller.abort(), { once: true });
const timer = setTimeout(() => controller.abort(), timeoutMs);
controller.signal.addEventListener("abort", () => clearTimeout(timer), { once: true });
return controller;
}
Why link the external signal + internal timer? The user might click "Stop" in the UI (externalAbort), and bash itself might run over 30s (timeoutAbort), either trigger must abort the child process.
6.4 Two-stage kill for bash child processes
Bash's execCommand also does something brutal:
const timeout = setTimeout(() => {
child.kill("SIGTERM");
setTimeout(() => {
child.kill("SIGKILL"); // If not exited after 5 seconds, force kill
}, 5000);
}, timeoutMs);
SIGTERM gives a 5s graceful exit window; if it hasn't left after 5s, use SIGKILL. This avoids "fake timeouts" (process not really dead, resources not released) — especially for commands like npm install, cargo build that spawn child-of-child processes.
7. Result Protocol: ToolResult success and failure share the same structure
An LLM agent's tool results must obey one iron rule: Success and failure both return objects of the same shape. Otherwise, the LLM's training pattern when seeing errors mixes with normal responses, getting messier the more it's used.
Core fields of ToolResult
// src/tool/types.ts (excerpt)
export interface ToolResult {
success: boolean; // The only flag for success or failure
data: string; // Content for the LLM to see (output on success, error on failure)
error?: string; // Error classification tag, LLM doesn't need to care, but the program does
diff?: FileDiff; // Only carried by file modification tools
summary?: string; // One-line summary for UI (LLM doesn't see)
issues?: ValidationIssue[]; // Carried when schema validation fails
denial?: ToolDenial; // v0.6 new: permission denial details
}
Every tool error must have a machine-readable error code:
| Error Scenario | success |
error |
data |
|---|---|---|---|
| File not found | false |
READ_ERROR |
"Read file failed: ENOENT..." |
| File too large | false |
FILE_TOO_LARGE |
"File too large (15.3MB)..." |
| Binary file | false |
BINARY_FILE |
"Looks like a binary file..." |
| old_text not found | false |
TEXT_NOT_FOUND |
"Text to replace not found..." |
| old_text appears multiple times | false |
TEXT_MULTIPLE_MATCHES |
"Appears multiple times, please provide more context..." |
| Outside writeRoots | false |
OUTSIDE_WRITE_ROOTS |
"Path not within allowed scope..." |
| Command exited non-0 | false |
EXIT_CODE_1 |
"...\n[Exit code: 1]" |
| Success | true |
(none) | Tool output |
Error codes should not be "descriptive" like Message — the error field is for programmatic branching decisions, it must be a stable identifier. If the LLM wants to read error details, it reads from data.
data must always be directly readable by the LLM
Don't throw Error, don't return undefined, don't call JSON.stringify(obj) (LLMs choke on large object descriptions).
On failure, data is a readable error description that the LLM can see in the next conversation round and self-correct — for example:
return {
success: false,
data: "Text to replace not found. Please confirm old_text matches the file content exactly (including indentation and spaces).",
error: "TEXT_NOT_FOUND",
};
Seeing this, the LLM will likely replay the old text or use read_file first in the second round, rather than getting stuck.
summary for UI, data for LLM — two separate lines
return {
success: true,
data: `File edited: ${filePath}\nReplacement location: Line 120...`, // ← Feed LLM
summary: `📝 Modified: foo.ts (+3 -5)`, // ← Feed UI
diff, // ← Feed UI diff view
};
Why two lines? Rendering a diff for a 10MB file directly in the UI doesn't work; but the LLM needs to see the full content to verify the modification is correct. data is the "LLM-friendly version", summary is the "human-friendly version", diff is the "tool-friendly version".
8. Tool Parallelism and Serialization: Starting from ToolKind
The actual capability of a tool system is not just "can it be invoked", but also "can it be invoked smartly". An engineering agent running 3 read_file calls simultaneously is a common need; running 3 edit_file calls editing the same file simultaneously is a disaster.
ToolKind is a semantic classification
export enum ToolKind {
Read = "read", // Pure read, no side effects, parallelizable
Edit = "edit", // File/directory content editing
Delete = "delete", // Deletion
Move = "move", // Rename/move
Other = "other", // bash, fetch fallback
}
export function isReadOnly(kind: ToolKind): boolean {
return kind === ToolKind.Read;
}
The primary job of this field is for the agent main loop to use as a switch for deciding parallelism:
const reads = registry.listReadTools();
const writes = registry.listWriteTools();
// Main loop:
// 1. Fan-out all read-type calls in parallel (independent IO)
// 2. Writes must wait for the previous batch to complete before executing (single-threaded serial)
Why this division? Read operations have no external side effects — 5 read_file calls running on two different files in the same second, system state unchanged. Write operations have side effects (one write overwrites another, order-sensitive), must be serial.
Another hidden benefit: UI categorized display
If tool calls in the terminal are rendered grouped by ToolKind, the user can see at a glance "now reading / now writing":
✓ read_file src/main.ts (1ms)
✓ read_file src/utils.ts (2ms)
✓ grep "TODO" (8ms, 3 matches)
── Now starting writes ──
✓ edit_file src/main.ts (+3 -1)
✓ bash npm test (exit 0)
This "read / write" grouping is not showing off — it gives the user a sense of which phase the agent is currently in.
9. End-to-End Assembly
Putting all the above parts together looks like this:
graph TB
subgraph Registration Phase
RT["readFileTool: AgentTool<br/>strictly typed"]
ET["editFileTool: AgentTool<br/>strictly typed"]
BT["bashTool: AgentTool<br/>strictly typed"]
end
subgraph "Registry (registry.ts)"
REG["ToolRegistry<br/>tools: Map"]
end
subgraph "Scheduling Phase (Agent Main Loop)"
SCHED["Caller<br/>1. registry.listReadTools() / listWriteTools()<br/>2. validateArgs(args, tool.parameters)<br/>3. Permission gate check (optional unified entry point)<br/>4. tool.execute(args, ctx)"]
end
subgraph Validation
VJ[JSONSchema validation<br/>→ ValidationIssue[]]
VZ[Zod validation<br/>→ ValidationIssue[]]
end
subgraph Sandbox
SB["ToolContext {<br/>cwd, signal, timeout,<br/>writeRoots[] }<br/>+ sandbox.ts helpers"]
end
RT -->|eraseTool| REG
ET -->|eraseTool| REG
BT -->|eraseTool| REG
REG --> SCHED
SCHED -->|validate schema| VJ
SCHED -->|validate schema| VZ
SCHED -->|execute| SB
SB -->|ToolResult| SCHED
SCHED -->|deny → lastDenial| GT[Permission Gate<br/>(unified policy entry point)]
SCHED -->|success / fail| LLM[Next round LLM]
end
The code for a complete invocation looks like this:
// ① Find tool
const tool = registry.get(toolName);
if (!tool) return { success: false, data: `Tool "${toolName}" does not exist`, error: "TOOL_NOT_FOUND" };
// ② Validate parameters (using the tool's own schema, JSONSchema or Zod both work)
const validation = validateArgs(rawArgs, (tool as any).schema ?? tool.parameters);
if (!validation.ok) {
return {
success: false,
data: "Parameter validation failed:\n" + validation.issues.map((i) => i.message).join("\n"),
error: "INVALID_ARGS",
issues: validation.issues,
};
}
// ③ Optional: Permission gate check (if a unified policy entry point exists; skip this section if not passed)
const gate = dispatchOptions.gate; // Gate held by call context, this section does not depend on specific implementation
if (gate && !(await gate.check(toolName, rawArgs))) {
return {
success: false,
data: `Permission denied: ${gate.lastDenial?.reason ?? "Unknown reason"}`,
error: "GATE_DENIED",
denial: gate.lastDenial,
};
}
// ④ Prepare context
const ctx: ToolContext = {
cwd: process.cwd(),
signal: externalSignal,
timeout: 30_000,
writeRoots: config.writeRoots, // Optional
};
// ⑤ Execute
try {
const result = await tool.execute(rawArgs, ctx);
// ⑥ Any thrown exception will be wrapped into EXECUTION_ERROR by registry.execute (see registry.ts)
return result;
} catch (err) {
return {
success: false,
data: `Exception: ${(err as Error).message}`,
error: "EXECUTION_ERROR",
};
}
This is the entirety of the core scheduling logic — the rest is the tools' own business.
10. Example of a Real Tool: How edit_file is written "by the rules"
Pick edit_file to explain, because it involves three cross-cutting concerns: sandbox, diff, and eol, best illustrating the point:
// src/tool/builtins/edit-file.ts (excerpt)
export const editFileTool: AgentTool<EditFileArgs> = {
name: "edit_file",
kind: ToolKind.Edit,
description: "Performs exact string replacements on a file...",
parameters: {
type: "object",
properties: {
path: { type: "string", description: "..." },
old_text: { type: "string", description: "..." },
new_text: { type: "string", description: "..." },
},
required: ["path", "old_text", "new_text"],
additionalProperties: false, // ← Prevents LLM from giving extra fields
},
async execute(args, ctx) {
// ① Quick parameter check (defensive, formal validation is during scheduling)
if (!args.path) return { success: false, data: "Missing path", error: "INVALID_ARGS" };
if (typeof args.old_text !== "string") return { success: false, data: "...", error: "INVALID_ARGS" };
if (typeof args.new_text !== "string") return { success: false, data: "...", error: "INVALID_ARGS" };
// ② Path resolution
const filePath = resolvePath(args.path, ctx.cwd);
// ③ Sandbox: Write operations must be under whitelist roots
if (ctx.writeRoots && ctx.writeRoots.length > 0) {
const conf = await confine(ctx.writeRoots, filePath);
if (!conf.ok) return { success: false, data: conf.error, error: "OUTSIDE_WRITE_ROOTS" };
}
// ④ Read + LF normalization (original file might be CRLF, LLM is used to LF)
const content = await readFile(filePath, "utf-8");
const contentN = toLf(content);
const oldTextN = toLf(args.old_text);
// ⑤ Uniqueness matching — error if appears 0 times or 2+ times
const first = contentN.indexOf(oldTextN);
if (first === -1) return { success: false, data: "Not found...", error: "TEXT_NOT_FOUND" };
if (contentN.indexOf(oldTextN, first + 1) !== -1) {
return { success: false, data: "Appears multiple times...", error: "TEXT_MULTIPLE_MATCHES" };
}
// ⑥ Replace and restore original EOL
const newContentN =
contentN.slice(0, first) +
toLf(args.new_text) +
contentN.slice(first + oldTextN.length);
const writtenContent = normalizeEol(content, newContentN);
await writeFile(filePath, writtenContent, "utf-8");
// ⑦ Compute diff for UI
const diff = computeFileDiff(content, writtenContent, filePath);
return {
success: true,
data: `File edited: ${filePath}\nReplacement location: Line ${startLine}\nChanges: ${diff.additions} +/ ${diff.deletions} -`,
summary: `📝 Modified: ${basename(filePath)} (+${diff.additions} -${diff.deletions})`,
diff,
};
},
};
A few details worth naming:
additionalProperties: false— Prevents the LLM from giving extra fields, which are mostly hallucinations (things the LLM shouldn't give).- Dual error codes —
TEXT_NOT_FOUNDandTEXT_MULTIPLE_MATCHESare two completely different situations. The LLM's coping strategy differs (the former reads the file to confirm old_text, the latter adds context), so they must be subdivided. - CRLF/LF normalization — This is the biggest pain point for Windows users.
toLfflattens EOL for matching,normalizeEolrestores the EOL when writing the replacement back to the original file, the LLM won't be repeatedly tripped up by platform differences. - diff is part of the result, not a side effect — Computing diff only takes a few dozen milliseconds, but the UI experience improvement is huge (changes are clear at a glance).
11. Trade-offs and Pitfalls Encountered
Pitfall 1: What is the real cost of introducing Zod?
zod is about 50KB (before gzip). If your project resolutely does not use Zod, you can fully support it with pure JSONSchema — the 8 built-in tools are all JSONSchema, ran for half a year without problems.
However, every time a new tool is added, you have to write it three times (JSONSchema + TS interface + validation), and the synchronization rework is particularly tiring. Zod-first gets it right in one go, the human time saved alone earns back that 50KB.
The correct choice: Use Zod-first for new tools, leave old tools untouched, keep all interfaces. That's exactly what we did (_examples/read-file-zod.ts is a template left for posterity).
Pitfall 2: Tool thrown exceptions should be swallowed
If a tool itself throw new Error, the Registry will intercept it and wrap it into EXECUTION_ERROR. This seems "gentle", but it actually protects the tool author — any tool forgetting try/catch won't crash the entire agent.
Counter-example: Some agent frameworks let exceptions bubble up, tool authors must strictly try/catch. The result is everyone shifting blame "you threw over there".
Pitfall 3: Is parallel reading really safe?
read_file is marked Read, theoretically parallelizable. But if the LLM reads foo.ts and the log it generated in the same round — there is a logical dependency. The model itself will sort this dependency; we don't need the tool system to manage it.
Conversely, what if a write tool is marked Read by the model (trying to be lazy)? — Don't worry, the LLM won't make this mistake, because the write tool's kind: ToolKind.Edit is declared. If the model calls it, we will follow the Edit serial logic. This is schema validation, not kindness validation.
Pitfall 4: Don't stuff objects into ToolResult.data
// ❌ Wrong: LLM sees a bunch of JSON it doesn't recognize
return { success: true, data: JSON.stringify(someComplexObject) };
// ✓ Correct: Human language description + key facts
return {
success: true,
data: `Found 3 matches:\n1. src/a.ts:12 → const foo = bar\n2. src/b.ts:5 → ...`,
};
LLMs are particularly sensitive to "natural language + numbered lists" and often misread pure JSON descriptions. The data field is by default for the LLM to see, write it in an LLM-friendly way.
Pitfall 5: Tool description is the system prompt the LLM sees
I initially thought description was documentation for developers, later found — it is the basis for the LLM deciding "whether to call you". A poorly written description will make the LLM never call you (or call you randomly).
Good practices:
- Clearly state applicable scenarios:
"Suitable for viewing source code, configuration files, and other text files"is more useful than~Read file~. - Clearly state forbidden scenarios:
"Do not use cat/type to read files, please use read_file"— carve out the niche against similar tools. - Point out error correction paths:
"If old_text appears multiple times, please provide more context"— directly fed to the LLM as a hint.
Pitfall 6: The @ reference marker must be unified early
I've seen people let each tool do if (path.startsWith("@")) path = path.slice(1) on its own, resulting in two tools forgetting to strip, and the @xxx.ts reference syntax mentioned in the system prompt was broken for them.
// Solution: Inject stripMentionPrefix into resolvePath, no one can bypass it
export function resolvePath(inputPath: string, cwd: string): string {
const stripped = stripMentionPrefix(inputPath); // ← Mandatory step
// ...
}
Cross-cutting concerns, centralized, done once. All places resolving file paths for writing should go through resolvePath, not their own path.resolve(cwd, args.path).
12. Conclusion: The "Constitution" of the Tool System
Looking back at the entire design, it is the embodiment of several universal principles:
- Unified Contract — The single invocation method
AgentTool.execute(args, ctx)supports all 8+ tools - Type Erasure —
AnyAgentTool+eraseTool, decoupling "finding a tool" from "using a tool" at the type level - Centralized Policy — Sandbox
confineis always the last gate, no one is allowed to write their own copy - Isomorphic Results —
ToolResult.success/data/errorsame shape, success and failure the LLM sees are both string descriptions - Default Policy —
ToolKind.Othercatches all non-file tools
This tool system itself also has a few "constitutional" level rules:
- Separation of Concerns — Sandbox belongs to sandbox, validation to validation, invocation to invocation, not allowed to be stuffed together
- Isomorphic Protocol — Success/Failure, Zod/JSONSchema, Built-in/UDF must have unified interfaces
- Degradable, Not Crashable — Schema wrong gives issues for LLM to retry, file out-of-bounds gives error code for UI to prompt, exceptions are wrapped into
EXECUTION_ERRORbyRegistry.executeinstead of crashing the process - Cross-cutting concerns must be centralized — Path, timeout, output truncation, EOL, diff only need to be written once, no tool is allowed to copy them again
- Type safety is degradable — Each tool retains its own strong types, erased to
AnyAgentToolwhen unified into the Registry, caller asserts back by name; the cost is args go through schema validation for secondary protection
Landing in code, the core modules are only ~1200 lines: types.ts 200 lines / registry.ts 130 lines / sandbox.ts 200 lines / schema-validator.ts 250 lines / zod-schema-validator.ts 400 lines. Plus 8 built-in tools totaling ~1500 lines.
This design can be directly copied and used — any "let LLM call tools" scenario can apply it. The day you see your agent's code volume hasn't doubled after the 30th tool goes live, readability hasn't degraded, error handling isn't chaotic — you'll know these less than 2000 lines of code were worth it.
Further Reading
- Type erasure reference: TypeScript: Type Erasure in Generic Containers
- Schema dual-track reference: Zod vs JSON Schema — Zod itself has
z.toJSONSchema()conversion - Similar designs:
- LangChain's Tool / ToolExecutor — A similar
name + description + func(args)trio - Vercel AI SDK's Tool Definition — Uses a Zod-first approach, decoupling tool definition from provider
- Anthropic Computer Use's tool_use block — Server-side validation as a fallback
- LangChain's Tool / ToolExecutor — A similar
Feel free to leave a comment if you have questions. If you find it useful, please give it a bookmark ⭐ ~