The Agent Harness: Why Prompt Engineering Alone Can't Build Reliable AI Agents

1. Prerequisites: From Prompt Engineering to Harness Engineering

Before diving into the specific design, let's clarify two foundational concepts to establish clear conceptual boundaries:

A single tool call ≠ an Agent. Native Function Calling can only complete a single tool-call decision. Complex tasks like cross-file refactoring, project-level debugging, and long-term research require multiple iterations, state persistence, environmental interaction, and exception handling. Prompts and single tool calls alone cannot support such tasks. This is the core background behind the emergence of Harness engineering.
Harness Positioning: The Runtime Control Container for Agents The Harness (literally "control framework") is the engineering runtime framework wrapped around the large model. It is the bridge between the "model brain" and the "execution environment." Prompts are just one component within the Harness; the execution loop, tool scheduling, state management, security sandbox, and error handling are its core components.

Three-layer standard architecture:

Layer	Core Function	Typical Products
Model Layer	Semantic understanding, reasoning decisions, output generation	Large language models like Claude 4.8, GPT-5.5
Harness Control Layer	Loop scheduling, prompt assembly, tool execution, state maintenance, security control	Core runtime frameworks of Claude Code, Codex
Tool/Environment Layer	Specific capability execution and environmental interaction	File system, terminal, IDE, search engine, external APIs

2. Core Concept: What is an Agent Harness, and how does it differ from the large model?

2.1 Definition

An Agent Harness is a runtime control framework that manages the complete execution lifecycle of an agent. It connects model reasoning, tool calls, result feedback, and state transitions through a standardized execution loop, transforming the open-ended semantic decision-making capabilities of large models into predictable, controllable, and implementable real-world operations.

In layman's terms: If the large model is the Agent's brain, the Harness is the torso and nervous system—responsible for receiving brain commands, mobilizing limbs for execution, transmitting sensory information back, controlling behavioral rhythm, and constraining safety boundaries.

2.2 Underlying Principles

The core logic of the Harness is a state machine + execution closed loop:

Decompose the Agent's running process into standard states: thinking, tool calling, waiting for results, task completion, abnormal interruption.
Drive state transitions according to a fixed execution loop (such as the Think-Act-Observe paradigm of ReAct).
The model is only responsible for outputting decision content; all implementation execution, exception handling, and context maintenance are completed by the Harness.

Essentially, it constrains the model's open-ended generation into an engineered, manageable process, solving the problems of high randomness, uncontrollability, and difficulty in implementation of large models.

2.3 Concrete Example

Taking Claude Code refactoring project code as an example, the complete execution chain:

User inputs a requirement → Harness dynamically assembles context: identity rules + tool descriptions + project structure + user requirement, feeds it to the model.
Model outputs a tool call to read the target file → Harness parses the instruction, verifies path permissions, calls the file reading tool.
After reading is complete, Harness formats the file content and injects it into the context, calls the model again.
Model outputs instructions to modify the file → Harness generates a diff preview, determines if user confirmation is needed.
After confirmation, executes the modification, returns the execution result to the model, the model determines if further iteration is needed.
After everything is complete, the model outputs a summary, Harness terminates the loop and outputs the result.

Throughout the entire process, the model only performs reasoning and decision-making; all file operations, permission checks, process control, and state persistence are completed by the Harness.

2.4 Core Value

Stability: Constrains model output format and execution flow, reducing failure rates caused by randomness.
Security: All external operations undergo unified permission verification and sandbox isolation, avoiding misoperation risks.
Maintainability: Rules, tools, and logic are managed at the Harness layer, eliminating the need for frequent model adjustments.
Scalability: Adding new tools and capabilities only requires integration at the Harness layer, without modifying the model itself.

3. The Five Core Modules of a Harness

3.1 Dynamic Prompt Assembly Module

Definition

Responsible for dynamically assembling a complete prompt system layer by layer based on the current scenario, rather than using a fixed single-segment system prompt. This is the core module in the Harness that directly influences model behavior.

Underlying Principles

Adopts a layered assembly + tiered caching architecture. Different layers of prompts have different lifecycles and caching strategies:

Static Global Layer: Identity positioning, core principles, safety boundaries. Unchanged throughout the process, can be globally cached and reused.
Semi-dynamic Scenario Layer: Available tool list, output format specifications, domain rules. Loaded per scenario.
Fully Dynamic Environment Layer: Current environment state, file content, execution results, user context. Updated every round.

Taking Claude Code as an example, its system prompt is essentially a segmented array, distinguishing static and dynamic parts through boundary markers. The static part supports global caching, significantly reducing token consumption.

Concrete Example

Standard layered prompt structure:

[Static Identity Layer]
You are a senior full-stack engineer, proficient in software engineering and code refactoring, strictly adhering to engineering best practices.

[Semi-dynamic Tool Layer]
You can use the following tools:
1. read_file: Read a file, parameter file_path
2. write_file: Write a file, parameters file_path, content
3. run_command: Execute a terminal command, parameters command, cwd
Output requirement: Tool calls must be wrapped in <tool_call> tags.

[Fully Dynamic Environment Layer]
Working directory: /project
Current file: src/app.js
Git status: 2 files uncommitted
Historical execution result: npm run build execution failed, error message: xxx

Design Points

Load tools on demand to avoid all tools occupying the context window.
Condense environmental information, retaining only content strongly relevant to the current task.
Place static content first to leverage the model's prompt caching mechanism to reduce costs.

3.2 Execution Loop Engine

Definition

The core scheduler of the Harness, responsible for driving the Agent's multi-round execution process, controlling state transitions, termination conditions, and exception handling.

Underlying Principles

A state machine loop based on the extended ReAct paradigm. Standard execution flow:

Assemble the complete context for the current round, call the large model.
Parse the model output, determine the state: end task / call tool / continue thinking.
If calling a tool, inject the result into the context after execution, return to step 1.
If ending the task, output the final result, terminate the loop.

Also includes a built-in triple protection mechanism:

Maximum execution steps: Prevents infinite loops consuming tokens.
Single-step / total timeout: Prevents tool freezes and task timeouts.
Exception retry: Automatically retries transient errors; returns recoverable errors to the model for self-correction.

Concrete Example

Pseudo-code for Codex CLI's execution loop:

max_steps = 20
for step in range(max_steps):
    # Assemble context, call model
    response = llm.responses.create(context)
    # Parse output items
    for item in response.output_items:
        if item.type == "reasoning":
            context.add_reasoning(item.content)
        elif item.type == "tool_call":
            # Route to tool executor, verify permissions
            result = tool_router.execute(item.tool, item.params)
            context.add_observation(result)
        elif item.type == "final_answer":
            return item.content

Application Scenarios

All long-duration, multi-step Agent tasks, such as code refactoring, problem investigation, automated research, etc.

3.3 Tool Scheduling and Parsing Module

Definition

Responsible for parsing the model's tool call instructions, completing parameter validation, permission judgment, tool execution, and result formatting.

Underlying Principles

The model only outputs semantic call instructions; the Harness is responsible for engineering implementation. The process is divided into four steps:

Format Parsing: Extract structured tool names and parameters from the model output (XML/JSON tags).
Parameter Validation: Verify parameter types, value ranges, path legality, intercepting illegal inputs.
Permission Tiering: Execute different approval strategies based on tool risk levels: no-risk auto-execute, low-risk silent execution, high-risk user confirmation.
Execution Encapsulation: Call the corresponding tool, uniformly format output results and error messages, inject into context.

Taking Codex as an example, its ToolRouter module has a built-in three-tier approval mode: auto mode (read/write local files auto-execute), read-only mode, and full confirmation mode, adapting to different security level scenarios.

Concrete Example

Model output:

<tool_call>
{"name": "run_command", "params": {"command": "npm run build"}}
</tool_call>

Harness processing flow:

Parse tool run_command, parameter npm run build.
Verify the command is on the whitelist, determine it's an operation within the working directory, auto-execute.
Execute the command in a sandbox environment, capture standard output and errors, record execution duration.
Format and inject into context:

<observation status="success">
Command execution completed, output:
> build success, 120 modules compiled
</observation>

Design Points

All tool executions must set a timeout to avoid freezes.
Error messages must be clear and locatable, facilitating model self-correction.
Dangerous operations must be logged, supporting auditing and rollback.

3.4 Context State Management Module

Definition

Responsible for maintaining all state information throughout the Agent's lifecycle, dynamically managing the context window to avoid overflow and ensure critical information is not lost.

Underlying Principles

Categorically manages information, adopting differentiated retention strategies:

Permanent Retention: System rules, tool definitions, core task objectives.
Priority Retention: Tool calls and results from the last 3-5 rounds.
Compressible: Earlier history, large segments of file content, subject to summarization and compression.
Real-time Update: Environmental state, file snapshots, synchronized to the latest value each round.

Advanced solution: When the context approaches its limit, call the model's compression interface to generate an encrypted hidden state summary, replacing the original text. This saves window space while retaining semantic information (e.g., Codex's /responses/compact endpoint).

Core Capabilities

Sliding window trimming: Automatically trims old, non-critical information.
Large content summarization: Automatically extracts key information from long files and outputs.
Checkpoint resumption: Supports task recovery after interruption, continuing execution based on checkpoints.

Application Scenarios

Tasks requiring long contexts, such as cross-file refactoring, complex problem troubleshooting, and long-term research.

3.5 Security Sandbox and Guardrail Module

Definition

Responsible for the security control of all external operations, isolating the execution environment, and mitigating risks at the mechanism level.

Underlying Principles

Adopts a three-layer protection of "pre-validation + environment isolation + post-audit":

Pre-validation: Parameter legality, operation permissions, dangerous command interception.
Environment Isolation: Code and commands run in a sandbox, restricting file access scope and network permissions.
Post-audit: Full logging of all tool operations, traceable and rollbackable.

OpenAI further divides this into two layers of quality control in Codex:

Computational layer control: Linter, type checking, structural testing—deterministic, fast validation.
Reasoning layer control: LLM code review, semantic validation—deep quality assurance.

Concrete Example

Claude Code's security mechanisms:

File operations are restricted to the specified working directory, prohibiting access to parent directories.
Forced secondary user confirmation before deleting files or executing high-risk terminal commands.
Automatically rejects operations outside permissions and does not mechanically retry rejected instructions.

4. Typical Product Harness Design Cases

4.1 Claude Code: Strongly Constrained, Layered Code Agent Harness

Design Positioning

A code agent for terminals and editors, focusing on deep engineering operations and a stable, controllable execution experience.

Core Harness Design

Layered Prompt Architecture
- Static layer: Identity definition, behavioral guidelines, security rules, globally cached and reused.
- Dynamic layer: Environmental information, MCP tools, session preferences, dynamically assembled each round.
- Uses XML tags to strongly constrain output format, resulting in extremely high parsing accuracy.
Loop Execution Engine
- Infinite step iteration based on a while-loop, with built-in step reminders and cost estimation.
- Supports user interruption mid-way, insertion of new instructions, and dynamic adjustment of task objectives.
- Uses Git commits as checkpoints, supporting rollback and progress recovery.
Tiered Tool System
- Covers full-chain development tools like file read/write, terminal execution, Git operations, search preview.
- Tiered approval based on risk level; low-risk operations auto-execute, high-risk ones require forced confirmation.

Differentiating Characteristics

Prompts are evolving towards simplification: The new version has streamlined 80% of the system prompt, relying on the model's native capabilities rather than redundant rules.
Strong environmental awareness, automatically synchronizing project structure and status, loading relevant context on demand.

4.2 OpenAI Codex: A Multi-end, Production-Grade Harness

Design Positioning

A shared core framework for code agents across multiple ends, supporting all product forms like CLI, Web, VS Code, and Desktop. One development effort takes effect across all ends.

Core Harness Design

Unified Shared Harness
- All ends reuse the same core logic: Agent loop, tool execution, permissions, authentication.
- Exposes capabilities externally via JSON-RPC protocol; each end only needs to implement the client UI.
- Feature iterations are released once and take effect synchronously across all ends.
Efficient Context and Cache Design
- Static content (rules, tools) is fixed at the very front of the prompt to maximize cache hit rate.
- Tool order is strictly fixed to avoid cache invalidation due to sequence changes.
- When context overflows, a compression interface is called to generate a hidden state summary, balancing privacy and efficiency.
Dual-Layer Quality Guardrails
- Computational layer: Linter, type checking, structural testing—deterministic, fast validation.
- Reasoning layer: LLM code review, semantic compliance check—deep quality assurance.
- Three-tier approval mode: Auto, Read-only, Full confirmation, adapting to different security scenarios.

Differentiating Characteristics

Architecturally prioritizes multi-end reusability and engineering efficiency, a production-grade solution for large-scale deployment.
Deeply integrated with the Responses API, the execution loop is deeply linked with model output events, providing a smooth streaming experience.

5. Common Misconceptions

Misconception 1: A Harness is just writing a high-quality System Prompt

Correction: Prompt assembly is just one of the five core modules of a Harness, accounting for less than 20% of the engineering complexity. The execution loop, tool scheduling, state management, and security sandbox are the core of the Harness, determining the Agent's stability, security, and usability. Many Agents perform poorly not because of prompts, but due to a lack of execution and state management.
Misconception 2: The longer the system prompt and the more detailed the rules, the better the effect

Correction: The trend for the new generation of powerful models is exactly the opposite. The new version of Claude Code has streamlined its system prompt by 80%. Redundant rules and examples actually limit model capability and introduce noise. The core of a prompt is clear boundaries and format constraints, not exhaustive regulations.
Misconception 3: Models have built-in tool calling, so a Harness is not needed

Correction: A model's native Function Calling only outputs structured call instructions. It has no execution loop, no parameter validation, no exception handling, no security control, no state management. Without the encapsulation of a Harness, tool calling can only complete single, simple operations and cannot support complex, multi-round Agent tasks.
Misconception 4: One Harness can fit all Agent scenarios

Correction: Harness designs vary greatly across different domains. A code Agent's Harness focuses on file management and sandbox execution; a customer service Agent's Harness focuses on knowledge base integration and ticketing system integration; a data Agent's Harness focuses on database queries and chart generation. The logic of core modules, tool systems, and security rules are completely different and cannot be universal.

6. Practical Design Suggestions

6.1 Prompt Design Principles

Layered Assembly: Separate the fixed rule layer, dynamic tool layer, and real-time environment layer; do not hardcode them into a single text segment.
Strong Format Constraints: Use XML or JSON tags to enforce structured output, reducing parsing error rates.
Static Content First: Place unchanging content at the very front of the prompt to maximize the use of model caching capabilities.
Concise and Restrained: Clear rules are sufficient; do not pile on redundant constraints. Strong models rely more on guidance than restrictions.

6.2 Execution Loop Design Principles

Set Hard Limits: Must configure maximum execution steps, single-step timeout, and total task timeout to avoid unlimited token consumption.
Tiered Error Handling: Auto-retry transient errors, return recoverable errors to the model for self-correction, interrupt and notify the user for severe errors.
Support Human Intervention: Allow users to interrupt at any time, modify requirements, and confirm high-risk operations.

6.3 Tool Design Principles

Moderate Granularity: A single tool completes a single independent action; not too fragmented, not too vague.
Error-Friendly: Error messages returned by tools must be clear and locatable, facilitating model self-correction.
Security Tiering: Classify into no-risk, low-risk, and high-risk based on risk level, corresponding to different confirmation mechanisms.
Load on Demand: Dynamically load relevant tools based on the current scenario; do not stuff all tools into the context at once.

6.4 Implementation Iteration Path

Minimum Closed Loop First: First implement basic prompts + single tool call + simple loop to verify the core process runs through.
Gradually Supplement Capabilities: Add context management, security sandbox, exception handling, and cache optimization in sequence.
Optimize Based on Real Failure Cases: Targetedly optimize prompt and tool design based on failure scenarios encountered in actual use.
Instrumentation and Monitoring: Record execution steps, token consumption, success rate, error types, and drive continuous iteration with data.