跪拜 Guibai
← All articles
Architecture · Algorithm · Artificial Intelligence

Dewu's AI Harness Wraps the Full PDCA Loop Around Recommendation Agents

By 得物技术 ·
Read original on juejin.cn ↗ Google Translate ↗ Alt translation

Most AI coding efforts stop at generation and leave verification, rollback, and knowledge reuse as manual afterthoughts. Dewu's Harness shows how to close that loop in a high-stakes recommendation system, with measurable gains in accuracy and cost that any team running AI agents in production can benchmark against.

Summary

AI coding tools solve the Do phase, but complex recommendation systems fail across the whole PDCA cycle. Dewu's AI Harness embeds constraints, verification, and feedback into the environment itself, so agents operate freely but stay within verifiable, rollback-able engineering contexts. The system spans seven guardrail stages, from structured requirement contracts (T-PRD) through automated 24/7 AI evaluation to Bad Case capture that feeds directly into the next iteration.

A hybrid agent architecture called TuiChaCha splits work into a deterministic Highway for the 80% of problems that are high-frequency and reproducible, and an ATV exploration mode for the 20% that are long-tail. Successful ATV explorations get pruned, generalized, and promoted into new Highway capabilities, creating a compounding memory loop. A three-layer knowledge governance model — architecture docs, module design docs, and code comments — lifted simple-task accuracy from 52% to 91% while cutting token consumption by 48%.

Takeaways
AI coding alone addresses only the Do phase; recommendation system failures originate across Plan, Check, and Act as well.
A seven-stage guardrail system maps the full PDCA cycle into measurable collaboration surfaces for AI agents.
Structured requirement contracts (T-PRD) decompose product intent into executable units with explicit scope, metric direction, stability red lines, and acceptance assertions.
The Axis AI evaluation platform runs 24/7 automated reviews simulating user profiles to surface experience risks before online experiments.
Bad Case capture feeds into sandbox replay and Story deposition so each incident leaves a reusable diagnostic path.
Three-layer knowledge governance — L1 architecture boundaries, L2 module design, L3 code comments — raised simple-task accuracy from 52% to 91% and cut token use by 48%.
The TuiChaCha hybrid agent routes 80% of troubleshooting through deterministic Highway code and 20% through ATV exploration, then promotes successful explorations into new Highway paths.
Memory pruning generalizes one-time features like UIDs into business variables, enabling Dry Run admission before a path becomes a default capability.
Conclusions

Framing the Harness as an environment rather than a set of hard rules is a useful mental model: constraints that feel natural to the agent produce fewer workarounds than explicit guardrails bolted on afterward.

The 80/20 split between Highway and ATV mirrors how human SRE teams actually work — runbooks for known incidents, ad-hoc investigation for novel ones — and formalizing that split lets each mode optimize for its own cost profile.

L3 code comments delivering a 48% token reduction while improving accuracy challenges the assumption that more context always helps; proximity and specificity matter more than volume.

Turning Bad Cases into reusable Stories rather than one-off postmortems creates a compounding knowledge asset that directly feeds the Highway, making the system self-improving without retraining.

The observation that humans are being 'interfaced' — constrained by SOPs, input/output contracts, and health metrics — while AI is treated as creative and emergent, is a genuine inversion worth watching as agent orchestration matures.

Concepts & terms
PDCA
Plan-Do-Check-Act: a four-step iterative management method for continuous improvement. In this context, it maps to requirement planning, AI-driven development, automated evaluation, and knowledge deposition.
T-PRD
A structured, machine-readable requirement document that decomposes product intent into executable units (EPs) with explicit scope, metric direction, stability red lines, and acceptance assertions, replacing ambiguous natural-language PRDs.
Highway Agent
A deterministic code path that handles high-frequency, reproducible problems without LLM reasoning. The LLM is used only for final result polishing, keeping execution predictable and debuggable.
ATV Agent
An exploration-mode agent that uses tools, MCP, and constraints to autonomously decompose and investigate long-tail problems via ReAct-style reasoning. Successful trajectories are pruned and promoted into Highway capabilities.
Memory Pruning
The process of taking a successful ATV exploration trace, removing one-time identifiers like UIDs, and generalizing them into reusable business variables so the path can be validated and promoted to a default Highway capability.
L3 Code Comments
The most granular layer of Dewu's three-tier knowledge governance, consisting of inline code annotations that AI agents consume during code reading. Proximity to the code reduces ambiguity and token waste compared to higher-level docs alone.
Source: juejin.cn ↗ Google Translate ↗ Backup ↗