Backend

A Logistics Platform Cut Financial Losses 99.96% by Letting AI Agents Police Their Own Rules

By 货拉拉技术 · Jul 1, 2026

Read original on juejin.cn ↗ Google Translate ↗ Alt translation

Rule decay is the silent killer of financial controls in any high-velocity engineering org. This architecture shows that a distilled small model plus a multi-agent rule-auditing system can outperform both manual processes and monolithic large-model approaches, at a fraction of the cost and latency.

Summary

Manual review and static rule engines collapse under the pace of modern software delivery. Huolala's internal controls team found that over a third of their financial reconciliation rules were dead within six months, leaving the door open to six-figure losses. Their response replaces the entire human-reliant lifecycle with an AI-native architecture that catches risks at the requirements stage and keeps rules fresh after deployment. The core loop uses large models to auto-label a million code and text samples, then distills that knowledge into a small ModernBERT model that runs cheaply in production at 95% recall. A separate multi-agent system—four specialized agents working as surveyor, inspector, communicator, and scout—continuously maps code facts against rule logic to flag what's missing, what's stale, and what needs to change. Writing new reconciliation rules became 90% faster, and the platform now blocks risky code in the CI/CD pipeline before it ships. The next phase aims for fully autonomous internal controls with dedicated agents for deduction, adversarial simulation, and automatic remediation.

Takeaways

— Over one-third of reconciliation rules older than six months had factually failed, making stale rules the primary source of large-scale fund loss.

— An automated labeling pipeline used DeepSeek-R1 with chain-of-thought prompting to pseudo-label one million unlabeled code and requirement samples, bootstrapping a data flywheel from only 2,000 hand-labeled examples.

— Knowledge distillation from DeepSeek-Coder-Lite into ModernBERT pushed code-risk recall to 95% while keeping inference costs low enough for production.

— Context loss (50%) and missing business semantics (30%) were the dominant failure modes in code risk identification before feature engineering and long-sequence models were introduced.

— A four-agent anti-decay system—code analysis, rule analysis, feature relationship mapping, and adversarial checking—automatically identifies rules that need to be added, removed, or updated.

— Writing new reconciliation rules became 90% faster after the multi-agent system was deployed.

— High-risk code changes are now intercepted in the CI/CD pipeline through automated circuit-breaking, shifting the defense line left of production.

— The 3.0 roadmap adds a blue-team adversarial agent that simulates attacks to proactively discover rule blind spots.

Conclusions

Rule decay is a harder problem than initial risk detection because it requires continuous alignment between evolving code and static rule definitions—a task that single-model approaches handle poorly.

The 99.96% loss reduction figure is impressive, but the more replicable insight is the architecture pattern: large models for offline labeling, small distilled models for online inference, and specialized agents for ongoing maintenance.

BERT-class models, not frontier LLMs, proved to be the cost-performance sweet spot for production code-risk classification once distillation was applied.

Embedding internal control gates directly into CI/CD treats financial safety as a build-time property rather than an audit afterthought, which is still rare in most Western engineering organizations.

The multi-agent design mirrors military reconnaissance doctrine—specialization and adversarial checking produce more reliable outputs than a single generalist model trying to do everything.

Concepts & terms

Data Flywheel

A self-reinforcing loop where model outputs (pseudo-labels) are fed back as training data, improving the model, which then produces better labels, creating a cycle of continuous improvement without manual annotation.

Knowledge Distillation

A technique where a large, high-capacity 'teacher' model trains a smaller 'student' model by transferring its knowledge through soft probability labels, preserving accuracy while drastically reducing inference cost and latency.

Rule Decay (Anti-Decay)

The phenomenon where static business rules become invalid over time as the underlying code, database schemas, or API semantics change. Anti-decay systems automatically detect and repair these stale rules.

Multi-Agent Collaboration

An architecture where several specialized AI agents, each with a narrow, well-defined task, work together in sequence or in parallel to solve a problem that a single generalist model would handle poorly.

Shift-Left

The practice of moving testing, security, or compliance checks earlier in the software development lifecycle—ideally into the IDE or CI/CD pipeline—to catch issues before they reach production.

Chain-of-Thought (CoT) Prompting

A prompting technique that instructs a large language model to break down its reasoning into intermediate steps before arriving at a final answer, improving performance on complex classification and logic tasks.

Source: juejin.cn ↗ Google Translate ↗ Backup ↗