跪拜 Guibai
← Back to the summary

From Form-Filling to Agent-Driven: How Dewu Rebuilt Its Event Setup with a Two-Stage AI Workflow

From Forms to Agents: The AI Practice of Dewu Community Event Setup

I. Project Background

From planning to launch of a marketing campaign, operators have to switch between three systems 10+ times and fill in 40+ fields. We redesigned this chain with AI — from "AI helps you fill forms" to "Two-stage Agent + Aggregated Workbench." This article records not technical details, but the choices and reflections along the way.

Imagine you are a Dewu community operator. Next week you need to launch a "Summer Outdoor Goodies" campaign. You need to open System A to create a topic, switch to System B to fill in campaign configuration, then open the venue builder system to configure components, and finally submit for review. The three systems are independent, but their fields are highly coupled. If you change one word in the campaign name, you have to update it in System A and the venue as well, with a lot of duplicate entry.

II. First Version Exploration and Agent CLI Feasibility Assessment

Let AI Help You Fill the Form — But Humans Are Still the Protagonist

Our first reaction was: let AI help operators fill in the fields. The approach was a 5-step form wizard. AI parses the planning document in the first step and pre-fills fields in subsequent steps. AI's capability comes from two Dify Workflows (Dify: an open-source LLM application development platform). The first one parses the planning document into structured fields; the second one semantically matches the parsed results with the system's dropdown options. In effect, operators went from "all manual entry" to "AI pre-fill + human verification."

After launch, we found that operation time was reduced, but it was far from a "qualitative leap." The reason is simple: the paradigm didn't change. The operator was still the subject of the process; AI only pre-filled some fields. You still had to understand each field, go through all 5 steps in order, switch between three systems, and judge whether AI filled correctly. More specific problems:

The first version gave us an important insight: if AI's role is just "help you fill fields," it will never bring a qualitative change. The real change should be — AI drives the process, and humans only confirm at key nodes. This insight aligns with a pattern in AI products: the value leap of AI products almost always occurs at the inflection point where "AI changes from an auxiliary tool to the subject of the process." GitHub Copilot evolved from "line-level completion" to "Copilot Workspace"; ChatGPT evolved from "dialogue" to "GPTs+Actions"; both are different manifestations of the same inflection point.

Feasibility Assessment of Agent CLI Approach

Before deciding to rewrite, we evaluated a more radical direction: Agent CLI. Tools like OpenCode CLI, Cursor, and Claude Code demonstrate another possibility — using natural language to guide an AI Agent to complete the entire development process. In theory, venue building could be done similarly: the operator says "I want to create a summer outdoor themed campaign," and the Agent autonomously completes topic creation, campaign configuration, and component setup, without any structured UI cards.

We firmly believe this is the future, but current implementation has three hard obstacles: First, the Agent has no "feel" for the business constraints of the venue; second, the Agent cannot obtain real-time status; third, the Agent's operations lack auditability and explainability. This made us realize: we need to find a suitable position between "fully autonomous Agent" and "purely manual operation."

Anthropic's Agentic system complexity spectrum provides exactly this framework — it divides the autonomy of AI systems into multiple levels from low to high. Our workflow roughly corresponds to the combination of Prompt Chaining + Routing + Human-in-the-loop in the middle of the spectrum. This is not the most complex position on the spectrum, but it is the most suitable position for us.

III. Second Version Implementation and Component Module Protocol

From "Filling Forms" to "Reviewing Cards"

The second version was an architecture-level rewrite. The core concept can be summarized in one sentence: change the operator from a "process executor" to a "process supervisor." The operator's tasks are reduced to two: provide information — paste a link to a Feishu planning document; key confirmation — verify and fine-tune on the structured card popped up by AI. Everything else (fetching the document, parsing fields, creating topics, creating campaigns, copying venue templates, configuring components) is all driven by the workflow. Before choosing a technical solution, there is a more fundamental question to answer first: Do we need a Workflow or an Agent?

Our venue building process has clear steps (parse → select components → fill fields → confirm → build), the completion conditions for each step are deterministic, and there is a high requirement for accuracy. This is clearly more suitable for the Workflow model. But this doesn't mean we don't use Agent capabilities at all. In local scenarios — such as AI rewriting rule copy, or understanding natural language to modify component configurations — we used Agent-style LLM calls (give LLM tools, let it autonomously decide how to complete this local task).

A practical rule of thumb: If your business process can be drawn as a finite state machine diagram, use Workflow; if it's more like "given a goal, let AI figure out how to achieve it," use Agent. Most enterprise scenarios are a mix of both — the big framework uses Workflow to ensure controllability, and local nodes use Agent to provide flexibility.

This distinction is increasingly valued in the industry. LangChain founder Harrison Chase has emphasized on multiple occasions that most successful AI applications today are Workflows, not Agents. Klarna's AI customer service system is widely reported as an "Agent," but from an architectural perspective, it's closer to a carefully designed Workflow — with clear intent classification routing, standardized tool calling processes, and escalation mechanisms. Truly "fully autonomous Agents" running in production are still very rare.

Choosing LangGraph as the Orchestration Engine

After clarifying the need for a Workflow, we evaluated multiple options and finally chose LangGraph as the orchestration engine. LangGraph is a state orchestration engine released by the LangChain team, open-sourced under MIT. It uses directed graphs to define AI workflows: each node is a processing step (can be LLM call, tool call, or human confirmation), and edges define the transition conditions between steps. The built-in Checkpointer mechanism can persist the complete state of each session, supporting interruption and resumption.

In our system, the LLM is not a "decision-maker" but an "executor within a node" — performing information parsing and copy generation within clearly defined nodes, not participating in process routing.

Interrupt / Resume: The Interaction Language Between Frontend and Backend

The entire frontend-backend interaction is based on LangGraph's interrupt/resume mechanism. What is Interrupt / Resume? It can be understood as "pausing and resuming the program." When the backend process reaches a step that requires human input, it calls interrupt() to pause the entire graph's execution, while sending a structured data packet (payload) to the frontend. The frontend renders the corresponding UI card based on the data type. After the user completes the operation, the frontend pushes the result back via Command(resume=value), and the graph continues execution from the breakpoint.

Combined with LangGraph's interrupt mechanism and LangChain's Human-in-the-loop documentation, human intervention typically covers several types of actions: Approve / Reject — suitable for confirmation before high-risk operations; Edit — editing the Agent's actions, state, or intermediate results — suitable for manual correction of AI output; Respond / Input — providing additional information or supplementary feedback — suitable for data that AI cannot obtain on its own. Our 6 interrupt points correspond to a mix of these types:

Operator's Perspective: How to Set Up a Campaign

Take the "Summer Outdoor Goodies" campaign as an example:

The entire process is completed within the same dialogue interface, without needing to switch to other systems.

Capability Registry: One Shell for All Scenarios

The frontend does not hardcode the "venue scenario"; instead, it uses a Capability Registry to dynamically mount UI for different scenarios. What is a Capability Registry? It's similar to a plugin system — each business scenario is a "capability module" that registers its UI contributions (welcome state, interrupt cards, result display, etc.) with a unified interaction shell. The interaction shell doesn't care about specific business logic; it's only responsible for dispatching and rendering. When adding a new scenario, you only need to register a new module, without modifying the shell's code. This design approach shares similarities with Module Federation in micro-frontend architecture — both decouple the "shell" from "capabilities," allowing new capabilities to be developed and registered independently. The difference is that our granularity is finer: not module-level federation, but registration at the level of UI contribution points (welcome state, cards, result display).

The second version advanced the experience from "filling forms" to "reviewing cards": AI upgraded from "helping you fill fields" to "driving the entire process," supporting interrupt and resume — you can close the browser and reopen to continue. The Capability Registry design gives the system good extensibility. Currently, it has been connected to three scenarios: venue building, dynamic filtering, and general chat. In the future, new scenarios only need to implement the corresponding capability module. However, the second version still has a prerequisite: the operator already has a planning document. In many real business scenarios, operators don't have a document at the beginning, only an idea for a campaign. This directly drove the third version.

Component Module Protocol: Encapsulating Differences into Modules

Why a Protocol is Needed

Before entering the third version, we need to talk about the most important layer of design in the second version — the component form module protocol. It solves a core business extensibility problem. A venue consists of multiple components: header image, dynamic feed, campaign tasks, rule description, banner, sharing incentives, etc. The configuration logic for each component varies greatly.

The first version's approach was to use conditional statements in a single file for dispatching. Every time a component was added, the central dispatch file had to be modified, and the business logic of components was scattered everywhere. It was acceptable with 5 components, but not with 16+ components.

Lifecycle: A Unified Rhythm for Each Component

We abstracted each component into an independent module and defined a unified lifecycle:

The most important rule is that the initialization phase should not produce side effects (such as calling the backend to create resources). Actions that actually produce write operations must be placed in the "pre-build" phase, i.e., after the user explicitly clicks "Build Venue." This rule avoids a real bug: a component secretly called the backend save interface during initialization, causing the system to create campaign data just because the user opened the card to take a look.

Dual-Track Registration: Explicit Selection + Conditional Injection

When adding a new component, you only need to implement the module interface and register it in the corresponding track, without modifying any dispatch logic. This is the core advantage of protocol-level design — the Open-Closed Principle: open for extension, closed for modification. This pattern of "automatically injecting based on context, without user awareness" is actually quite common in software architecture: taking the decision of "whether a certain component is needed" away from the user and giving it to the system to judge automatically based on rules. The benefit is lower cognitive load for the user; the downside is that system behavior becomes harder to predict — so our conditional injection track displays an "auto-injected" prompt on the workbench, letting the operator know what decision the system made for them.

Declarative Position Orchestration

Component output is not just configuration data; it can also declare position intent — "I want to be placed before or after a certain component." Taking the Feeds component as an example, a campaign may have multiple feeds, and the "vertical one" needs to be placed at the bottom of the page. The Feeds module doesn't need to know "where the bottom is"; it only needs to declare "this vertical feed wants to be at the end of the page." Three layers of concern separation, each layer only cares about its own business.

IV. Third Version Design and Engineering Practice

From "I Have a Document" to "Help Me Write a Document"

The second version solved the problem of "how the venue building process is driven by AI," but it had an implicit prerequisite: the operator already has a planning document. In actual business, many operators don't even have a document — they only have a vague idea, but are asked to first write a structured Feishu document, then paste it into the Agent. Planning generation and venue building are two different paradigms:

The difference between these two paradigms determined that we needed a two-stage architecture to carry them separately: Stage 1 solves "from idea to planning document," and Stage 2 solves "from planning document to venue configuration."

Two-Stage Architecture

Stage 1: Campaign Plan Generation Skill

The core constraint of the Stage 1 Skill is read-only — it only queries candidate data (budget pool, categories, tags, historical venues) and does not perform any write operations. All write operations (creating topics, creating campaigns, copying venues, submitting for review) are handled by Stage 2. This "read-only" constraint is not a technical limitation, but a design choice of the permission model. In Agent systems, a recurring architectural decision is: where should the boundary of AI's capabilities be?

Stage 1 only grants read permission; Stage 2 gradually opens write permission; publish permission requires manual confirmation before triggering. This tiered approach is consistent with AWS's Principle of Least Privilege: each subject only possesses the minimum set of permissions necessary to complete the current task.

The guidance strategy is progressive, not requiring the operator to fill in a complete form all at once:

Minimum three items: campaign name + campaign time + at least one topic. Other information is gradually asked based on the conversation progress. Progressive Information Collection: A UX Pattern. This strategy of "not asking all questions at once, but gradually asking based on conversation progress" is called Progressive Disclosure in UX design. Nielsen Norman Group (NNG) has systematically elaborated on and promoted this classic UX pattern through case studies, but it was not a concept first proposed by NNG. The core idea is: don't give users too much information or too many choices at once. Provide exactly enough information when the user needs it, in a way the user can understand.

Stage 2: Aggregated Workbench

Stage 2 is an independent page, not a panel within the chat shell. It carries dense interactions more oriented towards production operations. The center panel preview is the core upgrade. It reuses the existing H5 editor page of the builder, with an interactive overlay on top. Operators can click any component in the preview, and the right side automatically displays the configuration form for that component.

The key to this design is "not reimplementing the previewer" — the builder already has complete venue rendering capabilities; we just added an interaction layer on the outside. Real-time draft synchronization ensures operations are not lost: every component addition, deletion, or property modification instantly updates the preview, while asynchronously persisting to the backend.

Component-level AI transformation has two levels: Explicit AI Assistance Panel — some components support switching between "AI assistance" and "original form"; operators can use natural language to describe the desired effect; General Natural Language Editing — input modification requirements in the workbench dialogue, and the system passes the currently selected component's context to the backend editing sub-process.

The ability to "modify components using natural language" in the workbench — for example, "change this banner's title to a summer cool style" — is a microcosm of the current field of AI-Assisted Design. This field is developing rapidly. Several noteworthy products and directions: Figma AI: Figma is embedding AI capabilities into design tools — auto-layout suggestions, copy generation, design search. Its approach is similar to ours: AI makes suggestions, designers make decisions. Vercel v0: Uses natural language descriptions to generate React UI component code. It takes a more radical route — AI directly generates complete code, rather than modifying properties of existing components. Adobe Firefly: Generative AI embedded in Photoshop and Illustrator, following the route of "AI generates locally within the canvas, humans control globally."

The common pattern of these products is: AI does local generation/modification, humans make global decisions. Products that completely replace designers with AI (such as some "AI auto-website-builder" tools) are still difficult to land in professional scenarios — facing the same problem as our venue building: too high requirements for accuracy and brand consistency.

Form Host Runtime: A Pragmatic Engineering Compromise

The component form on the right side of the workbench runs in an independently built sub-project. The reason is that the builder's component forms depend on a specific runtime environment and the Formily form engine, which are incompatible with our frontend technology stack. Instead of reimplementing dozens of component forms, we made an independent build, reused the builder's original runtime, and communicated with the main page via a message protocol. This is an engineering compromise to "avoid duplicate development" — although it increases architectural complexity, it focuses limited development effort on truly new value (preview interaction, AI assistance, draft synchronization) rather than reinventing existing wheels.

V. Architecture Overview

System Layers

VI. Trade-offs in Practice

"Full AI control" is unrealistic at this stage. The business constraints of venue components are complex: fields come from multiple external systems, rules have priorities, and some components trigger real resource creation. The final solution is: AI is responsible for information extraction, draft generation, field rewriting, and recommendations; code is responsible for process, structure, validation, and side-effect timing.

Structured UI is not "not AI enough"; it's a respect for the operator's cognitive load. Having an operator select a budget pool from a dropdown list is more efficient and accurate than having them describe it in natural language and then having AI guess. The best interaction method is a mix: natural language to describe intent, structured UI to confirm details, and visual preview to check results.

One of the most important rules of the component module protocol: the initialization phase should not produce side effects; write operations are only allowed after the user explicitly confirms. This is the Edit Action protection mechanism of LangGraph, determining whether the system will produce dirty data when the user has not confirmed.

Form Host is an engineering compromise but necessary. Complete rewrite is costly and prone to introducing behavioral differences. Reusing the old ecosystem allows the workbench to focus limited development effort on truly new value.

Fake progress bars are over-promising. Trust in AI products is built slowly; every "fake feedback" consumes trust. Microsoft's HAX design guidelines suggest making users understand why the system did what it did — each of our interrupt cards does this, not just giving the operator a form, but also explaining "why we need you to fill in these fields" and "what AI has already done and what is still missing."

VII. Summary and Outlook

The evolution path of this project:

What might be next? Perhaps the maturity of the Agent CLI approach, allowing more operations to be completed through natural language; perhaps the integration of more scenarios, allowing the same architecture to support more businesses; perhaps with improved LLM capabilities, "full AI control of components" becomes feasible. But no matter how it evolves, three principles will not change: Process Controllable — AI cannot be allowed to run freely to an unpredictable extent; Accuracy First — venue configuration errors directly impact live operations; Lower Operator Burden — no matter how cool the technology, it must ultimately make operators feel it's easy to use; technology forms will change, but business constraints will not.

Previous Articles

  1. From Tracking Requirements to Rule Assets: Hermes Agent Reconstructs Dewu Data Warehouse Workflow
  2. Giving Claude Code Self-Evolution and Memory System | Dewu Tech
  3. Reconstructing Alert Troubleshooting Process with LLM Agent | Dewu Tech
  4. HorizonVault Deep Dive: How to Build a 100GB/s+ High-Throughput Distributed Storage on HDD | Dewu Tech
  5. Claude Code Harness Engineering: Data Warehouse Side Implementation | Dewu Tech

Author: Hang Fei

Follow Dewu Tech, weekly technical articles every Thursday.

If you found this article helpful, feel free to comment, repost, and like!

Reproduction is prohibited without permission from Dewu Tech, and legal liability will be pursued accordingly.