跪拜 Guibai
← Back to the summary

From Chat to Execution: Why Enterprise Agents Need a CLI, Not Just Another Chat Box

Agent Isn't Just for Chat: How We Used CLI to Organize Business Capability Entry Points

This article comes from the real engineering practice of Huajiao Technology. If you are also interested in AI engineering, enterprise Agents, MCP, Skills, or R&D toolchains, there is a "Huajiao Technology Exchange Group" entry at the end of the article. Welcome to join the discussion.

When building enterprise Agents, many teams initially focus on model capabilities:

Can it understand user intent? Can it break down tasks? Can it organize responses like a reliable assistant?

These are all important, but once you truly enter the business process, the next problem we encounter earlier is:

After the Agent understands, how can it stably and safely invoke business capabilities?

If it's just answering questions, a chat box is enough. But once you want the Agent to complete queries, interactions, analyses, and confirm operations, it must know:

Huajiao CLI was built to address this problem.

On the surface, it's a command-line tool, but more accurately, it's a layer of business capability entry points we prepared for the Agent:

Skill is responsible for describing capabilities
CLI is responsible for stable execution of actions
Gateway is responsible for converging and controlling invocations
Downstream business services maintain their original boundaries

This article is not about "we released a CLI," but a review of a more specific engineering problem:

When an enterprise Agent moves from chat to execution, why can't it rely solely on temporary tool integration? Why does it ultimately require a capability entry point layer of Skill + CLI + Gateway?

1. The Problem Isn't "Can It Be Connected," but "Can It Be Integrated Long-Term"

In the early stages of Agent capability integration, the most direct approach was usually MCP or temporary tool encapsulation.

First, connect a query capability, then connect an execution capability, and finally let the Agent return the result to the user. This approach is very suitable for validating ideas and quickly running demos.

The initial goal of the first version of Huajiao CLI was also simple: first, prove that an Agent can invoke Huajiao's public capabilities through a stable execution entry point.

For example, finding live streams:

The user describes what they want to watch in the AI assistant, the Agent queries public live stream content via CLI, and then organizes the results into natural language feedback.

Another example, sending interactive content:

The user states the target and text, the Agent organizes the parameters, and after user confirmation, invokes the call via CLI.

These scenarios themselves are not complex, but they cover several key issues:

The value of this step is not in the complexity of the commands, but in validating a path:

Business capabilities don't have to be handed directly to the Agent, nor does the Agent need to understand the complexity behind the business. A stable execution entry point can be placed in between.

The real problem appears in the next phase.

As capabilities increase, temporary integration leads to several typical issues:

Problem Manifestation
Scattered capability descriptions Each tool has its own description method, increasing Agent learning costs
Scattered execution boundaries Which actions need confirmation, which are read-only, easily implemented differently
Entry points hard to reuse Different AI assistants and business scenarios may require repeated encapsulation
Fragmented invocation chain When problems occur, it's hard to tell if it's an Agent parameter, tool execution, or business service anomaly
Heavier subsequent integration Each new capability requires re-handling descriptions, execution, feedback, and boundaries

So, the problem isn't "Can it be connected."

More critically: After capabilities multiply, can they be continuously integrated, uniformly constrained, have reusable entry points, and allow problem tracing?

2. Why Choose CLI as the Execution Entry Point

We later evaluated the Agent CLI form for a simple reason: CLI is friendly to developers and also friendly to Agents.

Tools like Claude Code, Cursor, and Codex can naturally understand commands, execute commands, and read output. For an Agent, a stable command is a clear action.

It doesn't need to understand the complex structure behind the business service; it only needs to know:

When to call
What parameters to pass
What result to get
How to handle failure

For the engineering team, CLI also has several practical advantages.

First, easy local validation.

Developers can run commands directly in the terminal to see if parameters, results, and error messages are stable, without having to go through the complete Agent chain for debugging each time.

Second, convenient for cross-AI assistant reuse.

As long as the commands and output are stable, the same CLI can be invoked by different entry points like Claude Code, Cursor, and Codex. Different Agents don't need to encapsulate a set of business capabilities separately.

Third, complex capabilities can be converged into a small number of clear actions.

Agents shouldn't directly face complex business details. They are better suited to face a set of actions with clear commands and interpretable results.

A sanitized pseudo-command can illustrate this idea:

# Example is a sanitized pseudo-command, not representing actual command names
hj live list --keyword "singing" --limit 5

hj chat send \
  --room "<room_id>" \
  --text "This song sounds great"

The most critical thing here is not the command name, but the command contract:

The value of CLI is not to make users learn another command-line tool, but to give the Agent a more stable execution surface.

3. Skill Is Not a Tutorial, but a Capability Protocol for the Agent

CLI alone is not enough.

Humans can read documentation, look at examples, and slowly try commands when they see them. Agents are not suitable for guessing; they need more direct capability descriptions:

This is the role of Skill.

In this design, Skill is not an ordinary usage tutorial, but a capability protocol for the Agent to read. It translates human operational experience into steps the Agent can understand.

For example, for "finding live streams," humans can open a page and filter themselves, but the Agent needs to know:

User expresses viewing intent
-> Extract keywords or preferences
-> Call live stream query command
-> Read structured results
-> Organize recommendations based on user intent
-> Follow up with further questions if necessary

Similarly, for "sending interactive content," the Agent needs to know:

User expresses sending intent
-> Confirm target live room and text
-> Determine if secondary confirmation is needed
-> Call CLI to execute action
-> Explain execution result to user

From this perspective, the core of Huajiao CLI is not that CLI stands alone, but that CLI and Skill work together.

Skill is responsible for describing capabilities, and CLI is responsible for stable execution. Together, they give the Agent the opportunity to move from "being able to answer" to "being able to get things done."

4. The Command Layer Must Converge into an "Action Contract"

The focus of the second version of Huajiao CLI was to advance from "can connect" to "can use."

"Can connect" means engineers can run it.

"Can use" means external developers or AI assistants can find the entry point, complete installation, know how to invoke it, and get basic feedback when it fails.

Therefore, the command layer should be as converged as possible.

For example, publicly exposed capabilities can be organized into these types:

Capability Type Command Layer Focus
Login and environment check Can it confirm the current execution environment is available
Version and help info Can the Agent determine the current CLI capability scope
Query capabilities Is the output structured, stable, and convenient for secondary processing
Interactive capabilities Are parameters clear, and is confirmation needed before execution
Data capabilities Is the result scope public, and does it need further desensitization

The command layer fears two things most.

First, unstable command semantics.

Today a parameter represents a filter condition, tomorrow it's reused for another meaning. It's hard for an Agent to call stably long-term.

Second, unstable output.

Humans can read semi-structured text, but Agents need stable fields. Otherwise, subsequent explanation, summarization, and re-invocation become unreliable.

So the command layer is not an internal implementation detail, but a set of actions for the Agent to recognize business capabilities.

Whether a command can enter the CLI depends not only on whether it can execute, but also on whether it meets several conditions:

This is also the biggest difference between CLI and ordinary scripts.

Scripts can serve only the person who wrote them.

Agent CLI must serve an executor that automatically organizes steps, continuously invokes capabilities, and may also misunderstand context.

5. Why Internal Capabilities Also Need a Gateway

After the public capabilities were running smoothly, we started looking at another problem: Can the same approach be used to support internal Agents?

Internal scenarios are much more complex than public capabilities.

More capabilities, finer boundaries, and more traceable consequences of invocations. If we still rely on "wherever needed, add a tool," it will be hard to maintain later.

At this point, Skill + CLI alone is not enough.

We need a layer of Gateway to converge controlled invocations.

The Gateway solves not "how to make commands execute," but "how to make commands execute in a controlled manner."

It mainly handles several types of general capabilities:

Capability Role
Identity verification Confirm the caller and execution environment
Rate limiting Prevent abnormal continuous invocations from affecting services
Route mapping CLI does not directly perceive complex backend structures
Logging Facilitate troubleshooting between Agent, CLI, and business services
Necessary confirmation Retain manual confirmation nodes for critical actions

A boundary to note here:

Agents are good at understanding intent and organizing steps, but they should not directly bear business boundary judgments. Capabilities that truly need long-term stability should enter the controlled execution chain.

With the Gateway, the overall relationship becomes clearer:

Agent reads Skill, knows when to invoke capabilities
CLI executes standard commands, outputs stable results
Gateway handles identity, rate limiting, routing, logging, and confirmation
Downstream business services maintain their original boundaries

This way, the Agent faces clear commands; the CLI faces a unified entry point; business services face controlled invocations.

6. From MCP to Skill + CLI + Gateway: Not a Replacement Relationship

We did not pit MCP against CLI.

MCP is suitable for exploration and rapid integration. Especially in the early validation phase, it can quickly hand a tool to the Agent, allowing the team to judge whether the scenario is viable.

But for long-term maintenance, stable capabilities are better precipitated into reusable entry points.

This is also why we moved from MCP exploration to Skill + CLI + Gateway.

It can be simply understood as two phases:

Exploration phase:
Agent -> MCP / Temporary tool -> Single business capability

Engineering phase:
Agent -> Skill -> CLI -> Gateway -> Downstream business service

The exploration phase focuses on "Can this work."

The engineering phase focuses on "Can this type of capability be continuously integrated, uniformly constrained, have reusable entry points, and allow problem tracing."

Both phases are needed.

If you build a complete platform from the start, it's easy to over-engineer.

If you always stay with temporary integration, it will later become a fragmented collection of tools.

A more realistic path is:

First, use lightweight methods to validate scenarios, then precipitate stable, clear, and high-frequency capabilities into the Skill + CLI + Gateway chain.

7. Not Every Capability Is Suitable for CLI

At this point, there's another temptation: cramming all capabilities into the CLI.

We didn't do this.

CLI is suitable for capabilities with clear boundaries, clear actions, and interpretable results. It is not suitable for fully porting complex backends into the command line, nor for replacing existing business systems.

To judge whether a capability is suitable for CLI, you can first look at a few questions:

Question Judgment
Is this action stable enough? Frequently changing capabilities are not suitable for early precipitation
Can parameters be clearly expressed? Capabilities that rely heavily on page context should be approached with caution
Is the result interpretable? Output must be understandable by both Agent and user
Does it require manual confirmation? Actions needing confirmation must be explicitly expressed in the chain
Is failure recoverable? Failure reasons should be explainable, preferably with next steps
Is it suitable for cross-entry point reuse? Capabilities serving only a single page don't necessarily need CLI-ization

This is also the boundary of Huajiao CLI:

It is not a universal entry point, but a standardized entry point for Agents to invoke business capabilities.

Capabilities that can be abstracted into stable actions are suitable for CLI.

Capabilities that heavily rely on manual judgment, have heavy page context, or have frequently changing rules should not be rushed in.

8. A Reusable Integration Checklist

If you are also working on enterprise Agents or internal business capability integration, you can use this checklist to review your approach.

8.1 Is the Capability Suitable for the Agent

8.2 Is the Skill Clearly Written

8.3 Is the CLI Stable Enough

8.4 Does the Gateway Converge General Boundaries

8.5 Do Downstream Businesses Maintain Boundaries

The core of this table is not "do everything perfectly," but to avoid Agent capability integration becoming a collection of temporary tools.

9. Summary

Looking back at Huajiao CLI, it is not a command-line tool that suddenly appeared.

It is more like an execution entry point layer in enterprise Agent engineering.

From this practice, the judgments we have precipitated are:

This approach is not only applicable to Huajiao CLI but also to many enterprise internal Agent projects.

Don't rush to connect everything to the Agent.

Huajiao Technology Exchange Group

Still researching AI engineering, AI programming, Agent implementation alone, without peers to exchange ideas or analyze real-world practices?

Here, front-line technical practitioners gather, focusing on code review and real-world implementation of enterprise internal AI assistants.

If you want to keep up with AI cutting-edge trends, exchange engineering implementation experience, and avoid common pitfalls, feel free to join the "Huajiao Technology Exchange Group."

Exclusive group benefits: Daily curated R&D-oriented AI industry newsletters, exclusive extended materials for articles, and technical details not expanded upon in the text, all shared together.

If the group QR code expires, follow the official WeChat account Huajiao Technology, and reply with "Exchange Group" to get the latest group QR code.