跪拜 Guibai
← Back to the summary

The 170,000-Line Go CLI That Treats AI Agents as First-Class Users

What I Learned After Reading Through 170,000 Lines of Go Code in lark-cli

image.png

TL;DR: lark-cli is an open-source CLI tool by Feishu with 170,000 lines of Go, 200+ commands, covering 18 business domains. It has a special design premise—both humans and AI Agents are its users. Once you commit to this premise, every layer needs to be rethought: how commands are layered, how errors are designed, how identity is resolved, how Skill files are written, and how security is safeguarded. This article dissects its complete architecture: from the command system, execution pipeline, and factory pattern, to the error system, identity system, Skill files, security guardrails, and output system. My biggest takeaway after reading through it all: "Agent-Native" is not about adding a --json parameter; it's about treating the Agent as a second-class user, designing for it from the very first line of code, at every layer.


Recently, I read through the source code of Feishu's open-source lark-cli. The reason was simple—the first paragraph of its AGENTS.md states:

This CLI's primary consumers include AI agents (Claude Code, Cursor, Gemini CLI). Your code is read by machines — error messages, output format, and flag design all directly affect agent success rates.

Honestly, my expectations were low before reading it. Most tools claiming to be "AI-Native" just add a --json parameter and call it a day. After reading through it, this article ended up nearly 5,000 words—not because I wanted to write a long piece, but because this project truly deserves this level of detailed breakdown.


1. Command System: Three Layers, Three Trust Boundaries

lark-cli's commands are divided into three layers, not "advanced/intermediate/low-level," but three trust boundaries.

Layer Prefix For Whom What the Framework Does for You
Shortcuts Starts with + Humans + Agents (stable interface) Identity resolution, Scope validation, parameter validation, error classification, pagination merging, content safety scanning
API Commands No prefix Developers familiar with the API Auto-generated from OAPI metadata, 1:1 mapping
Raw API api subcommand Escape hatch Does nothing; you are responsible for everything

The + prefix is a deliberate design choice. It visually separates Shortcuts from ordinary subcommands—an Agent can distinguish at a glance between "this is a stable command encapsulated by the framework" and "this is a command that directly maps to the API."

# Shortcut: The framework handles identity, Scope, error classification, and pagination for you
lark-cli calendar +agenda

# API Command: You handle parameters and responses yourself
lark-cli calendar events instance_view --params '{"calendar_id":"primary",...}'

# Raw API: You handle everything yourself
lark-cli api GET /open-apis/calendar/v4/calendars

Agents should use Shortcuts as much as possible and only fall back to Raw API when Shortcuts don't cover the needed functionality. This is not a "capability progression" but a trust progression—the lower you go, the less the framework does for you, and the more responsibility you bear.

image.png


2. Execution Pipeline: What a Shortcut Goes Through from Call to Return

A Shortcut is not as simple as "parse parameters → call API → return result." Its execution pipeline has 6 stages:

lark-cli calendar +agenda --start 2025-03-21
         │
         ▼
  ① Identity Resolution
      ├── Read --as parameter (user / bot / auto)
      ├── Read config file defaultAs
      ├── Read strict mode (force user-only or bot-only)
      └── Validate: Does this Shortcut support the current identity?
         │
         ▼
  ② Config Loading
      ├── Load config from Credential chain (multi-profile support)
      └── Validate: Are app_id and secret configured?
         │
         ▼
  ③ Scope Pre-check
      ├── Read scopes declared by the Shortcut
      ├── Read scopes of the current token
      └── Missing scope → return typed error, telling the Agent which command to run
         │
         ▼
  ④ RuntimeContext Creation
      ├── Inject APIClient (lazy loading, sync.OnceValues)
      ├── Inject Lark SDK client
      ├── Parse --format and --jq
      └── Set bot-only flag
         │
         ▼
  ⑤ Validate
      ├── Enum value validation (--priority can only be high/medium/low)
      ├── @file and stdin input parsing
      ├── --jq expression legality check
      └── Business logic validation (e.g., "bot cannot query user calendar")
         │
         ▼
  ⑥ Execute
      ├── --dry-run? → Print request preview, do not execute
      ├── --print-schema? → Print JSON Schema, do not execute
      ├── High-risk operation? → Check --yes
      └── Call API → Classify error → Format output → Return

Each step is an independent phase with clear inputs and outputs. The design philosophy of this pipeline is: Let the framework do everything Agents are bad at, and let Execute only handle business logic.

image.png


3. Factory Pattern: How to Share Dependencies Across 200+ Commands While Keeping Them Independently Testable

lark-cli has 200+ commands, and each command needs access to configuration, HTTP client, Lark SDK, credential chain, file system, and Keychain. Using global variables would be a testing disaster.

Its solution is the Factory pattern—a single struct holding all shared dependencies, with all function fields being lazily loaded:

type Factory struct {
    Config     func() (*core.CliConfig, error) // Lazy-loaded config
    HttpClient func() (*http.Client, error)    // Lazy-loaded HTTP client
    LarkClient func() (*lark.Client, error)    // Lazy-loaded SDK client
    IOStreams  *IOStreams                      // stdin/stdout/stderr
    Keychain   keychain.KeychainAccess         // System keychain
    Credential *credential.CredentialProvider  // Credential chain
    // ...
}

Production uses NewDefault() to create it, and tests directly replace fields:

// Mock all external dependencies in tests
f := &cmdutil.Factory{
    Config: func() (*core.CliConfig, error) {
        return &core.CliConfig{AppID: "test"}, nil
    },
    IOStreams: cmdutil.NewTestIO(),
    // ...
}

No DI framework needed, no wire or dig. Go's struct + function field is sufficient. The simplest solution is often the most testable solution.


4. Error System: This Is the Most Worth-Stealing Part of the Entire Project

Most CLI tools handle errors by printing a human-readable error message and exiting non-zero. The Agent gets this message and can only do string matching.

lark-cli's errors all go through a JSON envelope on stderr:

{
  "ok": false,
  "identity": "user",
  "error": {
    "type": "authorization",
    "subtype": "missing_scope",
    "code": 99991679,
    "message": "missing scope `calendar:event:create` for app cli_xxx",
    "hint": "run lark-cli auth login --scope calendar:event:create",
    "log_id": "20260520-0a1b2c3d",
    "missing_scopes": ["calendar:event:create"],
    "console_url": "https://open.feishu.cn/app/cli_xxx/auth?q=..."
  }
}

The Agent doesn't need to read message. It just reads the type and subtype fields to know what to do.

9 Categories, exhaustive and closed:

Category When to Use Exit What the Agent Should Do
validation Parameter is wrong 2 Read params, fix parameters, retry
authentication Not logged in / no token 3 Run auth login
authorization Token lacks scope 3 Run auth login --scope
config Local config is missing 3 Run config init
network DNS / timeout / connection refused 4 Wait a moment and retry
api Feishu API returned an error 1 Read code and log_id, check documentation
policy Content safety interception 6 Read challenge_url, let user handle it
internal Bug in the tool itself 5 Stop, do not retry, report bug
confirmation High-risk operation not confirmed 10 Add --yes, run again

Each Category in Go is an independent struct with a builder API. Category is locked to the function name, Subtype must be a declared constant, and Message is for humans (the Agent does not depend on it):

return errs.NewPermissionError(errs.SubtypeMissingScope,
    "missing required scope(s): %s", strings.Join(missing, ", ")).
    WithMissingScopes(missing...).
    WithHint("run: lark-cli auth login --scope %s", strings.Join(missing, " "))

This contract is locked down by lint, not by documentation. The project runs two golangci-lint rules plus a custom AST check module:

Lint Rule What It Blocks
forbidigo fmt.Errorf / errors.New returned at command boundaries fails compilation directly
CheckDeclaredSubtype Subtype must be a declared constant; hand-written strings fail CI
CheckProblemEmbed Every typed error struct must embed errs.Problem
Error Reclassification Ban *PermissionError cannot be wrapped into *InternalError and thrown upwards

Why is this important? Because once an Agent starts relying on type: "authorization" → re-login, you cannot accidentally change it to type: "api" one day. The Agent will walk into the wrong branch, repeatedly log in, and then report to the user that it "can't do it."

This is not a code style issue. It is an interface contract. Change the API return format for humans, and they will curse you. Change the error type for an Agent, and it will repeatedly do the wrong thing in front of the user.

image.png


5. Identity System: user, bot, auto, and strict mode

lark-cli supports three identities, each corresponding to different token types and API permissions:

Identity Meaning Token Type Use Case
user Call as a user User Access Token Check own calendar, send messages
bot Call as an application Tenant Access Token Group bots, batch operations
auto Auto-detect Automatically selected based on config and credential Default behavior

Identity resolution order: --as parameter > config file defaultAs > auto-detection.

There is also strict mode—an administrator can force --strict-mode bot, so all commands can only run as a bot. An Agent using --as user in strict mode will get an error directly, without silently degrading.

Each Shortcut declares its supported AuthTypes:

var CalendarAgenda = common.Shortcut{
    AuthTypes: []string{"user", "bot"},  // Supports both identities
    // ...
}

If a Shortcut declares AuthTypes: ["bot"], the framework will reject --as user calls during the Validate phase. The Agent doesn't need trial and error—it sees the AuthTypes metadata and knows which identity to use.


6. Skill Files: The Agent's Operation Manual, Embedded in the Binary

--help can only list parameters. An Agent needs to know "When the user says 'help me schedule a meeting,' which command should I call? What should I do first?".

lark-cli's approach is to write a SKILL.md for each business domain, embedded into the binary via //go:embed:

## Intent Routing
| User Intent | Route To |
|-------------|----------|
| "Help me schedule a meeting" | +create (read schedule-meeting.md first) |
| "Check today's calendar" | +agenda (Note: what the user calls "calendar" is "events") |
| "Yesterday's meeting minutes" | Not calendar, it's lark-vc |

## Prerequisites
| Scenario | Prerequisite |
|----------|--------------|
| Edit existing event | Locate event_id first (recurring events need to locate the instance) |
| Verify after delete/modify | Wait 2 seconds before querying (API eventual consistency) |

26 business domains, each with such a Skill file. The benefit of embedding in the binary is version consistency—upgrade the CLI, and the Skill content upgrades along with it, preventing the Agent from calling new commands based on an old Skill file.

Skill files are not documentation for humans; they are operation manuals for Agents. They contain intent routing tables, prerequisite checks, and terminology mapping ("what the user calls 'calendar' is 'events,' not 'calendar container'")—this is the information Agents truly need when making decisions.


7. Security Guardrails: The Agent Is an Untrusted Caller

When a human opens a terminal, you assume they can operate the current directory by default. When an Agent runs commands in the background, you assume it cannot.

lark-cli's security design revolves around one premise: the flag values filled by the Agent are untrusted input.

Mechanism What It Prevents
vfs abstraction layer All file I/O does not go through os.Open, but through internal/vfs. Path validation rejects absolute paths, ../ traversal, symlink escapes, and control characters
Output scanning output.ScanForSafety scans content before outputting to stdout. Agent A's output may enter Agent B's pipeline—prevent malicious content from passing through
dry-run All Shortcuts automatically support --dry-run. When unsure, the Agent previews the request without executing it
OS Keychain Tokens do not go into config files or environment variables; they go into the system keychain. The Agent cannot read them, and prompt injection cannot get them
High-risk confirmation Shortcuts with risk: "high-risk-write" require --yes, otherwise return type: "confirmation" (exit 10)

These mechanisms are not for security audits. Agents make mistakes, can be prompt-injected, and can call the same command repeatedly in a loop. Guardrails are not to prevent bad actors, but to prevent bad outcomes.

image.png


8. Output System: Five Formats, One Envelope, One Notification System

A command's output has three destinations: a human viewing it in the terminal, an Agent parsing it, and the next command in the pipeline.

lark-cli's --format supports five formats: json / pretty / table / ndjson / csv. With the --jq expression, the Agent can perform JSON filtering inside the command without needing to pipe to jq.

All output goes through the same JSON envelope:

{
  "ok": true,
  "identity": "user",
  "data": { ... },
  "meta": { "count": 42 },
  "_notice": {
    "update": {
      "current": "1.2.0",
      "latest": "1.3.0",
      "command": "lark-cli update"
    }
  }
}

The _notice field is a push system—the Agent can check it to determine if a tool upgrade is needed. If the Agent doesn't check, it still works fine—_notice does not affect the semantics of ok: true. This design turns "push notifications" into "a field in the output envelope," without interfering with the normal flow.


9. Common Pitfalls: Three Mistakes Most CLI Tools Make Regarding "Agent as a User"

After reading through lark-cli and looking back at other CLI tools, three particularly common pitfalls emerge:

Anti-pattern Symptom lark-cli's Solution
Errors as strings fmt.Errorf("permission denied"), Agent guesses using regex matching Structured JSON, stable routing via type + subtype
Flat commands All commands are sibling subcommands; Agent can't distinguish "stable interface" from "raw API" Three-layer system, isolated by + prefix
Agent as trusted caller No path validation, no output scanning, no dry-run vfs + ScanForSafety + dry-run + keychain

The first pitfall is the most common—almost all CLI tools fall into it. The second pitfall is "having functionality but no design." The third pitfall is the most insidious—completely invisible until something goes wrong; once an Agent is prompt-injected, the entire system is running naked.


10. The Design Philosophy Throughout

After reading through 170,000 lines of code, a few principles emerge that run through all modules:

1. The Agent is a second-class user, not a "simplified version of a human user." The error system needs type and subtype because Agents rely on them for routing. Skill files need intent routing tables because Agents don't know that "what the user calls 'calendar' is 'events'." Security guardrails need to be stricter because Agents can be injected.

2. Contracts are locked down by code, not by documentation. Error classification is locked down by lint, Subtype is locked down by constant declarations, and identity validation is locked down in the pipeline. Documentation becomes outdated; CI does not.

3. The framework does what Agents are bad at; Shortcuts only do business logic. Identity resolution, Scope pre-check, parameter validation, error classification, pagination merging, content safety scanning—all of these are completed in the pipeline. Execute only receives a clean RuntimeContext and directly calls the API.

4. The simplest solution is often the most testable solution. The factory pattern doesn't use a DI framework, the vfs abstraction doesn't use heavyweight mock libraries, and tests use t.Setenv and t.TempDir to isolate state. There is no over-engineering; every layer of abstraction has clear testing benefits.


11. If You Want to Build Your Own Agent CLI: 8 Things You Can Take from lark-cli

Reading through lark-cli is not just for writing a book report. If you are building a CLI tool that will be called by Agents, here are 8 design decisions you can directly take, sorted by priority.

1. Command Layering: + Prefix Isolates Stable Interfaces

Problem: The Agent cannot distinguish which command is a "framework-encapsulated stable interface" and which is a "raw command directly mapping the API."

Takeaway: Add a prefix convention to your CLI—+ or stable: both work. Prefixed commands = stable interfaces, where the framework handles validation, error classification, and pagination for the Agent. Non-prefixed commands = raw interfaces, where the Agent is responsible for everything.

lark-cli's code:

// Shortcuts use + prefix, framework automatically injects dry-run, format, jq, identity resolution
var CalendarAgenda = common.Shortcut{
    Command: "+agenda",
    Scopes:  []string{"calendar:calendar.event:read"},
    Execute: func(ctx context.Context, runtime *common.RuntimeContext) error {
        // Only write business logic, leave everything else to the framework
    },
}

2. Error System: Exhaustive Categories, Stability Locked by Lint

Problem: The Agent relies on string matching to guess error types; changing the wording breaks it.

Takeaway: Define 5-10 exhaustive error Categories, each corresponding to a stable type field. Each Category gets an exit code. Use lint to force all command boundaries to return typed errors, banning bare fmt.Errorf.

lark-cli's code:

// Don't write this
return fmt.Errorf("permission denied")

// Write this—Agent reads type and subtype, doesn't need to read message
return errs.NewPermissionError(errs.SubtypeMissingScope,
    "missing required scope(s): %s", strings.Join(missing, ", ")).
    WithMissingScopes(missing...).
    WithHint("run: mycli auth login --scope %s", strings.Join(missing, " "))

Minimum viable version: You don't need 9 Categories from the start. 3 can cover 80% of scenarios: validation (parameter wrong), permission (no permission), internal (tool's own bug). Add more later.

3. Execution Pipeline: Framework Does What Agents Are Bad At

Problem: Every time the Agent calls a command, it has to handle identity, Scope, parameter validation, error classification, and pagination itself—these are not business logic, but Agents often make mistakes here.

Takeaway: Design a Shortcut execution pipeline that places identity resolution, Scope pre-check, parameter validation, error classification, and pagination merging all before Execute. Execute only receives a clean RuntimeContext and directly calls the API.

Identity → Config → Scopes → RuntimeContext → Validate → Execute

lark-cli's approach: Shortcuts only declare Scopes, AuthTypes, Flags, and Execute; the rest is automatically completed by the runShortcut pipeline. You don't need to repeat identity validation and Scope checks in every Shortcut.

4. Identity System: user/bot Dual Identity + Enforcement Mode

Problem: Sometimes the Agent needs to call the API as a user, sometimes as a bot. If the Agent judges entirely on its own, it will get confused.

Takeaway: Define 2-3 identities (user, bot, auto), and each command declares which identities it supports. The framework automatically resolves and validates in the pipeline. Add a strict mode so administrators can force bot-only identity.

lark-cli's approach: --as user / --as bot parameter + config file defaultAs + AuthTypes declaration. If the Agent calls a bot-only command with --as user, the framework rejects it during the Validate phase, not waiting until the API call to discover the error.

5. Security Guardrails: Agent Is an Untrusted Caller

Problem: Agents can be prompt-injected, can call the same command repeatedly in a loop, and can fill in malicious flag values.

Takeaway: Three things—path validation (all file I/O goes through an abstraction layer, rejecting ../ traversal), dry-run (all write operations support preview), credentials not in config files (use system keychain or environment variables, don't let the Agent read them).

Minimum viable version: dry-run offers the best cost-benefit ratio. Adding a --dry-run flag costs almost nothing, but the Agent can preview when unsure.

6. Output Envelope: Unified Format + _notice Push

Problem: The Agent needs to simultaneously get data, metadata, and system notifications from the output. If these three things are scattered across stdout and stderr, the Agent struggles to piece them together.

Takeaway: All output goes through the same JSON envelope, containing four fields: ok, data, meta, _notice. _notice is used to push system notifications like "new version available" or "Skill file outdated," without affecting the semantics of ok: true.

{
  "ok": true,
  "data": { "items": [...] },
  "_notice": {
    "update": { "current": "1.2.0", "latest": "1.3.0", "command": "mycli update" }
  }
}

7. Factory Pattern: struct + function field Dependency Injection

Problem: As CLI tools accumulate more commands, each command needs access to configuration, HTTP client, and credential chain. Using global variables makes testing a disaster.

Takeaway: Use a Factory struct to hold all shared dependencies, with all function fields lazily loaded. Replace fields directly in tests; no DI framework needed.

type Factory struct {
    Config  func() (*Config, error)
    Client  func() (*http.Client, error)
    // ...
}

// In tests
f := &Factory{
    Config: func() (*Config, error) { return &Config{...}, nil },
}

lark-cli's approach: NewDefault() creates the production Factory, tests use cmdutil.TestFactory(t, config) to create mocks. No need for wire, dig, or any third-party DI library.

8. Skill Files: Agent Operation Manual Embedded in Binary

Problem: The Agent gets --help and only sees a parameter list; it doesn't see "which command to call when the user says X" or "what needs to be done before calling this command."

Takeaway: Write a SKILL.md for each business domain, containing intent routing tables, prerequisites, and terminology mapping. Embed it in the binary using //go:embed to ensure version consistency.

Minimum viable version: Start with one, not 26. Choose the most commonly used business domain, and clearly write three things—what the user says → which command to call, what to prepare before calling, and how to recover from failure. The Agent's call success rate will noticeably improve with this file.


Priority Suggestions

If you need to build an Agent CLI right now, follow this order:

Priority What to Do Why Do It First Effort
P0 Error system (3 Categories + JSON envelope) Agents rely on errors for decisions; this is the highest-frequency interaction interface 1-2 days
P0 dry-run Lowest cost, highest benefit. Agent can preview before executing Half a day
P1 Command layering (+ prefix) Lets Agent distinguish "stable interface" from "raw interface" Half a day
P1 Output envelope (unified JSON format) Agent doesn't need to parse multiple output formats 1 day
P2 Execution pipeline (identity + Scope auto-validation) Reduces Agent errors in non-business logic 2-3 days
P2 Factory pattern Makes code testable; otherwise, you won't dare change it after the 10th command 1 day
P3 Identity system (user/bot + strict mode) Most tools only have one identity initially 1-2 days
P3 Skill files Write one first, validate effectiveness, then expand Half a day

Final Thoughts: The Three Stages of Building a CLI

Stage Focus Typical Approach
Add a --json Make output parseable Add a format parameter, JSON output
Structured errors Let Agent make decisions from errors Error classification, type/subtype, executable hints
Agent is a second-class user Design the CLI's interface contract from the ground up Command layering, identity system, Skill files, lint-locked contracts, security guardrails

Most tools stop at the first stage. lark-cli is at the third stage.

Treating the Agent as a second-class user—the gap between this design principle and the code is not technical ability, but the willingness to admit that your CLI is not just for humans.

After reading through 170,000 lines of code, the above is what I think is most worth taking away. There are certainly many details I haven't seen. If you are also building an Agent CLI, or have discovered interesting CLI designs, feel free to let me know.


Project repository: https://github.com/larksuite/cli