Stop AI from Wrecking Your Codebase with Spec-Driven Development

1. Foreword

In the past two years of rapid AI Coding adoption, more and more teams have started using tools like Codex, Cursor, and Claude Code to assist development. A large number of tutorials on Prompts, Skills, and MCP have emerged around these tools. However, much of this content remains at the level of "single-point efficiency gains": teaching you how to write Prompts, install tools, and invoke capabilities, but rarely answering a more critical question: Once AI enters the R&D process, how do we establish a stable, sustainable, and reusable engineering collaboration method?

This article revolves around this question. Let's first look at the most common problems teams encounter when using AI-assisted development:

Prompts are too simple, and AI lacks sufficient information, so it fills in the context on its own, leading to implementation results that go off track.
Too much information is thrown at the AI at once without clear priorities and acceptance criteria, causing the final output to easily lose focus.
Code implementation may not conform to team standards and may even inadvertently expand the scope of changes.
Some key terminology and business context only exist within the current session; when a new session starts, everything needs to be explained again.

The crux of these problems often lies in the ineffective organization of the development context, a bottleneck that is difficult to overcome by simply optimizing Prompts. Therefore, we need to shift our focus to engineering and build a process that allows AI to run stably within established rules. This is the core topic we will discuss today—SDD (Spec-Driven Development).

2. SDD Spec-Driven Development

What is SDD

SDD (Spec-Driven Development) is not a brand-new concept. It appeared in the field of software engineering long ago, with the core philosophy being: Define the rules first, then implement. However, the cost of writing and maintaining specification documents was high in the past, so it never truly became a mainstream development method.

With the rapid development of AI Coding in 2024 and 2025, AI can now generate code efficiently, but it has also exposed problems such as unstable requirement understanding, uncontrolled scope of changes, and the inability to continuously reuse context. Against this backdrop, SDD is being discussed again. For example, GitHub's Spec Kit solidifies the AI development process into Specify - Plan - Tasks - Implement, making AI more of an executor, while the human focus shifts forward to requirement definition, architectural design, constraint specification, and acceptance criteria.

Differences Between SDD and Traditional Approaches

Those familiar with traditional R&D processes will realize that this is essentially the AI version of "write the technical plan first, then develop," but with slight differences.

Compared to traditional approaches, the differences with SDD are roughly as follows:

Dimension	Traditional Approach	SDD
Document Audience	Humans	AI + Humans
Document Granularity	Readable is enough	Needs to be structured for AI parsing
Acceptance Criteria	Relatively vague	As verifiable and judgeable as possible
Document-Code Relationship	Tends to drift apart	Driven by specs, easier to keep consistent

At this point, many people will have a natural question:

Writing plans was already time-consuming; now we have to make documents more detailed and structured—won't the cost be even higher?

The answer is: There are already many mature tools available to assist us in writing Specs, and the overall cost is actually lower.

In fact, when generating a Spec, we also deepen our understanding of the requirements, and this investment pays off with a lower rework rate and more stable collaboration quality in the subsequent coding phase.

How SDD Solves These Problems

The "AI randomly changing code" mentioned earlier is often not a single problem but a combination of several issues:

AI doesn't know the requirement boundaries and will fill in logic on its own.
AI doesn't know the acceptance criteria and might end tasks prematurely.
AI doesn't understand project constraints and easily produces code that doesn't match team habits.
AI lacks long-term context and will repeat the same mistakes in new sessions.

What SDD aims to solve is organizing this information—originally hidden in requirement reviews, technical plans, and development experience—into specification documents that AI can continuously read and reuse. For example:

Spec solves the problem of "what to do and what not to do." It clearly states user scenarios, functional boundaries, edge cases, and acceptance criteria. This way, AI doesn't need to guess requirements during implementation and is less likely to modify unrelated logic.

Design solves the problem of "how to do it." This explains our technical approach, module impact scope, data flow, interface dependencies, and compatibility requirements.

Tasks solves the problem of "which part to do first and to what extent." It breaks down requirements into small, executable, and verifiable tasks. AI handles only one clearly defined scope at a time, making changes easier to converge and easier for humans to review and roll back.

In this way, AI can work around the same set of specification documents from analysis and planning to coding. Humans are responsible for defining rules and judging results, while AI is responsible for completing the implementation within those rules. As these Spec documents continue to accumulate in the project, they will gradually form a project-level knowledge base. When starting new sessions later, AI won't need to understand the business from scratch or have the same set of rules explained repeatedly, significantly improving collaboration stability.

3. Tool Selection

OpenSpec

Many excellent tools have already emerged for SDD to help us improve efficiency. Common solutions include OpenSpec, Spec Kit, superpowers, and others.

If a team wants a lightweight integration and quick implementation, I would lean towards choosing OpenSpec. It may not be suitable for all scenarios, but it is less intrusive to existing team workflows and has a lower barrier to entry.

The initialization method is straightforward: after installing OpenSpec, execute openspec init in the project, then select the corresponding AI Coding tool to generate the following directory structure:

openspec/
├── specs/              # Source of truth (your system's behavior)
│   └── <domain>/
│       └── spec.md
├── changes/            # Proposed updates (one folder per change)
│   └── <change-name>/
│       ├── proposal.md
│       ├── design.md
│       ├── tasks.md
│       └── specs/      # Delta specs (what's changing)
│           └── <domain>/
│               └── spec.md
└── config.yaml         # Project configuration (optional)

We focus on the following core files:

File	Role
`spec.md`	Defines requirements, boundaries, behaviors, constraints, and acceptance criteria, serving as AI's long-term context
`design.md`	Records specific technical solutions and implementation designs
`proposal.md`	Describes the background, goals, impact scope, and rationale
`tasks.md`	Task breakdown and execution progress reference

Taking Codex as an example, if the corresponding capabilities are integrated, you can quickly enter this workflow using the /opsx shortcut command.

For more capabilities, refer to OpenSpec GitHub.

Workflow

OpenSpec is just one way to carry this workflow. The core is not a specific tool but the collaboration method of "defining specs first, then constraining execution, then continuously accumulating." Ideally, we don't start directly from code but first define and continuously refine Spec documents, then let AI code around these documents.

The lifecycle of an OpenSpec change roughly looks like this:

Overall Architecture

OpenSpec solves the main process of "advancing development according to specs," but to make AI output more stable, we also need to complete the context acquisition, project rules, and automated verification. Overall, this collaboration method can be understood as the following diagram:

4. Practical Implementation

Just looking at the flowchart and overall architecture diagram, many people who haven't practiced it will still find it abstract. They might not understand the roles that MCP, Skills, and other constraints play within it, or which command to use at which stage. Below, we'll follow a common AI Coding collaboration process to see how OpenSpec's workflow is implemented in actual development.

Generating Specs

At the start of a complex requirement, we can first use /opsx:propose + requirement content, technical plan, etc. to generate a draft Spec document. However, in practice, you'll find that if you completely rely on copy-pasting information into the dialog box, it's not only inefficient but also easy to miss key information, ultimately affecting the completeness of the Spec.

MCP: Obtaining More Complete Context

Therefore, we need to leverage the capabilities of MCP to allow AI to directly access context scattered across tools like Figma, Yuque, Yapi, etc. By uniformly encapsulating the commonly used data sources in the R&D process into MCP Servers, we can reduce the information loss caused by manual copying on one hand, and make Spec generation closer to the real business context on the other.

With this capability, we can then write Specs like this:

/opsx:propose 
Product Document: link1
Technical Plan: link2
...

After the AI outputs, we still need to intervene and check. Because the current project may not yet have enough historical Specs, the AI's understanding of business knowledge is incomplete, and the generated content might contain incorrect business terms, unreasonable architectural designs, etc. At this point, directly modify the generated Spec document. As Specs continue to accumulate, such problems will become fewer and fewer in subsequent generations.

After the document is modified, we can proceed to the most critical next step: generating code.

Generating Code

In this step, we just need /opsx:apply to let the AI quickly generate code based on the Spec document. However, in practice, you'll find that after using /opsx:apply, although it understands business terms and logic, it might still expand the scope of changes or produce naming and implementations that don't conform to team standards. At this step, we need some additional means to constrain these non-standard behaviors:

1. Skills: Turning Soft Constraints into a Reusable Rule Layer

Every team has its own development standards. If we tell the AI these common team rules through Prompts and Specs every time, it causes a huge waste of resources. Here, we can turn these soft constraints into a reusable rule layer.

In my workflow, Skills are divided into two layers: Bottom-layer Skills (constraining the model) + Top-layer Skills (guiding the model to understand the business).

Bottom-layer Skills

You can understand this Skill as "long-term work habits for the model," placing some general coding rules here, such as:

Think before coding.
Prioritize small changes, don't arbitrarily expand the scope.
After modifications, self-check and explain the impact.
Confirm first when encountering uncertainty, don't guess blindly.
...

These types of general rules are placed globally in the tool and shared across all projects.

Recommended use: andrej-karpathy-Skills

Top-layer Skills

But the things truly strongly related to your project's business are often another layer of content, such as:

Which layer of our project architecture is responsible for requests, and which layer handles data processing.
Component naming conventions.
Domain-specific terms in the business.
Which historical pitfalls should not be stepped into again.

This content is more suitable to be made into project-level Skills, continuously iterated, and placed in our project.

Skills and Specs are essentially both providing context to the AI, but the focus of Specs is on what to do, while the focus of Skills is on how to do it.

2. AGENTS.md: The Project Onboarding Manual for AI

Besides Skills, you can also utilize AGENTS.md. Many AI Coding tools will prioritize reading the rule description file in the project. It is suitable for placing content that "should be known by default after entering the project." You can understand it as: The project's onboarding manual for AI.

For example, suitable content includes:

How to read the project structure.
Which standards have the highest priority.
Which directories or files should not be modified casually.
...

If you don't know how to write the format of this file, you can also use AI to generate it at this step; you just need to verify the document it generates.

Why not put all this content into Skills?

Because the trigger mechanisms and scopes of the two are not exactly the same:

Skills are activated on demand, more suitable for general rules or single customized rules, like the general coding standards we mentioned above.
AGENTS.md takes effect upon entering our project, storing the customized rules for our current project, like the code structure of our specific project.

In many tools, AGENTS.md is read earlier than Skills, so it is particularly suitable for placing the most fundamental and stable project rules.

3. Hooks: Upgrading Soft Constraints to Hard Constraints

Even with Specs, Skills, and AGENTS.md, AI might still occasionally "go haywire" in complex tasks. Because the constraints we mentioned above are all soft constraints, the AI might not comply with these agreements. If we want strong verification and standardization of our code, we need to use Hooks capabilities.

Many AI Coding tools provide Hook capabilities, allowing us to execute scripts at specific lifecycle nodes. Taking Codex Hooks as an example, common nodes include:

UserPromptSubmit: After the user prompt is submitted, before the AI receives it.
PreToolUse: Before calling a tool, can intercept high-risk operations.
PostToolUse: After the tool executes, before the result is returned to the AI.
Stop: Before the session stops, can execute checks, tests, or CR processes.

For example, in practice, you might find that Codex AGENTS.md/Prompt has already stated "frontend code cannot directly write Chinese, it must go through internationalization," but in long sessions or complex tasks, the AI still occasionally outputs hardcoded Chinese.

At this time, you can enforce a detection script after code modifications:

#!/bin/bash

FILE="$1"

# Detect Chinese characters
if grep -nE '[\u4e00-\u9fa5]' "$FILE"; then
  echo ""
  echo "Detected hardcoded Chinese"
  echo "Please change to:"
  echo "t('xxx')"
  exit 2
fi

exit 0

A sample hook configuration is as follows:

{
  "Hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "Hooks": [
          {
            "type": "command",
            "command": "./scripts/check-i18n.sh $CLAUDE_FILE"
          }
        ]
      }
    ]
  }
}

With these configurations in place, the quality of AI output will significantly improve. After AI Coding is complete, the work we need to continue doing is:

Review the code generated by AI.
Verify whether the interface and interactions meet expectations.
Identify new problems exposed during this collaboration.

These issues can later be fed back into Skills, AGENTS.md, and Hooks, forming a more stable foundation for the next round of collaboration.

Spec Synchronization

In actual development, we often encounter some unplanned changes: halfway through writing code, the product manager temporarily adjusts the interaction; backend interface fields change; we discover during development that the original technical plan is unworkable; or during review, we find a boundary scenario that wasn't considered before, etc.

These changes themselves are normal, but in the AI Coding process, if we only change the code and don't update the Spec, it buries a problem: The code has changed, but the rules the AI reads later are still the old ones.

In the short term, this might not have much impact because everyone still remembers what happened in the current session. But once you switch sessions, or continue iterating a few days later, the AI will still understand the requirements according to the old Spec. The result is: it might change back the logic you just fixed, or continue generating code based on the old rules, causing the code and specs to become increasingly inconsistent.

At this point, there are two approaches:

First, use /opsx:sync + change description to modify the Spec document, then /opsx:apply based on the latest Spec document.
In special cases where we manually modified the code, we must then use /opsx:sync + description to update the Spec.

This ensures our code and Spec remain consistent.

Archiving and Accumulation

When all tasks are completed and verification is passed, the final step is /opsx:archive.

Many people understand archiving as "ending this task," but in SDD, its more important role is: Accumulating the effective experience generated from a temporary collaboration into reusable project context for the future.

After a requirement is completed, the truly valuable things are not just the code itself, but also include:

The functional boundaries finally confirmed for this requirement.
Which design decisions were adopted, and which plans were abandoned.
What historical problems were discovered during implementation.
Which rules are worth accumulating into the main Spec, Skills, or AGENTS.md.
Which testing and acceptance methods can be reused for subsequent similar requirements.

If not archived, this information often only exists in the current session, a temporary branch, or someone's memory. The next time a similar requirement comes up, the AI will still need to re-understand it, the team will need to re-explain it, and many pitfalls that have already been stepped on will be stepped on again.

For example, in one requirement, we discovered that "backend export functions must distinguish between task creation success and file generation success." If this rule only stays in the current code, the next time the AI works on another export function, it might not know it. But if it is accumulated into the project standards during archiving, this experience can be directly reused for subsequent similar requirements.

In the long run, the more solid the archiving is, the more complete the project's context becomes. The AI's "familiarity" with the project does not arise from nothing but is built up bit by bit through these continuously accumulated specs, rules, and historical decisions. The marginal cost of subsequent collaboration will also gradually decrease with the accumulation of this context.

Applicable Scenarios

Not all requirements are worth going through the complete SDD process. Sometimes, direct Vibe Coding might be more efficient.

Scenarios more recommended for SDD include:

New business modules.
Complex feature development.
Multi-person collaboration projects.
Long-term maintenance projects.
Large-scale refactoring.

A very practical rule of thumb is: If a requirement is expected to last more than 2 days, it is usually worth establishing a Spec.

Because the complexity of such requirements is high enough that it's difficult for AI to complete them stably in just one round of conversation; and once a Spec is in place, the subsequent collaboration cost drops significantly.

Scenarios less recommended for SDD include:

Overly simple changes.
Pure style adjustments.
One-off solution validation.
Non-persistent temporary scripts.

These scenarios themselves have low information density, and forcing a full set of SDD processes would actually slow down efficiency.

Practical Tips

When first generating a Spec, don't pursue perfection in one step. Get a working draft first, then continuously supplement it during the verification process.
Write down user scenarios and goals clearly first, then discuss implementation details; don't jump straight into technical solutions.
Only advance one minimal closed loop at a time, such as "add a page capability" or "fix a configuration link," to avoid continuous scope creep.
If the requirements are still unstable, explicitly list the uncertain items first before starting implementation, to avoid guessing while doing.
Task breakdown should ideally be cut according to "committable, verifiable, and revertible" units; don't lump multiple business points into one large change.
Perform a minimal verification after completing each sub-task, at least confirming that page rendering, request parameters, and type checking have no obvious issues.

5. Summary

Model capabilities are now very strong, but they won't automatically make all judgments for us. What truly determines the quality of the output still depends on how we use them. SDD is just a relatively good choice at the moment, but the future certainly holds much more than this.