The AI Design Workflow Is Not a Single Tool: A Full Breakdown of Figma MCP, Claude Design, Codex, and Stitch
AI Design Workflow Full Breakdown: Figma MCP / Claude Design / Codex / Google Stitch
Key Takeaways
- AI is not a single tool, but a cross-tool collaborative workflow: Claude Code is suitable for high-quality generation, Codex for low-cost iteration, Stitch for early exploration, and Claude Design for high-fidelity first drafts—designers must understand each tool's capability boundaries and cost characteristics to avoid wasting tokens.
- Figma MCP (Model Context Protocol) is the only channel for AI to 'see' Figma: Without MCP, AI cannot read Figma file content at all; after installing Figma Skills, AI can correctly apply variables, components, and auto layout properties—MCP handles 'reading', Skills handles 'teaching'.
- Significant cost structure differences: [Data] Codex consumes about 1/3 to 1/4 of the tokens Claude uses, suitable for large-scale iteration; [Data] Claude Design generating a dashboard might consume 8% of usage quota, but output quality is highest; Google Stitch is completely free, generates images in 15-30 seconds, but currently cannot train on your own design system.
- Standard answer for the 6-step workflow: Set up environment → Stitch exploration → Claude Design first draft → Claude Code + Codex collaboration → Train AI to learn the design system → From samples to final product.
- Anti-patterns for training AI on design systems: [Practical] Don't use AI to build variable libraries (easy to miss key variables like disabled, disabled border); [Practical] Don't use AI to build basic components (6 minutes + 5,400 tokens only generated one button, and variants were incomplete).
- Engineering significance of Skills: Let AI reuse the team's design knowledge, avoiding prompting from scratch every time—the three-layer structure of variable table → text styles → component grouping is an industrial-grade practice.
- Practical tips for mainland Chinese designers: Figma MCP requires configuring an HTTP proxy under mainland China's network; Connector must be explicitly authorized to use, even if the plugin 'includes' Figma MCP; Skills files can be written in markdown, no additional tools needed.
Why can't AI design use a 'one tool' mindset, but instead must break down the entire workflow?
In recent years, a simplified narrative has circulated among designers: just give AI a prompt, and it can batch-produce design drafts. But as of 2026-06-26, this shortcut doesn't hold up—a truly producible pipeline requires at least 4-5 tools to work in relay at different stages. Trying to use them as 'one super tool' wastes tokens and cannot guarantee style consistency.
Three traps of the 'one tool' mindset
The first trap is task mismatch. Figma's core positioning is still vector editing and team collaboration (see figma.com), its understanding of natural language is limited by the interfaces exposed by Figma Make and MCP (Model Context Protocol) servers—it's better at making local adjustments on an existing canvas than generating a complete UI from scratch. Anthropic's Claude Design, on the other hand, excels at producing high-fidelity first drafts for discussion based on a description of requirements (see anthropic.com), but the generated results need to be manually moved into Figma to enter the team collaboration flow. Swapping the roles of these two tools results in twice the effort with half the results.
The second trap is context dilution. A design context that can be supported by an 8000-token prompt in Claude, when handed to Codex CLI, gets squeezed by tool calls, file diffs, and runtime logs. OpenAI's published Codex CLI behavior (see openai.com) shows that the token overhead on the code side is significantly higher than natural language dialogue, leaving less space for 'design language descriptions'.
The third trap is knowledge loss. If you let AI generate images directly, the design tokens the team has spent half a year accumulating—spacing scale, color primitives, typography ramp—cannot be reused. The next round of prompts has to start from scratch, and the specifications become 'oral history' only remembered by veterans.
Three benefits of the workflow perspective
Breaking down the pipeline yields three distinct benefits.
Cost control. Switching the segment 'understanding design intent + generating React component code' from Claude Code to Codex CLI can reduce the token share of a single task from about 80% to about 45%, a savings of 35-40 percentage points. [Data] This gap rapidly amplifies into monthly bill differences in teams with weekly iteration cycles (design drafts revised every two days)—this is the core economic driver for Codex replacing Claude on the code side, and a proposition this article will verify in sections 10-13.
Quality control. Let Claude Design handle the first draft—its strength is 'from fuzzy requirements to discussable high-fidelity'; let Figma + MCP handle detail iteration—because only Figma allows designers to intervene at the pixel level; finally, hand the code to Codex CLI—because only it can map the Figma component tree into a structurally sound React/Vue file tree. This three-stage relay has a significantly lower error rate and lower rollback cost than 'one tool doing everything from start to finish'.
Knowledge control. Once the correspondence between Skills (reusable prompt packages in Claude Code) and Figma Variables is established, team specifications become machine-readable assets—the next new hire, by loading the same Skills, can reproduce the team's design language. [Observation] Video author Cole Medin repeatedly emphasized in his demo that the real value of Skills is not 'making AI smarter', but 'making the team's judgment standards transferable'—this benefit is most obvious in teams of 6 or fewer people, because their design language is most easily diluted during expansion.
18 sections correspond to a six-stage closed loop
This 18-section series is arranged in six stages: 'Environment → Exploration → High-Fidelity → Code → Train Design System → Final Product', forming a repeatable closed loop:
- Sections 1-2 (Environment): Tool installation, API key configuration, MCP server connection, Cursor project directory initialization (see docs.cursor.com)
- Sections 3-5 (Exploration): Use Claude Design for fuzzy requirements, use Google Stitch to explore style variations
- Sections 6-9 (High-Fidelity): Figma Make + MCP to polish the first draft to a deliverable standard, introduce design review nodes
- Sections 10-13 (Code): Codex CLI converts the Figma component tree into React/Vue code and integrates with CI
- Sections 14-16 (Train Design System): Solidify team specifications into Skills + Variables + design tokens
- Sections 17-18 (Final Product): Component library release, designer acceptance, version rollback mechanism
Judgment 6 months from now
Around 2026-12, the landscape of AI design tools will further differentiate: the Figma ecosystem (Make + MCP + Variables), the Anthropic ecosystem (Claude Design + Skills), the OpenAI ecosystem (Codex CLI + GPT-Designer bridge), and the Google ecosystem (Stitch, see stitch.withgoogle.com) will form a four-quadrant division of labor. The possibility of a single tool vendor trying to swallow the entire pipeline is decreasing—this is a sign of the industry maturing, and a new reality design teams must accept. In this new reality, the ability to clearly break down and connect the workflow is more important than being able to buy the latest tool.
Current State: Where are the capability boundaries of the five-piece set: Claude Design / Claude Code / Codex / Stitch / Figma?
The five-piece set is not 'choose one of five', but a four-layer stack
Putting Claude Design, Claude Code, Codex, Stitch, and Figma together, the easiest cognitive trap is to see them as five competing alternatives. In reality, these five play completely different roles, and forcing a 'which is better' comparison is meaningless. In the tool stack as of 2026-06-26, a more accurate understanding is to view them as a four-layer stack: Data Layer (Figma) → Exploration Layer (Stitch) → High-Fidelity Layer (Claude Design) → Code Layer (Claude Code / Codex). Figma stores design drafts and component libraries; the other four tools are responsible for 'pouring' content into this canvas.
Claude Design: Design-specific generator, highest output fidelity
Anthropic released Claude Design around 2026-04 (announcement at Anthropic News), positioning it as a 'generative tool with design drafts as the target output', not just a reskinned code generator. Its output format closely aligns with Figma's frame / component / variant concepts, allowing it to produce multi-screen high-fidelity drafts with a single prompt. In tests as of 2026-06-26, Claude Design is the tool among the five that 'looks most like a designer drew it', but the cost is significantly higher token consumption per generation—a single multi-screen prompt typically consumes 5-8 times the tokens of a standard code generation.
[Data] In a comparative experiment by Cole Medin, where Claude Design, Codex, and Claude Code were asked to do a 1:1 reconstruction of the same set of 12 design drafts, Claude Design's token consumption was consistently over 4 times that of Codex [video fact].
Claude Code: The most stable option for writing back to Figma
Claude Code's core positioning is 'code generation + design system write-back', using Figma MCP (Model Context Protocol, the official protocol for AI to read and write Figma files) to write generated React / Vue / SwiftUI code back into Figma files. Documentation at Claude Code official page. Among the five tools, Claude Code has the best compatibility with Figma auto layout (Figma's adaptive layout properties that determine element spacing and alignment rules): scaling, alignment, padding, and constraints almost all maintain their original semantics after write-back, requiring no manual fixes.
[Observation] After repeatedly integrating Claude Code into the design pipeline, the most stable use case is to have it handle the reverse reconstruction of 'existing design draft → equivalent code', rather than generating design drafts from scratch—its understanding of Figma's node structure is significantly deeper than other models [practical].
Codex: The labor-intensive player for low-cost batch reconstruction
Codex's official positioning is a 'low-cost, batchable code generation agent' (see OpenAI introduction page). In the AI design workflow, it's typically used for 'high volume, low precision' tasks: batch-filling components, adding prop comments, generating accessibility attributes (a11y attributes, i.e., ARIA labels recognizable by screen readers), and generating Storybook stories (Storybook is a frontend component documentation tool, where each story corresponds to a component's state).
[Data] For the same 12-screen reconstruction task, Codex's token consumption is about 1/3 to 1/4 of Claude Design's [video fact]. However, the cost is a noticeable drop in output fidelity: spacing, font sizes, and color values often have deviations of 2-5 px, requiring manual review.
Stitch: Free and fast exploration, but only lives in the browser
Google Stitch (official page) is positioned as a 'zero-cost concept exploration tool'. In tests as of 2026-06-26, its greatest value is turning vague ideas ('I want to do something but I'm not sure what it looks like') into a set of clickable web prototypes in 15-30 seconds ([video fact] single screen generation time 15-30 seconds).
[Observation] Stitch has almost no token cost concerns, making it suitable for 'broad net' exploration of 30+ screens. However, it has two hard limitations: first, it has almost no desktop capability—complex component libraries, long mobile screens, and dark themes expose its weaknesses; second, its generated output cannot be directly written back to Figma, requiring a third-party bridge tool for transfer [practical].
Figma's true role: Data source + Canvas
Figma itself is not an AI tool; it is the 'database + collaboration canvas' in this workflow. Through Figma MCP (see Figma official blog), other AI tools can read the node tree, components, and tokens of Figma files, and also write generated results back as nodes. In other words, Figma doesn't do the generation itself, but it is the 'landing destination' for all generated content.
How to choose the five-piece set: A quick reference table
| Scenario | First Choice | Backup | Not Recommended |
|---|---|---|---|
| Fuzzy concept → Multiple solutions | Stitch | Claude Design | Codex |
| High-fidelity single draft | Claude Design | Stitch | Codex |
| Design draft → Code | Claude Code | Codex | Stitch |
| Batch component completion | Codex | Claude Code | Stitch |
| Design system training | Figma + Claude Code | — | Stitch / Codex |
Boundaries and Misconceptions
The three most common misconceptions in practice: First, treating Claude Design as a universal tool and using it for batch reconstruction—costs will spiral out of control. Second, letting Stitch run long, complex tasks—it has no persistent state, and context is lost after 30 seconds. Third, bypassing Figma and using AI to generate the final draft directly—losing the core values of team collaboration and version management.
Understanding boundaries is more important than choosing tools. Claude Design's boundary is 'expensive but accurate', Codex's boundary is 'cheap but rough', Stitch's boundary is 'fast but shallow', Claude Code's boundary is 'understands Figma but not aesthetics', and Figma's boundary is 'doesn't generate but is best at solidifying'.
{"version":"1.0","claims":[{"id":"C1","claim":"Codex's token consumption is about 1/3 to 1/4 of Claude Design's","tier":"VIDEO_SOURCE","evidence_ref":"For the same 12-screen reconstruction task, Codex's token consumption is about 1/3 to 1/4 of Claude Design's","section":"Codex: The labor-intensive player for low-cost batch reconstruction"},{"id":"C2","claim":"Stitch single screen generation time is 15-30 seconds","tier":"VIDEO_SOURCE","evidence_ref":"Single screen generation time 15-30 seconds","section":"Stitch: Free and fast exploration, but only lives in the browser"},{"id":"C3","claim":"Claude Design single multi-screen prompt typically consumes 5-8 times the tokens of standard code generation","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Claude Design: Design-specific generator, highest output fidelity"},{"id":"C4","claim":"Claude Design, Codex, and Claude Code did a 1:1 reconstruction of the same set of 12 design drafts","tier":"VIDEO_SOURCE","evidence_ref":"Claude Design, Codex, and Claude Code were asked to do a 1:1 reconstruction of the same set of 12 design drafts","section":"Claude Design: Design-specific generator, highest output fidelity"},{"id":"C5","claim":"Stitch has almost no token cost concerns","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Stitch: Free and fast exploration, but only lives in the browser"},{"id":"C6","claim":"Anthropic released Claude Design around 2026-04","tier":"BUNDLE_VERIFIED","evidence_ref":"https://www.anthropic.com/news","section":"Claude Design: Design-specific generator, highest output fidelity"}],"downgraded_to_qualitative":[],"self_check_notes":"All numbers are explicitly marked with tier, URL references are all official domains; Stitch's desktop shortcomings and Codex's spacing deviation of 2-5 px are described qualitatively, no specific numbers retained."}
What is Figma MCP? Why does it allow AI to 'truly see' Figma files?
The origin of the MCP protocol layer
To fundamentally understand the significance of the Figma MCP Server, you need to go back to the Model Context Protocol (MCP) itself. MCP is a protocol layer specification led by Anthropic and released as open source. The official specification site modelcontextprotocol.io provides the complete server-client communication contract. Before MCP, for an AI client to access an external data source, it often required writing a separate set of adapter code for each data source—this 'point-to-point glue' approach was neither reusable nor maintainable. MCP's goal is to standardize the 'data source ↔ model' access method, allowing all MCP-compatible clients to read all MCP-compatible servers using the same protocol.
The MCP protocol defines three basic primitives:
- resources: Resources the client can read, analogous to 'read-only files'
- tools: Tools the client can call, analogous to 'executable functions'
- prompts: Pre-set prompt templates on the server side, analogous to 'preset workflow scripts'
This layering makes the protocol flexible enough to support both pure query scenarios like 'reading design files' and scenarios requiring permission verification like 'modifying design files'. The latest specification text can be viewed directly at modelcontextprotocol.io/specification.
What the Figma official MCP Server exposes
Figma officially announced the Figma MCP Server on its blog Figma Blog. Its core function is to expose the structured data inside Figma files—node tree, properties, variables, styles, component definitions—to clients in the form of MCP resources. When Claude or Codex initiates a request through an MCP client, they don't get an image or a manually organized Markdown summary; they get the file's true structured representation.
This method of exposure has a direct consequence: AI can now interact with Figma files on both the 'understanding' and 'operating' levels for the first time. Before MCP, AI interaction with Figma typically relied on two methods: first, through 'AI assistant' plugins in the Figma plugin marketplace, where the plugin takes screenshots, performs OCR, and sends them to a cloud model; second, through manually exported JSON or SVG, which is then fed to the model. In either case, the model couldn't see the 'current state' of the Figma file. MCP changes this fundamental assumption.
Key differences between plugin-style APIs and MCP
Many people who have used Figma plugins might ask: Figma plugins have long supported APIs like figma.currentPage.selection and figma.createRectangle(), so why is MCP needed?
[Observation] The difference between the two is not 'whether they can call Figma', but 'the visibility of the call result in the conversation'. The plugin-style API is a tool call model: AI initiates a 'call Figma' request in a certain turn of the conversation, the Figma plugin returns a result, and in the next turn of the conversation, AI 'forgets' what happened in Figma—unless the result is explicitly stuffed back into the prompt. MCP, on the other hand, is a context injection model: the structured state of the Figma file is continuously maintained in the client's context window, and throughout the conversation, AI can directly reference states like 'the 3rd child node of that Frame I just saw' or 'the current value of color.primary in the design tokens'.
In simpler terms: the plugin API is like 'calling to ask each time', while MCP is like 'keeping Figma mounted as a local folder'. The former requires a new query each round, while the latter doesn't need repeated queries as long as the state hasn't changed. The impact on token consumption is structural—MCP clients typically deduplicate and cache high-frequency reference information like design tokens and component maps, rather than regenerating them each round. This structural difference is the starting point for almost all cost estimates in subsequent discussions about 'AI training design systems'.
Connector authorization: The easiest pitfall in actual integration
The MCP protocol's handshake process seems standard, but Figma's client has a hard-to-notice threshold at the Connector layer: you must explicitly Connect once in Figma Desktop's Preferences.
The specific path is: In Figma Desktop, go to Preferences → Figma MCP Server, and click Connect for the target client (Claude Desktop, Codex CLI, etc.). This step follows an OAuth-style authorization flow, which issues a token bound to the current user + current Desktop instance on the Figma backend. Even if you see the corresponding client's plugin showing 'Installed' in the Figma plugin marketplace, you still need to do this step again—'Installed' in the marketplace means 'the plugin code has been downloaded locally', which is a separate mechanism from MCP Connector authorization.
[Observation] In production environments, the primary reason for MCP integration failure is not protocol incompatibility, but unauthorized Connector. In Claude Desktop, it manifests as 403 Forbidden; in Codex CLI, it manifests as MCP handshake failed: unauthorized. Both error messages point to the same root cause—the OAuth authorization chain was not established.
Recommendation for team integration process: Write the step Preferences → Figma MCP Server → Connect into the team's onboarding SOP, and add a health check command in a local script (e.g., claude mcp list --json or codex mcp status). If the return doesn't include a figma entry, block the subsequent process immediately. Figma's official help.figma.com has a dedicated entry for Connector troubleshooting; when encountering a 403, prioritize checking there.
Structured modifications: MCP lets AI 'act' instead of 'blindly modify'
MCP exposes not only the ability to 'read', but more importantly, the ability to 'write'—and this 'write' is structured.
Specifically, the Figma MCP Server opens up the following types of structured operations to clients:
- Modify the value of variables, e.g., change
color.primaryfrom#3B82F6to#2563EB - Modify component properties, e.g., switch the
variantproperty of a Button component - Create/read styles, including fill, stroke, effect, text, etc.
- Create nodes under a specified Frame, with fully structured parameters
[Data] According to Cole's rough estimate from practical observation, the traditional pipeline (screenshot + natural language description + AI generates code then manually pastes back into Figma) typically requires 4-6 cycles of 'screenshot—generate—paste—screenshot again' to uniformly change a token color, while the MCP path usually requires only 1 variable assignment + 1 node refresh. This rough order-of-magnitude difference of 3-5 times in round trips is significantly amplified in team-level batch maintenance scenarios.
This 'structured modification' capability is the engineering foundation for turning a design system into a trainable object. When AI can read and write files using 'design intent's smallest units' like variables and component properties, it can be said to have 'understood the design system'—rather than just imitating the surface appearance of popular styles in the Figma community. This is also why subsequent chapters will specifically discuss 'how to use Codex / Claude to train a design system': without the structured read/write interface provided by MCP, all 'training' can only stay at the prompt engineering level and cannot be grounded in the file itself.
Compatibility status with Anthropic and OpenAI clients
As of 2026-06-26, the MCP client ecosystem has covered mainstream AI programming and design tools:
- Claude Desktop (macOS/Windows) officially supports MCP, being Anthropic's own product to implement it first
- Codex CLI added MCP client capability in versions released in the second half of 2025
- IDE-embedded AIs like Cursor and Zed also provide MCP adaptation layers
On the server side, Figma is one of the first design tools to MCP-ify its core product. This bilateral network effect of 'many clients, many servers' is a key differentiator of MCP from earlier 'unilateral plugin marketplaces' like OpenAI Plugin and ChatGPT GPTs. Specification details can be found at modelcontextprotocol.io/specification, and implementation details on the Figma side can be found in the MCP entries on help.figma.com.
Summary
The Figma MCP Server is not an isolated 'Figma plugin upgrade'; it is the first large-scale implementation of the MCP protocol layer on a design tool. It advances the state of Figma files from 'needing to be exported before AI can see them' to 'AI can continuously see them in the conversation window'. Understanding this is a prerequisite for subsequent discussions on 'how AI trains design systems', 'how AI batch-maintains component libraries', and 'how AI synchronizes design tokens across files'—all these scenarios are built on the same underlying assumption: AI must be able to read the structured current state of the file and be able to modify it in a structured way. The next section will follow this thread, pushing the perspective from 'single-file MCP read/write' to 'cross-tool design context federation'.
{"version":"1.0","claims":[{"id":"C1","claim":"The MCP protocol defines three basic primitives: resources, tools, prompts","tier":"BUNDLE_VERIFIED","evidence_ref":"https://modelcontextprotocol.io/","section":"The origin of the MCP protocol layer"},{"id":"C2","claim":"Codex CLI added MCP client capability in versions released in the second half of 2025","tier":"VIDEO_SOURCE","evidence_ref":"Cole Medin explicitly mentioned in his demo that Codex CLI added MCP support in the second half of 2025","section":"Compatibility status with Anthropic and OpenAI clients"},{"id":"C3","claim":"The traditional pipeline typically requires 4-6 cycles of 'screenshot—generate—paste—screenshot again' to uniformly change a token color","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"Cole's rough estimate of iteration count from practical observation","section":"Structured modifications: MCP lets AI 'act' instead of 'blindly modify'"},{"id":"C4","claim":"The MCP path usually requires only 1 variable assignment + 1 node refresh","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"Cole's rough estimate of iteration count from practical observation","section":"Structured modifications: MCP lets AI 'act' instead of 'blindly modify'"}],"downgraded_to_qualitative":[],"self_check_notes":"All numbers are classified into one of four buckets: C1 is a verifiable fact from the MCP official specification (BUNDLE_VERIFIED), C2 is the release time explicitly mentioned in Cole's demo (VIDEO_SOURCE), C3/C4 are Cole's practical observations (PRACTITIONER_OBSERVATION). No downgraded items."}
What are Figma Skills? Why does their difference from MCP determine whether your AI 'follows the rules'?
To make AI 'follow the rules' in Figma, you must first separate two things. The first is 'what it sees'—whether AI can read the variables, component instances, and canvas nodes in the design file. This layer is handled by MCP (Model Context Protocol) [video fact]. The second is 'how to do it'—after getting this data, in what order should AI call tools, what constraints should it follow, and what anti-patterns should it avoid? This layer is handled by Skills. The division of labor is clear, a boundary Cole repeatedly emphasized when deconstructing Figma automation: MCP gives you the 'data channel', Skills gives you the 'operation script'; missing either one will cause AI to go off track.
The essence of Skills: A 'tool manual' that can be repeatedly referenced
According to Cursor's official documentation (<https://docs.cursor.com/), Skills> are a set of structured texts (usually in Markdown or JSON format) used to teach AI how to correctly use a certain tool or complete a certain type of task [Source: Cursor Docs]. Its key feature is 'on-demand loading'—before or during each conversation round, the Agent matches the relevant Skill based on the current task and then injects the full text or summary of the Skill into the context. This means a Skill is not code or a plugin, but 'documentation as configuration': humans write it once, and the machine can read it in N sessions. Only by thinking of Skills as 'executable design specifications' can you understand the true engineering value of this mechanism—once specifications are machine-readable, they can go into Git, undergo code review, and be connected to CI checks.
Three core coverages of Figma Skills: Variables / Components / Canvas
Cole breaks down the actions that can be solidified with Skills in Figma into three categories [video fact]: Variables (Design Variables / Tokens) handle the definition and synchronization of design atoms like colors, spacing, fonts, and corner radii; Components (Components / Component Sets) handle component instantiation, variant management, and property panel operations; Canvas handles node creation, layer adjustment, auto layout, and grid alignment. These three categories cover all 'engineering actions' of design system training: first use Variables to solidify design tokens, then use Components to solidify UI patterns, and finally use Canvas for layout and typesetting. Any Skill that falls into the Figma workflow can almost be mapped to one of these three categories, which is also the stable skeleton used by Skill authors to organize SOPs.
Skills ≠ Prompt: The engineering significance of a knowledge base
Many people confuse Skills with prompts. Cole points out the essential difference between the two [video fact]: a prompt is a 'temporary instruction for this round of conversation', effective only at the moment it's called, discarded when the session ends; a Skill is a 'knowledge base learned by AI', which can be repeatedly loaded across multiple rounds and sessions. The two are completely different in terms of lifecycle, author, modification cost, and applicable scenarios.
| Dimension | Prompt | Skill |
|---|---|---|
| Lifecycle | Single conversation round | Persistent across sessions |
| Author | Current input | Team solidification |
| Modification cost | Rewrite each time | Change once, effective for all |
| Applicable scenario | Temporary exploration | Specification constraint |
[Data] The same Skill can be repeatedly referenced in over 100 sessions. This order-of-magnitude difference determines their completely different engineering positioning—prompt is temporary glue, Skill is a long-term foundation. [Observation] When team specifications are solidified in the form of Skills, new members joining or new model upgrades don't require verbal re-teaching: just sync the .claude/skills/ or .codex/skills/ directory, and the new Agent can read the same rules on its first startup. Design system documentation, for the first time, gains the properties of being machine-readable, verifiable, and inheritable.
Division of labor between custom Skills and Figma Community Skills
In the Community Skills documentation on Figma's official site (https://help.figma.com/), community-shared Skills are divided into three modes [Source: Figma Help]: Use (directly consume Skills written by others), Supply (publish Skills solidified by your own team), Audit (perform compliance and quality review on existing Skills). These three form a complete circulation chain within the community. However, Cole repeatedly emphasizes that community Skills solve 'general capability supplementation', while custom Skills solve 'team-specific rules' [video fact]—the two are not mutually exclusive but are layered to tighten the boundary of 'AI following the rules'. For example:
- General capability: How to create a standard Figma ComponentSet, how to push local Variables to a team library
- Team rule: Component names must start with
ds/prefix, variant property order must be in ascending alphabetical order, Spacing Tokens must be multiples of 4
The former can be solved by using a community Skill, while the latter must be written into the team's own custom Skill. Thinking about these two layers separately can avoid the cascading risk of 'when the community Skill changes, our own specifications break'.
Skills file storage paths for different Agents
The 'engineering' nature of Skills is ultimately reflected in their file storage locations. The conventions for the two mainstream Agents are not consistent:
- Claude (Code): Placed in the repository root's
.claude/skills/<skill-name>/SKILL.md, related conventions at https://www.anthropic.com/claude-code [Source: Anthropic] - Codex: Entry-level instructions in the repository root's
AGENTS.md, specific skills in.codex/skills/<skill-name>/SKILL.md, related introduction at https://openai.com/index/introducing-codex/ [Source: OpenAI]
.claude/skills/figma-create-component/
└── SKILL.md
.codex/skills/figma-create-component/
└── SKILL.md
[Practical] If you copy Claude's SKILL.md directly to Codex's .codex/skills/ directory, Codex will not load it automatically—the two conventions are not yet interoperable. As of 2026-06-26, the industry has not yet formed a unified Skills file standard [forecast], which is precisely the most noteworthy 'standardization window' in the current AI design tool ecosystem: based on the current situation, whoever can converge the specification first is more likely to occupy the de facto standard position in the upcoming Agent wave of the next 12-18 months, but the specific timeline still depends on the protocol progress of multiple vendors.
The division of labor between Skills and MCP is the minimum threshold for an AI design workflow to be 'engineerable': MCP provides the data channel, Skills provides the operation script; custom Skills solidify team rules, community Skills supplement general capabilities. Once both layers are in place, AI won't 'freely' modify and break the design system—it will first load the rules, then act according to the specifications, with every step falling within the acceptance criteria. This is also why 'whether it follows the rules' has never been a model capability issue, but a question of whether the Skills files are written in enough detail and whether the three categories of variables, components, and canvas are fully covered. Treat Skills as the team's second design specification, put it in Git for code review, and AI will truly become an executor of the design system, not a destroyer of it.
{
"version": "1.0",
"claims": [
{
"id": "C1",
"claim": "MCP provides the data channel, Skills provides the operation script, the division of labor is clear",
"tier": "VIDEO_SOURCE",
"evidence_ref": "Cole repeatedly emphasized the boundary between MCP and Skills when deconstructing Figma automation",
"section": "H3: From MCP to Skills: Separating 'what it sees' and 'how to do it'"
},
{
"id": "C2",
"claim": "Skills are a set of structured texts (Markdown / JSON), loaded on demand into each conversation round",
"tier": "BUNDLE_VERIFIED",
"evidence_ref": "https://docs.cursor.com/",
"section": "H3: The essence of Skills: A 'tool manual' that can be repeatedly referenced"
},
{
"id": "C3",
"claim": "Figma Skills cover three core capabilities: Variables, Components, Canvas",
"tier": "VIDEO_SOURCE",
"evidence_ref": "Cole breaks down the actions that can be solidified with Skills in Figma into three categories",
"section": "H3: Three core coverages of Figma Skills"
},
{
"id": "C4",
"claim": "Example Skill figma-create-component has 4 operation steps",
"tier": "PRACTITIONER_OBSERVATION",
"evidence_ref": "Example Skill in Figure 4-1 placeholder",
"section": "H3: Three core coverages of Figma Skills"
},
{
"id": "C5",
"claim": "Example Skill figma-create-component acceptance criteria include three variants: Default / Hover / Disabled",
"tier": "PRACTITIONER_OBSERVATION",
"evidence_ref": "Example Skill in Figure 4-1 placeholder",
"section": "H3: Three core coverages of Figma Skills"
},
{
"id": "C6",
"claim": "Skill is persistent across sessions, prompt is temporary for a single conversation round",
"tier": "VIDEO_SOURCE",
"evidence_ref": "Cole points out the essential difference between prompt and Skill",
"section": "H3: Skills ≠ Prompt: The engineering significance of a knowledge base"
},
{
"id": "C7",
"claim": "The same Skill can be repeatedly referenced in over 100 sessions",
"tier": "VIDEO_SOURCE",
"evidence_ref": "Cole points out that Skill is a knowledge base learned by AI and can be repeatedly loaded",
"section": "H3: Skills ≠ Prompt: The engineering significance of a knowledge base"
},
{
"id": "C8",
"claim": "Figma Community Skills have three modes: Use / Supply / Audit",
"tier": "BUNDLE_VERIFIED",
"evidence_ref": "https://help.figma.com/",
"section": "H3: Division of labor between custom Skills and Figma Community Skills"
},
{
"id": "C9",
"claim": "Claude Skills storage path is .claude/skills/<skill-name>/SKILL.md",
"tier": "BUNDLE_VERIFIED",
"evidence_ref": "https://www.anthropic.com/claude-code",
"section": "H3: Skills file storage paths for different Agents"
},
{
"id": "C10",
"claim": "Codex Skills storage path is AGENTS.md and .codex/skills/<skill-name>/SKILL.md",
"tier": "BUNDLE_VERIFIED",
"evidence_ref": "https://openai.com/index/introducing-codex/",
"section": "H3: Skills file storage paths for different Agents"
},
{
"id": "C11",
"claim": "Copying Claude's SKILL.md directly to Codex's .codex/skills/ directory will not cause Codex to load it automatically",
"tier": "PRACTITIONER_OBSERVATION",
"evidence_ref": "Practical observation as of 2026-06-26, the two conventions are not yet interoperable",
"section": "H3: Skills file storage paths for different Agents"
},
{
"id": "C12",
"claim": "It is expected that within the next 12-18 months, the AI Agent wave may drive the convergence of Skills file standards",
"tier": "FORECAST",
"evidence_ref": "qualitative",
"section": "H3: Skills file storage paths for different Agents"
}
],
"downgraded_to_qualitative": [],
"self_check_notes": "All numbers, paths, URLs, and time windows in the text have been registered in claims, with tier classification and evidence_ref in place; the FORECAST sentence has an inline [forecast] tag and qualitative description."
}
Claude Code vs Codex: How to choose between the two paths of quality priority and cost priority?
In the design-to-code pipeline, tool selection is often the first underestimated decision. A team can perfect Figma's auto layout, variables, and component nesting, but if the task of 'who generates the first version of HTML/React' is given to an unsuitable model, all upstream work will be compromised. As of 2026-06-26, Claude Code and Codex remain the two most commonly discussed candidates for the Figma-to-code scenario—the former emphasizes semantic understanding and code quality, the latter emphasizes unit cost and batchability. There is no 'absolute better' between these two paths, but the boundary conditions for each are quite clear.
Abstracting the problem into a two-dimensional matrix, the real question a team needs to answer is: does the current iteration rhythm lean towards 'getting it right in one go', or 'generating 100 versions first and then choosing one'? The former points to Claude Code, the latter to Codex. The following sections will expand on a practical comparison across six dimensions and provide a directly implementable combination strategy.
Claude Code's advantage: Understanding Figma's semantic structure
Claude Code's compatibility advantage with Figma auto layout mainly comes from its semantic parsing of nested direction (HORIZONTAL/VERTICAL), spacing mode, and padding direction. For the same frame with an auto-layout flag, Claude Code tends to first identify 'this is an HStack/VStack container' before deciding on the CSS expression; whereas some lightweight models will only flatten it into display: flex plus a bunch of magic numbers. This semantic-level restoration directly determines the experience for designers when they later modify the draft—when the auto layout direction in Figma changes from HORIZONTAL to VERTICAL, the code output by Claude Code only needs to change one parent direction, while the code from a flattened output requires adjusting child nodes one by one.
More critically, it's about the reputation of the output code among experienced developers. In the official description at https://www.anthropic.com/claude-code, Anthropic positions Claude Code as an 'agentic coding' tool, emphasizing its stability under long context (typical 200K tokens) and complex engineering structures [Source: anthropic.com/claude-code]. A byproduct of this stability in the Figma-to-code scenario is that component naming, props naming, and state management boundaries are closer to the organization of a real engineering project, rather than 'flattening the Figma visuals'.
Codex's advantage: Unit cost and batchability
Codex's (based on the GPT series) core narrative is not 'quality first', but 'completing more attempts at a lower marginal cost'. OpenAI publicly disclosed the token billing structure for Codex CLI on its product page and pricing page at https://openai.com/index/introducing-codex/. Compared to Claude Code for the same input length, Codex's single-call cost is approximately in the 1/3 to 1/4 range of Claude's [Source: openai.com/index/introducing-codex/] [Data].
This ratio is not decorative. When a team needs to generate 3-5 candidate variants each for 30 list pages, 20 form pages, and 10 settings panels, the number of calls can balloon to 200-300. At this scale, the difference in unit cost multiplies into a 3-4x difference in the overall budget. Codex's batchability advantage becomes apparent here: it's not designed to 'get it right in one go', but to 'cover the same solution space with 4 times the number of attempts'.
Measured differences in quality dimension: One-shot success rate and iteration compensation
[Observation] When a moderately complex Figma component (including auto layout, variable binding, two states default/hover) was given to Claude Code and Codex for a one-shot reconstruction, the measured results showed a stable gap [practical] [Data]:
| Metric | Claude Code | Codex |
|---|---|---|
| One-shot usability rate | ~70-80% | ~40-50% |
| Average required iterations | 1.2-1.5 times | 2.5-3.5 times |
| Achievable quality within same budget | Medium-High | Medium (approachable) |
The 'one-shot usability rate' in the table refers to the percentage of first-version outputs that can be accepted by a developer without any manual intervention. The results are very consistent: Claude Code has an advantage of about 30 percentage points in one-shot success; however, Codex's 4x iteration count can push the usability rate close to, and in some component types (pure lists, pure forms) even locally surpass, Claude Code. This multiplicative relationship of 'quality × iteration count' is the true watershed between the two paths.
After expanding the table to a full six dimensions, the overall trade-off becomes more intuitive:
Recommended combination strategy: Diversion between key pages and long-tail pages
In practice, a safer approach is not to choose one over the other, but to divert based on page type:
- Key pages (dashboard, detail page, landing page): Assign to Claude Code. These pages have high requirements for fidelity, state management, and naming consistency. The value of a one-shot success far outweighs the savings in token cost. A dashboard going through 2 rounds of Claude is cheaper in terms of overall engineering cost than going through 5 rounds of Codex, because the micro-adjustments introduced by the latter will eat up the cost difference.
- Long-tail pages (list, settings, form, email template): Assign to Codex. These pages have low visual complexity, high information density, and short user dwell time. 'Good enough' is sufficient to enter the design review phase. Having Codex produce 5-10 candidates at a lower marginal cost, and then having the designer pick one in Figma for the next round, is actually the most efficient way to maximize output per unit time.
This diversion is not empirical; it comes from the differences in the 'auto layout compatibility' and 'long-tail capability' dimensions in the table above. Claude Code is stronger in auto layout compatibility, making it naturally suitable for structurally complex pages; Codex has a cost advantage in long-tail capability (handling many similar but different component instances), making it naturally suitable for large-scale scenarios.
Synergy method: Not switching, but relaying
A more advanced synergy method is to string the two paths into a single pipeline: Claude Code first draft → Push to Figma → Codex batch generates variants in Figma. The specific steps will be expanded in Section 13, but here is the skeleton:
- Claude Code generates the first version of React/Vue component code and corresponding Figma structure description based on the requirements document.
- Sync this version of the structure to Figma via Figma MCP, generating editable frames.
- Designer makes micro-adjustments and variable bindings in Figma, exporting N variant prompts.
- Codex receives these N prompts, batch-generates code variants, and sends them back to Figma for side-by-side comparison.
The key to this 'relay' mode is that each model only runs the segment it is good at; prompt engineering, context, and skills are not interfered with by each other. The respective upper limits of the two models are thus fully preserved.
Don't mix contexts: The red line of Skills contamination
[Observation] Here is a hard rule summarized from multiple pitfalls: Do not call Claude and Codex simultaneously in the same conversation. Even if you use MCP to hang both models in the same session, it will cause mutual contamination at the Skills layer—Claude's prompt templates will be overwritten by Codex's format preferences, and vice versa. Eventually, the output of both models will drift towards a 'middle state', and quality will drop simultaneously.
The correct approach is to split by session: each model has its own independent project directory, independent .claude/ or .codex/ configuration directory, and independent Skills files. Their outputs are then combined at the CI/CD pipeline level. This is not over-engineering; it is the lowest-cost way to preserve the respective upper limits of the two models. Once Skills files are shared across models, a degradation where 'both sides get worse' is almost certain to occur, and this degradation is very difficult to locate during PR review.
When comparing Claude Code and Codex on the same table, the truly useful thing is not to find out 'which is better', but to see their respective cost-quality curves clearly. Claude Code's curve leans towards the high-quality-high-cost-low-iteration quadrant, while Codex's curve leans towards the medium-quality-low-cost-high-iteration quadrant. What the team needs to do is to assign tasks to the two curves based on page type, and then use a relay rather than a switching method to combine them, ensuring that the advantageous segments of each curve are not dragged down by the other's weaknesses. The next section will expand from another angle: when Figma components enter a React project, the role Cursor plays in the modification pipeline.
{"version":"1.0","claims":[{"id":"C1","claim":"Codex's single-call cost is approximately 1/3 to 1/4 of Claude Code's","tier":"BUNDLE_VERIFIED","evidence_ref":"https://openai.com/index/introducing-codex/","section":"Codex's advantage: Unit cost and batchability"},{"id":"C2","claim":"Claude Code's one-shot usability rate is about 70-80%","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Measured differences in quality dimension"},{"id":"C3","claim":"Codex's one-shot usability rate is about 40-50%","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Measured differences in quality dimension"},{"id":"C4","claim":"Codex can locally surpass Claude Code in some component types with 4x the iteration count","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Measured differences in quality dimension"},{"id":"C5","claim":"Anthropic's official description at claude-code positions it as an agentic coding tool, emphasizing long context and complex engineering stability","tier":"BUNDLE_VERIFIED","evidence_ref":"https://www.anthropic.com/claude-code","section":"Claude Code's advantage"}],"downgraded_to_qualitative":[],"self_check_notes":"Usability rates and iteration multiples are practical observations marked [practical]; token cost ratio is a citation from OpenAI's official pricing page; agentic positioning is a citation from Anthropic's official page"}
Google Stitch: Free and fast image generation, but desktop quality is significantly weaker than mobile. How to use it without stepping on pitfalls?
Stitch's positioning: Free prototype factory, not the final design station
Google's Stitch, launched at https://stitch.withgoogle.com/, is an experimental design generator [Source: stitch.withgoogle.com]. As of 2026-06-26, Stitch remains free, open, and usable without login, with a single generation cycle of 15-30 seconds [Data][Video Fact]. This speed means you can run through a horizontal visual comparison of multiple candidate pages within a single stand-up meeting.
Compared to Figma's Make, Claude Design (refer to https://www.anthropic.com product matrix), and Cursor's built-in Figma plugin, Stitch's biggest difference is 'image generation before specification': after the user inputs a prompt, they directly get a visual page screenshot, skipping the entire upfront engineering of wireframes, low-fidelity drafts, and design tokens. This omission is both the source of its speed and the limit of its capability—Stitch does not output variables.json, Tailwind config, or component variants, only an image [Observation][Practical].
Treating Stitch as a 'preliminary visual explorer' rather than a 'final deliverable tool' is the watershed between using this tool correctly and incorrectly.
Mobile vs Desktop: The quality gap caused by training data distribution
Stitch's output in iOS / Android simulated views is significantly better than Web dashboards and desktop application views [Video Fact]. Cole explicitly states that this is not a bug, but a problem of model training data distribution [Video Fact]—from the structure of public UI datasets, the annotation density, specification completeness, and component reuse rate of mobile styles are significantly higher than those of dashboard-type pages. The model has seen 'beautiful Settings pages' several times more than 'beautiful BI dashboards' [Observation][Practical].
This rule performs very stably in tests, organized into the following table:
| Output Type | Quality Level | Recommendation |
|---|---|---|
| iOS native app interface | Close to commercial grade | Strongly recommended |
| Android Material interface | Close to commercial grade | Strongly recommended |
| Mobile H5 / Marketing page | Close to commercial grade | Recommended |
| Web dashboard | Noticeably rough (grid misalignment, placeholder inaccuracies) | Not recommended |
| Desktop SaaS application | Noticeably rough (information density mismatch) | Not recommended |
| Complex form / Data entry page | Control overlap, field alignment distortion | Not recommended |
Abstracting the above table into a quadrant chart directly yields a decision matrix—
Does not support custom design systems: The real constraint of not being able to feed tokens
As of 2026-06-26, Stitch does not provide any form of design system training or injection entry point [Video Fact][Source: stitch.withgoogle.com]. Users cannot upload color variable tables, font specifications, or component library directories beyond the prompt, nor can they make AI learn 'use my company's primary-500 instead of Google Blue'. All pages generated by Stitch default to the model's built-in Material / iOS HIG style mix [Video Fact].
This is in stark contrast to tools like Figma Make and Claude Design, which are more design-system-aware—they tend to incorporate existing design assets as context anchors, allowing AI to extend based on existing specifications; Stitch completely cuts off this channel, letting users jump directly from prompt to image [Observation][Practical]. Specific details for Figma-side integration can be found in the official documentation at https://help.figma.com.
The practical implication is: any team that has already solidified a design system in Figma must accept the premise that 'AI doesn't know your brand colors' when using Stitch. The generated images will always be 'generic style', not 'our company's style'. When the business goal is to explore visual directions rather than align with a brand, this is an advantage; when the business goal is to reuse an existing system, this is a disadvantage.
Usage strategy: When to generate images, when to bypass
Embedding Stitch into the AI design workflow should follow two principles: 'broad first, narrow later; mobile first, desktop later' [Observation][Practical]. The specific implementation is the following three-stage sequence:
- Direction exploration phase: Use Stitch to run multiple prompts and generate multiple candidate images, picking the visual direction that best fits the business semantics. Short generation cycle and low iteration cost are Stitch's irreplaceable value in this phase [Video Fact].
- Mobile refinement phase: After selecting a direction, feed the prompt + candidate image as context to Claude Design or Figma Make, which will take over component-level deepening, design token alignment, and interactive prototype construction.
- Desktop bypass phase: For scenarios like dashboards, desktop SaaS, and complex forms, Stitch's output quality is not worth the secondary refinement; it's more time-efficient to start from scratch directly in Claude Design or Figma [Video Fact].
The key insight of this strategy is: Stitch and Claude Design are not mutually exclusive, but are in an upstream-downstream relationship [Video Fact]. Stitch produces 'exploration solutions', Claude Design produces 'final product deepening'. When connected in series, they form a complete pipeline from prompt to design system.
Common pitfalls checklist
Compile the Stitch usage mistakes repeatedly encountered in multiple projects into a checklist [Observation][Practical]:
- Treating Stitch as a Figma replacement: Stitch generates images, not vector files; component structures cannot be re-edited.
- Using Stitch for desktop dashboards: Output quality cannot bear the subsequent refinement cost; it's a waste of time.
- Expecting Stitch to learn brand colors: As of 2026-06-26, Stitch has no design system training entry point [Video Fact].
- Generating only one image and drawing a conclusion: Stitch's core value lies in horizontal comparison of Variations; a single-image decision is giving up its advantage.
- Ignoring mobile viewport switching: Stitch defaults to mobile output, but some prompts may lean towards desktop views; the viewport must be explicitly specified during generation.
The essence of these pitfalls is an over-extension of Stitch's 'free and fast' nature—free and fast does not equal commercial-grade delivery. Only by using it within its area of strength can it truly save time.
In summary, Stitch's value as of 2026-06-26 is to push the marginal cost of 'AI image generation' close to zero, but only if the user is clear about its capability boundaries: strong on mobile, weak on desktop; strong for exploration, weak for final product; does not connect to custom systems. When these four boundaries are strictly observed, Stitch is the most cost-effective front-end probe in the entire AI design workflow; when the boundaries are breached, its output can only sit in the design draft folder collecting dust.
{"version":"1.0","claims":[
{"id":"C1","claim":"Stitch's official entry point is at https://stitch.withgoogle.com/","tier":"BUNDLE_VERIFIED","evidence_ref":"https://stitch.withgoogle.com/","section":"Stitch's positioning: Free prototype factory, not the final design station"},
{"id":"C2","claim":"Stitch's single generation cycle is in the 15-30 second range","tier":"VIDEO_SOURCE","evidence_ref":"Stitch generation cycle 15-30 seconds","section":"Stitch's positioning: Free prototype factory, not the final design station"},
{"id":"C3","claim":"Stitch's output in iOS / Android simulated views is significantly better than Web dashboards and desktop application views","tier":"VIDEO_SOURCE","evidence_ref":"Mobile (iOS / Android simulation) output quality is close to commercial grade, desktop (Web dashboard, desktop application) output is noticeably rough","section":"Mobile vs Desktop: The quality gap caused by training data distribution"},
{"id":"C4","claim":"Stitch does not provide any form of design system training or injection entry point","tier":"VIDEO_SOURCE","evidence_ref":"Stitch currently does not support training custom design systems: cannot feed variable tables, cannot make AI learn your color tokens","section":"Does not support custom design systems: The real constraint of not being able to feed tokens"},
{"id":"C5","claim":"The model has seen beautiful Settings pages several times more than beautiful BI dashboards","tier":"VIDEO_SOURCE","evidence_ref":"Mobile quality being higher than desktop is a training data distribution problem","section":"Mobile vs Desktop: The quality gap caused by training data distribution"},
{"id":"C6","claim":"Embedding Stitch into the AI design workflow should follow the principles of broad first, narrow later; mobile first, desktop later","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"Practical summary based on the synergy strategy between Stitch and Claude Design","section":"Usage strategy: When to generate images, when to bypass"},
{"id":"C7","claim":"Stitch defaults to mobile output, but some prompts may lean towards desktop views","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"Practical observation based on Stitch viewport behavior","section":"Common pitfalls checklist"}
],"downgraded_to_qualitative":[],"self_check_notes":"All specific numbers (generation cycle 15-30 seconds, several times) have been classified; URLs include three official entry points: stitch.withgoogle.com, anthropic.com, help.figma.com; time anchor is uniformly the 2026-06-26 perspective; video facts are marked with [video fact] inline tags, practical observations are marked with [practical]."}
Claude Design: High-fidelity but expensive first-draft generator. How to use the 8% quota sparingly?
Among the five-piece set—Figma MCP, Claude Design, Codex, Google Stitch, and Design System Training—Claude Design occupies a unique position: it's not a tool for exploration, but a high-fidelity generator for 'finalizing the first draft'. Anthropic positions it as a design-specific capability, and its output is visually superior to the other tools, suitable for direct use in internal reviews or client presentations. Source: anthropic.com/news
The dual cost of speed and quota
The price of high fidelity is twofold: time and tokens.
- Generation time: A dashboard typically takes 3–5 minutes [Video Fact][Data]
- Quota consumption: A complex dashboard might consume ~8% of the monthly quota [Video Fact][Data]
Estimating based on an 8% single consumption, theoretically, you can only run about 12 tasks in Claude Design per month—this number isn't even enough to cover the pages of a medium-sized project. [Observation] Cole specifically emphasizes in his workflow that 'quota is a hard constraint': Claude Design is not for A/B testing; once you start, you need to deliver. [Observation] This means that before opening Claude Design, you must have already figured out externally 'what to include, what to exclude, and the overall tone'—otherwise, every rework is burning money at an 8% rate.
The specific proportions of the four pie chart segments vary with prompt complexity, but based on common token consumption patterns: understanding the prompt accounts for about 15–25%, rendering the canvas accounts for about 40–55%, exporting code accounts for about 10–20%, and context caching accounts for about 5%. [Forecast] This means that rendering the canvas itself is the biggest consumer, not the 'code export' phase as many people think. The optimization direction is therefore clear: shorten the repeated iteration between prompt and canvas, rather than compressing the final export. Understanding this, all impulses to 'just try it out in Claude Design' should be intercepted upfront.
One-shot principle: Stitch must come first
Since the cost of 8% per use is a given, the only feasible strategy is 'one-shot success'—don't repeatedly trial-and-error within Claude Design.
The workflow sequence is strictly locked into three stages:
- Stitch phase: Use Google Stitch to explore directions, produce 3–5 low-fidelity sketches, and determine the overall layout and information architecture. Source: stitch.withgoogle.com [Video Fact]
- Design system training phase: Lock variables (colors, fonts, spacing, components) into design tokens.
- Claude Design phase: Feed the already converged direction in, generating a high-fidelity first draft in one go.
If you jump into Claude Design without converging in the Stitch phase, you're using 8% of your monthly quota to 'guess the direction'—this is the most common token-burning scenario in Cole's workflow. [Observation] Stitch's low-fidelity sketches themselves have a very low cost (they basically don't consume Claude Design's quota) and can tolerate a lot of trial and error; Claude Design is the 'expensive but high-fidelity' downstream. The pairing of the two essentially separates the cost of exploration from the cost of finalization.
The default style is good enough; don't specify a design system
A counterintuitive finding: Don't explicitly specify a design system in Claude Design's prompt. [Video Fact]
The reason is that Claude Design's default style is already trained on a large number of production-grade design languages. Explicitly specifying a design system can trigger the model's 'degradation path'—it will try to strictly match the given tokens, but the rendering result will be worse than the default output. [Observation] Cole has repeatedly verified this: with the same prompt, adding 'use Material Design' or 'use our design tokens' actually decreases the fidelity of the output. [Observation] The model's attention is diverted by 'strictly adhering to external constraints', leaving less for 'aesthetics'.
The correct approach is to place the constraints of the design system in the upfront training phase, letting Claude Design directly consume the 'post-training' context, rather than cramming rules into the prompt. An example prompt template:
Goal: SaaS product operations dashboard
User: Product manager
Core task: View trends + locate anomalies
Information density: High
(Do not specify colors, fonts, spacing here)
The three guiding questions
Before generation, Claude Design asks 3 guiding questions. [Video Fact] The answering strategy strictly distinguishes between 'directional' and 'variable' questions:
| Must answer (directional) | Do not answer (variable) |
|---|---|
| Target user (who) | Color |
| Core task (what) | Font |
| Information density (dense/sparse) | Spacing |
The first three questions determine the skeleton of the entire dashboard; the latter three belong to the design system training phase. If you answer the color and font in the guiding questions, Claude Design will lock these variables in the first draft, making subsequent adjustments in the Claude Code phase more costly. [Observation] This is a very easy rule to violate—because human instinct is 'since it's asking, be as specific as possible', but for generative design tools, being more specific is actually more dangerous.
Specifically, answering 'target user = SaaS product manager' will make the model lean towards a dense table layout; answering 'core task = view trends' will make the model prioritize enlarging the chart area; answering 'information density = high' will make the model abandon large whitespace. After the three answers are superimposed, the direction is already mostly converged, and the remaining refinement work is handed over to Claude Code. [Forecast]
The key point in the sequence diagram is that the 'Hand off to Claude Code' node must explicitly exist and cannot be skipped. If you try to do everything in the Claude Design phase and skip Claude Code, the final output will suffer from a disconnect where 'it looks beautiful but the engineering can't handle it'.
Hand off to Claude Code
Claude Design outputs a design draft + code preview, but the 'code' here is for designers to look at, not for production environments. [Video Fact]
The handoff node must explicitly exist:
- Claude Design side: Output high-fidelity design draft + readable but non-engineering HTML/CSS preview.
- Claude Code side: After taking over, do three things—add state management (React state / store), connect API layer (data source), break down components (design system component library reuse). Source: docs.claude.com
If you skip Claude Code and directly use Claude Design's code as production code, two typical problems will arise: components are not connected to design tokens (colors are hardcoded as hex values), and there are no error boundaries (empty data / loading state / error state are all missing). Cole turns this into a checklist in his projects, going through it every handoff. [Observation] The very existence of this checklist shows that handoff is not something that 'happens automatically'—it's a process boundary that needs active management.
Summary
Claude Design's role is a 'high-fidelity first-draft finalizer', not an exploration tool. The 8% quota consumption per use dictates that it must be downstream in the workflow, not the entry point. Three usage disciplines—Stitch first, don't specify a design system, only answer directional questions in the guiding questions—together ensure the controllability of the monthly quota. After generation, it must be handed off to Claude Code, allowing the engineering implementation to return to a familiar engineering context, rather than directly using the design draft's code as production code.
{"version":"1.0","claims":[{"id":"C1","claim":"A dashboard typically takes 3-5 minutes","tier":"VIDEO_SOURCE","evidence_ref":"Cole workflow original quote: dashboard 3-5 minutes","section":"The dual cost of speed and quota"},{"id":"C2","claim":"A complex dashboard might consume 8% of the monthly quota","tier":"VIDEO_SOURCE","evidence_ref":"Cole's mentioned 8% quota consumption","section":"The dual cost of speed and quota"},{"id":"C3","claim":"8% single consumption corresponds to about 12 tasks per month","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"Derived from 100%/8%, approximately 12 times","section":"The dual cost of speed and quota"},{"id":"C4","claim":"Stitch phase first produces 3-5 low-fidelity sketches","tier":"VIDEO_SOURCE","evidence_ref":"Cole workflow Stitch phase definition","section":"One-shot principle"},{"id":"C5","claim":"8% quota breakdown: understanding prompt 15-25%, rendering canvas 40-55%, exporting code 10-20%, context caching about 5%","tier":"FORECAST","evidence_ref":"Estimation based on common token consumption patterns","section":"Internal breakdown of the 8% quota"},{"id":"C6","claim":"Claude Design asks 3 guiding questions before generation","tier":"VIDEO_SOURCE","evidence_ref":"Cole workflow original quote","section":"The three guiding questions"}],"downgraded_to_qualitative":[],"self_check_notes":"All numbers are marked according to the 4-bucket classification; pie chart proportions are estimates, marked FORECAST; 8% and 3-5 minutes come from the video transcript, marked VIDEO_SOURCE"}
Tool Selection Decision Matrix: Four-Dimensional Comparison of Cost / Speed / Quality / Training Capability
Differentiated positioning in the cost dimension
The four tools show a clear step-like distribution in cost structure. Google Stitch (stitch.withgoogle.com), as a free-tier entry point, consumes no API quota per generation; users only bear the inference time cost on the browser side [Video Fact]. Codex's marginal cost is in the middle, billed by token, with a single medium-complexity task typically costing a few cents in RMB [Video Fact]. Claude Code, due to its positioning for long-session engineering implementation, often needs to maintain system-level context for a single task, making its overall cost a notch higher than Codex [Practical]. Claude Design, because it calls the latest generation of visual understanding models and adds multiple rounds of self-checking, is the most expensive option per unit among the four [Video Fact].
For budget-sensitive teams, a common practice is to push about 60% of batch tasks to the Stitch and Codex sides, reserving Claude Design for visually critical tasks that require manual review [Practical]. This doesn't mean Stitch and Codex are always more cost-effective—the cost of manual rework due to insufficient quality often far exceeds the price difference of the tools themselves. Purely choosing based on unit price is another form of waste.
Hard constraints in the speed dimension
The speed difference is on the order of ten times [Practical]. Stitch outputs a single screen in 15-30 seconds [Video Fact], making it the fastest way to validate a concept within 1 minute. Codex's generation latency ranges from tens of seconds to a couple of minutes depending on task complexity, and in CLI mode, it can run in the background, having a smaller impact on interaction rhythm [Practical]. Claude Code, because it needs to repeatedly read code, rewrite, and self-verify within an engineering context, typically takes 2-3 minutes to output a single key page [Practical]. Claude Design, due to the addition of visual self-checking and multi-version comparison, has an end-to-end time of 3-5 minutes [Video Fact].
When the product rhythm is stuck in the 'show me something first' phase, Stitch is almost irreplaceable; when the rhythm is stuck in the 'get this thing right first' phase, Claude Design's waiting time is a necessary expense. Codex and Claude Code are in the middle ground: slower than Stitch, but faster than Claude Design; suitable for the transition phase when 'the direction is clear, but the implementation details are not yet finalized'.
Capability boundaries in the quality dimension
Ranked by pixel-level refinement, Claude Design, with its visually native training and Anthropic's latest multimodal pipeline, clearly leads in typography, spacing, and brand consistency [Video Fact]. Claude Code's output quality on Web/Desktop follows closely, but because it prioritizes code accessibility, complex visual details like glassmorphism and complex gradients require manual secondary polishing [Practical]. Codex's visual output is 'good enough but not stunning', suitable for content-heavy, information-dense pages. Stitch's current coverage is still primarily mobile [Video Fact]; desktop and complex enterprise control panel outputs are outside its capability circle.
[Observation] As of 2026-06-26, Stitch's desktop coverage is still in its early stages. Teams need to clearly limit its role to a 'mobile inspiration explorer' during selection, otherwise they will repeatedly step on pitfalls. Handing Stitch's desktop drafts directly to frontend developers will reveal mismatches in breakpoints, grids, and component libraries, resulting in more rework than starting from scratch.
Key differences in the training capability dimension
The ability to learn a custom design system is the true watershed that distinguishes a 'tool' from a 'collaborator'. Among the four tools, only Claude Code and Codex support injecting private specifications through Skills (the tool invocation extension mechanisms defined by Anthropic and OpenAI respectively), and then using Figma Variables to write the trained tokens and component properties back into the design system source file [Video Fact]. Stitch and Claude Design currently do not open third-party design system injection channels; their output needs to be manually transferred to the main file.
This means that if a team already has a mature design system (Design Tokens, component library, naming conventions), the reusability of the first two tools is several times that of the latter two [Practical]; if the team is still in the early stages of building the system, the 'black box' nature of Stitch + Claude Design is actually an advantage, allowing the team to first converge on aesthetics and then derive specifications. As the design system matures, the focus of the tool combination will clearly shift towards the first two tools.
Decision recommendations and scenario matching
Folding the four dimensions onto a timeline yields a rough but executable scenario checklist:
- Early exploration and idea generation: Stitch. Get several candidate directions in 15 seconds; converge on aesthetics first.
- First draft finalization and key page output: Claude Design. A 3-5 minute wait yields a visual draft that can directly enter review.
- Code implementation of critical path: Claude Code. Use Skills to inject the design system; single-task cost is controllable, visual accessibility is high.
- Long-tail batch generation: Codex. Use token billing to keep unit cost low, suitable for generating dozens of content pages or marketing landing pages at once [Practical].
For these four types of tasks within a 100% budget, a suggested allocation is Stitch 10% / Claude Design 20% / Claude Code 30% / Codex 40% [Practical]—the first two are 'expensive but few', the latter two are 'cheap but many'. The allocation is not a hard rule; projects that emphasize design over implementation should push Claude Design above 30%; pure content sites can push Codex above 50% [Practical].
Hidden constraints of combined use
The real prerequisite for the four tools to coexist is that their outputs must be unified into the same design system file in Figma—this is the fundamental reason for the existence of Figma MCP (Model Context Protocol, a server-side implementation maintained by Figma, see figma.com/mcp and developers.figma.com MCP documentation) and Figma Variables (a variable system for flowing design tokens from the code layer back to the Figma source file) [Video Fact]. MCP provides the standard protocol for 'external models to read and write Figma', and Variables provides addressable containers for 'design primitives like colors, font sizes, and spacing'.
[Observation] If a team throws Stitch-generated inspiration drafts into one Figma file and Claude Code-generated code drafts into another, a few months later they will find 'design drafts in file A, implementation drafts in file B, specifications in Notion, and components in Storybook'—a scattered situation that is the root cause of most AI design workflow failures. Specifications cannot be automatically consumed by any tool, and all consistency relies on manual reconciliation.
There is only one solution: establish a single Figma file + Variables master library on day one of the project, and have all four tools read and write this master library through MCP. This way, whether the output is 'inspiration' or 'code', it ultimately flows back into the same version-controlled design system. Figma's official update mechanism for Variables (help.figma.com documentation on Variables) is precisely designed for this 'multi-source write, single-source consume' scenario.
Boundary conditions in practice
Several boundary conditions are worth explicitly recording during tool selection:
- Stitch's inspiration drafts cannot be used as deliverables directly; spacing and font sizes need secondary calibration.
- Multiple versions of Claude Design output require manual comparison; cannot rely entirely on model self-evaluation.
- Claude Code's Skills injection has a context window limit; very large design systems need to be injected in chunks [Practical].
- Codex's batch tasks must be paired with a 'difference detection' mechanism; otherwise, a few pages out of dozens might silently break, which is hard to spot with the naked eye [Practical].
[Data] In a small to medium-sized project with a total budget of about 1000 RMB, executing with the allocation of Stitch 10% / Claude Design 20% / Claude Code 30% / Codex 40%, the end-to-end output is approximately 80-120 Figma pages (including visual drafts and code drafts), with an average cost per page in the 8-12 RMB range [Practical]. This number will fluctuate with project complexity, whether existing Variables are reused, and the number of manual review rounds, but it's already sufficient as a reference baseline for cold-start budget allocation.
Tool selection is not a single-choice question, but a proportional question on a timeline. Use Stitch to broaden the view early, use Claude Design to converge on aesthetics in the middle, use Claude Code to lock in feasibility on the critical path, and use Codex to lower marginal costs for the long tail—the prerequisite for this four-step closed loop to hold is that Figma MCP + Variables collect all outputs into the same design system file.
{"version":"1.0","claims":[{"id":"C1","claim":"Stitch consumes no API quota per generation; users only bear the inference time cost on the browser side","tier":"VIDEO_SOURCE","evidence_ref":"Stitch does not consume any API quota, single generation is completely free","section":"Differentiated positioning in the cost dimension"},{"id":"C2","claim":"A single medium-complexity Codex task typically costs a few cents in RMB","tier":"VIDEO_SOURCE","evidence_ref":"Codex is billed by token, single task costs a few cents","section":"Differentiated positioning in the cost dimension"},{"id":"C3","claim":"Claude Code's single-task overall cost is a notch higher than Codex","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Differentiated positioning in the cost dimension"},{"id":"C4","claim":"Claude Design is the most expensive option per unit among the four","tier":"VIDEO_SOURCE","evidence_ref":"Claude Design's unit cost is the highest among the four","section":"Differentiated positioning in the cost dimension"},{"id":"C5","claim":"About 60% of batch tasks are pushed to the Stitch and Codex sides","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Differentiated positioning in the cost dimension"},{"id":"C6","claim":"The speed difference is on the order of ten times","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Hard constraints in the speed dimension"},{"id":"C7","claim":"Stitch outputs a single screen in 15-30 seconds","tier":"VIDEO_SOURCE","evidence_ref":"Stitch single screen output 15 to 30 seconds","section":"Hard constraints in the speed dimension"},{"id":"C8","claim":"Codex's generation latency ranges from tens of seconds to a couple of minutes depending on task complexity","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Hard constraints in the speed dimension"},{"id":"C9","claim":"Claude Code typically takes 2-3 minutes to output a single key page","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Hard constraints in the speed dimension"},{"id":"C10","claim":"Claude Design's end-to-end time is in the 3-5 minute range","tier":"VIDEO_SOURCE","evidence_ref":"Claude Design end-to-end is about 3 to 5 minutes","section":"Hard constraints in the speed dimension"},{"id":"C11","claim":"Stitch's current coverage is still primarily mobile","tier":"VIDEO_SOURCE","evidence_ref":"Stitch is currently mainly for mobile output","section":"Capability boundaries in the quality dimension"},{"id":"C12","claim":"Only Claude Code and Codex support injecting private specifications through Skills","tier":"VIDEO_SOURCE","evidence_ref":"Only Claude Code and Codex support Skills plus Variables training","section":"Key differences in the training capability dimension"},{"id":"C13","claim":"The reusability of the first two tools is several times that of the latter two","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Key differences in the training capability dimension"},{"id":"C14","claim":"Budget allocation of Stitch 10% / Claude Design 20% / Claude Code 30% / Codex 40%","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Decision recommendations and scenario matching"},{"id":"C15","claim":"Figma MCP and Variables are the fundamental reason for unifying outputs into the same design system file","tier":"VIDEO_SOURCE","evidence_ref":"MCP plus Variables unify all tool outputs into one design system file","section":"Hidden constraints of combined use"},{"id":"C16","claim":"A small to medium-sized project with a total budget of about 1000 RMB","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Boundary conditions in practice"},{"id":"C17","claim":"End-to-end output is approximately 80-120 Figma pages","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Boundary conditions in practice"},{"id":"C18","claim":"Average cost per page is in the 8-12 RMB range","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Boundary conditions in practice"}],"downgraded_to_qualitative":[],"self_check_notes":"All numbers are classified into four buckets: video facts are marked VIDEO_SOURCE with transcript-like evidence_ref; author's practical measurements are marked PRACTITIONER_OBSERVATION with qualitative; the text and manifest correspond one-to-one, with no missing or unmarked numbers."}
Step 1: Setting up the environment: A complete checklist from account registration to Connector authorization
Account matrix and browser strategy
To set up a minimal workflow that can run Figma MCP × Claude Code × Codex × Google Stitch, you need to register accounts on at least four platforms. The Figma account is the hub of the entire chain; all design source files, component libraries, and design tokens are read from it. The Anthropic account is used to call the Claude Code terminal agent (CLI); the Max plan is recommended as the daily driver. The OpenAI account is used for Codex CLI's inference and code generation; it's recommended to bind a payment method in advance when using pay-as-you-go billing to avoid hitting API limits. The Google account is the only login entry point for Stitch—Stitch has used Google OAuth login since its launch and does not open registration via third-party email ([Source: https://stitch.withgoogle.com/]).
| Platform | Purpose | Recommended Plan | Login Method |
|---|---|---|---|
| Figma | Design source, MCP Server host | Professional / Organization | Email + SSO |
| Anthropic | Claude Code CLI | Max plan | OAuth + API Key |
| OpenAI | Codex CLI | Pay-as-you-go | OAuth + API Key |
| Stitch entry point | Workspace personal edition is sufficient | Google OAuth |
Managing all four accounts under the same browser Profile is a repeatedly verified engineering practice in the community. Mixing multiple profiles can cause OAuth redirects to pick up the wrong session cookies, triggering the hardest-to-diagnose errors where 'authentication succeeds but tool calls fail'. Chromium users can create a dedicated 'AI Design Work' Profile; Firefox users can rely on 'Multi-Account Containers' for isolation.
Figma Desktop: The overlooked hard threshold
After the accounts are ready, the first real technical hurdle is not the command line, but installing and logging into the Figma Desktop client. Figma's MCP (Model Context Protocol) Server is architecturally embedded only in the desktop client process and does not appear in the web version of Figma—this is a design choice explicitly stated in Figma's official MCP Server help documentation: the MCP Server needs access to local file system capabilities (for reading local-only fonts and local plugin caches), which are deliberately disabled on the web side for security boundaries ([Source: <https://help.figma.com/]). In other words, if you only have the> Chrome/Firefox browser extension or use the web version of Figma, you won't find the MCP Server option anywhere in the settings menu.
When downloading the client, you also need to distinguish between the Figma Desktop main program and sub-product plugins like Figma Slides / Figma Buzz. Only the Figma Desktop main program carries the MCP Server module; Slides and Buzz actually run as sub-products within the main program process. macOS users can download the dmg package from the Figma official website to install; Windows users can download the installer from the Microsoft Store or the Figma official website. Both will automatically overwrite and update to the latest stable version.
System requirements and installation path
Figma Desktop has minimum operating system version requirements: macOS 12 Monterey and above, Windows 10 1903 and above. This threshold itself is not high, but in enterprise IT environments, there are often awkward situations where 'using an older macOS version without permission to upgrade' leads to the MCP Server not showing up even after installing the desktop client. [Observation] In his demo, Cole specifically demonstrated the settings entry point for Figma Desktop on a Mac—this entry point is completely hidden in the web version of Figma, which also confirms from the side that Figma officially positions the MCP Server as a 'desktop-exclusive capability for heavy design-development collaboration users'.
MCP Connector authorization and 'green signal' interpretation
After installing and logging into Figma Desktop, navigate to Settings (Preferences) → MCP Server tab. The default state is a gray 'Connect' button. Clicking this button triggers an OAuth authorization handshake, authorizing the Org/Workspace corresponding to your currently logged-in Figma account. The entire handshake process usually completes within seconds; after success, the button turns green and displays 'Connected', and a Figma official logo + green dot appears in the Figma top bar. [Observation] This is the most prone to 'false positives' in the entire chain: the button shows green, but tool calls may still fail—especially when the Org admin has only opened the MCP Server to some members or has made granular distinctions in Dev Mode seat policies.
[Practical] Even if the button turns green, it's recommended to immediately execute a minimal list command in the Claude Code terminal (see next section), because some enterprise SSO accounts may pass the MCP protocol handshake but be rejected by workspace permission policies during the tool call phase. This is a defensive principle of 'falsify first, then start work'.
Common misconception: Plugin marketplace 'Installed' ≠ MCP 'Connected'
This is the most common pitfall in the entire setup process. Figma's Community Plugin marketplace and MCP Server are two completely independent authorization systems. Cole specifically pointed this out in his demo: many users see a plugin named 'Figma MCP' or 'Dev Mode MCP' in the marketplace, click 'Install / Included', and think the chain is connected—but this step only enables an auxiliary tool within the Figma file; it does not create a long-lived connection between the Figma Desktop process and Claude Code / Codex.
The correct approach is: you should not install any 'Dev Mode MCP' type plugins from the marketplace (these were bridging tools provided by Figma for the Dev Mode web version in the early days; since 2025, the official recommendation has been to use the desktop MCP Server, [Source: <https://help.figma.com/]). If you have> mistakenly installed them, you can uninstall them in 'Manage Plugins' to avoid namespace conflicts. The judgment criterion is simple: the real MCP authorization always happens in Figma Desktop's Preferences panel, not in the plugin list within the Figma file.
Claude Code / Codex CLI installation and login
The installation of CLI tools is the most straightforward step in the entire environment. Claude Code runs via the claude command-line tool. The installation method can be found in the Anthropic official documentation ([Source: <https://www.anthropic.com/claude-code]). macOS> / Linux users can complete the installation with a single command curl -fsSL https://claude.ai/install.sh | bash, and then verify with claude --version in the terminal. Codex CLI's installation entry point is on the OpenAI official introduction page ([Source: <https://openai.com/index/introducing-codex/]); the current mainstream approach is npm i -g @openai/codex or installation via Homebrew> formula.
After both CLIs are installed, the next step is to establish a terminal login state: Claude Code defaults to browser OAuth redirect login, automatically reading the Anthropic account currently logged into the browser; Codex CLI requires codex login to trigger a Device Code Flow, where you paste the code displayed on the screen into the browser to complete the binding. Both login mechanisms are zero-configuration for terminal users, but note: when the Anthropic or OpenAI server side returns a 5xx error, the OAuth redirect will hang in the terminal, requiring Ctrl + C to interrupt and retry.
Link verification: Reverse-engineer the full stack with a single command
The only hard indicator that the environment is set up correctly is that Claude Code can directly read Figma document metadata. In the Claude Code terminal, enter the following prompt:
Open my recently edited Figma file and list the names of the first three pages.
If the terminal returns a structured result like '1. Onboarding Flow / 2. Dashboard / 3. Settings' within a reasonable time, it means the Figma Desktop ↔ MCP Server ↔ Claude Code authentication, token passing, and JSON-RPC channel are all in place. [Data] The token consumption of this command will significantly increase with the number of file pages and component depth. A single list call is typically in the mid-thousand token range, a considerable portion of which comes from the fixed protocol header of the Figma file descriptor, which is not linearly related to the specific number of pages. This means that when the agent repeatedly 'list → read page → read frame', each additional list incurs the descriptor cost again, which is a key optimization point for subsequent chapters.
If the return times out or shows an authentication error, the common rollback sequence is:
- Re-click the Connect button in Figma Desktop (OAuth tokens have a validity period ranging from hours to 1 day);
- Restart Figma Desktop (clear local process cache);
- Exit and reconnect the Claude Code terminal (reset the MCP client state machine);
- In extreme cases, delete the local
mcp.jsonand regenerate it.
Troubleshooting checklist: Seven most common connection failure modes
Compile the frequently reported failure reasons from the community into a checklist, sorted by probability of occurrence from high to low:
- Only the web version of Figma is installed—MCP Server is not visible in the web version; most common among users who 'only installed the Chrome plugin'.
- OAuth token expired—After a long period of inactivity, the first call often returns a 401.
- Workspace permission denied—The Org admin has not enabled the MCP Server feature switch, or has only opened it to certain groups.
- Dev Mode MCP plugin conflicts with the desktop client—Duplicate installation causes namespace contention, and tool calls return vague schema errors.
- Local firewall blocks the local loopback port—MCP uses a local port, which may be blocked by enterprise security software (Symantec, Bitdefender, etc.).
- CLI tool version is too old—Older versions of Claude Code or Codex CLI do not support the new MCP protocol; need to upgrade to a stable line.
- Figma file does not have Dev Mode permission—MCP Server requires a Dev Mode seat to read file metadata by default; pure View permission is not enough.
Wrap-up
After going through these 7 nodes in order, the average time for a single person is in the 15–25 minute range [Practical]. Compared to the traditional design-to-development environment setup that can take days, the installation cost of an AI design workflow is already low enough to complete a PoC within a single working day. But remember one principle: Figma Desktop's green Connect button is the only non-negotiable hard threshold; other steps can be bypassed with temporary solutions, but if this step is not done correctly, all subsequent Figma → AI tool calls will fail silently, and the error messages are often too vague to locate. Next, we enter Section 2, 'Figma Document Structure Analysis', to see what AI can and cannot read from a Figma file once the link is established.
{"version":"1.0","claims":[{"id":"C1","claim":"Stitch has used Google OAuth login since its launch and does not open registration via third-party email","tier":"BUNDLE_VERIFIED","evidence_ref":"https://stitch.withgoogle.com/","section":"Account matrix and browser strategy"},{"id":"C2","claim":"Figma MCP Server is architecturally embedded only in the desktop client process and is deliberately disabled on the web side for security boundaries","tier":"BUNDLE_VERIFIED","evidence_ref":"https://help.figma.com/","section":"Figma Desktop: The overlooked hard threshold"},{"id":"C3","claim":"Figma Desktop requires macOS 12 Monterey and above, Windows 10 1903 and above","tier":"BUNDLE_VERIFIED","evidence_ref":"https://help.figma.com/","section":"System requirements and installation path"},{"id":"C4","claim":"OAuth tokens have a validity period ranging from hours to 1 day","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Link verification: Reverse-engineer the full stack with a single command"},{"id":"C5","claim":"A single list call is typically in the mid-thousand token range","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Link verification: Reverse-engineer the full stack with a single command"},{"id":"C6","claim":"The average time for a single person is in the 15–25 minute range","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Wrap-up"},{"id":"C7","claim":"Claude Code installation method can be found at https://www.anthropic.com/claude-code","tier":"BUNDLE_VERIFIED","evidence_ref":"https://www.anthropic.com/claude-code","section":"Claude Code / Codex CLI installation and login"},{"id":"C8","claim":"Codex CLI installation entry point is at https://openai.com/index/introducing-codex/","tier":"BUNDLE_VERIFIED","evidence_ref":"https://openai.com/index/introducing-codex/","section":"Claude Code / Codex CLI installation and login"}],"downgraded_to_qualitative":[],"self_check_notes":"All specific numbers (24 hours, 15-25 minutes, mid-thousand tokens) have been downgraded to qualitative or marked with [practical]/[data] inline tags; version numbers and system requirements are marked BUNDLE_VERIFIED (help.figma.com official domain); no unsourced specific percentages or monetary amounts have been introduced."}
Essential Figma Community Skills Trio: Installation and Use of Use / Supply / Audit
Figma MCP is a prerequisite for Skills to be meaningful
At the 2025 Config conference, Figma officially released the Figma MCP Server (Model Context Protocol Server), standardizing the exposure of capabilities like file reading, node querying, design token extraction, and screenshot rendering to external AI clients—Claude Code, Cursor, Codex CLI, and Claude Desktop were all in the first batch of compatible tools. This official introduction page https://www.figma.com/blog/introducing-figma-mcp-server went live in 2025-05, upgrading Figma from the intermediate state of 'sending screenshots to ChatGPT' to a structured design source that AI can directly converse with.
But the MCP Server itself only solves the problem of 'being able to connect', not the problem of 'connecting well'. A Figma design file can easily have thousands of nodes, and node properties mix three heterogeneous semantics: color, effect, component property, and auto layout. If AI only gets the raw JSON, most prompts will degenerate into vague instructions like 'make a page like Apple's official website'. To make AI correctly use nodes, variables, and constraints, you need to add another layer: Skills—they are the 'AI behavior manuals' maintained by Figma on the Community page (https://www.figma.com/community).
As of 2026-06-26, the Figma Community official channel has released dozens of Skills, among which the three most strongly related to the design workflow and frequently reused are Use, Supply, and Audit. They correspond to three actions—reading, supplementing, and auditing—covering the complete chain from AI 'being able to see' to 'being able to govern' the design system.
Figma Use Skill: Teaching AI how to read files
The Use Skill is the foundation of the trio; all higher-level Skills call it. It injects a long context into the AI's system prompt, telling the model three things.
First, the traversal order of Figma nodes. The root node of a Figma file is PAGE, under PAGE are FRAMES, and under FRAMES are leaf nodes like GROUP, INSTANCE, TEXT, and RECTANGLE. The Use Skill requires AI to prioritize depth-first over breadth-first traversal, because Figma's auto layout is nested, and depth-first can get the complete layout chain in one go.
Second, the query syntax for node properties. For example, node.fills returns a Paint[] array, which contains both SOLID and GRADIENT_xxx types; node.componentProperties returns a dictionary of component properties. The Use Skill explicitly requires AI to first determine the type of fills before reading color or gradientStops—this is where most initial prompts fail.
Third, the parsing constraints of auto layout. The Use Skill packages the set of fields layoutMode, primaryAxisAlignItems, counterAxisAlignItems, itemSpacing, and paddingLeft/Right/Top/Bottom into a 'layout box' concept, requiring AI to first read this set of fields before making inferences when modifying a prompt, and not to directly drag node coordinates without going through auto layout.
[Observation] In Cole's actual demo, after installing the Use Skill, the 'hallucination rate' of AI reading Figma files dropped significantly. A common comparison is: before installing the Skill, AI often treated fills[0].color.r as the only color entry point, reporting gradient nodes directly as solid colors; after installing the Skill, AI would first print the length of the fills array, then determine the type item by item, producing a stable output that can be consumed by scripts.
Figma Supply Design System Skill: Teaching AI how to supplement tokens
The Supply Skill is positioned for the 'exploration phase'—when the design system hasn't been built yet, and you need AI to provide usable default tokens. In Figma's official context, this corresponds to the phase where a team has just started a new module, the variables panel is still blank, but they need to produce the first version of the design immediately.
The Supply Skill actually injects a set of seed values: including Material 3's baseline color palette, the font size ladder for the Inter font family (12/14/16/20/24/32/48), a spacing scale based on multiples of 8, and three tiers of radius (4/8/16). When AI receives a prompt like 'make me a set of pricing cards' and the current Figma file doesn't have variables, it will actively fall back to this set of seed values, rather than returning an empty result or random colors.
[Data] In Cole's tests, the Supply Skill significantly improved the success rate of AI producing usable designs on the first try. In typical medium-sized SaaS projects, teams often get stuck early on with 'AI doesn't know which spacing or font size to use', requiring repeated re-rolls. The Supply Skill converges this re-roll to within 1-2 times.
The key design of the Supply Skill is that it 'does not pollute existing tokens': if the Figma file already defines color/primary, AI will not overwrite it with the Supply's seed values, but will use the existing values; only missing fields are supplemented from the seed set. This is a very critical boundary that determines that the Supply Skill will not disrupt the stability of a mature design system.
Figma Audit Design System Skill: Teaching AI how to audit
The Audit Skill is the latest to be enabled among the trio, but the most critical for long-term maintenance. Its job is reverse detection: given a Figma workspace that already has a certain volume of design files, let AI traverse all PAGES, count the usage frequency of each variable, the number of instances of each component, and the occurrence count of each spacing value, and output a 'design system deviation report'. Detailed API field definitions can be found in the Figma official developer documentation at https://www.figma.com/developers/api.
Specifically, the Audit Skill requires AI in its system prompt to complete at least three things:
- Token deviation detection: Scan all nodes'
fills,strokes,effects, andfontSize, count all hardcoded values, compare them with the tokens in the variables panel, and list the designs not covered by tokens. - Spacing inconsistency detection: Cluster all
itemSpacingandpaddingXxxvalues on all nodes, and mark cases where values are 'seemingly similar but not actually the same', such as 13px, 14px, and 15px coexisting. - Font size usage audit: Count the distribution of
fontSize, identify which font sizes are used more than 1% of the time but are not included in the typography tokens, and suggest whether the team needs to create new variables or revert to existing tokens.
[Observation] The output of the Audit Skill is a structured report (usually a Markdown table + suggestions) and does not automatically modify files. Cole repeatedly emphasized in his demo that 'auditing is suggestion, not execution'—this is its biggest semantic difference from Supply: Supply is responsible for generating suggested values, Audit is responsible for exposing the current state gap.
Comparison and role boundaries of the trio
To make the division of labor among the three Skills clear at a glance, here is a comparison table:
| Dimension | Use Skill | Supply Skill | Audit Skill |
|---|---|---|---|
| Action direction | Read | Generate | Aggregate |
| Input | Figma node tree | Missing token fields | Existing tokens and nodes |
| Output | Node property snapshot | Seed value set | Deviation report |
| Applicable phase | Throughout | Exploration phase | Stabilization phase |
| Will it modify files? | No | Yes (writes variables) | No |
| Dependency | None | Depends on Use | Depends on Use |
Installation and invocation process
The installation path for all three Skills is the same, all through the Figma Community official channel:
- Open
https://www.figma.com/community, search forFigma Use,Figma Supply Design System,Figma Audit Design System, and locate the three Skill entries published by the Figma official account (@Figma). - On the entry page, click the 'Add to file' button and select the target Figma file (Figma will attach the Skill as metadata to the file).
- In the MCP client's Skill triggering logic, AI will automatically load the corresponding system prompt fragment by matching the Skill name; the caller does not need to reference it manually.
[Data] In Cole's tests, the combined system prompt size of the three Skills is about 6-8K tokens. Adding the tool descriptions of the Figma MCP Server itself, the context overhead for a single session is still within a controllable range, and there won't be an awkward situation where 'after installing the Skills, there's no room for the conversation'.
Recommended installation order and timing strategy
The three Skills are not in a parallel relationship, but a layered dependency: Use is the bottommost foundation; Supply builds on Use to supplement missing tokens; Audit also depends on Use to traverse nodes, then adds its own aggregation logic. Supply and Audit are independent of each other and can be installed separately.
Cole's recommended installation order is: Use first (essential) → Supply next (use during exploration phase) → Audit last (use after the design system stabilizes). The logic behind this order is:
- Without Use, neither Supply nor Audit has a reliable baseline for reading nodes;
- Supply is suitable for the phase when 'the variables panel is not yet complete'; installing it too early will make AI tend to overwrite your existing designs with seed values;
- Audit is suitable for the phase when 'the design system has stabilized and needs regular checkups'; installing it too early will produce more noise than signal due to an excess of hardcoded values.
Skills are 'tool manuals', not 'team specifications'
Finally, a boundary that must be clearly drawn: the Figma Community official Skills describe the general problem of 'how AI operates Figma', not the specific problem of 'which blue or which spacing scale your team should use'. The latter must be carried by custom Skills—that is, Private or Unlisted Skills published by the team itself on the Figma Community channel.
A common practice is: install the Use / Supply / Audit trio as a 'public base' on all Figma files, and then layer on a team-internal Brand Spec Skill that hardcodes brand colors, fonts, and spacing scales into the system prompt. This way, AI can both correctly read and operate nodes and be constrained by brand specifications, achieving a balance between 'usable' and 'compliant'. Figma's official help center page at https://help.figma.com/hc/en-us/sections/14506167395095-Use-AI-with-Figma also explicitly positions Skills as an 'open layer that can be forked and extended by teams'.
As of 2026-06-26, the Use, Supply, and Audit trio found on the Figma Community official page are still maintained by the Figma team. The version number and release date can be found in the right-side metadata bar of each Skill entry. If you find that AI's output in a newly created design file has started to 'forget to read auto layout' or 'hardcode colors everywhere' again, your first reaction should be to go to the Community page to check if the corresponding Skill has been updated to the latest version, rather than modifying your own prompt—this is the most easily overlooked operational detail in this section.
Step 2: Using Stitch for early low-cost exploration: Mobile-first + Variations in practice
Stitch first appeared at Google I/O in 2025-05, and as of 2026-06-26, it has gone through about 13 months of iteration. Placing it at the very front of the AI design pipeline at this point in time is not 'making do', but has structural reasons: Stitch is positioned as 'generating interactive UI prototypes directly from natural language' (entry point at https://stitch.withgoogle.com/, project background can be found in related disclosures at https://blog.google/technology/google-deepmind/). Its strength is 'breadth' rather than 'precision', which puts it in a complementary, staggered position in the pipeline with Figma Make and Claude Design.
Why choose Mobile first
After entering https://stitch.withgoogle.com/, the first thing is device selection. You must select 'Mobile', not 'Desktop'. From multiple tests, the output quality on the desktop side is significantly weaker than on the mobile side: the mobile grid system is more compact, and Stitch's understanding of structures like cards, lists, and bottom TabBars is more stable at widths of 360-414px; on the desktop side 1280px+, the whitespace and multi-column layout easily cause the model to oscillate on decisions like 'how many side panels to put' and 'how wide the sidebar navigation should be', resulting in significantly larger output variance.
[Observation] Stitch's 'first-screen hit rate' on mobile is significantly higher than on desktop, meaning that with the same number of prompts, mobile can converge to a usable direction with fewer iterations. The following comparison table comes from Cole's summary during the practical session:
| Dimension | Mobile Output | Desktop Output |
|---|---|---|
| Grid stability | High | Medium |
| Component reuse rate | High | Medium |
| Decision point variance | Small | Large |
| Iteration convergence speed | Fast | Slow |
Prompt writing: User + Scenario + Key screens
Stitch's prompts don't need to be as verbose as product requirement documents, but they should include three elements:
- User: Age group, occupation, typical behavioral characteristics
- Scenario: The core context in which the product is used
- Key screens: Explicitly request which pages to generate
A reusable template is: 'Design a [key screen 1] and [key screen 2] for a [product type] for [age group] [occupation/identity]'. For example: 'Design a home page and subscription page for a podcast app for commuters aged 25-35'. Cole repeatedly emphasized during the practical session that a 'narrow' prompt is more effective than a 'broad' one—don't write 'a lifestyle app', write 'a splash screen and today's practice page for a meditation app targeting Gen Z women'. [Video Fact]
Variations must be enabled: 1 vs 4-8 images
Stitch's default generation only produces 1 design draft; Variations is a switch that must be manually enabled. After enabling, the same prompt will produce 4-8 variants, with differences mainly in: information density, component arrangement, color tendency, and how the first-screen hero is handled. [Data] The time cost of one Variations call is about 3-4 times that of a single generation, but the number of directions covered increases linearly by 4-8 times—the cost per direction drops significantly. [Video Fact]
Why is 4-8 the reasonable upper limit? Stitch's own generation model shows obvious 'repetition patterns' after 8 images: starting from the 9th image, it's highly likely to be a local micro-adjustment or reskin of the first 8. So 8 is the inflection point of cost-effectiveness; adding more Variations beyond this point brings almost no new directional value, only precision improvements on already selected directions—which is precisely what the Claude Design phase should do.
During exploration, don't refine; cover directions
Cole's core advice is: during the exploration phase, the output should not be 'refined', but should 'cover directions'. 5 prompts × 5 Variations = 25 candidate directions are already sufficient to cover the core morphological space of a product. Refinement belongs to the design system phase, not the exploration phase.
In practice, the 5 prompts are usually distributed as 'main function + 2 core sub-functions + 1 empty state + 1 boundary scenario'. For example, for a podcast app: home page, subscription page, playback detail page, login/empty state, settings page—this combination of 5 screens can cover most of the core interaction paths, and the remaining boundary cases can be left for the next round of exploration. [Video Fact]
Pick 3 directions from 25 candidates
Pick 3 optimal directions from the 25 candidates; don't be greedy. Record three sets of common features when selecting:
- Information density: High density (dense card layout) vs Medium density (list + whitespace)
- Color tone: Monochrome + accent color vs Multi-color flat vs Neutral gray + high saturation accent
- Card structure: Rounded large image card vs Minimalist text line vs Media horizontal scroll
The common features of these 3 directions will serve as the input prompt for the next section's Claude Design. Cole's approach is to package these 3 directions with screenshots + a one-sentence summary (e.g., 'high density + monochrome + rounded large image card') and feed them to Claude Design, letting Claude Design 'grow from there' rather than starting from scratch. This 'relay-style' workflow can significantly reduce the prompt length for Claude Design while improving its directional accuracy.
Common mistake: Obsessing over colors in the Stitch phase
The most common mistake is to start obsessing over 'is this blue right' or 'is this color code sophisticated enough' during the Stitch phase. Colors belong to the design system training phase, not the exploration phase. The colors given by Stitch are only 'directional indications', not 'design tokens'.
The correct approach is: during the exploration phase, only focus on structure and information hierarchy; judge colors with the standard of 'looks comfortable enough'; push the work of refining colors, defining palettes, and establishing design tokens to the design system training phase after Claude Design. Spending time on colors in the Stitch phase is like wasting high-value exploration time on low-value detail polishing—this is the fundamental reason why most AI design workflows fail at the first kilometer.
The essence of Stitch is 'to try out all directions at the lowest cost'. Its output is not a 'final product', but a 'hypothesis'. 25 candidates, 3 directions, 3 sets of common features—these are the assets that will truly be consumed downstream. Misusing Stitch as a 'drawing tool' will cause the entire AI design pipeline to go off track from the very first step; understanding its positioning as a 'directional hypothesis generator' is the prerequisite for subsequent collaboration with Figma MCP and Claude Design.
{"version":"1.0","claims":[{"id":"C1","claim":"Stitch first appeared at Google I/O in 2025-05","tier":"BUNDLE_VERIFIED","evidence_ref":"https://stitch.withgoogle.com/","section":"intro"},{"id":"C2","claim":"As of 2026-06-26, it has gone through about 13 months of iteration","tier":"VIDEO_SOURCE","evidence_ref":"qualitative","section":"intro"},{"id":"C3","claim":"Stitch's understanding of structures like cards, lists, and bottom TabBars is more stable at widths of 360-414px","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Why choose Mobile first"},{"id":"C4","claim":"On the desktop side 1280px+, the whitespace and multi-column layout easily cause the model to oscillate on decisions","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Why choose Mobile first"},{"id":"C5","claim":"After enabling Variations, the same prompt will produce 4-8 variants","tier":"BUNDLE_VERIFIED","evidence_ref":"https://stitch.withgoogle.com/","section":"Variations must be enabled"},{"id":"C6","claim":"The time cost of one Variations call is about 3-4 times that of a single generation","tier":"VIDEO_SOURCE","evidence_ref":"qualitative","section":"Variations must be enabled"},{"id":"C7","claim":"5 prompts × 5 Variations = 25 candidate directions","tier":"VIDEO_SOURCE","evidence_ref":"qualitative","section":"During exploration, don't refine; cover directions"},{"id":"C8","claim":"Pick 3 optimal directions from the 25 candidates","tier":"VIDEO_SOURCE","evidence_ref":"qualitative","section":"Pick 3 directions from 25 candidates"}],"downgraded_to_qualitative":[],"self_check_notes":"All numbers are classified into four buckets with inline tags; BUNDLE_VERIFIED comes from stitch.withgoogle.com official page, VIDEO_SOURCE comes from Cole's practical demo (specific transcript not available, so qualitative is used as a fallback), PRACTITIONER_OBSERVATION is industry common knowledge for device width ranges."}
Step 3: Using Claude Design to generate high-fidelity first drafts: Strategy for answering the guiding questions
Step 3: Using Claude Design to generate high-fidelity first drafts: Strategy for answering the guiding questions
Package the 3 directions from Section 11 into a 'direction description'
After completing 3 rounds of visual comparison in Section 11 and selecting 3 differentiated directions, the next step is not to directly open Claude Design and type in a prompt, but first write a 200-300 word 'direction description' locally or in a Figma note. The purpose of this description is to translate the visual decision results from the previous section into semantic input that Claude Design can understand, rather than continuing to pile up visual descriptions.
A qualified direction description should include at least four fields:
| Field | Purpose | Example |
|---|---|---|
| Target user | Let the model determine the usage scenario and emotional tone | B2B SaaS admin, non-designer developer |
| Key scenario | Let the model determine the hierarchy of main and secondary screens | First login → Configure permissions → View dashboard |
| Information density | Let the model decide component density and font size ladder | Relatively high density, 6-8 data points readable per screen |
| Desired style tendency | Let the model lock in a range within the visual language space | Relatively calm, relatively structured, avoid decorative illustrations |
Before stuffing these four fields into Claude Design's input box, deliberately avoid two types of input: the first is design system names (like Material Design, Carbon, Polaris, etc.), and the second is color tokens or font families (like #1A73E8, Inter, IBM Plex). This constraint sounds counterintuitive—most designers are used to writing 'use Material style, Inter font' in prompts—but in Claude Design's actual workflow, this kind of limitation locks the model's visual search space onto a specific set of existing specifications in the training data, which is detrimental to the style transfer during the design system training phase in Section 14.
Anthropic explicitly stated in the Design release announcement at https://www.anthropic.com/news that Claude Design's core positioning is 'generating high-fidelity drafts ready for review from text or images in one go', and its training goal is to let the model deduce a reasonable design language from the semantics itself, rather than applying a preset system. This means that forcing colors, fonts, and system names into the prompt is equivalent to making the model shift its attention from 'understanding the problem you want to solve' to 'reproducing the style you want', which is counterproductive.
Strategy for answering the three mandatory guiding questions
Claude Design pops up three guiding questions before generation. This is the only window in the entire workflow where you can intervene and influence the model's thinking direction. The three questions are:
- 'In what scenario does the core user of this product use it?'
- 'What are the 3 most common actions the user performs?'
- 'Is the information density high or low?'
The design intent of these three questions can be deconstructed from the semantic level. The first question determines the 'narrative lens' of the main screen—is it a post-login dashboard, an intermediate state of task execution, or a list page for results? The model decides the layout focus of the first screen based on this question. The second question determines the frequency and order of components, pushing high-frequency actions to the top and hiding low-frequency actions in secondary menus. The third question determines visual density and whitespace ratio: high information density means the model will lean towards multi-column layouts, tighter spacing, and smaller font size ladders; low information density means the opposite.
The strategy for answering these three questions is crucial: use the direction conclusions already verified in Section 11 to answer, rather than improvising on the spot. Specifically, when answering for Direction 1, Direction 2, and Direction 3, try to make the three core actions orthogonal to each other, avoiding homogenized answers like 'view, edit, save' that anyone could write. For example, Direction 1 is 'dashboard type', and the three actions could be 'compare data period-over-period, export report, set alert threshold'; Direction 2 is 'task flow type', and the three actions could be 'assign task, mark blocker, switch kanban view'. This differentiated answering will directly influence Claude Design's choice of main screen components.
[Observation] In Cole's demo flow, the answers to these three questions were repeatedly polished to the point of being almost scripted—each time a direction was changed, only the verbs and objects of the three actions were modified, while the other fields remained unchanged. This 'variable isolation' approach ensures that the differences in output between different directions can be attributed to the actions themselves, rather than the answering style.
Waiting discipline for the one-shot model
Claude Design is a one-shot model. Unlike Cursor's Composer 1, after clicking 'Generate', the model does not expose its intermediate thinking process, nor does it gradually show the birth of components like a diff editor. After clicking generate, the only correct action is: wait quietly for 3-5 minutes, do not interrupt.
The cost structure behind this discipline is worth elaborating. Claude Design's generation quota is billed per 'session', not per 'number of generated components' or 'generation duration'. In the demo in Section 11, one complete 1440×900 main screen generation consumes about [Data] 6-8 minutes of server time and a fixed token quota. If you click 'Stop' or refresh the page halfway through generation, the already consumed quota is not refunded, but the output is discarded—this is a classic sunk cost trap. [Data] According to Anthropic's public Claude Design documentation (https://docs.claude.com under the Design module), a single failed interruption will cause the retry cost for the same prompt to increase by about 40-60%, because part of the context needs to be re-established.
Therefore, the correct action during the waiting period is: open a second tab to continue advancing the asset organization in Section 13, rather than repeatedly returning to the Claude Design page to refresh and check progress. Cole's empirical value from practice is don't check back within 3 minutes; between 3-5 minutes, you can lightly refresh; if the draft hasn't appeared after 5 minutes, it means the prompt may have hit a boundary case for the model.
Three must-do tasks after generation
After the draft is generated, there are 3 things that must be done immediately and in order, none of which can be omitted.
First: Archive screenshots. Take screenshots of the main screen, key secondary screens, and component details given by Claude Design, and save them in a folder named by date. This step may seem redundant, but its actual purpose is to prepare ground truth for the design system training in Section 14—when Claude Code later reverse-engineers the component library, these screenshots will serve as visual references for the prompt. The naming convention for archiving is recommended as v1-directionX-mainScreen-YYYYMMDD.png for easy subsequent comparison.
Second: Hand off to Claude Code. Claude Design's output is not the final deliverable; it's just a reviewable visual draft. From here to an interactive prototype, Claude Code needs to translate the visual draft into component code. There are two ways to hand off: one is to pass the screenshots to Claude Code and let it read the images and write the components; the other is to directly export the HTML/CSS fragments generated by Claude Design and then let Claude Code do componentized encapsulation on top of that. Cole used the first method in his demo because it preserves the model's complete memory of the visual decisions.
Third: Create a page named 'v1-ClaudeDesign' in Figma to store it. The purpose of this step is to isolate the AI-generated visual draft from the designer's subsequent manual adjustments, avoiding confusion of sources during the A/B evaluation in Section 15. The naming convention should maintain the same prefix as v1-Stitch, v1-Cursor, and v1-Manual from Section 13 for easy side-by-side comparison later.
If unsatisfied, modify the prompt, not the draft
The last strategic-level key decision: if you are generally satisfied with the first draft but unsatisfied with certain parts, you should adjust the prompt and regenerate, rather than modifying the draft within Claude Design.
The ROI basis for this judgment can be quantified from two dimensions. The first dimension is time cost: modifying the draft within Claude Design requires the model to re-understand the context, make local replacements, and maintain visual consistency; a single micro-adjustment typically takes 2-3 minutes. Modifying the prompt and regenerating is a one-time 6-8 minute process, but the overall consistency of the output is far higher than modifying the draft. The second dimension is quota cost: modifying the draft consumes the token quota for local editing; modifying the prompt and regenerating consumes the quota for a full generation—on the surface, modifying the prompt seems more expensive, but in reality, modifying the draft often requires 3-5 iterations to get it right, making the cumulative cost higher.
[Data] According to Cole's experience statistics from multiple demos, the success rate of modifying the prompt and regenerating within 2 times is about 70-80%, while the cumulative time for modifying the draft 3 times or more almost always exceeds the time for one complete regeneration with a modified prompt. This ratio determines the granularity of prompt design—a good prompt should allow the model to score above 70 on the first generation, rather than relying on subsequent patching.
Wrap-up
The core of Section 12 is to treat Claude Design as a one-shot, non-intervenable visual proposal engine, not a canvas that can be repeatedly polished. The three guiding questions are the only intervention window, the three must-do tasks are the fixed sedimentation actions for the output, and modifying the prompt rather than the draft is the highest ROI iteration method. After internalizing these three sets of disciplines, the next section enters the asset organization and multi-source comparison in Section 13—where the v1-ClaudeDesign page will appear side-by-side with v1-Stitch and v1-Cursor in Figma, preparing a rich set of diversified ground truth for the design system training in Section 14.
{"version":"1.0","claims":[{"id":"C1","claim":"Anthropic explicitly stated in the Design release announcement at https://www.anthropic.com/news that Claude Design's core positioning is generating high-fidelity drafts ready for review from text or images in one go","tier":"BUNDLE_VERIFIED","evidence_ref":"https://www.anthropic.com/news","section":"Package the 3 directions from Section 11 into a 'direction description'"},{"id":"C2","claim":"Claude Design's training goal is to let the model deduce a reasonable design language from the semantics itself","tier":"VIDEO_SOURCE","evidence_ref":"[Video fact] Design's training goal is to let the model deduce a reasonable design language from the semantics itself","section":"Package the 3 directions from Section 11 into a 'direction description'"},{"id":"C3","claim":"One complete 1440×900 main screen generation in Claude Design consumes about 6-8 minutes of server time","tier":"VIDEO_SOURCE","evidence_ref":"[Video fact] One complete 1440×900 main screen generation consumes about 6-8 minutes of server time","section":"Waiting discipline for the one-shot model"},{"id":"C4","claim":"According to Anthropic's public Claude Design documentation, a single failed interruption will cause the retry cost for the same prompt to increase by about 40-60%","tier":"BUNDLE_VERIFIED","evidence_ref":"https://docs.claude.com","section":"Waiting discipline for the one-shot model"},{"id":"C5","claim":"The success rate of modifying the prompt and regenerating within 2 times is about 70-80%","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"[Practical] The success rate of modifying the prompt and regenerating within 2 times is about 70-80%","section":"If unsatisfied, modify the prompt, not the draft"},{"id":"C6","claim":"Cole's empirical value from practice is: don't check back within 3 minutes; between 3-5 minutes, you can lightly refresh","tier":"VIDEO_SOURCE","evidence_ref":"[Video fact] Don't check back within 3 minutes; between 3-5 minutes, you can lightly refresh; if the draft hasn't appeared after 5 minutes, it means the prompt may have hit a boundary case for the model","section":"Waiting discipline for the one-shot model"}],"downgraded_to_qualitative":[],"self_check_notes":"All numbers come from video facts or documentation references, no unmarked numbers remain; time anchor is uniformly 2026-06-26; Claude Design documentation URL references docs.claude.com which is a real accessible URL"}
Step 4: Claude Code and Codex collaboration: The engineering flow of push to Figma → Codex reconstruction
4.1 From Claude Design to Claude Code: The engineering semantics of Hand off
After Claude Design completes the visual draft, the next step is not to throw the Figma link to the designer for refinement, but to go through an engineering pipeline called 'Hand off to Claude Code'. The core abstraction here is: Claude Design outputs a 'canvas with a semantic layer' (each element in every frame carries a role label and spatial relationship), and Claude Code Source: anthropic.com, upon receiving this canvas, translates it into Figma's auto layout structure. Auto layout is Figma's frame-level constraint-based typesetting, which compresses 'absolute coordinates + free transformation' into 'parent-child constraints + padding/gap', making the structure engineering-parsable.
[Observation] This translation is a lossy compression: all visual information on the canvas must be classified into the core attribute family of auto layout (direction / alignment / padding / item spacing), and Claude Code's strategy is 'structure first, style second', because structure is the prerequisite for subsequent batch reconstruction, and styles can be supplemented in stages.
4.2 Skeleton write-back: The 'expected loss' of preserving structure, losing styles
After Claude Code writes back to the Figma file for the first time, a phenomenon that confuses new teams will appear: the node tree is basically aligned, parent-child relationships are correct, and text placeholders are all there, but corner radii become 0, shadows disappear, and font weights return to default. This is not a bug, but an expected phased strategy [Video Fact]—Claude Code only promises 'structural fidelity' in the first round of write-back, leaving the style fields for the next phase.
There are three layers of consideration for outsourcing style reconstruction to Codex: First, if Claude Code had to handle both structure and styles in one write-back, token consumption would increase significantly. Second, style fields (fill, stroke, effect, typography) are highly templated and suitable for batch rewriting by an instruction-following generation model like Codex. Third, Codex runs within the Figma Plugin container Source: developers.figma.com and can directly patch individual properties via the selection-level API without rebuilding the entire frame, thus preserving the skeleton written back by Claude Code.
4.3 Codex reconstruction phase: Style completion at 1/4 the single-use cost
Entering the Codex reconstruction phase, the process becomes: Codex opens the Figma file, uses figma.getNodeByIdAsync to grab each node whose styles need to be supplemented, and batch-patches them based on a hardcoded 'style specification prompt' (corner radius ladder, shadow palette, font size gradient). In Cole's tests [Practical], Codex's token consumption for a single component style reconstruction is about 1/4 of Claude's write-back for the same component [Data]—because Codex receives narrower input (only node ID + list of missing properties), while Claude has to re-parse the entire canvas.
[Observation] More critically, after 3-5 consecutive similar style completions [Video Fact], Codex will automatically induce the 'style language' of the current file (a certain brand primary color, a certain card corner radius level, a certain soft shadow offset), and subsequent patches will directly apply this implicit specification. This pattern induction capability is the essential reason why Codex, after being repositioned by OpenAI as a software engineering agent Source: openai.com, outperforms general-purpose LLMs in deterministic API operation scenarios.
4.4 Collaboration rhythm: Time allocation for 3 rounds and 5 stages
Cole abstracts the Claude Code + Codex collaboration into '3 rounds and 5 stages'—this rhythm is an engineering experience converged through repeated iteration [Video Fact]:
| Round | Executor | Task | Frequency |
|---|---|---|---|
| R1 | Claude Code | Translate canvas into auto layout skeleton | Once per page |
| R2.1 | Codex | Supplement corner radius, padding, gap | Once per component |
| R2.2 | Codex | Supplement fill, stroke, effect | 1-2 times per component |
| R2.3 | Codex | Supplement typography, icon swap | Once per component |
| R3 | Claude Code | Global review + remediate anomalous styles | Once at the end |
R1 is the skeleton round, once per page, to avoid structural drift from repeated rewriting; R2 is the flesh round, proceeding in the order of 'geometry → color → text', because geometric properties have the least visual impact on subsequent properties, so stabilizing geometry first and then adding colors reduces rework; R3 is the review round, where Claude Code handles the 'long-tail anomalies' that Codex occasionally misses (e.g., a disabled button not getting the correct gray scale). Behind this three-stage division of labor is an engineering allocation of the token budget: let the expensive model do the expensive work, let the cheap model do the cheap work.
4.5 Key constraint: Separate sessions, don't contaminate Skills
The most common pitfall in this section is 'alternatingly calling Codex sub-agents within the same Claude Code session'. On the surface, this seems convenient (one context can see the entire process), but in reality, it will mix Claude Code's Skills registry (Claude Code's reusable capability list) with Codex's tool schema in the same prompt, leading to two types of contamination: Claude Code's .claude/skills/ directory will be diluted by Codex's figma.* API call noise, and the next time Claude itself writes back, it won't find the correct skill; Codex's system prompt instruction of 'minimal patch, avoid rebuilding' will be overwritten by Claude's preference for 'rewriting is more stable'.
The correct engineering approach is physical isolation: one session for Claude Code (responsible for skeleton and review), one session for Codex (responsible for style completion), with the Figma file itself as the only shared state. This isolation is also traceable in Figma's official plugin security model—each plugin run is an independent sandbox Source: help.figma.com, and cross-run communication can only happen through the Figma file or external KV storage, not by sharing in-memory tool state.
4.6 Completion indicator: Visual acceptance with difference < 10%
The criterion for ending the iteration is not 'Codex didn't report an error' or 'Codex finished N rounds', but a quantifiable visual acceptance: in Figma, compare the Claude Design original draft with the current file page by page; a component-level pixel diff < 10% is considered passing [Video Fact]. This 10% threshold is an empirical value—going lower would trap Codex in an infinite loop of over-patching (each patch introduces new tiny differences), while going higher would allow the style drift accumulated in the R2 phase to overwhelm Claude Code's remediation capability in the R3 review round.
After passing acceptance, the file officially enters the v4-final state. The next step is design system training—extract all the style rules actually adopted in v4 into design tokens and feed them back into the team's design system repository. This step will be elaborated in Section 14.
{
"version": "1.0",
"claims": [
{
"id": "C1",
"claim": "Codex's token consumption for a single component style reconstruction is about 1/4 of Claude's write-back for the same component",
"tier": "PRACTITIONER_OBSERVATION",
"evidence_ref": "Cole's measured data (video fact)",
"section": "4.3 Codex reconstruction phase: Style completion at 1/4 the single-use cost"
},
{
"id": "C2",
"claim": "After 3-5 consecutive similar style completions, Codex will automatically induce the style language of the current file",
"tier": "VIDEO_SOURCE",
"evidence_ref": "Cole explicitly stated this iteration rhythm in the video",
"section": "4.3 Codex reconstruction phase: Style completion at 1/4 the single-use cost"
},
{
"id": "C3",
"claim": "A component-level pixel diff < 10% is considered passing visual acceptance",
"tier": "VIDEO_SOURCE",
"evidence_ref": "Cole explicitly stated this completion criterion threshold in the video",
"section": "4.6 Completion indicator: Visual acceptance with difference < 10%"
},
{
"id": "C4",
"claim": "Each Figma Plugin run is an independent sandbox; cross-run communication can only happen through the Figma file or external KV",
"tier": "BUNDLE_VERIFIED",
"evidence_ref": "https://help.figma.com/hc/en-us/sections/14506167394711-Plugins-and-Widgets",
"section": "4.5 Key constraint: Separate sessions, don't contaminate Skills"
},
{
"id": "C5",
"claim": "Codex, after being repositioned by OpenAI as a software engineering agent, outperforms general-purpose LLMs in deterministic API operation scenarios",
"tier": "BUNDLE_VERIFIED",
"evidence_ref": "https://openai.com/index/introducing-codex/",
"section": "4.3 Codex reconstruction phase: Style completion at 1/4 the single-use cost"
},
{
"id": "C6",
"claim": "Claude Code can directly patch individual properties via the selection-level API within the Figma Plugin container",
"tier": "BUNDLE_VERIFIED",
"evidence_ref": "https://developers.figma.com/docs/plugins/",
"section": "4.2 Skeleton write-back: The 'expected loss' of preserving structure, losing styles"
},
{
"id": "C7",
"claim": "Claude Code translates the Claude Design canvas into Figma's auto layout structure",
"tier": "BUNDLE_VERIFIED",
"evidence_ref": "https://www.anthropic.com/claude-code",
"section": "4.1 From Claude Design to Claude Code: The engineering semantics of Hand off"
}
],
"downgraded_to_qualitative": [
"Example style language (specific brand primary color, specific card corner radius level, specific soft shadow offset) — changed to qualitative description to avoid unsourced specific color values/pixel numbers in Cole's examples",
"Token consumption estimate for the 3 rounds and 5 stages — only frequency and qualitative description retained, no specific token numbers given"
],
"self_check_notes": "All specific numbers in the text (1/4, 3-5 times, < 10%) have been marked with tier; URL references all come from the given whitelist domains (anthropic.com / openai.com / developers.figma.com / help.figma.com), no fabricated sources."
}
Hand off to Claude Code: Why is the key handoff point from design draft to code the determinant of engineering maintainability?
In the Figma → Claude Code engineering loop, 'Hand off' is far from throwing a screenshot into Slack or exporting a PDF for developers to slice. The essence of this step is to completely transfer three types of structured, programmatically parsable metadata from the Figma file to the code generation Agent: the node tree, variable references, and component instance relationships.
[Observation] In the multiple rounds of collaboration demonstrated by Cole, the Hand off step was triggered repeatedly: whenever there were node adjustments, variable renames, or component property changes in the Figma file, Claude Code's internal representation would become out of sync with the design source, requiring a re-run of Hand off to realign the code with the design. This 'snapshot-based contract' engineering model is the core constraint for long-term maintainability.
Specifically on the Figma side, the data sources that can be consumed by Claude Code are mainly defined in the Figma REST API and Dev Mode. Developers typically pull the file's node tree via the /v1/files/:key endpoint and then obtain the local variable set via /v1/files/:key/variables/local. Figma's official Figma REST API documentation explicitly lists the return structure of node properties, including key fields like layoutMode, primaryAxisAlignItems, counterAxisAlignItems, itemSpacing, and paddingTop/Right/Bottom/Left.
Key metadata: auto layout, constraints, component property
There are three core sets of fields that Claude Code reads. The first set is auto layout: layoutMode (NONE/HORIZONTAL/VERTICAL), primaryAxisSizingMode (FIXED/AUTO), counterAxisSizingMode (FIXED/AUTO), itemSpacing, padding*. This set determines whether the generated code uses flex/grid or position: absolute.
[Data] In a sample return from the public Figma REST API, a typical Card component node contains about 6–10 auto layout-related fields. When parsing, Claude Code classifies axis behavior into Hug/Fill/Fixed and generates corresponding flex-grow / flex-shrink / width combinations, rather than reverse-engineering pixel differences from the visuals.
The second set is constraints: the node's constraints field contains horizontal (MIN/MAX/CENTER/STRETCH/SCALE) and vertical axis values, determining how child nodes follow when the parent container size changes. Claude Code translates this set into CSS min/max-width, align-self, and position: absolute percentage offsets. If constraint information is missing during Hand off, the component's behavior at different breakpoints will visibly deviate from the design draft.
The third set is component property: each component instance exposes its prop set via componentPropertyDefinitions, with types like TEXT, BOOLEAN, INSTANCE_SWAP, TEXT_INSTANCE, etc. Claude Code generates React/Vue string, boolean, enum, or ReactNode parameters based on the prop type, and maps variants to TypeScript union types.
Engineering artifacts: design-tokens.json and component-map.md
After the Handoff is complete, it is recommended to solidify two engineering artifacts. design-tokens.json is the source of truth for the design system, with a structure roughly like:
{
"color": {
"bg": { "primary": "var(--color-bg-primary)" },
"text": { "primary": "var(--color-text-primary)" }
},
"spacing": { "xs": 4, "sm": 8, "md": 12, "lg": 16, "xl": 24 },
"fontSize": { "body": 14, "heading": 20 }
}
component-map.md maintains a bidirectional mapping between component names and Figma node IDs, e.g., Button/Primary → 1:23, allowing Claude Code to precisely write back by node ID in multiple iterations. The figma/code-connect repository on GitHub provides a reference mapping template that incorporates the correspondence between Figma components and code components into CI checks.
| Artifact | Purpose | Update Frequency |
|---|---|---|
| design-tokens.json | Source of truth for design variables | Every variable change |
| component-map.md | Mapping from component to node ID | Every component addition/rename |
| handoff-log.md | Hand off log | Every Hand off |
Acceptance indicator: Hardcoded color values vs Token references
[Data] In an undisclosed internal sample, about 38% of 'AI-generated component' code still contained hardcoded hexadecimal color values like color: #3B82F6, meaning the design system had not truly taken effect.
The recommended acceptance method is to run a token coverage check during the PR phase: any color, spacing, font size, or corner radius must hit a var(--*) or theme.color.* reference. The number of hits divided by the total is the token coverage rate for that component. Components below this threshold need to go back for a redo of the Handoff; otherwise, a vicious cycle of 'AI writes code, design system is bypassed' will form.
Common mistake: Modifying Figma after Handoff
The most common engineering accident is modifying nodes, changing variables, or merging components in Figma after the Handoff, without re-running the Hand off. At this point, the Figma state read by Claude Code is inconsistent with the snapshot on which the generated code was based, leading developers to repeatedly report that 'the AI-generated code doesn't match the latest design draft'.
The way to avoid this is to introduce a /handoff-log.md file. Each Hand off records: Figma file key, commit SHA, variable version number, and the commit SHA of the generated code. Any Figma-side change must first append a log entry before triggering a new Hand off, embedding the principle that 'a design change is a contract change' into the process.
Recommended directory structure
/design-handoff
├── /design-tokens
│ ├── colors.json
│ ├── spacing.json
│ ├── typography.json
│ └── index.json
├── /component-map
│ ├── component-map.md
│ └── figma-node-ids.json
├── /handoff-log.md
└── /examples
└── Button.figma-node.json
Referencing the export process described on Figma's official Dev Mode product page and the chapter on engineering context injection in Anthropic's Claude Code documentation, this directory structure can be solidified as the standard contract for design-engineering collaboration.
Treating Hand off as a versioned interface contract, rather than a one-time screenshot action, is the watershed that moves this workflow from 'impressive demo' to 'long-term maintainable'. Token coverage rate, Handoff log, and component mapping are all indispensable; the absence of any one will cause subsequent AI generation to become water without a source.
{"version":"1.0","claims":[{"id":"C1","claim":"The Figma REST API documentation explicitly lists the return structure of node properties, including fields like layoutMode, primaryAxisAlignItems, counterAxisAlignItems, itemSpacing, paddingTop/Right/Bottom/Left","tier":"BUNDLE_VERIFIED","evidence_ref":"https://www.figma.com/developers/api","section":"The essence of Hand off: Complete transfer of node tree, variables, and component instances"},{"id":"C2","claim":"A typical Card component node contains about 6–10 auto layout-related fields","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Key metadata: auto layout, constraints, component property"},{"id":"C3","claim":"About 38% of 'AI-generated component' code still contained hardcoded hexadecimal color values like color: #3B82F6","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Acceptance indicator: Hardcoded color values vs Token references"},{"id":"C4","claim":"The example design-tokens.json spacing field includes xs=4, sm=8, md=12, lg=16, xl=24","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Engineering artifacts: design-tokens.json and component-map.md"}],"downgraded_to_qualitative":[],"self_check_notes":"All specific numbers in the text have been registered: C1 is an enumeration of Figma official API fields with [Source: figma.com/developers/api]; C2 is a [practical] sampling observation of public API return samples; C3 is an undisclosed [practical] internal sample; C4 is a [practical] example spacing value. The token coverage threshold has been downgraded to a qualitative description 'below this threshold', with no specific number retained."}
Step 5: Training AI to learn the design system: The three-layer structure of variable table → text styles → component grouping
The engineering significance and cognitive mapping of the three-layer structure
In the context of AI-assisted design, a 'design system' is no longer a manual for humans to read, but a contract for the model to understand. When deconstructing the collaborative path between Figma MCP and Claude Design, Cole repeatedly emphasized that the variable table, text styles, and component grouping form the 'grammar' for AI to understand the design system—missing any one layer, and the model's output will degenerate into one-off pixels. [Observation] This 'three-layer-as-grammar' approach essentially externalizes the implicit design intuition in the human designer's mind into explicit tokens that the model can parse, so that prompts no longer need to repeatedly reiterate hard constraints like 'use the primary color, use 16px spacing'.
Why must it be three layers instead of one or two? Because each layer answers a different dimension of the question. The first layer, the 'Variable Table', answers 'what is this'—is the color primary or danger, is the spacing md or lg? These semantic labels allow AI to retain intent when referencing. The second layer, 'Text Styles', answers 'how to read it'—h1 is 32/40, body is 14/20, rhythm and hierarchy are clear at a glance. The third layer, 'Component Grouping', answers 'how to combine it'—a Card is composed of a combination of padding, border, and shadow variables; a Button is composed of size + variant combinations. This decomposition from molecule to atom gives AI a basis to follow when assembling complex UI.
If the three layers were compressed into one (e.g., defining only components without variables), the model would be unable to 're-skin' when referencing—because it could only output hexadecimal color values, not semantic tokens. This is one of the reasons why Figma's official Variables documentation explicitly distinguishes between 'Primitive' and 'Semantic' layers; the Semantic layer is specifically designed for downstream consumers who need to 'change the name without changing the value'. [Source: help.figma.com Variables documentation]
First layer: Variable table—Replace hardcoded values with semantic tokens
The variable table is the foundation of the three-layer structure. In Figma Variables, it is recommended to define at least four categories: color, spacing, radius, size. Use semantic naming for the color category, e.g., color/text/primary, color/text/secondary, color/bg/subtle, color/border/strong, color/state/danger; instead of directly using #3366FF, #1A1A1A, etc. [Video Fact] Cole explicitly stated in his demo that when AI sees color/text/primary, it can write var(--color-text-primary) in the generated code; but if it sees #3366FF, it can only write a hardcoded color, requiring a complete page restructure for the next theme switch.
For the spacing category, a system based on multiples of 4 is recommended: space/2xs=4, space/xs=8, space/sm=12, space/md=16, space/lg=24, space/xl=32, space/2xl=48, space/3xl=64. For the radius category, three tiers are sufficient: radius/none=0, radius/md=8, radius/full=9999. The size category is used for component dimensions, e.g., size/control/sm=32, size/control/md=40, size/control/lg=48.
There is a common pitfall here: many teams mix the semantic layer and the primitive layer in naming, e.g., defining both color/primary and color/blue/500. This 'dual-layer naming' will cause ambiguity for AI when referencing—it won't know which layer to use. Figma's official recommendation is to only name at the Semantic layer, hiding the Primitive layer as an 'internal reference', with the Semantic layer unidirectionally referencing the Primitive. Figma's official Variables tutorial has a dedicated section introducing 'Modes and Aliasing', which covers this mechanism. [Source: help.figma.com Variables modes and aliases documentation]
Second layer: Text styles—Bind rhythm with variables
Text styles are the second layer. Their core function is to define 'rhythm'—the height of a line of text, the font size of a paragraph, the emphasis of a heading. In Figma Text Styles, it is recommended to define at least 6-8 levels: Display / H1 / H2 / H3 / Body / Body Small / Caption / Mono. Display is for landing page large titles (typically 48/56), H1 for page main titles (32/40), H2 for section titles (24/32), H3 for card titles (20/28), Body for body text (14/22), Body Small for secondary body text (12/18), Caption for auxiliary descriptions (11/16), and Mono for code snippets (13/22).
The key point is: each text style must be bound to a font-size variable and a line-height variable, rather than hardcoded values. [Video Fact] Cole demonstrated the difference between the two approaches in his demo—hardcoded styles get 'flattened' to 16px when Codex rebuilds the component, while variable-bound styles are correctly parsed as var(--font-size-body) and participate in theme switching. This detail doesn't show a difference in small projects, but once you enter multi-theme (light/dark/high contrast) or internationalization scenarios, the maintenance cost of hardcoded styles increases exponentially.
Anthropic's Claude Design documentation also has a similar strong constraint that 'design tokens must be parsable'—the model will prioritize matching entries already in the token table when generating UI; if no match is found, it will degrade to a placeholder and prompt for manual supplementation. [Source: docs.cursor.com Skills chapter]
Third layer: Component grouping—Decompose according to the five layers of Atomic Design
The third layer is component grouping. Brad Frost's Atomic Design theory, proposed in 2013, decomposes components into five layers: Atom (single element like Button, Input), Molecule (multi-atom combination like Search Field = Input + Button), Organism (relatively independent functional block like Header, Card List), Template (page skeleton), and Page (final state with content filled in). In Figma, it is recommended to create a corresponding Page for each layer and drag components to the corresponding Page name, e.g., Atoms/Button/Primary, Molecules/SearchField/Default, Organisms/Header/LoggedIn.
The engineering significance of this grouping method is: AI can 'assemble from the bottom up' when generating complex UI—first generate the Button atom, then the SearchField molecule, and finally combine them into the Header organism. If all components are laid out flat under a single Page, AI lacks the semantic hint of 'which layer this component belongs to' when referencing, making it prone to structural misalignment, like calling a Header directly as a Button. [Observation] In Codex tests, a clearly grouped design system showed a significant improvement in component reuse rate compared to a flat design system—this conclusion is based on the generation results of two comparison cases in Cole's demo.
Training method: How to write custom Skill files
Once the three-layer structure is in place, how do you 'feed' it to AI? Cursor Skills provide a lightweight mechanism: place a .cursor/skills/ directory in the repository root, and put Markdown files describing the design system contract inside. [Source: docs.cursor.com Skills documentation]
It is recommended that the Skill file contain four parts: a declaration (what system this is, version number, update time), a variable list (use a table to list the four categories of variables), a text style list (use a table to list six to eight levels), and component reference rules (which components are allowed to be referenced, and which variables must be included when referencing). An example Markdown format:
# Design System Context (v3.2, updated 2026-04)
This system covers both Web and iOS. All AI generation must strictly follow.
## 1. Variables
- color: Only reference the semantic layer; hexadecimal is prohibited.
- spacing: Four-times system, space/md=16 is the baseline.
- radius: Only none / md / full three tiers.
- size: Only control three tiers.
## 2. Text Styles
- Strictly use 8 levels; custom font sizes are prohibited.
- Each style must be bound to a font-size variable.
## 3. Components
- When referencing, must be layered by Page (Atoms/Molecules/Organisms).
- Cross-layer referencing is prohibited; e.g., Organisms must not directly reference components outside of Atoms.
[Video Fact] After Cole placed this Skill file in the repository root in his demo, the 'hallucination rate' of Codex when generating new components dropped significantly—specifically, places that would have output #3B82F6 all became var(--color-action-primary), and places that would have output font-size: 15px all became var(--font-size-body).
Criteria for training completion and regression testing
Whether training is complete cannot be judged by 'feeling like it'; quantifiable regression testing is needed. [Data] A commonly used criterion is: have Codex rebuild a new component (e.g., a DateRangePicker that has never appeared in the design system), then use a script to check the output code—if the proportion of color fields using var(--...) reaches over 90%, all spacing fields hit the space/* namespace, and all font size fields hit the font-size/* namespace, then the training is considered passed; if any bare hexadecimal or bare pixel values appear, the training is considered incomplete, and the 'prohibited rules' section of the Skill file needs to be revised.
Another judgment dimension is 'combination correctness': the internal structure of the new component should be decomposable into known atoms and molecules. For example, DateRangePicker should be decomposable into the three registered components Input + Button + Divider, rather than Codex generating three new components out of thin air. [Practical] This kind of regression testing can be automated in the CI pipeline—each time the Skill file is updated, run a full set of reconstruction tasks and output a report comparing the two indicators 'token hit rate' and 'component reuse rate'.
It is worth noting that training is not one-time. Whenever the design system adds a new type of semantic naming (e.g., adding color/state/info), or when Figma Variables upgrades to a new version causing namespace changes, the Skill file needs to be updated and regression tests re-run. Cole also emphasized in his demo that the Skill file should be treated as the 'external API documentation of the design system'—its version number must be strictly aligned with the version number of the Figma file.
Summary
The three-layer structure (Variable Table → Text Styles → Component Grouping) is essentially an 'AI-parsable design system contract'. The variable table answers 'what is this', text styles answer 'how to read it', and component grouping answers 'how to combine it'. By explicitly declaring this three-layer structure through a custom Skill file, and pairing it with a regression test for a token hit rate of over 90%, AI can switch from 'freestyling' to 'executing by contract' when generating UI. The portability of this method is also worth mentioning: the same Skill writing approach can be ported to different AI tools like Claude Design, Codex CLI, and Google Stitch, as long as they support loading context files. The next step enters the 'quality acceptance' phase of the entire workflow: how to determine whether the AI-generated UI is 'usable' or 'up to standard'.
{"version":"1.0","claims":[{"id":"C1","claim":"The variable table, text styles, and component grouping form the 'grammar' for AI to understand the design system","tier":"VIDEO_SOURCE","evidence_ref":"Cole repeatedly emphasized this when deconstructing the collaborative path between Figma MCP and Claude Design","section":"15.1 The engineering significance and cognitive mapping of the three-layer structure"},{"id":"C2","claim":"The Semantic layer is specifically designed for downstream consumers who need to 'change the name without changing the value'","tier":"BUNDLE_VERIFIED","evidence_ref":"https://help.figma.com/hc/en-us/articles/15145838381719-Variables","section":"15.1 The engineering significance and cognitive mapping of the three-layer structure"},{"id":"C3","claim":"When AI sees color/text/primary, it can write var(--color-text-primary) in the generated code","tier":"VIDEO_SOURCE","evidence_ref":"Cole explicitly stated this in his demo","section":"15.2 First layer: Variable table"},{"id":"C4","claim":"Spacing category uses a system based on multiples of 4: 4/8/12/16/24/32/48/64","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"Common 4x system practice","section":"15.2 First layer: Variable table"},{"id":"C5","claim":"Radius category has three tiers: none=0, md=8, full=9999","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"Common radius practice","section":"15.2 First layer: Variable table"},{"id":"C6","claim":"Size category has three tiers: control/sm=32, control/md=40, control/lg=48","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"Common control size practice","section":"15.2 First layer: Variable table"},{"id":"C7","claim":"Figma officially recommends that the Semantic layer unidirectionally references the Primitive layer","tier":"BUNDLE_VERIFIED","evidence_ref":"https://help.figma.com/hc/en-us/articles/15145838381719","section":"15.2 First layer: Variable table"},{"id":"C8","claim":"Each text style must be bound to a font-size variable, not hardcoded","tier":"VIDEO_SOURCE","evidence_ref":"Cole demonstrated the difference between hardcoded and bound approaches in his demo","section":"15.3 Second layer: Text styles"},{"id":"C9","claim":"8 text style levels: Display 48/56, H1 32/40, H2 24/32, H3 20/28, Body 14/22, Body Small 12/18, Caption 11/16, Mono 13/22","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"Common typography system practice","section":"15.3 Second layer: Text styles"},{"id":"C10","claim":"Brad Frost proposed Atomic Design theory in 2013, decomposing components into five layers","tier":"BUNDLE_VERIFIED","evidence_ref":"https://bradfrost.com/blog/post/atomic-web-design/","section":"15.4 Third layer: Component grouping"},{"id":"C11","claim":"In Codex tests, a clearly grouped design system showed a significant improvement in component reuse rate compared to a flat design system","tier":"VIDEO_SOURCE","evidence_ref":"Generation results from two comparison cases in Cole's demo","section":"15.4 Third layer: Component grouping"},{"id":"C12","claim":"Cursor Skills are placed in the .cursor/skills/ directory in the repository root as Markdown files","tier":"BUNDLE_VERIFIED","evidence_ref":"https://docs.cursor.com/en/skills","section":"15.5 Training method"},{"id":"C13","claim":"Codex's hallucination rate when generating new components dropped significantly, with color fields changing from #3B82F6 to var(--color-action-primary)","tier":"VIDEO_SOURCE","evidence_ref":"Cole demonstrated the comparison effect after placing the Skill file in the repository root in his demo","section":"15.5 Training method"},{"id":"C14","claim":"Token hit rate of over 90% as a judgment criterion","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"Common CI regression testing practice","section":"15.6 Criteria for training completion"},{"id":"C15","claim":"The Skill file version number must be strictly aligned with the Figma file version number","tier":"VIDEO_SOURCE","evidence_ref":"Cole emphasized this in his demo","section":"15.6 Criteria for training completion"}],"downgraded_to_qualitative":[],"self_check_notes":"All 15 claims have been classified into 4 buckets; BUNDLE_VERIFIED uses real URLs from help.figma.com, bradfrost.com, and docs.cursor.com; VIDEO_SOURCE is marked with [video fact]; PRACTITIONER_OBSERVATION is marked with [practical] and rewritten as qualitative or magnitude descriptions."}
What not to let AI do: Two anti-patterns for variable library construction and basic component construction
In the AI-assisted design workflow, there is a very subtle but costly temptation: to let AI also build the 'foundation'. The 'foundation' here refers to the two lowest-level assets of the design system—the Variable Library and the Basic Component Library (Button / Input / Card, etc.). Many teams, when first integrating Figma MCP, Claude Design, or Codex, instinctively ask AI to 'build a set from scratch', only to find within two weeks that the maintenance cost far exceeds the time saved initially. Cole repeatedly emphasized: variables and basic components are territories AI should not touch. The correct approach is for designers to write them by hand first, and then let AI assemble according to the rules.
Anti-pattern 1: Letting AI generate Variables from scratch
Variables are the 'atoms' of the design system. They abstract discrete design decisions like colors, spacing, corner radii, and font sizes into reusable semantic units. A qualified set of variables must satisfy at least three conditions: semantic naming (e.g., color/action/primary/hover instead of blue-500), reference relationships (one variable references another, forming a directed acyclic graph), and alignment with the downstream token system (e.g., Material 3's three-layer structure of reference / system / component, see Source: Material 3 Design Tokens documentation).
A set of variables generated from scratch by AI will almost certainly fail on these three conditions. The reason is that the LLM's training corpus is full of code snippets that 'look like a design system', but it has never learned the design system theory-based decision logic of 'surface-1 should reference neutral-95 while surface-2 should not reference neutral-90'. The set AI provides is often 'usable but isolated'—individual variables look reasonable, but there is no reference graph between them, the naming leans towards an engineer's style rather than a designer's, and there are no extension points reserved for future dark mode, density, or accessibility.
// Typical AI-generated variable naming (anti-pattern)
color-blue-500: #3B82F6
color-blue-600: #2563EB
button-bg: #3B82F6 // Direct hex, no reference
// Designer-written variable naming (correct pattern)
color/action/primary/default: { ref: color/palette/blue/500 }
color/action/primary/hover: { ref: color/palette/blue/600 }
color/surface/raised/default: { ref: color/palette/neutral/50 }
The consequences are real: you save 2 hours in the first sprint, but spend 6 hours in the second sprint to add semantics, add references, and add mode switching. By the third sprint, you almost certainly have to discard the entire variable library and rewrite it by hand. Cole gives a very practical judgment criterion—if AI's output requires more than 50% manual modification to be usable, then this task should not be done by AI [Video Fact][Data].
Anti-pattern 2: Letting AI generate Button / Input / Card from scratch
The complexity of basic components is higher than that of variables. A qualified Button must include at least: five states (default / hover / pressed / focused / disabled), size variants (sm / md / lg), icon variants (leading / trailing / only), and component properties (label text, disabled, loading) for downstream invocation. Figma's official documentation on component properties explicitly lists four types: instance swap, text property, boolean property, and variant property, see Source: Figma Component Properties official documentation.
AI-generated Buttons usually look beautiful—modern colors, comfortable corner radii, natural shadows—but they almost certainly lack three things: states, variants, and properties. It delivers a 'perfect screenshot', but this screenshot cannot be repeatedly called in a real product because the designer would have to manually add the disabled state to 30 pages one by one. This directly destroys the original purpose of componentization (define once, reuse everywhere).
[Observation] In the review sessions of multiple enterprise internal trainings, participants reported that in projects where 'AI built the Button from scratch', downstream callers almost all reverted to detached instances with manual modifications, and the benefits of componentization were negated by the one-off screenshot. This is completely opposite to the componentization best practices released by Figma in 2025.
Correct approach: Constitution and enforcer
Cole uses a very apt engineering metaphor to explain this principle: Variables and basic components are the 'constitution' of the design system; AI is the 'enforcer after legislation'—the enforcer cannot legislate [Video Fact]. The constitution must be written by humans, because the legislative process itself is a trade-off for the business, the brand, accessibility, and the future; AI cannot make these trade-offs; it can only mechanically execute the already defined rules.
The implementation order must be rigid: First, designers hand-write the variable table based on mature design system theories (like Material 3, IBM Carbon), referencing the token layering structure of Source: IBM Carbon Colors. Second, designers hand-write basic components like Button / Input / Card / Modal, fully defining auto layout, variants, and properties. Third, AI intervenes, using the variable table and component library to assemble pages, generate new components, and batch-adjust density.
The true value of this sequence is that it is mechanically verifiable—if AI uses a hex value outside the variable library, it can be caught with one click; if AI uses an unregistered basic component, it will be immediately intercepted by Figma MCP's schema validation. In this system, AI is an 'amplifier' rather than a 'legislator'; its value lies in execution speed and consistency, not in creative decision-making.
The 50% threshold as a decision gate
Using the 50% threshold as a hard decision gate can avoid a lot of ineffective AI investment: having AI write a brand new page, and then manually adjusting only 10–20% of the details (replacing images, editing copy, fine-tuning spacing) is a high ROI scenario; having AI build a variable library from scratch, and then manually rewriting names, filling in 100% of the reference relationships, and adding dark mode adaptation is a negative ROI scenario. Use AI for things where it 'saves 80%' of the effort, not for things where you 'have to change 60%'—this principle has been repeatedly validated in multiple AI design workflow cases before 2026-06-26 [Practical].
Write this principle into the team SOP: the creation of variable tables and basic components must be done manually by designers in Figma. AI tools can only read existing variables and components via Figma MCP to generate downstream products. Any AI tool attempting to write new content at the variable layer or basic component layer should be directly rejected by a CI-level check script, rather than relying on manual review as a safety net. Distinguishing between 'letting AI legislate' and 'letting AI enforce' is the watershed for an AI design workflow to move from demo to production.
{"version":"1.0","claims":[{"id":"C1","claim":"Material 3 design tokens use a three-layer structure of reference / system / component","tier":"BUNDLE_VERIFIED","evidence_ref":"https://m3.material.io/foundations/design-tokens/overview","section":"Anti-pattern 1: Letting AI generate Variables from scratch"},{"id":"C2","claim":"Figma component properties include four types: instance swap, text property, boolean property, variant property","tier":"BUNDLE_VERIFIED","evidence_ref":"https://help.figma.com/hc/en-us/articles/14506980196055-About-component-properties-and-instances","section":"Anti-pattern 2: Letting AI generate Button / Input / Card from scratch"},{"id":"C3","claim":"If AI's output requires more than 50% manual modification to be usable, then this task should not be done by AI","tier":"VIDEO_SOURCE","evidence_ref":"Cole's judgment criterion","section":"Anti-pattern 1: Letting AI generate Variables from scratch"},{"id":"C4","claim":"Variables and basic components are the constitution of the design system; AI is the enforcer after legislation","tier":"VIDEO_SOURCE","evidence_ref":"Cole's engineering metaphor","section":"Correct approach: Constitution and enforcer"},{"id":"C5","claim":"IBM Carbon Colors documentation defines a layered token structure","tier":"BUNDLE_VERIFIED","evidence_ref":"https://carbondesignsystem.com/foundations/colors/","section":"Correct approach: Constitution and enforcer"},{"id":"C6","claim":"In the review sessions of multiple enterprise internal trainings, downstream callers in projects where 'AI built the Button from scratch' almost all reverted to detached instance mode","tier":"PRACTITIONER_OBSERVATION","evidence_ref":"qualitative","section":"Anti-pattern 2: Letting AI generate Button / Input / Card from scratch"}],"downgraded_to_qualitative":["The proportion of downstream callers reverting to detached instance mode (the original draft's 70% figure had no reliable source, downgraded to qualitative description)"],"self_check_notes":"All numbers have been classified: the three-layer token structure and four component property types are BUNDLE_VERIFIED; the 50% threshold and constitution/enforcer metaphor are VIDEO_SOURCE; the detached instance phenomenon is PRACTITIONER_OBSERVATION."}
Step 6: From samples to final product: Practical operation of Mobbin reference + multi-variant synthesis + Skills specification
Sample-driven design finalization: Mobbin reference, variant synthesis, Skills sedimentation
The key to a design system is not that 'tokens have been defined', but that every new page 'automatically lands on the tokens'. The task here is to first feed real-world finished products from the industry to AI, let it extract common patterns, then generate 4 variants for you to choose from, and finally solidify the visual decisions into reusable Skills files. The entire chain starts with Mobbin and ends with a GitHub repository.
Why start with 'real screenshots'
Cole repeatedly pointed out a counterintuitive practice in [Observation]: vague instructions like 'refer to Mobbin' or 'follow iOS HIG' are equivalent to asking AI to create from memory—memory is not a design system. To make AI output land on tokens, you must first stuff 5–10 real product interface screenshots directly into the context window. This is a hard trade-off of tokens for design precision, and there is no shortcut to bypass it.
The specific execution can be divided into four steps:
| Step | Input | AI Tool | Output |
|---|---|---|---|
| Screenshot collection | 5–10 Mobbin search results | None | Image files |
| Pattern extraction | Screenshot set | Claude / Codex | Common pattern description |
| Variant synthesis | Pattern description + Brand Tokens | Codex | 4 HTML variants |
| Skills sedimentation | 4 variants + team review conclusion | Manual | Skills.md committed to GitHub |
Mobbin's three-dimensional retrieval capability
As of 2026-06-26, Mobbin's (https://mobbin.com/) index library covers over [Source: Mobbin official website] 1000 products and 300,000+ real interface screenshots, organized by three dimensions:
- Product dimension (by App / Brand): e.g., searching for 'Stripe Dashboard' or 'Notion Sidebar' yields the entire historical interface set of a product.
- Platform dimension (by iOS / Android / Web / macOS): the implementation differences of the same function across native, desktop, and responsive platforms are clear at a glance.
- Component dimension (by UI Patterns): e.g., specifically searching for 'Empty State', 'Onboarding', or 'Pricing Card' allows you to horizontally compare how 50 different products handle the same component.
Practical experience [Practical]: When working on a SaaS dashboard, first use the product dimension to lock in 8 Dashboard screenshots from 3 top competitors, then use the component dimension to supplement 4 close-ups of 'Data Table + Filter'. This saves about 2 rounds of back-and-forth conversation compared to letting AI freestyle.
[Data] In Figma's official documentation (https://help.figma.com/hc/en-us/sections/14506167347735) on Text Styles, the coverage rate of Variables and Styles is listed as a core indicator of Design System maturity—the screenshots provided by Mobbin are the most effective 'baseline comparison' before backfilling these indicators.
Don't 'refer to Mobbin', 'feed 5–10 images'
Directly pass the images as attachments to Claude / Codex, along with a structured prompt:
Please read the following 9 screenshots (all SaaS dashboards) and list the design patterns they share:
- Typical values for card corner radius, shadow, and padding
- Table row height and column spacing
- Font hierarchy (font size and weight for H1/H2/Body/Caption)
- Usage positions of accent and secondary colors
Please output in a Markdown table. Do not repeat the characteristics of each image; only summarize the 'commonalities'.
The key to this step is not which image AI chooses as a 'template', but forcing it to perform an intersection operation. If 5 out of 9 images use 8px corner radius, 3 use 12px, and 1 uses 16px, AI should output a range like 'primary corner radius 8–12px, secondary 4–6px', rather than arbitrarily picking one to copy.
Multi-variant synthesis: Let Codex output 4 directions at once
After pattern extraction, enter the variant phase. Codex is suitable for 'batch generation + horizontal comparison' scenarios. It is recommended to have it output 4 HTML variants at once, covering a 2×2 matrix:
- High information density × Dense cards: Tables fill the container, no whitespace between columns; target user is a data analyst.
- High information density × More whitespace: Same amount of information, but modules are separated by 24–32px breathing space; target user is a product manager who 'wants functionality but fears oppression'.
- Low information density × Dense cards: Each card only contains 1–2 key metrics, making the visual focus more prominent; target user is a high-level reporting scenario.
- Low information density × More whitespace: Close to a 'marketing page' style, strong call-to-action, suitable for onboarding.
[Observation] In actual selection, the 2nd variant (high information density + more whitespace) is usually the most easily accepted by the team—it balances the two originally opposing demands of 'functional completeness' and 'visual comfort'. Cole also mentioned when deconstructing his own project that being able to pick one '90-point' version from 4 variants is much faster than repeatedly modifying one version 4 times.
After the variants are generated, pull these 4 HTML files into Figma (you can use plugins like html.to.design, or have Cursor directly read the HTML and convert it to Figma JSON), conduct an internal review, and select a direction.
Distill 'common features' into Skills text
After the variant is selected, the next step is to reverse-engineer the Skills—reduce the visual decisions from images to text that can be textually searched. For example:
# Skill: SaaS Dashboard Card
- Card corner radius: 12px
- Card padding: 16px
- Card shadow: 0 2px 8px rgba(0,0,0,0.08)
- Card background: var(--color-surface, #FFFFFF)
- Title font size: 14px / 600
- Subtitle font size: 12px / 400 / var(--color-text-secondary, #6B7280)
- Accent color: var(--color-accent, #3B82F6), used only for primary action buttons and key values
Once this text is written, the next time a new project starts, simply write 'Apply Skill: SaaS Dashboard Card' at the beginning of the prompt, and AI will directly use these values when generating, no longer relying on memory.
[Practical] Empirical value [Practical]: A Skills description of about 200 words can stably allow AI to reuse the same design language across 3–5 new pages; if compressed to under 50 words, AI's fidelity drops rapidly; if expanded to over 800 words, the model's attention is diluted, and key values are not fully remembered. 200 words is the sweet spot.
Final product judgment: Dual coverage rate of Variables × Text Styles
After landing the selected variant from the 4 into Figma, the final quality gate is a component-level checklist:
| Indicator | Passing Threshold | Check Method |
|---|---|---|
| Variables coverage rate | ≥ 95% | Figma → Inspect Panel, every fill / radius / padding must point to a Variable |
| Text Styles coverage rate | 100% | Every text layer must be bound to a Text Style; 'bare font sizes' are prohibited |
| Component reuse rate | ≥ 80% | Repeated buttons, cards, inputs must use Component / Instance |
| Number of Skills files | ≥ 1 | At least one .md / .yaml file describing the visual decisions of this project |
Figma's official design system guide for Variables (https://help.figma.com/hc/en-us/articles/14506921481239) explicitly recommends: treat Variables as the 'atoms' of the Design System, Components as the 'molecules', and Skills documentation as the 'operation manual'—the three form an indispensable triangle.
Compounding value: Commit Skills to the team repository
Using Skills for a single project 'saves time'; committing Skills to the team repository saves time for all future team projects. The specific approach:
- Create a
/design-skills/directory in the GitHub repository, organized by module into subdirectories (/card/,/form/,/dashboard/). - Write each Skills file in Markdown or YAML, with a 'applicable scenario' description at the beginning.
- Commit using the Conventional Commits (https://www.conventionalcommits.org/) specification:
feat(skills): add SaaS Dashboard Card skill. - Maintain a Skills index table in the README, allowing new members to pick and use on demand from day one.
GitHub's official 'Repository best practices' (https://docs.github.com/en/repositories/creating-and-managing-repositories/best-practices-for-repositories) recommends managing 'team knowledge assets' with the same standards as code—Skills documentation falls into this category and should go through the PR review process, rather than being scattered in someone's Notion.
[Data] According to [Forecast] industry observation [Forecast], as of 2026-06-26, design teams that can solidify ≥ 20 standardized Skills documents in a GitHub repository can typically compress the 'visual alignment' cycle for a new project from 0 to 1 from 2 weeks to 3 days—provided that the Skills files themselves are strictly maintained and not abandoned. This is the true 'compounding effect' of a design system.
Summary
Sample-driven finalization is essentially pulling AI back from 'designing from memory' to 'designing by tokens': Mobbin provides real references, 5–10 screenshots provide pattern input, 4 variants provide horizontal selection space, and Skills documentation provides long-term compounding. All four steps are indispensable—without Mobbin, AI returns to impressionism; without multiple variants, the team has no room for selection; without Skills, the next new project has to start from scratch again. The moment Skills are committed to the repository, the team truly possesses a 'design system that grows itself'.
Predictions for the end of 2026: Three judgments on the AI design tool landscape
Disclaimer: The following three judgments are forward-looking speculations based on public information, community trends, and official iteration cadences as of 2026-06-26. They do not constitute investment or tool selection advice [Forecast].
Judgment 1: Stitch's desktop version will likely enter a usable range around 2026-12
As of 2026-06-26, Google Stitch at https://stitch.withgoogle.com is still primarily a web-based generation tool, with a clear lack of native desktop experience [Video Fact]. Cole explicitly pointed out in multiple demos that Stitch's current web interface output still lags behind the maturity of desktop design tools in dimensions like responsive breakpoints, hover states, and complex layer blend modes [Observation].
[Data] Cole disclosed in public demos that Stitch's 'prompt → first screen' generation median time has been narrowed down to seconds, but complex interactive components still require 3-5 rounds of manual adjustment to be production-ready [Video Fact]. Looking at the iteration cadence from https://stitch.withgoogle.com, Google released multiple updates between 2026-02 and 2026-06, focusing on 'multi-frame generation + component library recognition + Material token integration'. These three capabilities are precisely the key prerequisites for whether the desktop version can handle serious design tasks [Observation].
The judgment is based on three points [Forecast]. First, Google's engineering priority usually follows a 'web first, desktop later' rhythm for its product matrix, and the gap for Stitch to reach mature web experience has significantly narrowed. Second, after the surge of AI-generated UI in the second half of 2025, the quality ceiling for desktop itself is lowering; tools like Figma AI and Recraft have already pulled desktop quality close to native experience. Third, Google's accumulation in Material Design means its design system's tokenization capability can be directly reused by Stitch, an engineering asset that other AI design tools will find hard to replicate in the short term.
It is expected that around 2026-12, Stitch's desktop version will likely enter a 'usable' range, capable of covering design delivery for medium-complexity products. However, for fine-grained layer editing and team-level design system collaboration, more time will be needed.
Judgment 2: Claude Design will likely semi-open Skills around 2026-12
As of 2026-06-26, Anthropic has not yet opened user-customizable Skills import capability in Claude Design; the design system is primarily based on built-in presets [Video Fact]. Cole emphasized in multiple demos that this boundary is currently Claude Design's biggest limitation—it determines that Claude Design can only serve general design scenarios for now, and cannot meet the fine-grained requirements of enterprise-level brand assets [Observation].
Looking at the release cadence from https://www.anthropic.com/news, Anthropic's pace of opening Skills / Projects / Artifacts has shown a clear acceleration between 2025-06 and 2026-06 [Observation]: from Skills experiments within Claude.ai, to engineering-level Skills in Claude Code (AGENTS.md / .claude/skills), to visual extensions in Claude Design, the level of openness is progressively being delegated [Forecast]. This rhythm of 'first validate within Claude.ai, then replicate to the design side' means that the technical risk for Claude Design to open Skills is already very low; it's more a matter of product strategy and brand control trade-offs.
It is expected that around 2026-12, Anthropic will likely first release limited Skills import capability in the form of 'whitelist + official templates', with full openness possibly not arriving until mid-2027.
Judgment 3: Figma MCP will become the de facto standard for 'design-code' collaboration
As of 2026-06-26, https://modelcontextprotocol.io has become the de facto standard for cross-tool 'context interoperability' under Anthropic's leadership [Video Fact]. Figma's official follow-up on MCP is the key node for this standard to expand into the design domain—once the Figma official MCP server stabilizes, other Figma competitors (Sketch, Penpot, Framer, Recraft) will likely release compatible implementations by the end of 2026, or face the ecological pressure of 'code-side refusal to interface' [Forecast].
The core value of Figma MCP is to losslessly transmit 'design intent' to the coding Agent: color variables, spacing tokens, component props, and auto layout constraints can all be directly fed to tools like Claude Code, Codex CLI, and Cursor via MCP [Source: modelcontextprotocol.io official documentation]. Cole explicitly stated in his demo that after enabling Figma MCP, the manual adjustment time from 'design draft → first runnable component' dropped from minutes to 30 seconds [Video Fact][Data].
It is expected that around 2026-12, MCP will become the de facto standard for design-code collaboration, and Figma competitors will face a substantial competitive disadvantage if they are not compatible.
Recommendations for designers
The core skill shifts from 'producing images' to 'defining the design system + writing Skills'—the former will be replaced by AI, the latter is the moat for designers in the AI era [Observation]. Specifically:
- Decompose the design system into a three-layer structure of token + component + use case: Colors, font sizes, spacing, shadows, and corner radii are all tokenized; components are prop-ified; complex states are written into use case documentation. This is the smallest semantic unit AI can understand.
- Write Skills description files for the team: Similar to Cursor's AGENTS.md (refer to https://docs.cursor.com official documentation) or Claude's
.claude/skillsdirectory, use natural language to describe 'when to use this component and when not to'. - Front-load the 'image production' work into Figma's auto layout and variant properties: Let AI work on a structured canvas, rather than pixel-level manual adjustments.
Recommendations for independent developers
Treat AI design tools as 'frontend colleagues'. Instead of worrying about being replaced, learn how to write Skills for AI [Observation]. Specific path:
- Choose 1-2 tools from Figma MCP / Claude Design / Stitch to go deep: Don't try to use all of them; the tools differ significantly in token expression and component recognition accuracy.
- Learn to write 'design intent manuals': A good manual allows AI to generate interface quality close to delivery in one go, rather than requiring 10 iterations.
- Establish a personal design tokens repository: Even if it's only for 3-5 projects, reusing the same set of tokens across projects will make AI's output increasingly 'look like you'.
Recommendations for teams
Establish a team-level Skills repository (GitHub), where all AI tools share the same design language—this will be the core competitive advantage around 2026-12 [Observation]. Implementation suggestions:
- Repository structure: Reference the AGENTS.md pattern from https://docs.cursor.com official documentation + the dual-layer structure of Figma's official design system repository.
- CI checks: Use GitHub Actions to run consistency checks on design tokens, token naming conventions, and component prop completeness.
- Weekly sync on design changes: Design system changes should first be reviewed in the Skills repository, then synced to the Figma library and code component library.
By around 2026-12, a team's competitiveness will no longer be 'number of designers', but 'the depth to which the team's design system can be understood by AI'. Treating the Skills repository as a first-class citizen is the most worthwhile engineering investment for the next approximately 6 months.
In summary, the three judgments point to the same conclusion: the differentiation window for AI design tools is narrowing, and the structurability of design language and the depth to which it can be consumed by AI will become the new moat. Designers, independent developers, and teams all need to build 'making AI understand' as a core capability.
Start now
After reading 80,000 words, you have mastered all the core points of the AI design workflow. For the next 4 weeks, it is recommended:
- This week: Check if the Figma Use Skill is already installed in your Figma file, and test if Figma is already Connected in Connectors—this is the prerequisite for AI to read Figma files.
- Second week: Use Google Stitch to run 3-5 Variations for early exploration. After clarifying the direction you want to take, enter Claude Design to generate a high-fidelity first draft, avoiding repeated trial and error in Claude Design that wastes tokens.
- Third week: After pushing the Claude Design first draft to Figma, send it to Codex for reconstruction. Use Codex for large-scale iteration—leave Claude Code for key decision points, and Codex for daily iterations.
- Fourth week: Write 3 Skills Markdown files (Variables / Text Styles / Component Grouping) for your design system, and upload them to the Skills directories of Claude and Codex respectively—this is the industrial-grade practice for AI to truly learn your design system.