跪拜 Guibai
← Back to the summary

The AI Developer's Interview Playbook: Multi-Model Workflows, Context Governance, and Agent Architecture

AI Development Related Interview Questions Collection (Including Demo + Deep Dive Simulation)

Three layers per question: ① Core Answer ② Runnable Demo ③ Deep Dive Q&A Simulation Dialogue


1. What are the temperaments of GPT / Gemini / Claude, and how to combine them in a multi-model workflow

Core Answer

Model Strengths Weaknesses
GPT-4o Broad ecosystem, stable tool calling, strong multimodal, fast response Sometimes overly eager to please, insufficient depth in long reasoning
Gemini 2.0 Ultra-long context (1M tokens), good Google ecosystem integration Instruction following sometimes drifts, slightly less stable code
Claude Sonnet/Opus Precise instruction following, high code quality, strong long-text processing Smaller tool ecosystem, sometimes overly cautious

Selection mantra: Claude does the work, GPT reviews, Gemini stuffs the material (ultra-long context).

Multi-model collaboration architecture:

Claude Code (4-5 tabs parallel implementation)
  ├── Tab A: feature/search implementation
  ├── Tab B: feature/auth implementation
  └── Tab C: solution design / architecture review

Codex / GPT-4 (independent session)
  └── Critical review of Tab A/B output
      —— No shared context, ensuring independence

Workflow sedimentation:


🔧 Demo: Unified Multi-Model Access via OpenRouter

// openrouter-demo.ts  ——  One key routes to different models
// Run: npx ts-node openrouter-demo.ts

const BASE = 'https://openrouter.ai/api/v1/chat/completions';
const KEY  = process.env.OPENROUTER_API_KEY!;

async function callModel(model: string, prompt: string) {
  const res = await fetch(BASE, {
    method: 'POST',
    headers: { Authorization: `Bearer ${KEY}`, 'Content-Type': 'application/json' },
    body: JSON.stringify({
      model,
      messages: [{ role: 'user', content: prompt }],
    }),
  });
  const data = await res.json();
  return data.choices[0].message.content as string;
}

async function reviewWorkflow(task: string) {
  console.log('=== Step 1: Claude Implementation ===');
  const implementation = await callModel(
    'anthropic/claude-sonnet-4-5',
    `Please implement the following feature, output code only: ${task}`
  );
  console.log(implementation);

  console.log('\n=== Step 2: GPT-4 Review ===');
  const review = await callModel(
    'openai/gpt-4o',
    `Please perform a critical review of the following code, listing potential issues:\n\n${implementation}`
  );
  console.log(review);
}

reviewWorkflow('Implement a React Hook for a search input box with debounce');

💬 Deep Dive Q&A Simulation

Q: How do you ensure the two models don't influence each other, for example, that GPT's review isn't biased by Claude's implementation approach?

The key is context isolation. The GPT review session only receives the final code artifact, not Claude's thought process or intermediate dialogue. My approach is to copy the code into a new GPT session after Claude finishes, with a prompt containing only the code + "Please independently and critically review," without any of Claude's explanations. This way, GPT evaluates from its own perspective, not defending Claude's work.

Q: You say Claude follows instructions precisely, can you give a specific example?

I have a scenario: asking the AI to modify only one specific function, without touching other files. Claude generally adheres strictly. GPT-4 sometimes "conveniently" changes a few other places it thinks are problematic. In multi-file projects, this difference has a huge impact because you don't know what it changed, increasing review costs.

Q: Have you actually used Gemini's ultra-long context, and did you encounter any pitfalls?

Yes, I've used it, mainly to dump an entire repo in to ask architectural questions. The pitfall is: the longer the context, the easier the model gets "lost," with attention to early information dropping. In practice, after exceeding 200k tokens, the accuracy of answers visibly declines. So I don't blindly dump everything; I filter files first, only including relevant modules.


2. AI Context Governance (Pollution, Rollback, Sub-Agent Isolation, Compression)

Core Answer

Problem Strategy
Context Pollution Start new chat / /clear / compact; only bring in relevant content
Sub-Agent Isolation Sub-tasks launch independent Agents, parent Agent only sees summary results
Context Compression For ultra-long conversations, have the model summarize the current state, start a new chat and paste the summary to continue
Skill Injection Externalize norms into skill files, inject on demand, don't pollute the main dialogue
Version Rollback Commit at important milestones; if AI messes up, directly git checkout

🔧 Demo: CLAUDE.md Project Norm Template (Minimum Viable Version)

<!-- .claude/CLAUDE.md  ——  Place in project root, Claude Code reads automatically -->

# Project Norms

## Tech Stack
- React 18 + TypeScript 5 + Vite
- State Management: Zustand (Redux is forbidden)
- Styling: Tailwind CSS (inline styles are forbidden)
- Requests: TanStack Query v5

## Directory Conventions
- Components go in src/components, one directory per component (index.tsx + index.module.css)
- Business hooks go in src/hooks, prefixed with use
- Utility functions go in src/utils, pure functions, no React imports

## Prohibitions
- No direct fetch inside components, must use TanStack Query
- No use of any, must define specific types
- Do not modify files in the src/api directory, they are auto-generated

## Commit Conventions
- Prefixes: feat: / fix: / refactor: / chore:
- Each PR does only one thing

🔧 Demo: Pre-commit Hook Auto-Check (Minimal Reproduction)

# .husky/pre-commit
#!/bin/sh

echo "Running pre-commit checks..."

# TypeScript type checking
npx tsc --noEmit
if [ $? -ne 0 ]; then
  echo "❌ TypeScript type errors, please fix before committing"
  exit 1
fi

# ESLint
npx eslint src --ext .ts,.tsx --max-warnings 0
if [ $? -ne 0 ]; then
  echo "❌ ESLint errors, please fix before committing"
  exit 1
fi

echo "✅ Checks passed"

🔧 Demo: Sub-Agent Isolation Pattern (Claude Code Task Tool Illustration)

# Parent Agent prompt (in CLAUDE.md or system prompt):
You are an orchestrator. Break down the following requirement into independent sub-tasks,
launch an independent Agent for each sub-task using the Task tool,
focus only on the final artifact, not the execution details.

# Sub-Agent only receives:
Task description + relevant files + completion criteria
—— Does not receive the parent Agent's conversation history

# Parent Agent receives:
Execution summary + list of modified files
—— Does not receive the sub-Agent's intermediate reasoning

💬 Deep Dive Q&A Simulation

Q: How did you discover context pollution, are there specific symptoms?

The most obvious symptom is the AI starting to "hold grudges." For example, if I tried a wrong approach before and later switched to the right direction, the AI would still occasionally drift back to the wrong direction because its context is full of failed attempts. Another symptom is answers getting longer and more verbose, as if constantly recapping previous content. At that point, I start a new chat, pasting only the currently valid code and the next goal.

Q: How is Sub-Agent isolation implemented in engineering terms, not just hand-waving?

Using Claude Code's Task tool, the parent Agent launches sub-Agents via Task, and sub-Agents have their own independent context windows. In the parent Agent's prompt, I explicitly write "don't do it yourself, distribute using the Task tool," and sub-Agents only return execution summaries upon completion. If not using Claude Code, the manual implementation is: each sub-task opens a new API call, the system prompt only carries the current sub-task's context, and results are aggregated at the parent level.

Q: Does information get lost after compact/compression, and how do you ensure key decisions aren't lost?

Yes, it does get lost; that's the cost of compression. My approach is, when making important architectural decisions, to have the AI generate a "decision record" (similar to ADR), written into CLAUDE.md or a separate decisions.md. This way, after compressing the context, the decision record remains in the file system. The next time the AI reads the file, it can restore context, rather than relying on conversation history.


3. Has the company used AI to improve efficiency and empowerment

Core Answer

Expand across five dimensions, ideally with quantitative data:

  1. Code Level: Claude Code implementation + Codex Review, parallel multi-tab
  2. Process Level: Slash commands, skill system to sediment high-frequency prompts
  3. Documentation Level: AI-generated technical docs, PRD drafts, commit messages
  4. Quality Level: AI-assisted unit test writing, security scanning
  5. Workflow Level: CLAUDE.md + hooks automated checks

🔧 Demo: Slash command /review example

<!-- .claude/commands/review.md  ——  /review command -->

Please perform a code review on the current changes, outputting according to the following dimensions:

## Correctness
- Are there logic bugs?
- Are edge cases handled?

## Security
- Any XSS / CSRF / SQL injection risks?
- Is user input validated?

## Performance
- Any unnecessary re-renders?
- Any memory leak risks?

## Maintainability
- Are names clear?
- Are there magic numbers that should be extracted as constants?

Finally, provide: 🔴 Must Fix / 🟡 Suggested Change / 🟢 Acceptable

🔧 Demo: Hook for AI-generated commit messages

# .husky/prepare-commit-msg
#!/bin/sh

COMMIT_MSG_FILE=$1
COMMIT_SOURCE=$2

# Only effective for empty commit messages (when user hasn't manually filled one)
if [ -z "$COMMIT_SOURCE" ]; then
  DIFF=$(git diff --cached --stat)
  # Call AI to generate commit message (example using Claude API)
  MSG=$(curl -s https://api.anthropic.com/v1/messages \
    -H "x-api-key: $ANTHROPIC_API_KEY" \
    -H "anthropic-version: 2023-06-01" \
    -H "content-type: application/json" \
    -d "{
      "model": "claude-haiku-4-5",
      "max_tokens": 100,
      "messages": [{"role": "user", "content": "Generate a single-line conventional commit message (feat/fix/refactor/chore: description) based on the following git diff stat:\n$DIFF"}]
    }" | jq -r '.content[0].text')
  echo "$MSG" > "$COMMIT_MSG_FILE"
fi

💬 Deep Dive Q&A Simulation

Q: You say AI helps you write unit tests, does test coverage really improve, and what's the quality of AI-written tests like?

Coverage does improve, but quality depends on the prompt. If you just tell the AI "write unit tests for this function," it usually only tests the happy path, with insufficient edge case coverage. I changed it to "write unit tests, must include: normal cases, boundary values, error cases, async failure cases," and the quality improved significantly. Also, AI often overuses mocks; I constrain it with "use as few mocks as possible, prioritize real logic."

Q: You mentioned quantitative gains like "from two days down to half a day," how is that measured, is it credible?

This is mainly based on comparison, comparing PR submission times for similar types of feature development before and after AI intervention. Strictly speaking, it's not scientific because feature complexity varies. A more objective measure is review rounds: after our team introduced AI-assisted review, the average PR review rounds dropped from 3.2 to 1.8, which is recorded.


4. Skill System Design (Discovery, Registration, Hot Update, Isolation)

Core Answer

The essence of a Skill system is pluggable prompt modularization + context injection pipeline, with an architecture similar to a plugin system.

Phase Implementation
Discovery Scan conventional directories for SKILL.md; priority: project-level > user-level > system-level
Registration Metadata (name/description/trigger) + content → written to in-memory registry
Hot Update fs.watch monitors directory, re-parses on change, hash comparison avoids unnecessary reloads
Isolation Independent context sandbox; minimal toolset (default read-only); Sub-Agent mode

🔧 Demo: A Minimal SKILL.md

---
name: react-component
description: Generate a React functional component conforming to project norms, including TypeScript types, style modules, unit tests
trigger: /component
---

## Usage
Provide the component name and functional description when calling.

## Output Specification
1. `ComponentName/index.tsx` — Component body
2. `ComponentName/index.module.css` — Styles (Tailwind preferred, use modules for complex styles)
3. `ComponentName/index.test.tsx` — Unit tests (vitest + @testing-library/react)

## Code Standards
- Props interface named `ComponentNameProps`
- Must have JSDoc comment describing component purpose
- Do not use default export, use named export
- Async operations must have loading / error states

## Example Structure
```tsx
export interface SearchBoxProps {
  /** Search result callback */
  onSearch: (keyword: string) => void;
  placeholder?: string;
}

export function SearchBox({ onSearch, placeholder = 'Search...' }: SearchBoxProps) {
  // implementation
}

### 🔧 Demo: Minimal Skill Hot Update Watcher (Node.js)

// skill-watcher.ts —— npx ts-node skill-watcher.ts import fs from 'fs'; import path from 'path'; import crypto from 'crypto';

interface Skill { name: string; description: string; content: string; hash: string; }

const registry = new Map<string, Skill>(); const SKILLS_DIR = path.resolve('.claude/skills');

function parseSkill(filePath: string): Skill | null { try { const raw = fs.readFileSync(filePath, 'utf-8'); // Parse frontmatter const match = raw.match(/^---\n([\s\S]?)\n---\n([\s\S])$/); if (!match) return null; const meta: Record<string, string> = {}; match[1].split('\n').forEach(line => { const [k, ...v] = line.split(':'); if (k) meta[k.trim()] = v.join(':').trim(); }); return { name: meta.name, description: meta.description, content: match[2], hash: crypto.createHash('md5').update(raw).digest('hex'), }; } catch { return null; } }

function loadSkills() { if (!fs.existsSync(SKILLS_DIR)) return; fs.readdirSync(SKILLS_DIR) .filter(f => f.endsWith('.md')) .forEach(f => { const skill = parseSkill(path.join(SKILLS_DIR, f)); if (skill) { registry.set(skill.name, skill); console.log([skill] loaded: ${skill.name}); } }); }

function watchSkills() { loadSkills(); fs.watch(SKILLS_DIR, (event, filename) => { if (!filename?.endsWith('.md')) return; const filePath = path.join(SKILLS_DIR, filename); const skill = parseSkill(filePath); if (!skill) return; const existing = registry.get(skill.name); // Hash comparison to avoid unnecessary reloads if (existing?.hash === skill.hash) return; registry.set(skill.name, skill); console.log([skill] hot-reloaded: ${skill.name}); }); console.log(Watching ${SKILLS_DIR} for skill changes...); }

watchSkills(); // registry is now queryable at any time export { registry };


* * *

### 💬 Deep Dive Q&A Simulation

**Q: Skill discovery uses conventional directory scanning, what if skill names conflict?**

> Priority rule: project-level > user-level > system-level, proximity principle. Within the same level, later loaded overwrites earlier loaded. In practice, you can also namespace in the description field, like `react:component` and `vue:component`, so registry keys with namespaces won't conflict.

**Q: You use `fs.watch` for hot updates, but it's notoriously unreliable on macOS, how do you handle that?**

> Indeed, `fs.watch` sometimes misses events on macOS. A production-grade solution would use `chokidar`, which internally uses different APIs on different platforms (FSEvents on macOS, inotify on Linux), offering much better stability. Also, add a polling fallback, doing a full rescan every 30 seconds as a safety net.

**Q: You mentioned "independent context sandbox" for isolation, have you encountered context leakage scenarios?**

> Yes, I have. The most typical is when skill A injects a role setting like "you are a strict code reviewer," and without proper cleanup, the AI starts being overly critical in subsequent normal conversations. The solution is to send a reset prompt after skill execution: "Skill execution ended, returning to default role," or a more thorough approach is to execute skills with independent Agent instances, achieving natural isolation.

* * *

## 5. What Vibecoding tools do you usually use

### Core Answer

- **Main Implementation**: Claude Code (CLI) — Precise instruction following, high code quality
- **Deep Review**: Codex — Deeper thinking, suitable for critical review
- **IDE Embedded**: Cursor / Windsurf — When needing to operate within the editor
- **Line-level Completion**: GitHub Copilot — Low cognitive load autocomplete
- **Parallel Mode**: Open 4–5 Claude Code sessions simultaneously, running different sub-tasks in parallel

* * *

### 🔧 Demo: Claude Code Multi-Tab Parallel Workflow

Terminal 1 —— feature/search implementation

cd ~/project claude "Implement search component, debounce 300ms, display results in SearchResults component, style reference Figma link, API docs in docs/api/search.md"

Terminal 2 —— feature/auth implementation (parallel, independent)

claude "Implement login page, use react-hook-form + zod for form validation, on success store token in useAuthStore, redirect to /dashboard"

Terminal 3 —— Review Tab 1 output (new session, independent perspective)

claude "Please review the code under src/components/Search directory, focus on: correct debounce, complete loading state, error handling coverage"


* * *

### 💬 Deep Dive Q&A Simulation

**Q: Multi-tab parallel, contexts isolated, but what if two tabs modify the same file?**

> This is the biggest risk of parallel AI development, so task splitting must clearly define file ownership. Files assigned to Tab A cannot be touched by Tab B. I explicitly write "only modify the following files: xxx" in the prompt, and after both tabs finish, I manually do a merge review to check for conflicts. Frequent commits are also key; each tab commits upon completing a small milestone, so git has a complete history, and conflicts can be traced via diff.

**Q: How do you choose between Cursor and Claude Code?**

> Different scenarios. Cursor suits situations where you need to ask and modify anytime within the editor, with the context being the currently open file. Claude Code suits scenarios requiring precise control, complex multi-file tasks, and needing custom hooks and CLAUDE.md. I mainly use Claude Code now because its instruction following is more controllable, and its hooks and skill system are more flexible.

* * *

## 6. How do you usually use Vibecoding

### Core Answer

1.  **Task Decomposition First**: Break large requirements into independent sub-tasks, one per tab
1.  **CLAUDE.md Injects Norms**: Write architecture, naming, prohibitions in; automatically effective each time
1.  **Plan → Review Separation**: Claude implements, Codex independently reviews, forming checks and balances
1.  **Self-built Slash Commands**: Encapsulate high-frequency operations into `/review`, `/writer`, etc.
1.  **Hooks Automation**: pre-commit triggers lint / typecheck, no manual reminding of AI
1.  **Timely Commits**: Commit at every workable milestone for easy rollback

* * *

### 🔧 Demo: Standard Workflow for a Complete Vibecoding Session

1. Open project, Claude Code automatically reads CLAUDE.md

$ claude

2. Tell AI what to do today (task decomposition)

Today we're building a search feature, broken into three sub-tasks:

  1. SearchInput component (debounce, clear, loading)
  2. SearchResults component (list rendering, empty state, error state)
  3. useSearch hook (API call, state management) Start with the first one, tell me when done to proceed to the second.

3. After AI completes, review

/review (triggers custom review slash command)

4. Confirm no issues, commit

Help me generate a commit message

5. Start the second sub-task (new prompt, keep context clean)

Now do the SearchResults component, receiving results: SearchResult[] type, empty state shows EmptyState component, loading shows Skeleton, error state shows ErrorBoundary


* * *

### 💬 Deep Dive Q&A Simulation

**Q: You say "plan and review separation," after Codex review finds issues, how do you feed the problems back to Claude?**

> I don't directly paste Codex's review to Claude, because then the context becomes "Claude defending its own code," prone to bias. My approach is: digest each issue in the review myself, confirm it's a real problem, then tell Claude "this code has a problem: ..., please fix it," driven by the problem description, not by another AI's evaluation.

**Q: The habit of timely commits, what if the AI modifies a bunch of files before committing, and the functionality is incomplete?**

> Use WIP commits (Work In Progress). `git commit -m "wip: search component skeleton, functionality incomplete"`, tag it as wip, and configure CI so wip commits don't trigger deployment pipelines. Once functionality is complete, `git rebase -i` to squash wip commits into a clean commit before pushing the PR. This provides a safety net without polluting the mainline history.

* * *

## 7. What to watch out for when using Vibecoding

### Core Answer

1.  **Context Management**: Start a new chat promptly when conversations get too long, use summaries to continue
1.  **Don't Blindly Accept**: Security-related code (SQL/XSS/permissions) must be manually reviewed
1.  **Atomic Commits**: Keep change scope small for easy review and rollback
1.  **Clear Constraints**: Explicitly state what not to change, preventing AI from improvising
1.  **Synchronous Testing**: Write features and tests simultaneously, don't backfill later
1.  **Context Pollution**: Clean up failed directions promptly, start new chats
1.  **Maintain Judgment**: Make architectural decisions yourself, AI provides candidate solutions

* * *

### 🔧 Demo: Good Prompt vs Bad Prompt Comparison

❌ Bad prompt —— Too vague, AI will make many assumptions

Help me write a user management page

✅ Good prompt —— Clear scope, constraints, what not to do

Implement the UserManagement page, requirements:


❌ Bad prompt —— Letting AI change too much at once

Refactor the entire src/api directory, change all requests from axios to fetch, also add error handling, retry logic, auth token injection

✅ Good prompt —— Break into small steps, progress gradually

First, only change src/api/user.ts from axios to fetch, keep interface signatures unchanged, add basic error handling (throw on 4xx/5xx), no retry or token injection, that's for later


* * *

### 💬 Deep Dive Q&A Simulation

**Q: You say security-related code must be manually reviewed, how do you actually determine what's security-related?**

> I have a mental checklist: ① Is user input directly concatenated into SQL / HTML / shell commands ② Does the API have authentication checks ③ Is sensitive data (passwords, tokens) being logged or exposed in responses ④ Does file upload have type and size restrictions. When encountering these types of code, I pause and read line by line myself, not relying on AI self-checks—because when AI self-checks, it's validating its own logic, prone to blind spots.

**Q: When AI improvises and changes things it shouldn't, how do you quickly discover it?**

> `git diff --stat` first to see which files were changed. If the file list includes files I didn't mention, immediately `git diff that file` to see what specifically changed. This is why atomic commits are crucial; every time AI finishes, I diff first then commit, rather than doing a bulk commit after a lot of work, when the diff is too large to review.

* * *

## 8. Have you used third-party enhancement tools like Superpower, OpenRouter

### Core Answer

| Tool | Purpose | Characteristics |
| -------------------------- | ---------------------- | ---------------------- |
| **OpenRouter** | Unified API gateway, one key routes to multiple models | Convenient multi-model comparison, usage statistics |
| **Superpower for ChatGPT** | Enhance ChatGPT interface | History search, prompt templates, folder categorization |
| **Continue.dev** | Open-source IDE plugin, connects to multiple models | Supports custom context providers |
| **Aider** | CLI AI programming, supports git integration | Auto commit, suitable for pure terminal workflows |

Focus on the core: **precise context control**, **multi-model routing**, **data localization** (no data export).

* * *

### 🔧 Demo: OpenRouter Multi-Model Comparison on the Same Question

// compare-models.ts —— npx ts-node compare-models.ts const MODELS = [ 'anthropic/claude-sonnet-4-5', 'openai/gpt-4o', 'google/gemini-2.0-flash', ];

const QUESTION = 'Explain what a closure is in under 100 words, give one practical application scenario';

async function ask(model: string, question: string) { const res = await fetch('https://openrouter.ai/api/v1/chat/completions', { method: 'POST', headers: { Authorization: Bearer ${process.env.OPENROUTER_API_KEY}, 'Content-Type': 'application/json', }, body: JSON.stringify({ model, messages: [{ role: 'user', content: question }], max_tokens: 200, }), }); const data = await res.json(); return data.choices[0].message.content; }

(async () => { for (const model of MODELS) { console.log(\n=== ${model} ===); console.log(await ask(model, QUESTION)); } })();


* * *

### 💬 Deep Dive Q&A Simulation

**Q: Does using OpenRouter pose data security issues, and is it compliant for company code to pass through a third party?**

> This is something to consider in practice. My approach is: core business code for company projects does not pass through third parties; I only use OpenRouter for multi-model comparison on personal and open-source projects. Internally, the company uses direct vendor APIs (Anthropic / OpenAI official). If there's a need for privatization, locally deployed models (Ollama + open-source models) are used. This is a compliance boundary; proactively mentioning it in an interview shows security awareness.

* * *

## 9. Why does AI streaming output use SSE instead of WebSocket

### Core Answer

| Dimension | SSE | WebSocket |
| ---- | ------------------------- | ----------- |
| Communication Direction | Unidirectional (Server → Client) | Bidirectional |
| Protocol Basis | Standard HTTP | Requires Upgrade handshake |
| Reconnection | Browser auto-reconnects | Manual implementation needed |
| Load Balancing | Naturally supported | Requires extra proxy configuration |
| Authentication | EventSource doesn't support custom Headers | Can carry Headers during handshake |
| Implementation Cost | Low (a few lines of code) | High (need to maintain connection state) |

**Why AI chooses SSE:** Streaming output is unidirectional push; SSE semantics perfectly match. HTTP-based is infrastructure-friendly. Under HTTP/2, performance rivals WebSocket.

**Scenarios for WebSocket:** AI conversations needing real-time interruption (user stops generation mid-speech), multi-user collaboration (multiple users sending messages simultaneously).

* * *

### 🔧 Demo: SSE vs WebSocket Minimal Comparison (Runnable in Browser Console)

// SSE Client (EventSource) const es = new EventSource('/sse-stream'); es.onmessage = e => console.log('SSE received:', e.data); es.onerror = () => console.log('SSE error, will auto-reconnect'); // Browser auto-reconnects on disconnect, no extra code needed

// WebSocket Client const ws = new WebSocket('ws://localhost:3001'); ws.onopen = () => console.log('WS connected'); ws.onmessage = e => console.log('WS received:', e.data); ws.onclose = () => { console.log('WS closed, manually reconnecting...'); setTimeout(() => new WebSocket('ws://localhost:3001'), 3000); // Manual reconnect };


* * *

### 💬 Deep Dive Q&A Simulation

**Q: You say SSE performance under HTTP/2 rivals WebSocket, can you explain why?**

> Under HTTP/1.1, one SSE connection occupies one TCP connection, and browsers have a 6 concurrent connection limit per domain, so opening multiple SSE streams simultaneously is restricted. Under HTTP/2, multiple SSE streams can multiplex a single TCP connection (multiplexing), eliminating this limit. One of WebSocket's original advantages was not being subject to HTTP connection limits, but the HTTP/2 + SSE combination is already very close in performance, while SSE is simpler operationally.

**Q: SSE can't customize request headers, authentication is an issue, how do you actually solve it?**

> Two approaches, I use the first in production: use `fetch` + `ReadableStream` instead of `EventSource`, allowing you to carry an `Authorization` header and also support POST requests (EventSource is GET only). The second is passing a token parameter in the URL, but the token appears in server access logs, which is less secure and not recommended.

* * *

## 10. The complete process of SSE streaming output, how data is handled at each step

### Core Answer

SSE is an HTTP-based unidirectional push protocol, each message in plain text format, ending with a blank line.

**Protocol Format:**

data: {"token": "Hello"}\n \n data: {"token": "World"}\n \n data: [DONE]\n \n


**Data Processing Pipeline:**

Uint8Array chunk → TextDecoder.decode(chunk, { stream: true }) ← Prevents Chinese character truncation → Append to buffer → Split by \n, leave incomplete last line in buffer → Filter out lines not starting with data: → JSON.parse(payload) → Extract token field → Append to UI


* * *

### 🔧 Demo: Minimal Runnable SSE Server + Client

**Server (Node.js, save as `sse-server.js`):**

// sse-server.js —— node sse-server.js const http = require('http');

http.createServer((req, res) => { if (req.url === '/stream') { res.writeHead(200, { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive', 'Access-Control-Allow-Origin': '*', });

const tokens = 'Hello World, this is a streaming text.'.split('');
let i = 0;

const timer = setInterval(() => {
  if (i >= tokens.length) {
    res.write('data: [DONE]\n\n');
    clearInterval(timer);
    res.end();
    return;
  }
  res.write(`data: ${JSON.stringify({ token: tokens[i++] })}\n\n`);
}, 100);

req.on('close', () => clearInterval(timer));

} else { // Return client HTML res.writeHead(200, { 'Content-Type': 'text/html; charset=utf-8' }); res.end(`

`); } }).listen(3000, () => console.log('Open http://localhost:3000')); ```
# Run
node sse-server.js
# Open browser and visit http://localhost:3000
# You can see text appearing character by character in a stream

💬 Deep Dive Q&A Simulation

Q: Can you clearly explain why TextDecoder's stream: true is necessary?

UTF-8 is a variable-length encoding; Chinese characters occupy 3 bytes. When a server chunk happens to cut off in the middle of these 3 bytes—say the first 2 bytes are in this chunk, the 3rd byte in the next chunk—without stream: true, TextDecoder will attempt to decode the incomplete byte sequence at the end of the current chunk, outputting garbled characters (usually ? or <U+FFFD>). With stream: true, TextDecoder knows more data is coming, caches the incomplete bytes, and decodes them together when the next chunk arrives.

Q: What's the logic behind keeping the last line in the buffer, why not just process line by line?

Because the chunks read by response.body.getReader() are arbitrary-sized byte fragments, not guaranteed to align with newline characters. An SSE message data: {"token":"A"}\n\n could be split into two chunks: data: {"token":"A"} and \n\n. If you process each chunk directly, you'd try to parse an incomplete line, and JSON.parse would fail. So you append the chunk to a buffer, split by \n, and keep the potentially incomplete last segment to be completed with the next chunk before processing.

Q: How does Last-Event-ID resumption work when SSE disconnects?

The server adds id: 42\n to each message, and the browser's EventSource remembers the last received ID. On reconnection, the browser automatically sends the Last-Event-ID: 42 request header. The server, upon receiving it, starts pushing messages from after ID 42, requiring no extra client-side logic. With fetch + ReadableStream, this mechanism needs manual implementation: record the last event id and include it in the request header upon reconnection.


11. In SSE streaming scenarios, how to determine a Markdown table is "complete" before rendering

Core Answer

Core strategy: Line-level state machine + pending buffer + lazy rendering.

Table end signal: Receiving the first line that does not start with | (or stream end).


🔧 Demo: Minimal Runnable Streaming Markdown Renderer

// streaming-md.ts  ——  npx ts-node streaming-md.ts
type Block = { type: 'text'; content: string } | { type: 'table'; lines: string[] };

class StreamingMarkdownParser {
  private buffer: string[] = [];
  private inTable = false;
  private pending = '';
  private blocks: Block[] = [];

  feed(chunk: string) {
    const lines = (this.pending + chunk).split('\n');
    this.pending = lines.pop()!;
    for (const line of lines) this.processLine(line);
  }

  private processLine(line: string) {
    if (/^\s*|/.test(line)) {
      this.inTable = true;
      this.buffer.push(line);
    } else {
      if (this.inTable) this.flushTable();
      if (line.trim()) this.blocks.push({ type: 'text', content: line });
    }
  }

  private flushTable() {
    const hasSep = this.buffer.some(l => /^|\s*[-:|]+\s*|/.test(l));
    if (hasSep) {
      this.blocks.push({ type: 'table', lines: [...this.buffer] });
    } else {
      // Invalid table structure, downgrade to plain text
      this.blocks.push(...this.buffer.map(l => ({ type: 'text' as const, content: l })));
    }
    this.buffer = [];
    this.inTable = false;
  }

  finish(): Block[] {
    if (this.pending) this.processLine(this.pending);
    if (this.inTable) this.flushTable();
    return this.blocks;
  }
}

// Simulate SSE streaming input (feed one token at a time)
const parser = new StreamingMarkdownParser();
const fakeStream = [
  'Normal text\n',
  '| ColA | ColB |\n',
  '| --- | --- |\n',
  '| Data1 | Da',  // ← Cut off mid-line
  'ta2 |\n',
  '\n',           // ← Triggers table end
  'More text\n',
];

for (const chunk of fakeStream) {
  parser.feed(chunk);
}

const blocks = parser.finish();
console.log(JSON.stringify(blocks, null, 2));
// Output: [
//   { type: 'text', content: 'Normal text' },
//   { type: 'table', lines: ['| ColA | ColB |', '| --- | --- |', '| Data1 | Data2 |'] },
//   { type: 'text', content: 'More text' }
// ]

💬 Deep Dive Q&A Simulation

Q: What if the AI's last output is a table, with no subsequent text, what happens to your state machine?

This is a classic off-by-one problem. At stream end, inTable is still true, the buffer has data but hasn't received a non-table line to trigger flushTable. So you must force finish() at stream end, which performs the final flushTable. The close event of EventSource, or done === true of a fetch stream, must trigger this finish(), otherwise the last table stays forever in the buffer, unrendered.

Q: If there's a | character inside a code block (``` wrapped content), would it be mistakenly identified as a table row?

Yes! That's a flaw in this simplified implementation. A complete implementation needs to first check if currently inside a code block (tracking inCodeBlock state, toggling on encountering ```), and skip table detection for lines inside code blocks, directly buffering them. In real projects, I wouldn't hand-write this; I'd use the streaming parsing interfaces of marked or markdown-it, which already handle these edge cases.


12. What experience do you have using SDD (Spec-Driven Development) in frontend AI development

Core Answer

SDD = Write the Spec first, then let AI implement according to the Spec.

Core Process:

Requirement Understanding → Write Spec (Features + Boundaries + Interface Contracts + What Not To Do)
  → Review Spec → AI Implements per Spec → Verify against Spec

Key Experience: "What not to do" in the Spec is more important than "what to do"; Spec is the acceptance criteria; Spec is the context alignment tool for multi-Agent.

In one sentence: SDD converts "communication cost with AI" into "cost of writing specs." Worthwhile for deterministic requirements, not necessarily for fuzzy exploration.


🔧 Demo: A Complete Spec Document Example

<!-- specs/search-box.md -->

# SearchBox Component Spec

## Feature Description
A search input box with debounce, triggering a search callback upon user input.

## Interface Contract
```ts
interface SearchBoxProps {
  onSearch: (keyword: string) => void;  // Search trigger callback
  placeholder?: string;                  // Default: "Search..."
  debounceMs?: number;                   // Default: 300
  disabled?: boolean;
}

Behavior Specification

Out of Scope

File Scope

Only modify the following files, do not change any other files:

Acceptance Criteria

Check against the behavior specification above item by item; all checkboxes passed means complete.


Usage: Pass the spec to Claude Code

claude "Please implement the SearchBox component according to the spec in specs/search-box.md, after implementation, self-check against the spec's checkboxes item by item"


* * *

### 💬 Deep Dive Q&A Simulation

**Q: After writing the Spec and having AI implement it, what if you discover the Spec itself has issues?**

> This is the most common problem in SDD. If you find a Spec issue, fix the Spec first, then have AI re-implement according to the new Spec. Don't just tell AI "the requirement changed" and continue modifying in the same session; that way, the Spec and code will increasingly diverge. The cost of fixing the Spec is worth it because the Spec is the baseline for subsequent acceptance and future requirement changes; the cost of not maintaining it is higher.

**Q: If requirements are vague and you can't write a clear Spec, is SDD still suitable?**

> Not suitable. When requirements are vague, it's better to do a spike (technical exploration) first, run a POC to clarify the technical direction. At this stage, "let AI give me a runnable version to see the effect" is more efficient. Once the POC is done and the direction is confirmed, then write the Spec and enter the SDD process. SDD suits "I know what to do, but implementation is tedious" scenarios, not "I don't know what to do" exploration phases.

**Q: Someone on the team is unwilling to write Specs, saying the time spent writing Specs is better spent letting AI just do it. How do you persuade them?**

> I'd first acknowledge their point—for simple requirements, that's indeed true, no need to SDD for SDD's sake. But if a feature requires 3 rounds of back-and-forth changes, and each time AI misunderstands, the cost of those 3 rounds already exceeds the cost of writing one Spec. More importantly, the Spec isn't just for AI; it's also a tool for aligning with product, design, and backend. If after writing the Spec everyone has no objections, understanding is aligned; if disagreements are found, early discovery and resolution is much cheaper than rework after code is written.

* * *

* * *

# Coding Agent Project Deep Dive Questions

* * *

## 13. What are the special characteristics of Claude Code and Codex respectively

**Answer:**

| Dimension | Claude Code | Codex (OpenAI) |
| ------ | ----------------------------- | ----------------------- |
| Access Method | CLI tool, used directly in terminal | API / GitHub Copilot backend |
| Context Management | CLAUDE.md auto-injection, hooks, skill system | Mainly relies on system prompt |
| Instruction Following | Precise, rarely oversteps with clear constraints | Flexible, sometimes "conveniently" extends |
| Code Review Depth | Execution-focused, fast implementation | High reasoning depth, stronger critical analysis |
| Tool Ecosystem | MCP protocol, extensible tools | Function Calling, broader ecosystem |
| Multi-file Operations | Native support, can track dependencies across files | Requires manual file scope control |
| Suitable Scenarios | Main implementation, complex multi-file tasks | Independent Review, critical analysis |

**Core Difference:** Claude Code is "execution-oriented," fast but shallow thinking; Codex is "review-oriented," slow but deep. Combining the two—Claude Code for implementation, Codex for review—forms checks and balances.

### 💬 Deep Dive Q&A

**Q: You say Codex has higher reasoning depth, can you give a specific example illustrating this difference?**

> Once, I had both models review the same concurrency control code. Claude Code's review mainly focused on style and naming issues. Codex caught a race condition: two async operations both read the same state variable, leading to data inconsistency under specific timing. Codex is better at catching problems requiring reasoning across multiple execution paths because it leans towards deep analysis rather than quick response.

* * *

## 14. What parts does the Prompt input to the model consist of, which must be injected, which are optional

**Answer:**

A complete LLM prompt typically consists of the following layers:

┌─────────────────────────────────────────┐ │ System Prompt │ ← Mandatory, defines role/capability/constraints ├─────────────────────────────────────────┤ │ Long-term Memory │ ← On-demand, user preferences/project norms ├─────────────────────────────────────────┤ │ Skill / Tool Definitions │ ← On-demand, tools needed for current task ├─────────────────────────────────────────┤ │ Conversation History │ ← On-demand, compressible ├─────────────────────────────────────────┤ │ Retrieved Context (RAG recall content) │ ← On-demand, relevant document snippets ├─────────────────────────────────────────┤ │ User Message (current user input) │ ← Mandatory └─────────────────────────────────────────┘


**Must Inject:**

- **System Prompt**: Defines model behavior boundaries; without it, the model easily drifts
- **User Message**: Current task input; without it, there's nothing to respond to

**On-Demand Injection (Dynamic Decision):**

- **Long-term Memory**: Only inject when memory is relevant to the current task, avoiding irrelevant information interference
- **Tool Definitions**: Inject only the tools needed for the current task, not all (reduces token consumption, lowers hallucinations)
- **Conversation History**: Compressible; replace raw dialogue with summary when exceeding threshold
- **RAG Recall**: Retrieve based on current query similarity, don't inject the entire knowledge base

**Design Principle:** Minimum Necessary Context. More precise injection leads to higher output quality and lower token cost.

### 💬 Deep Dive Q&A

**Q: Why not inject all tool definitions, only the currently needed ones?**

> Two reasons. First, token cost; one tool definition can be hundreds of tokens, a dozen tools is thousands of tokens, wasted on every call. Second, the model's "attention" issue when selecting tools—the more tools, the higher the probability of the model choosing the wrong tool, called tool selection confusion. Injecting only the tools truly needed for the current task allows the model to select more accurately.

**Q: Long-term memory is injected on-demand, how do you judge "on-demand," what method determines whether to inject?**

> Semantic similarity retrieval. Store long-term memory entries as vectors. Each time a new task comes, embed the user message, perform cosine similarity retrieval against the memory store, and only inject memory entries exceeding a threshold (e.g., 0.75). This avoids full injection while automatically recalling relevant items.

* * *

## 15. What is the difference between the Coding Agent you built and Claude Code / Codex

**Answer:** (Replace with your own project details, below is a reference framework)

| Dimension | Claude Code / Codex | Your Coding Agent |
| ---------- | -------------------- | -------------------- |
| Customization | General, requires prompt customization | Deeply customized for specific scenarios/team norms |
| Memory System | Basically no persistent memory | Layered long/short-term memory, persistent across sessions |
| Skill System | Relies on slash commands / conventions | Custom skill layering, auto-matching |
| Context Management | Static injection like CLAUDE.md | Dynamic prompt, assembled by task type |
| Multi-Agent Orchestration | Primarily single Agent | Multi-Agent pipeline, task distribution |
| Cost Control | No fine-grained control | Token budget management, compression strategies |
| Knowledge Integration | Relies on file context | RAG integration, dynamic knowledge base recall |

**Core Value of Your Agent:** Deep integration with team/personal workflows, persistent context and memory, maintaining consistency across sessions. Claude Code starts fresh each conversation, not remembering what was done last time.

### 💬 Deep Dive Q&A

**Q: Between the Agent you built yourself and directly using Claude Code, which do you use more daily?**

> Actually, I use both, for different scenarios. Claude Code for temporary, one-off tasks, like quickly refactoring a function or generating a test file; my own Agent for long-term tasks requiring cross-session accumulated context, like a two-week feature development, where it remembers previous decisions and norms without needing to re-introduce the project background each time.

* * *

## 16. How is context compression done, what is the three-layer compression strategy

**Answer:**

The goal of context compression is to retain the most critical information within a limited token window. Three-layer strategy:

**Layer 1: Message-Level Pruning (Lightweight)**

- Trigger: Number of historical messages exceeds a threshold (e.g., 20)
- Method: Sliding window, keep only the most recent N dialogue turns, discard the earliest
- Preserve: System prompt is never pruned; the most recent user-assistant pairs

**Layer 2: Summary Compression (Medium)**

- Trigger: Total tokens exceed 60% of the context window
- Method: Call a model to compress the first half of the dialogue into a summary ("Previously we did X, decided Y, current state is Z")
- Summary injected as a system message, replacing the original dialogue

**Layer 3: Key Fact Extraction (Heavy)**

- Trigger: Long-term session (exceeding N hours or M dialogue turns)
- Method: Extract key decisions, code changes, conventions into long-term memory, completely clear short-term dialogue history
- Next session rebuilds context from long-term memory

Token Usage: ░░░░░░░░░░ 0% ────────── 60% → Trigger Layer 2 Summary ──────────── 80% → Trigger Layer 3 Extraction ─────────────── 100% → Truncation (worst case)


### 💬 Deep Dive Q&A

**Q: Why the 60% threshold, not 80% or compress when full?**

> Two reasons. First, compression itself consumes tokens (calling a model to generate a summary). If you wait until 90% to compress, the remaining space after compression might be insufficient to continue the conversation. Second, summary quality correlates with remaining context amount; the fuller the context, the more dispersed the model's attention during processing, the worse the summary quality. 60% is an empirical value, leaving enough "breathing room" for the compression operation itself.

**Q: Is the summary generated by the same main model, or a lightweight model?**

> Summary generation uses a lightweight model (like Claude Haiku or GPT-4o-mini), because: ① Summary tasks don't require strong reasoning, lightweight models suffice ② Cost difference is significant, Haiku is over 10x cheaper than Sonnet ③ Speed is fast, doesn't block the main flow. The main model only does core tasks; summarization is an auxiliary operation.

* * *

## 17. How to detect and handle unsatisfactory results due to over-compression

**Answer:**

**How to Detect Over-Compression:**

1.  **Output Quality Monitoring**: Maintain a quality scoring mechanism, comparing output quality for the same task before and after compression (can use another model as judge)
1.  **User Feedback Signals**: User frequently says "You just said... you forgot," "This was already decided"—indicating key information was compressed away
1.  **Context Consistency Check**: After each dialogue turn, have the model output a "current task state summary" and compare with the previous turn for information loss
1.  **Obvious Errors**: Model starts repeating completed work, or ignoring known constraints

**Handling Strategy:**

Over-compression Detected ├── Short-term: Recover key facts from long-term memory, re-inject ├── Mid-term: Reduce compression aggressiveness (raise trigger threshold, increase retained information) └── Long-term: Improve key fact extraction algorithm, ensure important decisions aren't pruned


// Compression Quality Check (pseudocode) async function checkCompressionQuality( originalCtx: string, compressedCtx: string ): Promise { const score = await model.evaluate(` Original Context: ${originalCtx} Compressed: ${compressedCtx}

Evaluate compression quality (0-10), focusing on:
1. Are key decisions preserved
2. Are important constraints preserved  
3. Is the current task state accurate

Output only the score

`); return parseFloat(score); }


* * *

## 18. When the Agent needs to further modify an existing task (adding new features), how does the system handle it

**Answer:**

This is essentially a "session recovery + incremental injection" problem.

**Process:**

User initiates new modification request ↓ Retrieve relevant context from long-term memory (previous task description, completed content, current code state) ↓ Rebuild prompt:


**Key: Information that needs re-injection**

- ✅ Summary of completed functionality (so the model knows not to redo)
- ✅ Current code files (latest version, not the version from historical dialogue)
- ✅ Existing constraints and norms (prevent new features from violating old conventions)
- ❌ Not needed: Historical debugging processes, resolved errors, abandoned approaches

### 💬 Deep Dive Q&A

**Q: New features have dependencies on old features, like needing to change an old feature's interface to support the new feature, how does the system handle it?**

> This is the most complex situation. The system needs to explicitly state the change points in the prompt: "Existing interface X needs to change from A to B, here are the affected callers: [list], please modify them all." The key is to use tools to scan the entire codebase to find all callers, not rely on the model's memory—the model's memory of code is unreliable, files must be read in real-time. This is also why an Agent needs file system access tools, not just dialogue.

* * *

## 19. The tool calling process, can skills replace tools

**Answer:**

**Tool Calling Process (Function Calling):**

User Input ↓ Model generates response (containing tool_use block): { "name": "read_file", "input": { "path": "src/api.ts" } } ↓ System executes tool, gets result ↓ Injects tool result as tool_result into next round ↓ Model continues generating based on result (may call tools again) ↓ Model outputs final text response (no tool_use, loop ends)


**Can Skills Replace Tools?** Cannot fully replace, the two are fundamentally different:

| Dimension | Tool | Skill |
| ---- | ------------------ | ---------------------- |
| Essence | Code execution, has actual side effects | Prompt template, pure text injection |
| Capability | Read files, write files, call APIs, run commands | Provide task guidance, norm constraints, thinking frameworks |
| Side Effects | Yes (changes file system, calls external services) | None (only affects model's thinking direction) |
| Replacement Relationship | Tools can do what skills cannot | Skills can reduce repetitive prompt writing |

**Conclusion:** Skills complement tools, not replace them. Capabilities requiring interaction with external systems (reading/writing files, executing commands, calling APIs) must be tools; while high-frequency prompt patterns and norm injection can be encapsulated with skills.

* * *

## 20. For complex tasks, how does the Coding Agent's Plan work

**Answer:**

The core of complex task Planning is breaking "one big problem" into "multiple executable sub-tasks" and determining dependencies.

**Planning Process:**

Receive complex task ↓ Phase 1 - Clarification Have the model list unclear points, user confirms before planning ↓ Phase 2 - Decomposition Output structured Plan: { "tasks": [ { "id": "T1", "description": "...", "files": [...], "deps": [] }, { "id": "T2", "description": "...", "files": [...], "deps": ["T1"] }, { "id": "T3", "description": "...", "files": [...], "deps": [] }, // Can run parallel with T2 ] } ↓ Phase 3 - Orchestration Execute in parallel/serial based on dependencies; tasks without dependencies run in parallel ↓ Phase 4 - Integration After sub-tasks complete, overall review of interface consistency


### 💬 Deep Dive Q&A

**Q: How are dependencies in the Plan determined, can the model accurately identify them?**

> The model's judgment of dependencies sometimes misses things, especially implicit dependencies (like two tasks both needing to modify the same shared type definition file). My approach is, after the model generates the Plan, to use a tool to scan the file list involved in each sub-task, automatically checking for file overlaps. Overlapping tasks are marked as dependent and cannot run in parallel. Tool checks are more reliable than model judgment.

* * *

## 21. How is multi-Agent orchestration specifically done, what are the differences between each sub-Agent, why this design

**Answer:**

**Orchestration Architecture:**

Orchestrator Agent (Parent) ├── Receives user requests ├── Generates Plan, distributes sub-tasks ├── Aggregates sub-Agent results └── Handles conflicts and integration

Worker Agents (Children, divided by role) ├── Implementer Agent —— Writes code, focuses only on implementation ├── Reviewer Agent —— Reviews code, critical perspective ├── Tester Agent —— Writes tests, focuses on coverage and boundaries └── Documenter Agent —— Generates documentation, focuses on readability


**Differences Between Each Sub-Agent:**

- **System prompt differs**: Reviewer's system prompt emphasizes "critical, find problems," Implementer's emphasizes "precise execution, conform to norms"
- **Toolset differs**: Implementer has write file permissions, Reviewer only has read permissions (prevents accidentally modifying code during review)
- **Context differs**: Only given files relevant to the current sub-task, not sharing other sub-Agents' work processes

**Why not let all sub-Agents share tools?**

> Principle of least privilege. Reviewer doesn't need to write files; granting write permission introduces risk—it might directly modify during review, bypassing the Implementer's workflow. Tool permissions correspond to scope of responsibility; smaller responsibility, smaller permissions, smaller error surface.

* * *

# Coding Agent Project Detail Questions

* * *

## 22. The entire chain operation flow of the Coding Agent

**Answer:** (Reference framework, replace with actual project)

User Input (natural language task description) ↓ ① Intent Understanding + Skill Matching —— Embedding retrieval of relevant skills, inject corresponding prompt norms ↓ ② Long-term Memory Recall —— Vector retrieval, find historical memories related to current task, inject into context ↓ ③ Dynamic Prompt Assembly —— System prompt + long-term memory + skills + conversation history (compressed) + current task ↓ ④ Model Reasoning (Plan Phase) —— Output structured execution plan (sub-task list + dependencies) ↓ ⑤ Tool Execution Loop (ReAct Pattern) —— Read file → Analyze → Write file → Verify → Loop until complete ↓ ⑥ Sub-task Complete, Report Results ↓ ⑦ Update Long-term Memory (extract key decisions, change summaries) ↓ ⑧ Return to User, Wait for Next Input


* * *

## 23. How is the Skill layering system designed, why this design

**Answer:**

Three-layer structure:

Layer 3 —— Domain Skills (Most Specific) Example: react-component, api-design, test-writing Trigger: Specific task types

Layer 2 —— Norm Skills (Middle Layer) Example: code-style, commit-convention, pr-format Trigger: Automatically injected on every code write/commit

Layer 1 —— Base Skills (Most General) Example: project-context, team-preference Trigger: Injected in every conversation


**Why Layering:**

- Avoids unnecessary injection: Not every task needs all skills; on-demand injection saves tokens
- Flexible combination: A complex task can simultaneously trigger multiple layers of skills (base + norm + domain)
- Independent maintenance: Skills at each layer can be updated independently without affecting each other

* * *

## 24. How is user input matched with relevant Skills

**Answer:**

Semantic matching process:

User Input ↓ Text Embedding (Vectorization) ↓ Cosine similarity with each skill's description vector in the skill registry ↓ Candidate skills exceeding threshold (e.g., 0.7) ↓ Sort by score, take Top K (e.g., Top 3) ↓ Inject matched skill content


**Auxiliary Strategies:**

- Keyword Trigger: `/review` command directly triggers the review skill, bypassing semantic matching
- Forced Injection: Base layer skills bypass matching, injected every time
- Context Awareness: Skills already triggered in the current conversation are maintained for subsequent turns (avoids repeated retrieval)

* * *

## 25. Skill Sedimentation Mechanism

**Answer:**

Skills aren't just manually created; they can also be extracted and sedimented from conversations:

**Auto-Sedimentation Trigger Conditions:**

- User explicitly indicates a certain AI output is "great, do it this way from now on"
- The same type of prompt is reused more than N times
- User explicitly says "save this as a skill"

**Sedimentation Process:**

Identify sedimentation opportunity ↓ Extract high-frequency/high-quality prompt patterns ↓ Have model auto-generate skill document (name/description/content) ↓ User confirmation (or auto-save) ↓ Write to skill file, hot-update to registry


This way, the skill library grows continuously with usage, rather than being a one-time manual configuration.

* * *

## 26. How are long and short-term memory designed, static vs dynamic long-term memory

**Answer:**

**Memory Layering:**

Short-term Memory —— Current conversation's message history —— Stored in memory, disappears after session ends —— Triggers compression when exceeding token threshold

Long-term Memory —— Persistent across sessions, stored in vector database —— Divided into static and dynamic types


**Static Long-term Memory vs Dynamic Long-term Memory:**

| Dimension | Static Long-term Memory | Dynamic Long-term Memory |
| ---- | --------------------- | ------------------------------------------ |
| Content | User preferences, project norms, fixed conventions | New facts extracted after each conversation |
| Update Frequency | Low, manually maintained | High, automatically updated each dialogue turn |
| Example | "Use TypeScript, no any" | "User decided on 2026-06-01 to switch state management from Redux to Zustand" |
| Role | Provides stable background norms | Records dynamic decisions and progress |

**Why Distinguish the Two:**

If static things are updated every conversation, it introduces noise (AI might "overwrite" user-set norms). If dynamic things aren't automatically accumulated, the user has to re-introduce the background each time. Managing the two separately, with different write strategies and recall logic, addresses both.

* * *

## 27. How to handle rapid accumulation of too much long-term memory

**Answer:**

**Prevention Strategies:**

- Set a memory entry cap (e.g., 1000 entries), trigger consolidation when exceeded
- Deduplication: Merge memory entries with similarity > 0.9
- Set memory TTL (Time To Live), reduce weight or archive entries not recalled for over N days

**Periodic Consolidation (Memory Consolidation):**

Trigger: Memory entries exceed threshold OR scheduled (weekly) ↓ Cluster all memory entries (semantic clustering) ↓ Merge entries within the same cluster into one summary memory ↓ Demote original memory entries to "archive," not participating in daily recall ↓ Only keep summary memories as active memories


This is similar to the human memory "consolidation" process—details are forgotten, key facts are retained.

* * *

## 28. How does the large model decide whether to recall long-term memory

**Answer:**

Two stages:

**Stage 1: Vector Retrieval (Coarse Recall)**

Embed the user's current input, perform cosine similarity retrieval against the memory store, return Top-K most relevant memory candidates.

**Stage 2: Model Reranking (Fine Recall)**

Send the candidate memories and current query to the model, letting the model judge which memories are truly useful:

Below are memories potentially relevant to the current task (sorted by similarity): [Memory List]

Please judge which memories are helpful for the current task "User wants to implement a search feature," output the IDs of relevant memories, ignore irrelevant ones.


**Why Two Stages:** Pure vector retrieval is based on semantic similarity, sometimes recalling items that "look similar but are actually irrelevant." Having the model make the final judgment, combining semantic understanding to determine true relevance, yields higher accuracy.

* * *

## 29. Dynamic Prompt vs Static Prompt

**Answer:**

**Static Prompt:**

- Content is fixed, same for every conversation
- Example: Role definition in System Prompt, base norms
- Characteristics: Stable, predictable, unaffected by runtime state

**Dynamic Prompt:**

- Content is dynamically assembled at runtime based on context
- Components are conditionally injected/not injected
- Example:

function buildPrompt(context: Context): string { const parts = [ SYSTEM_BASE, // Static, always injected context.longTermMemory.join('\n'), // Dynamic, injected by relevance context.relevantSkills.map(s => s.content).join('\n'), // Dynamic, matched by task compressHistory(context.messages), // Dynamic, compressed when exceeding threshold context.ragResults ?? '', // Dynamic, injected only if retrieval results exist User's current task: ${context.userInput}, // Dynamic, different each time ]; return parts.filter(Boolean).join('\n\n'); }


**Design Principle:** Keep the static part stable, minimize dynamic part injection (only inject information truly needed for the current task), manage the two separately for easier debugging and iteration.

* * *

## 30. Model base selection, Token consumption and cost

**Answer:** (Reference framework, fill in numbers based on actual situation)

**Model Selection Considerations:**

| Task | Recommended Model | Reason |
| ------------ | -------------------------- | ----------- |
| Main Implementation (complex code tasks) | Claude Sonnet / GPT-4o | Strong reasoning, good instruction following |
| Summary/Compression (auxiliary tasks) | Claude Haiku / GPT-4o-mini | 10x cheaper, sufficient |
| Embedding | text-embedding-3-small | Good vector quality, low cost |
| Lightweight Judgment (routing/classification) | GPT-4o-mini / Haiku | Fast, cheap |

**Token Consumption Estimation:**

For a task writing 1000 lines of code:

- Input tokens (prompt + context + tool results): approx 50k–100k tokens
- Output tokens (code + explanation): approx 5k–20k tokens
- Total: approx 70k–120k tokens/task

**Cost Estimation (using Claude Sonnet as example):**

- Input: $3/1M tokens
- Output: $15/1M tokens
- One complex task: approx $0.3–0.8

**Why costs are this high:** Mainly due to context injection—each dialogue turn must carry complete skills, memory, history; the token count of this background information far exceeds the code itself. The optimization direction is refined context management, injecting only the truly necessary parts.

* * *

# RAG Project Deep Dive Questions

* * *

## 31. What are the data sources for RAG

**Answer:** (Replace with actual project, below is a reference framework)

Data sources typically fall into several categories:

| Source Type | Processing Method | Notes |
| --------------- | -------------------- | ------------------- |
| Internal Docs (PDF/Word) | Parse → Chunk → Vectorize | Complex formatting, need to handle tables/images |
| Web / Confluence | Crawl → HTML Cleaning → Chunk | Remove navigation, ads, duplicate content |
| Codebase | AST parsing or text chunking | Chunking by function/class is better than by line count |
| Database Structured Data | Convert to natural language descriptions then vectorize | Needs schema-aware description generation |
| API Docs | OpenAPI / Swagger parsing | Chunk by endpoint, with parameter descriptions |

**Data quality is the upper limit of RAG effectiveness**; garbage in, garbage out. Cleaning before ingestion is more important than model selection.

* * *

## 32. Why did you personally want to do a GraphRAG project

**Answer:** (Reference thought process)

Can be organized from "what problem was discovered → why existing solutions are insufficient → how GraphRAG solves it":

> While using ordinary RAG systems, I found that for questions requiring cross-document reasoning, the answer quality was very poor. For example, questions like "the background and impact of a certain technical decision" involve related information across multiple documents. Pure vector retrieval cannot find the full picture, resulting in one-sided answers. Pure vector RAG excels at "finding relevant paragraphs" but not at "understanding relationships between paragraphs." GraphRAG, by constructing a knowledge graph, gives retrieval multi-hop capabilities with graph structure, able to answer questions like "what is the relationship between A and B" that require reasoning. This is the core starting point for my project.

* * *

## 33. The difference between GraphRAG and pure vector retrieval, what problem does it solve

**Answer:**

**Limitations of Pure Vector Retrieval:**

- Each document chunk is independently vectorized, retrieval based on semantic similarity
- Excels at "where is this passage" (single-hop retrieval)
- Cannot handle questions requiring reasoning across multiple documents/paragraphs (multi-hop reasoning)

**GraphRAG's Approach:**

- On top of vector retrieval, build a knowledge graph: Entities, Relations, Communities
- During retrieval, not only find similar chunks but also traverse along the knowledge graph's edges for multi-hop

**Problems Solved:**

| Problem Type | Pure Vector | GraphRAG |
| ----------------- | --- | -------- |
| Direct Q&A (What is concept X) | ✅ | ✅ |
| Multi-hop Reasoning (Relationship between A and B) | ❌ | ✅ |
| Global Summarization (Themes of entire doc base) | ❌ | ✅ (Community Summaries) |
| Reasoning Questions (Infer conclusion from multiple facts) | ❌ | ✅ |

**What counts as a reasoning question (example):**

Knowledge base contains:

Reasoning question: How does Zhang San's familiarity with Technology Y affect Project X? → Requires reasoning across A/B/C three documents, very difficult for pure vector retrieval


### 💬 Deep Dive Q&A

**Q: The cost of building a knowledge graph for GraphRAG is high, how do you control this cost?**

> Two strategies. First, not all content is graphed; only core knowledge (documents rich in entities and complex relationships) is graphed, ordinary Q&A still goes through vector retrieval. Second, graph construction is offline and asynchronous, running in the background upon ingestion, not affecting real-time response. Entity and relation extraction uses lightweight models (Haiku/mini), only community summary generation uses strong models, controlling cost.

**Q: How are relations in the Graph extracted, what accuracy can be achieved?**

> LLM does relation extraction, given entity pairs and context, letting the model judge the relation type and direction. Accuracy is roughly around 80-85%, there is noise. The key is to have a denoising mechanism: the same relation appearing multiple times (mentioned in multiple documents) is confirmed for graph entry; appearing once might be a mis-extraction, not entered or given lower confidence.

* * *

## 34. How to handle various forms of content in PDFs

**Answer:**

PDF Content Forms:

| Content Type | Processing Method |
| ----------- | -------------------------------------------------- |
| Plain Text | `pdfplumber` / `pymupdf` direct extraction |
| Tables | `pdfplumber` extracts table structure, convert to Markdown / CSV |
| Images | Use vision models (GPT-4o / Claude) for Image Captioning |
| Formulas | MathPix API or `nougat` (specialized model for academic PDF formulas) to LaTeX |
| Scanned Docs (Image PDFs) | OCR (Tesseract / PaddleOCR / Azure Form Recognizer) |
| Mixed Layout (Multi-column) | `pdfplumber` sorts by coordinates, restores reading order |

**Processing Pipeline:**

PDF Input → Determine if scanned (has extractable text layer) → Normal PDF: pymupdf extracts text + tables → Scanned: OCR to text → Detect image blocks → Vision model generates descriptions → Detect tables → Convert to structured format → Chunk → Vectorize → Ingest


* * *

## 35. Pre-ingestion handling of data redundancy and irrelevant information

**Answer:**

**Cleaning Strategy (Pipeline):**

1.  **Deduplication**: Document-level MD5 hash dedup, preventing the same document from being ingested multiple times; chunk-level semantic dedup (chunks with cosine similarity > 0.95 keep only one)
1.  **Noise Filtering**: Remove headers/footers (usually at fixed coordinates), watermark text, navigation menu text
1.  **Quality Filtering**: Overly short chunks (< 50 chars), garbled text detection (illegal character ratio > X%) directly discarded
1.  **Relevance Filtering**: Use a lightweight classification model to judge if a chunk belongs to the target knowledge domain; irrelevant ones are not ingested
1.  **Structure Preservation**: Preserve heading hierarchy (H1/H2/H3), include headings as context prefix when chunking

* * *

## 36. How is knowledge extraction done

**Answer:**

Knowledge Extraction = Extracting structured knowledge (entities, relations, attributes) from unstructured text.

**Process:**

Document Chunk → Named Entity Recognition (NER): Identify persons/places/organizations/products/concepts → Relation Extraction (RE): Determine relation types and direction between entities → Attribute Extraction: Extract entity attribute values ("Zhang San's position is CTO") → Knowledge Fusion: Merge different mentions of the same entity ("AI" = "Artificial Intelligence") → Write to Knowledge Graph


**Implementation Methods:**

- Small scale: LLM directly (specify extraction format in prompt, output JSON)
- Large scale: Use lightweight NER model to identify entities first, then use LLM for relation judgment (reduces cost)

* * *

## 37. How is incremental updating done

**Answer:**

**Core Problem:** New document ingestion cannot trigger a full re-index, otherwise costs explode.

**Strategy:**

New Document Arrives ↓ Hash Comparison: New doc vs ingested docs ├── Identical → Skip ├── New Document (no record) → Normal processing pipeline ingestion └── Existing Document Updated → Differential Update: Only re-vectorize changed paragraphs Delete old version chunks, insert new version chunks Update affected nodes and edges in the knowledge graph


**Implementation Key Points:**

- Each chunk records source document ID + document version number
- When deleting a document, batch delete all chunks by document ID
- Knowledge graph nodes carry source_doc_id, synchronously update related nodes when document is updated

* * *

## 38. Document conflict handling, version control, erroneous document rollback

**Answer:**

**Document Conflict Handling:**

Conflict Detected (same fact described differently in different documents) ↓ Record conflict, do not auto-resolve ↓ During retrieval:


**Version Control:**

- Each document records upon ingestion: Document ID, version number, ingestion time, operator
- Retain snapshots of the most recent N versions (vectors + original text)
- New version ingestion does not immediately delete old version; there's a validation period (e.g., 24 hours) before cleanup

**Erroneous Document Rollback:**

Conceptual command

knowledge-base rollback
--doc-id "doc_xxx"
--to-version "v2" # Rollback to specified version

Underlying operations:

1. Delete all chunks of the current version (by doc_id + version)

2. Restore old version's chunks to the vector database

3. Update knowledge graph (delete new version nodes, restore old version nodes)

4. Log the rollback operation


* * *

# Fine-tuning Project Questions

* * *

## 39. Why do a Fine-tuning project, what was the starting point

**Answer:** (Reference framework, replace with actual project)

Can be answered from three layers: "API call limitations → What Fine-tuning solves → Personal learning value":

> There were two starting points. First, the engineering level: directly calling closed-source APIs has data privacy issues (code/internal docs sent to third parties), uncontrollable latency, and costs that scale linearly with usage. Fine-tuning open-source models allows local deployment, solving these problems. Second, the technical learning level: understanding the model training process helps with building Agents, doing RAG, and prompt engineering—knowing "why the model behaves this way," not just "how to use it." Llama3 was chosen because it has the strongest overall capability among open-source models, with a mature community toolchain (Unsloth, LLaMA-Factory), lowering the barrier to entry.

* * *

## 40. What are SFT, DPO, GRPO, and what are the differences

**Answer:**

**SFT (Supervised Fine-Tuning)**

Uses labeled (question, answer) pairs to directly train the model, maximizing the log probability of the correct answer.

Training Data: (prompt, chosen_response) pairs Goal: Teach the model to generate outputs like chosen_response given a prompt Pros: Simple and direct, data is easy to obtain Cons: Only learns "what is correct," not "why the wrong one is bad"


**DPO (Direct Preference Optimization)**

Uses preference pairs (good answer vs bad answer for the same prompt) for training, increasing the probability of the good answer and decreasing the probability of the bad answer.

Training Data: (prompt, chosen_response, rejected_response) triples Goal: Maximize the probability gap between chosen and rejected Pros: More stable than RLHF (no need to train a separate reward model) Cons: Requires human-annotated preference data


**GRPO (Group Relative Policy Optimization)**

Samples multiple responses for the same prompt, using the relative quality within the group as the reward signal, without needing a reference model. DeepSeek-R1 training used this method.

Training Data: (prompt, [response_1, response_2, ..., response_n]) Reward: Each response's score relative to the group's average quality Pros: No reference model needed, more memory efficient; naturally suits verifiable rewards (math/code) Cons: Reward function design is complex, hard to apply to non-verifiable tasks


| Dimension | SFT | DPO | GRPO |
| --------------- | ------------------ | -------------------------- | ----------------------- |
| Data Format | (prompt, response) | (prompt, chosen, rejected) | (prompt, [multiple responses]) |
| Needs Reward Model | No | No | No (built-in reward function) |
| Training Stability | Highest | High | Medium |
| Suitable Scenarios | Format alignment, knowledge injection | Style preference, safety alignment | Reasoning, math, code (verifiable tasks) |

* * *

## 41. Why based on Llama3 architecture, dataset size, evaluation metrics

**Answer:** (Reference framework, replace with actual project)

**Why Llama3:**

- Open-source, weights commercially usable (Llama3 Community License)
- Mature architecture, good community support (training frameworks like Unsloth, LLaMA-Factory are well-developed)
- At the same parameter count, Llama3's performance is among the top open-source models
- Chinese support: Llama3's vocabulary includes Chinese tokens (much better than Llama2)

**Dataset Size Reference:**

- SFT: Typically 1k–100k entries, high quality > large quantity of low-quality data
- DPO: Several thousand to tens of thousands of preference pairs
- Specific Domain: 500–5000 high-quality domain data entries can significantly change model behavior

**Common Evaluation Metrics:**

| Task Type | Metrics |
| ---- | --------------------- |
| Code Generation | HumanEval Pass@1, MBPP |
| Math Reasoning | GSM8K, MATH |
| Chinese Understanding | C-Eval, CMMLU |
| Dialogue Quality | MT-Bench (GPT-4 judged) |
| Domain Specific | Self-built test set, manual scoring |

* * *

# Other Questions

* * *

## 42. What are your advantages compared to others

**Reference Thought Process (organize based on actual situation):**

Answer from three levels:

**Deep Practice Level:** Not just using AI tools, but deeply involved in the full chain from tool usage to system construction—self-built Coding Agent, GraphRAG, done fine-tuning, possessing end-to-end engineering experience, not just staying at the prompt engineering level.

**Engineering Mindset Level:** Engineering AI capabilities—skill systems, context governance, multi-Agent orchestration, cost control. These are all engineering problems that need solving to turn AI from an "experiment" into a "production-usable system," with practical implementation experience.

**Continuous Learning Level:** Tracking the latest developments (SDD, GRPO, GraphRAG), and able to quickly apply new technologies to actual projects, not just understanding concepts.

* * *

## 43. Counter-question: AI is developing so fast, where is the biggest empowerment point for business

**Reference Answer:**

> I believe the biggest empowerment point isn't how strong the model itself is, but that **AI drastically lowers the execution cost of knowledge work**. A senior engineer making an architectural decision takes 2 hours, with AI assistance maybe 30 minutes; a product manager writing a PRD takes a day, with AI assistance maybe half a day. This "execution cost reduction" allows teams to try more directions and iterate faster. For business, the biggest empowerment is **lowering the cost of trial and error, increasing iteration speed**, not making any single thing better.

* * *

## 44. Counter-question: With base models iterating so fast, are Agents still necessary

**Reference Answer:**

> I think yes, and increasingly so. Stronger base models solve "how complex a single dialogue can be"; Agents solve "how to organize multiple interactions, multiple tools, and multi-step processes into a reliable system." These are two different dimensions. No matter how strong the base model, it can't automatically decide when to call what tool, how to manage long-term context, or how to recover from errors—these are problems Agent architecture needs to solve. In fact, the stronger the base model, the more complex the tasks an Agent can handle; the two are mutually amplifying, not mutually replacing.

* * *

*Compiled based on the technical landscape as of June 2026. Replacing the Demo examples with your own project experience yields better results in interviews.*