跪拜 Guibai
← All articles
Agent · Frontend

The AI Developer's Interview Playbook: Multi-Model Workflows, Context Governance, and Agent Architecture

By 一tiao咸鱼 ·
Read original on juejin.cn ↗ Google Translate ↗ Alt translation

The gap between demo-quality AI coding and production-grade AI engineering is wide, and this material catalogs the specific techniques—context isolation, skill modularization, memory consolidation, streaming parsers—that bridge it. Developers who treat AI as a tool to be architected around, rather than a chatbot to prompt, will ship faster and break fewer things.

Summary

A comprehensive interview-preparation resource maps the terrain of modern AI-assisted development across 44 questions with three layers each: a core answer, a runnable code demo, and a simulated deep-dive dialogue. The material spans practical model selection—using Claude for implementation, GPT for review, and Gemini for long-context ingestion—alongside engineering concerns like context pollution, sub-agent isolation, and compression strategies.

A full section dissects SSE streaming, from UTF-8 byte-boundary handling with `TextDecoder` to state-machine parsing of incomplete Markdown tables. Later sections detail the architecture of a custom Coding Agent, including dynamic prompt assembly, a three-layer skill system with hot-reloading, long/short-term memory design, and multi-agent orchestration with role-based tool permissions. RAG and GraphRAG coverage explains when vector retrieval falls short and how knowledge graphs enable multi-hop reasoning, while fine-tuning sections contrast SFT, DPO, and GRPO.

The document closes with meta-questions on personal advantage and the enduring value of Agents even as base models improve, arguing that stronger models and Agent architectures amplify each other rather than compete.

Takeaways
Claude Sonnet handles precise implementation, GPT-4o performs independent critical review, and Gemini 2.0 Flash ingests ultra-long contexts—each model gets a distinct role in a multi-model pipeline.
Context pollution manifests as AI 'holding grudges' against abandoned approaches; the fix is starting fresh sessions and pasting only current valid code and goals.
Sub-agent isolation requires launching independent Agent instances via a Task tool, with the parent receiving only execution summaries, never intermediate reasoning.
A three-layer compression strategy triggers message-level pruning at 20 turns, summary compression at 60% token usage, and key-fact extraction into long-term memory for sessions exceeding hours or hundreds of turns.
SSE streaming parsers must use `TextDecoder` with `stream: true` to prevent UTF-8 multi-byte character truncation, and must buffer incomplete lines across chunk boundaries.
Markdown table rendering in streams requires a line-level state machine that buffers table rows until a non-pipe line or stream end triggers a flush.
Spec-Driven Development converts AI communication cost into spec-writing cost; the 'out of scope' section matters more than the feature list.
Skill systems use semantic embedding matching with a cosine-similarity threshold, plus keyword triggers for slash commands and forced injection for base-layer norms.
Long-term memory splits into static (user preferences, fixed conventions) and dynamic (decisions, progress), each with different write strategies and TTL policies.
GraphRAG enables multi-hop reasoning across documents that pure vector RAG cannot handle, but graph construction costs are controlled by limiting it to entity-rich core knowledge and using lightweight models for extraction.
Fine-tuning approaches differ sharply: SFT teaches correct answers, DPO contrasts good vs bad responses, and GRPO uses group-relative scoring without a reference model—suited for verifiable tasks like math and code.
Tool permissions in multi-agent systems follow least privilege: Implementers get write access, Reviewers get read-only, preventing accidental modifications during review.
Conclusions

Multi-model workflows succeed or fail on context isolation—not on model capability. The practice of copying only final code artifacts into a fresh review session, with no implementation history, is the single highest-leverage habit.

The 60% compression threshold exists not because models break at 61%, but because compression itself consumes tokens and degrades in quality as context fills. It is a buffer for the compression operation, not for the conversation.

Skill hot-reloading via `fs.watch` is unreliable on macOS; production systems need `chokidar` with platform-specific APIs and a polling fallback. This is a known filesystem limitation that most demos ignore.

AI-written unit tests default to happy-path coverage unless the prompt explicitly demands boundary values, error cases, and async failure modes. The quality ceiling is set by the specificity of the test specification, not the model.

Atomic commits during AI-assisted development are not just good practice—they are the only reliable way to detect when an AI has modified files outside its assigned scope. `git diff --stat` after every AI change is a non-negotiable checkpoint.

Spec-Driven Development fails when requirements are fuzzy; it suits 'I know what to build but implementation is tedious,' not 'I don't know what to build.' Forcing SDD onto exploration phases multiplies cost.

The distinction between static and dynamic long-term memory solves a real failure mode: if static preferences are updated every session, the AI can overwrite user-set norms. Separate write strategies prevent this.

GraphRAG's multi-hop reasoning advantage is real but narrow—it matters for questions like 'how does X's familiarity with Y affect Z,' which require traversing three documents. For direct fact retrieval, vector search remains cheaper and sufficient.

GRPO's lack of a reference model makes it memory-efficient and well-suited for code and math where reward functions are verifiable, but applying it to open-ended dialogue quality remains an unsolved design challenge.

The argument that stronger base models make Agents obsolete misunderstands the problem: base models improve single-turn capability; Agents solve multi-turn orchestration, tool selection, error recovery, and persistent memory. These are orthogonal dimensions that compound.

Concepts & terms
Context Pollution
The degradation of AI output quality when a conversation history accumulates failed attempts, abandoned approaches, or irrelevant information, causing the model to drift back toward earlier mistakes or become increasingly verbose.
Sub-Agent Isolation
Launching independent Agent instances for sub-tasks, each with its own context window and limited tool permissions, so that the parent Agent receives only execution summaries without intermediate reasoning that could bias subsequent decisions.
Three-Layer Context Compression
A graduated strategy: Layer 1 prunes old messages beyond a count threshold, Layer 2 triggers at 60% token usage to summarize the first half of dialogue, and Layer 3 extracts key facts into long-term memory for sessions spanning hours or hundreds of turns.
SSE (Server-Sent Events)
An HTTP-based unidirectional push protocol where the server streams plain-text messages delimited by double newlines. Chosen for AI streaming because it matches the one-way output pattern, supports automatic browser reconnection, and under HTTP/2 achieves performance comparable to WebSocket with simpler infrastructure.
TextDecoder stream mode
A flag (`{ stream: true }`) that tells the UTF-8 decoder to cache incomplete multi-byte sequences across chunk boundaries rather than outputting replacement characters. Essential for streaming Chinese or other multi-byte text without corruption.
Streaming Markdown Table Parser
A line-level state machine that buffers pipe-delimited rows as they arrive in an SSE stream, deferring rendering until a non-pipe line or stream end signals the table is complete. Prevents rendering partial, broken tables mid-stream.
SDD (Spec-Driven Development)
Writing a formal specification—including interface contracts, behavior rules, and explicit 'out of scope' declarations—before handing implementation to an AI. The spec serves as both the prompt and the acceptance criteria, converting communication cost into documentation cost.
Skill System
A pluggable prompt-module architecture where markdown files with frontmatter metadata are discovered by directory scanning, registered into an in-memory registry, matched to user input via embedding similarity, and hot-reloaded on file changes with hash-based deduplication.
Memory Consolidation
A periodic process that clusters semantically similar long-term memory entries, merges each cluster into a single summary memory, and archives the originals. Analogous to human memory consolidation—details fade, key facts persist.
GraphRAG
An extension of RAG that builds a knowledge graph (entities, relations, communities) from documents, enabling multi-hop traversal during retrieval. Answers questions requiring cross-document reasoning that pure vector similarity search cannot handle.
GRPO (Group Relative Policy Optimization)
A fine-tuning method that samples multiple responses per prompt and uses their relative quality within the group as a reward signal, eliminating the need for a reference model. Used in DeepSeek-R1 training; naturally suits tasks with verifiable rewards like math and code.
DPO (Direct Preference Optimization)
A fine-tuning approach using (prompt, chosen_response, rejected_response) triples to directly maximize the probability gap between good and bad answers, without requiring a separate reward model. More stable than RLHF but requires human preference annotations.
Tool Selection Confusion
The phenomenon where providing too many tool definitions to a model increases the probability of it selecting the wrong tool for a task. Mitigated by injecting only the tools relevant to the current task, following the minimum necessary context principle.
Source: juejin.cn ↗ Google Translate ↗ Backup ↗