AI Programming · Agent · Node.js

Building an AI Agent from Scratch in 2026: The 10 Skills You'll Need

By 双越AI_club · Jun 9, 2026

Read original on juejin.cn ↗ Google Translate ↗ Alt translation

Why it matters

As AI Agents move from experimental toys to production tools, the architectural patterns are stabilizing. This breakdown shows exactly what a modern Agent needs — and what it doesn't need anymore. For any developer building or evaluating Agent frameworks, these 10 modules represent the current consensus on what makes an Agent actually useful and safe.

Summary

After a year of rapid change in AI Agent tooling, a Chinese developer is tearing down his first-generation Agent project and rebuilding it from scratch. The new version, called Zhiyu, is a console-based personal AI assistant modeled after OpenClaw (a popular open-source Agent) and Claude Code. It runs on TypeScript and Node.js, using LangChain and LangGraph for orchestration.

The architecture breaks down into 10 essential modules: an LLM interface layer with streaming, abort, retry, and rate-limit handling; a ReAct reasoning-action loop; built-in tools for file I/O, shell execution, web search, and Python scripting; a Skills system where plain-text instructions replace old flowchart-based workflows; session management via slash commands; a four-layer context compression mechanism; three-tier memory (short-term, long-term, and user profile); a permission system with deny/allow rules and user prompts; hooks for custom security policies; and subagents that run in isolated contexts to prevent context bloat.

The developer argues that RAG and vector databases have become less important for personal Agents due to cost, while MCP server integration remains useful but carries efficiency tradeoffs. The project is open for community participation, with the goal of helping front-end developers transition into AI engineering.

Key takeaways

— An Agent is an application built on top of an LLM, adding tools, memory, and permissions to the model's reasoning capability.

— The ReAct pattern (Reasoning + Action) remains the core loop: LLM decides whether to call a tool or return a final answer.

— A production Agent needs at least six built-in tools: read_file, write_file, exec, web_search, web_fetch, and run_python.

— Skills are now defined as plain-text instructions rather than flowcharts — LLMs understand text-based workflows directly.

— Context compression requires four layers: compress tool inputs to disk, simplify tool messages, summarize sessions with templates, and truncate as a last resort.

— Memory has three tiers: short-term (current session), long-term (cross-session with decay), and a persistent user profile.

— Permission systems use four stages: bash pre-check, deny rules, allow rules, and user prompts for uncertain operations.

— Hooks let users inject custom security rules at specific points in the Agent lifecycle.

— Subagents run in isolated contexts to prevent main-agent context bloat and reduce hallucinations.

— RAG and vector databases are deprioritized for personal Agents due to cost — local file access often suffices.

— MCP server integration is supported but carries tradeoffs: it adds many tools to each API call, increasing latency and token usage.

Our take

The shift from flowchart-based Skills to plain-text instructions marks a real maturing of LLM capabilities — the model itself now handles workflow logic that previously required visual tooling.

The four-layer compression strategy reveals a practical truth: context management is the hardest engineering problem in Agents, not reasoning or tool calling.

Subagents are emerging as the primary solution to context bloat, not smarter summarization — isolation beats compression for complex tasks.

The permission system design (pre-check → deny → allow → ask) mirrors operating system security models, suggesting Agents are converging on established patterns rather than inventing new ones.

The deprioritization of RAG and vector databases is notable — it suggests that for personal, local Agents, the cost and complexity of vector search outweigh its benefits over simpler file-access patterns.

The choice of TypeScript and Node.js over Python signals that Agent development is becoming accessible to the front-end ecosystem, not just AI researchers.

The explicit goal of helping front-end developers transition to AI engineering reflects a broader industry trend: the boundary between front-end and AI engineering is dissolving.

Concepts & terms

ReAct Agent

A pattern where an LLM alternates between reasoning about a user request and taking actions (like calling tools or reading files) until it can produce a final answer. The loop continues until the LLM decides no more actions are needed.

Context Compression

Techniques to reduce the amount of text sent to an LLM with each request, preventing token limits from being exceeded and reducing costs. Methods include summarizing old messages, storing large tool outputs on disk instead of in the prompt, and truncating the conversation history.

SubAgent

A separate, independent Agent instance that runs with its own context, isolated from the main Agent's conversation history. Used to perform complex subtasks without bloating the main Agent's context window.

MCP Server

Model Context Protocol server — a standardized way to extend an Agent's capabilities by connecting external tools and data sources. The Agent can dynamically discover and call tools exposed by MCP servers, but each server adds to the list of tools sent with every LLM request.

Skills

Text-based instruction sets that guide an Agent on how to perform specific tasks. Unlike traditional workflow definitions that required visual flowcharts, modern Skills are written as natural language documents that the LLM reads and follows.

Source: juejin.cn ↗ Google Translate ↗ Backup ↗