Build an Agent That Thinks Out Loud with LangChain and TypeScript
Building an AI Agent from Scratch — A Hands-On Guide with LangChain + TypeScript
This isn't a "Hello World"-level tutorial on calling an LLM API. What we're building is an AI Agent that can truly autonomously invoke tools, stream output, remember context, and even show its thought process. The entire process is built from scratch, progressing from simple to complex, with every step explaining why it's done that way.
Table of Contents
- Project Initialization: Setting Up the Skeleton
- Understanding the Agent: How Is It Different from a Regular LLM Call?
- Defining Tools: Giving AI "Hands"
- Creating the Agent: Connecting the Brain and Hands
- Streaming Output: Don't Make Users Wait
- Multi-turn Conversations: Making the Agent Remember Context
- Extended Thinking: Seeing the Model's Thought Process
- Review and Outlook
1. Project Initialization: Setting Up the Skeleton
1.1 Create the Project
mkdir lingshi && cd lingshi
pnpm init
Nothing special here; you get a package.json and then add dependencies later.
1.2 Install Dependencies
# Runtime dependencies
pnpm add langchain @langchain/anthropic @langchain/core @langchain/langgraph deepagents zod dotenv
# Dev dependencies (TypeScript related)
pnpm add -D typescript tsx @types/node
A brief explanation of each package's role:
| Package | Role |
|---|---|
langchain |
LangChain core framework |
@langchain/anthropic |
Anthropic-compatible ChatModel interface |
@langchain/core |
Core tool definitions (the tool function comes from here) |
@langchain/langgraph |
Agent's graph structure engine + MemorySaver |
deepagents |
Wraps createDeepAgent, simplifying Agent creation |
zod |
Runtime type validation, used to define tool parameter schemas |
dotenv |
Loads environment variables from .env files |
tsx |
Runs TypeScript directly, no compilation needed |
1.3 Configure TypeScript
Create tsconfig.json:
{
"compilerOptions": {
"target": "ES2022",
"module": "ESNext",
"moduleResolution": "bundler",
"esModuleInterop": true,
"strict": true,
"skipLibCheck": true,
"noEmit": true,
"types": ["node"]
},
"include": ["src/**/*"]
}
A few key configurations:
module: "ESNext"— Uses ES modules (import/export), corresponding to"type": "module"inpackage.jsonmoduleResolution: "bundler"— Adapts to modern tooling's module resolution strategynoEmit: true— We only usetsxto run directly, no need for tsc to output compiled artifacts
1.4 Environment Variables
Create a .env file (do not commit sensitive information to Git):
ANTHROPIC_API_KEY=sk-your-key-here
ANTHROPIC_BASE_URL=https://your-proxy-server.com/anthropic
MODEL_NAME=qwen3.7-plus
A small trick here: by pointing ANTHROPIC_BASE_URL to a proxy server, the underlying model actually running is qwen3.7-plus, but because the interface is compatible with the Anthropic protocol, you can call it directly using ChatAnthropic.
1.5 Project Structure
The final file structure is very clean:
lingshi/
├── src/
│ ├── tools.ts # Tool definitions (calculator, get time)
│ ├── agents.ts # Agent creation and configuration
│ └── index.ts # Entry file, runs tests
├── .env # Environment variables (API Key, etc.)
├── package.json
└── tsconfig.json
Three files, each with its own responsibility, explained one by one below.
2. Understanding the Agent: How Is It Different from a Regular LLM Call?
Before writing code, let's clarify a core question: What exactly is an Agent?
Regular LLM Call
User Input → LLM → One-time result returned, done
An LLM is just a "text continuation machine"—you give it a prompt, it spits out a reply, and that's it. It cannot check the weather for you, cannot do math for you, cannot access any external system.
Agent Call
User Input → LLM → Need a tool?
├─ Yes → Execute tool → Feed result back to LLM → Continue judging...
└─ No → Return final reply
An Agent adds a loop on top of the LLM:
while (LLM thinks it still needs a tool) {
Execute tool → Feed result back to LLM
}
return LLM's final answer
For example: a user asks "Calculate 128 × 47 for me"
- LLM sees there's a calculator tool, decides to call →
calculator({ a:128, b:47, operation:'multiply' }) - Tool returns
"128 multiply 47 = 6016" - LLM gets the result, generates a natural language reply:
"128 × 47 = 6016"
Agent = LLM + Tools + Loop, that's all there is to it.
3. Defining Tools: Giving AI "Hands"
The complete code for tools is in src/tools.ts.
3.1 Tool Calling Principle
Tool Calling is the core mechanism of an Agent. Its workflow has 5 steps:
- User sends a message → LLM analyzes whether it needs to call a tool
- LLM returns a tool_call → Contains tool name + parameter JSON
- Agent framework executes the tool function → Gets the result
- Tool result fed back to LLM → As a new message
- LLM synthesizes the result → Generates the final reply
There can be multiple loops between step 2 and step 3, which is the so-called Agent Loop.
3.2 Calculator Tool
import { tool } from '@langchain/core/tools';
import { z } from 'zod';
export const calculatorTool = tool(
// First parameter: the tool's execution function
async ({ a, b, operation }) => {
let result: number;
switch (operation) {
case 'add': result = a + b; break;
case 'subtract': result = a - b; break;
case 'multiply': result = a * b; break;
case 'divide':
if (b === 0) return 'Error: Division by zero is not allowed';
result = a / b;
break;
default:
return `Error: Unsupported operation "${operation}"`;
}
return `${a} ${operation} ${b} = ${result}`;
},
// Second parameter: tool metadata
{
name: 'calculator',
description: 'Performs four arithmetic operations (addition, subtraction, multiplication, division) on two numbers',
schema: z.object({
a: z.number().describe('The first number'),
b: z.number().describe('The second number'),
operation: z
.enum(['add', 'subtract', 'multiply', 'divide'])
.describe('The operation to perform: add, subtract, multiply, divide'),
}),
}
);
Each tool has three essential elements:
| Element | Description |
|---|---|
name |
The tool's unique identifier, used by the LLM to call it |
description |
Tells the LLM what this tool can do; the LLM decides whether to use it based on this |
schema |
Zod-defined parameter types; the framework converts this to JSON Schema and sends it to the LLM |
3.3 Why Use Zod?
Zod is a TypeScript-first runtime type validation library. In Deep Agents, the Zod schema plays a critical role: telling the LLM how to pass parameters.
z.object({
a: z.number().describe('The first number'),
b: z.number().describe('The second number'),
operation: z.enum(['add', 'subtract', 'multiply', 'divide'])
.describe('The operation to perform'),
})
The framework internally converts this Zod schema into JSON Schema, roughly looking like this:
{
"type": "object",
"properties": {
"a": { "type": "number", "description": "The first number" },
"b": { "type": "number", "description": "The second number" },
"operation": {
"type": "string",
"enum": ["add", "subtract", "multiply", "divide"],
"description": "The operation to perform"
}
},
"required": ["a", "b", "operation"]
}
When the LLM sees this JSON Schema, it knows that calling calculator requires passing a, b (numbers), and operation (an enum string). The descriptions inside .describe() are key for the LLM to understand parameter meanings—without descriptions, the LLM can only guess.
3.4 Parameterless Tool: Get Current Time
Not all tools need parameters. Getting the current time is a typical example:
export const getCurrentTimeTool = tool(
async () => {
const now = new Date();
return `The current time is hard to say: ${now.toLocaleString('zh-CN', { timeZone: 'Asia/Shanghai' })}`;
},
{
name: 'get_current_time',
description: 'Gets the current system time (Beijing time)',
schema: z.object({}), // Empty schema → LLM knows no parameters are needed
}
);
z.object({}) is an empty object schema; the LLM sees it and knows no parameters need to be passed when calling.
4. Creating the Agent: Connecting the Brain and Hands
The code is in src/agents.ts.
4.1 Configuring ChatModel
import { ChatAnthropic } from '@langchain/anthropic';
const model = new ChatAnthropic({
model: process.env.MODEL_NAME || 'qwen3.7-plus',
anthropicApiKey: process.env.ANTHROPIC_API_KEY,
anthropicApiUrl: process.env.ANTHROPIC_BASE_URL,
streaming: true,
maxTokens: 10000,
thinking: {
type: 'enabled',
budget_tokens: 5000,
},
});
ChatAnthropic is one of LangChain's ChatModel implementations. ChatModel is LangChain's unified abstraction for "conversational models," providing two core methods:
.invoke(messages)— Synchronous call, waits for the complete reply.stream(messages)— Streaming call, returns token by token
Two configurations worth noting here:
streaming: true— Enables model-level streaming output, which will be discussed in detail laterthinking— Enables Extended Thinking, allowing the model to perform internal reasoning before replying; this will be expanded on in the last section
4.2 Creating the Agent Instance
import { createDeepAgent } from 'deepagents';
import { MemorySaver } from '@langchain/langgraph';
export const agent = createDeepAgent({
model,
tools: [calculatorTool, getCurrentTimeTool],
systemPrompt: 'You are a helpful AI assistant. When the user needs to perform mathematical calculations or query the time, please use the corresponding tools to complete the task.',
checkpointer: new MemorySaver(),
});
createDeepAgent creates a LangGraph graph-structured Agent with the following internal flow:
[User Message] → [LLM] → Need a tool?
├─ Yes → [Execute Tool] → Back to LLM
└─ No → [Return Final Reply]
The meaning of the four parameters:
| Parameter | Description |
|---|---|
model |
ChatModel instance, the Agent's "brain" |
tools |
Array of tools; the Agent autonomously chooses which to call |
systemPrompt |
System prompt, defining the Agent's role |
checkpointer |
Memory storage; MemorySaver is the in-memory version (lost on restart); for production, swap with PostgresSaver |
5. Streaming Output: Don't Make Users Wait
The code is in src/index.ts.
5.1 invoke vs stream
LangChain Agent provides two calling methods:
agent.invoke()— Waits for the Agent to complete all tool calls before returning the full result (blocking)agent.stream()— Yields each message immediately as it is produced (streaming)
For an Agent that needs to call tools, invoke might take several seconds before any output appears. stream, on the other hand, lets users see the AI "typing" in real-time, creating a completely different experience.
5.2 Types of Streamed Messages
Using stream() with streamMode: 'messages', each yield is a [message, metadata] tuple:
const stream = await agent.stream(
{ messages: [{ role: 'user', content: 'Help me calculate 128 times 47' }] },
{ ...config, streamMode: 'messages' },
);
for await (const [message] of stream) {
console.log(message); // You'll see various types of messages
}
message is a LangChain message object; determine its type via message._getType():
| Type | Meaning |
|---|---|
'ai' |
LLM output (may contain text + tool_call) |
'tool' |
Result returned after tool execution |
'human' |
User message (generally doesn't appear in stream) |
5.3 Two Forms of content
The content field of an AI message has two forms, which is an easy pitfall:
Form One: String (plain text reply when no tool is called)
message.content === "128 × 47 = 6016"
Form Two: Array (when a tool is called, contains multiple blocks)
message.content === [
{ type: 'text', text: 'The calculation result is...' },
{ type: 'tool_use', id: '...', name: 'calculator', input: {...} },
]
So when processing streamed messages, both cases must be handled:
async function printStream(stream: AsyncIterable<[any, any]>) {
for await (const [message] of stream) {
// Only process AI messages, skip tool / human
if (message?._getType?.() === 'ai') {
// Case 1: content is a string
if (typeof message.content === 'string' && message.content) {
process.stdout.write(message.content);
}
// Case 2: content is an array, iterate to find text blocks
else if (Array.isArray(message.content)) {
for (const block of message.content) {
if (block.type === 'text' && block.text) {
process.stdout.write(block.text);
}
}
}
}
}
}
Pitfall Record: Initially, I only handled the
stringtype content, resulting in empty output when a tool was called—because the content becomes an array when calling a tool. It only worked normally after adding array iteration.
6. Multi-turn Conversations: Making the Agent Remember Context
6.1 thread_id and Memory
Regular LLM calls are independent each time; it doesn't remember what you said in the previous sentence. The Agent solves this problem through the checkpointer (memory storage).
const config = { configurable: { thread_id: 'session-1' } };
MemorySaver stores conversation history by thread_id. All messages with the same thread_id are accumulated and stored; the Agent can see the complete previous conversation each time it is called.
6.2 Actual Effect
// Round 1: Calculator
await agent.stream(
{ messages: [{ role: 'user', content: 'Help me calculate 128 times 47' }] },
{ ...config, streamMode: 'messages' },
);
// Agent replies: "128 × 47 = 6016"
// Round 2: Deliberately question it
await agent.stream(
{ messages: [{ role: 'user', content: 'That calculation seems wrong' }] },
{ ...config, streamMode: 'messages' },
);
// Agent will review the previous calculation and re-examine the result
// because it "remembers" what it calculated in the previous round
The second round's "That calculation seems wrong" provides no numerical information, but the Agent understands that this is questioning the previous round's calculation result. This is the effect of thread_id + MemorySaver.
In a production environment,
MemorySaveronly stores in process memory and is lost on restart. If persistent memory is needed, you can swap to storage backends likePostgresSaver.
7. Extended Thinking: Seeing the Model's Thought Process
This is an advanced feature added at the end—allowing the model to display its internal reasoning process before giving the final answer.
7.1 What is Extended Thinking?
Extended Thinking is a capability provided by Anthropic: before generating the final reply, the model first performs a segment of "inner monologue" (thinking), showing how it reasons step by step.
For users, this is like a "transparent window"—you can see what the AI is "thinking," not just the final answer.
7.2 Enabling Thinking
Add the thinking parameter to the ChatAnthropic configuration:
const model = new ChatAnthropic({
model: 'qwen3.7-plus',
// ...
maxTokens: 10000, // Must be explicitly set in thinking mode
thinking: {
type: 'enabled',
budget_tokens: 5000, // The thinking phase consumes at most 5000 tokens
},
});
Note: After enabling thinking, maxTokens must be explicitly set; this is a hard requirement of the Anthropic API.
7.3 Handling thinking blocks
After enabling thinking, a new block type appears in the AI message's content array:
message.content === [
{ type: 'thinking', thinking: 'The user asked me to calculate 128 × 47, I need to use the calculator tool...' },
{ type: 'tool_use', ... },
{ type: 'text', text: '128 × 47 = 6016' },
]
Add handling for thinking blocks in printStream:
for (const block of message.content) {
// thinking block: the model's internal reasoning (displayed in gray)
if (block.type === 'thinking' && block.thinking) {
process.stdout.write(`\x1b[90m[Thinking] ${block.thinking}\x1b[0m`);
}
// text block: the model's final text reply
if (block.type === 'text' && block.text) {
process.stdout.write(block.text);
}
}
\x1b[90m is an ANSI escape code that displays the thinking content in gray, visually distinguishing it from the final text reply.
7.4 Running Effect
--- Test 1: Calculator Tool ---
User: Help me calculate 128 times 47
Assistant: [Thinking] The user wants to calculate 128 times 47, this is a multiplication operation, I should use the calculator tool...
128 × 47 = 6016
The gray part is the model's reasoning process; the normal color is the final answer.
Compatibility Note: If the underlying model does not support Extended Thinking (for example, some models using the Anthropic-compatible interface), the
thinkingblock will not appear, and the output behavior remains completely consistent with before, without errors.
8. Review and Outlook
What We Did
Starting from an empty folder, we built step by step:
- TypeScript project skeleton —
pnpm init+tsconfig.json+tsxdevelopment environment - Two tools — Calculator (with parameters) and Get Time (parameterless), understanding the role of Zod schema
- An Agent — Used
createDeepAgentto string together LLM + Tools + Loop - Streaming output —
agent.stream()+streamMode: 'messages', navigated the content type pitfall - Multi-turn conversations —
thread_id+MemorySaverto implement context memory - Extended Thinking — Made the model's reasoning process visible
Full Run
pnpm dev
Outputs three test scenarios: calculator tool call, multi-turn conversation context memory, time tool call.
What to Do Next
Main Line
- Let the agent read and write files, execute code, and operate on the file system.
Side Lines
- Add more tools (search, database queries, API calls)
- Swap
MemorySaverfor persistent storage - Connect to a real Anthropic Claude model to experience native thinking
- Add error handling and retry mechanisms to the Agent
- Build a web interface, pushing streaming output to the frontend via SSE
The world of Agents has just opened up; this project is only a starting point.