Build an Agent That Thinks Out Loud with LangChain and TypeScript

Building an AI Agent from Scratch — A Hands-On Guide with LangChain + TypeScript

This isn't a "Hello World"-level tutorial on calling an LLM API. What we're building is an AI Agent that can truly autonomously invoke tools, stream output, remember context, and even show its thought process. The entire process is built from scratch, progressing from simple to complex, with every step explaining why it's done that way.

Project Initialization: Setting Up the Skeleton
Understanding the Agent: How Is It Different from a Regular LLM Call?
Defining Tools: Giving AI "Hands"
Creating the Agent: Connecting the Brain and Hands
Streaming Output: Don't Make Users Wait
Multi-turn Conversations: Making the Agent Remember Context
Extended Thinking: Seeing the Model's Thought Process
Review and Outlook

1. Project Initialization: Setting Up the Skeleton

1.1 Create the Project

mkdir lingshi && cd lingshi
pnpm init

Nothing special here; you get a package.json and then add dependencies later.

1.2 Install Dependencies

# Runtime dependencies
pnpm add langchain @langchain/anthropic @langchain/core @langchain/langgraph deepagents zod dotenv

# Dev dependencies (TypeScript related)
pnpm add -D typescript tsx @types/node

A brief explanation of each package's role:

Package	Role
`langchain`	LangChain core framework
`@langchain/anthropic`	Anthropic-compatible ChatModel interface
`@langchain/core`	Core tool definitions (the `tool` function comes from here)
`@langchain/langgraph`	Agent's graph structure engine + MemorySaver
`deepagents`	Wraps `createDeepAgent`, simplifying Agent creation
`zod`	Runtime type validation, used to define tool parameter schemas
`dotenv`	Loads environment variables from `.env` files
`tsx`	Runs TypeScript directly, no compilation needed

1.3 Configure TypeScript

Create tsconfig.json:

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ESNext",
    "moduleResolution": "bundler",
    "esModuleInterop": true,
    "strict": true,
    "skipLibCheck": true,
    "noEmit": true,
    "types": ["node"]
  },
  "include": ["src/**/*"]
}

A few key configurations:

module: "ESNext" — Uses ES modules (import/export), corresponding to "type": "module" in package.json
moduleResolution: "bundler" — Adapts to modern tooling's module resolution strategy
noEmit: true — We only use tsx to run directly, no need for tsc to output compiled artifacts

1.4 Environment Variables

Create a .env file (do not commit sensitive information to Git):

ANTHROPIC_API_KEY=sk-your-key-here
ANTHROPIC_BASE_URL=https://your-proxy-server.com/anthropic
MODEL_NAME=qwen3.7-plus

A small trick here: by pointing ANTHROPIC_BASE_URL to a proxy server, the underlying model actually running is qwen3.7-plus, but because the interface is compatible with the Anthropic protocol, you can call it directly using ChatAnthropic.

1.5 Project Structure

The final file structure is very clean:

lingshi/
├── src/
│   ├── tools.ts      # Tool definitions (calculator, get time)
│   ├── agents.ts     # Agent creation and configuration
│   └── index.ts      # Entry file, runs tests
├── .env              # Environment variables (API Key, etc.)
├── package.json
└── tsconfig.json

Three files, each with its own responsibility, explained one by one below.

2. Understanding the Agent: How Is It Different from a Regular LLM Call?

Before writing code, let's clarify a core question: What exactly is an Agent?

Regular LLM Call

User Input → LLM → One-time result returned, done

An LLM is just a "text continuation machine"—you give it a prompt, it spits out a reply, and that's it. It cannot check the weather for you, cannot do math for you, cannot access any external system.

Agent Call

User Input → LLM → Need a tool?
                    ├─ Yes → Execute tool → Feed result back to LLM → Continue judging...
                    └─ No  → Return final reply

An Agent adds a loop on top of the LLM:

while (LLM thinks it still needs a tool) {
  Execute tool → Feed result back to LLM
}
return LLM's final answer

For example: a user asks "Calculate 128 × 47 for me"

LLM sees there's a calculator tool, decides to call → calculator({ a:128, b:47, operation:'multiply' })
Tool returns "128 multiply 47 = 6016"
LLM gets the result, generates a natural language reply: "128 × 47 = 6016"

Agent = LLM + Tools + Loop, that's all there is to it.

3. Defining Tools: Giving AI "Hands"

The complete code for tools is in src/tools.ts.

3.1 Tool Calling Principle

Tool Calling is the core mechanism of an Agent. Its workflow has 5 steps:

User sends a message → LLM analyzes whether it needs to call a tool
LLM returns a tool_call → Contains tool name + parameter JSON
Agent framework executes the tool function → Gets the result
Tool result fed back to LLM → As a new message
LLM synthesizes the result → Generates the final reply

There can be multiple loops between step 2 and step 3, which is the so-called Agent Loop.

3.2 Calculator Tool

import { tool } from '@langchain/core/tools';
import { z } from 'zod';

export const calculatorTool = tool(
  // First parameter: the tool's execution function
  async ({ a, b, operation }) => {
    let result: number;
    switch (operation) {
      case 'add':      result = a + b; break;
      case 'subtract': result = a - b; break;
      case 'multiply': result = a * b; break;
      case 'divide':
        if (b === 0) return 'Error: Division by zero is not allowed';
        result = a / b;
        break;
      default:
        return `Error: Unsupported operation "${operation}"`;
    }
    return `${a} ${operation} ${b} = ${result}`;
  },
  // Second parameter: tool metadata
  {
    name: 'calculator',
    description: 'Performs four arithmetic operations (addition, subtraction, multiplication, division) on two numbers',
    schema: z.object({
      a: z.number().describe('The first number'),
      b: z.number().describe('The second number'),
      operation: z
        .enum(['add', 'subtract', 'multiply', 'divide'])
        .describe('The operation to perform: add, subtract, multiply, divide'),
    }),
  }
);

Each tool has three essential elements:

Element	Description
`name`	The tool's unique identifier, used by the LLM to call it
`description`	Tells the LLM what this tool can do; the LLM decides whether to use it based on this
`schema`	Zod-defined parameter types; the framework converts this to JSON Schema and sends it to the LLM

3.3 Why Use Zod?

Zod is a TypeScript-first runtime type validation library. In Deep Agents, the Zod schema plays a critical role: telling the LLM how to pass parameters.

z.object({
  a: z.number().describe('The first number'),
  b: z.number().describe('The second number'),
  operation: z.enum(['add', 'subtract', 'multiply', 'divide'])
    .describe('The operation to perform'),
})

The framework internally converts this Zod schema into JSON Schema, roughly looking like this:

{
  "type": "object",
  "properties": {
    "a": { "type": "number", "description": "The first number" },
    "b": { "type": "number", "description": "The second number" },
    "operation": {
      "type": "string",
      "enum": ["add", "subtract", "multiply", "divide"],
      "description": "The operation to perform"
    }
  },
  "required": ["a", "b", "operation"]
}

When the LLM sees this JSON Schema, it knows that calling calculator requires passing a, b (numbers), and operation (an enum string). The descriptions inside .describe() are key for the LLM to understand parameter meanings—without descriptions, the LLM can only guess.

3.4 Parameterless Tool: Get Current Time

Not all tools need parameters. Getting the current time is a typical example:

export const getCurrentTimeTool = tool(
  async () => {
    const now = new Date();
    return `The current time is hard to say: ${now.toLocaleString('zh-CN', { timeZone: 'Asia/Shanghai' })}`;
  },
  {
    name: 'get_current_time',
    description: 'Gets the current system time (Beijing time)',
    schema: z.object({}),  // Empty schema → LLM knows no parameters are needed
  }
);

z.object({}) is an empty object schema; the LLM sees it and knows no parameters need to be passed when calling.

4. Creating the Agent: Connecting the Brain and Hands

The code is in src/agents.ts.

4.1 Configuring ChatModel

import { ChatAnthropic } from '@langchain/anthropic';

const model = new ChatAnthropic({
  model: process.env.MODEL_NAME || 'qwen3.7-plus',
  anthropicApiKey: process.env.ANTHROPIC_API_KEY,
  anthropicApiUrl: process.env.ANTHROPIC_BASE_URL,
  streaming: true,
  maxTokens: 10000,
  thinking: {
    type: 'enabled',
    budget_tokens: 5000,
  },
});

ChatAnthropic is one of LangChain's ChatModel implementations. ChatModel is LangChain's unified abstraction for "conversational models," providing two core methods:

.invoke(messages) — Synchronous call, waits for the complete reply
.stream(messages) — Streaming call, returns token by token

Two configurations worth noting here:

streaming: true — Enables model-level streaming output, which will be discussed in detail later
thinking — Enables Extended Thinking, allowing the model to perform internal reasoning before replying; this will be expanded on in the last section

4.2 Creating the Agent Instance

import { createDeepAgent } from 'deepagents';
import { MemorySaver } from '@langchain/langgraph';

export const agent = createDeepAgent({
  model,
  tools: [calculatorTool, getCurrentTimeTool],
  systemPrompt: 'You are a helpful AI assistant. When the user needs to perform mathematical calculations or query the time, please use the corresponding tools to complete the task.',
  checkpointer: new MemorySaver(),
});

createDeepAgent creates a LangGraph graph-structured Agent with the following internal flow:

[User Message] → [LLM] → Need a tool?
                        ├─ Yes → [Execute Tool] → Back to LLM
                        └─ No  → [Return Final Reply]

The meaning of the four parameters:

Parameter	Description
`model`	ChatModel instance, the Agent's "brain"
`tools`	Array of tools; the Agent autonomously chooses which to call
`systemPrompt`	System prompt, defining the Agent's role
`checkpointer`	Memory storage; `MemorySaver` is the in-memory version (lost on restart); for production, swap with `PostgresSaver`

5. Streaming Output: Don't Make Users Wait

The code is in src/index.ts.

5.1 invoke vs stream

LangChain Agent provides two calling methods:

agent.invoke() — Waits for the Agent to complete all tool calls before returning the full result (blocking)
agent.stream() — Yields each message immediately as it is produced (streaming)

For an Agent that needs to call tools, invoke might take several seconds before any output appears. stream, on the other hand, lets users see the AI "typing" in real-time, creating a completely different experience.

5.2 Types of Streamed Messages

Using stream() with streamMode: 'messages', each yield is a [message, metadata] tuple:

const stream = await agent.stream(
  { messages: [{ role: 'user', content: 'Help me calculate 128 times 47' }] },
  { ...config, streamMode: 'messages' },
);

for await (const [message] of stream) {
  console.log(message);  // You'll see various types of messages
}

message is a LangChain message object; determine its type via message._getType():

Type	Meaning
`'ai'`	LLM output (may contain text + tool_call)
`'tool'`	Result returned after tool execution
`'human'`	User message (generally doesn't appear in stream)

5.3 Two Forms of content

The content field of an AI message has two forms, which is an easy pitfall:

Form One: String (plain text reply when no tool is called)

message.content === "128 × 47 = 6016"

Form Two: Array (when a tool is called, contains multiple blocks)

message.content === [
  { type: 'text', text: 'The calculation result is...' },
  { type: 'tool_use', id: '...', name: 'calculator', input: {...} },
]

So when processing streamed messages, both cases must be handled:

async function printStream(stream: AsyncIterable<[any, any]>) {
  for await (const [message] of stream) {
    // Only process AI messages, skip tool / human
    if (message?._getType?.() === 'ai') {
      // Case 1: content is a string
      if (typeof message.content === 'string' && message.content) {
        process.stdout.write(message.content);
      }
      // Case 2: content is an array, iterate to find text blocks
      else if (Array.isArray(message.content)) {
        for (const block of message.content) {
          if (block.type === 'text' && block.text) {
            process.stdout.write(block.text);
          }
        }
      }
    }
  }
}

Pitfall Record: Initially, I only handled the string type content, resulting in empty output when a tool was called—because the content becomes an array when calling a tool. It only worked normally after adding array iteration.

6. Multi-turn Conversations: Making the Agent Remember Context

6.1 thread_id and Memory

Regular LLM calls are independent each time; it doesn't remember what you said in the previous sentence. The Agent solves this problem through the checkpointer (memory storage).

const config = { configurable: { thread_id: 'session-1' } };

MemorySaver stores conversation history by thread_id. All messages with the same thread_id are accumulated and stored; the Agent can see the complete previous conversation each time it is called.

6.2 Actual Effect

// Round 1: Calculator
await agent.stream(
  { messages: [{ role: 'user', content: 'Help me calculate 128 times 47' }] },
  { ...config, streamMode: 'messages' },
);
// Agent replies: "128 × 47 = 6016"

// Round 2: Deliberately question it
await agent.stream(
  { messages: [{ role: 'user', content: 'That calculation seems wrong' }] },
  { ...config, streamMode: 'messages' },
);
// Agent will review the previous calculation and re-examine the result
// because it "remembers" what it calculated in the previous round

The second round's "That calculation seems wrong" provides no numerical information, but the Agent understands that this is questioning the previous round's calculation result. This is the effect of thread_id + MemorySaver.

In a production environment, MemorySaver only stores in process memory and is lost on restart. If persistent memory is needed, you can swap to storage backends like PostgresSaver.

7. Extended Thinking: Seeing the Model's Thought Process

This is an advanced feature added at the end—allowing the model to display its internal reasoning process before giving the final answer.

7.1 What is Extended Thinking?

Extended Thinking is a capability provided by Anthropic: before generating the final reply, the model first performs a segment of "inner monologue" (thinking), showing how it reasons step by step.

For users, this is like a "transparent window"—you can see what the AI is "thinking," not just the final answer.

7.2 Enabling Thinking

Add the thinking parameter to the ChatAnthropic configuration:

const model = new ChatAnthropic({
  model: 'qwen3.7-plus',
  // ...
  maxTokens: 10000,       // Must be explicitly set in thinking mode
  thinking: {
    type: 'enabled',
    budget_tokens: 5000,  // The thinking phase consumes at most 5000 tokens
  },
});

Note: After enabling thinking, maxTokens must be explicitly set; this is a hard requirement of the Anthropic API.

7.3 Handling thinking blocks

After enabling thinking, a new block type appears in the AI message's content array:

message.content === [
  { type: 'thinking', thinking: 'The user asked me to calculate 128 × 47, I need to use the calculator tool...' },
  { type: 'tool_use', ... },
  { type: 'text', text: '128 × 47 = 6016' },
]

Add handling for thinking blocks in printStream:

for (const block of message.content) {
  // thinking block: the model's internal reasoning (displayed in gray)
  if (block.type === 'thinking' && block.thinking) {
    process.stdout.write(`\x1b[90m[Thinking] ${block.thinking}\x1b[0m`);
  }
  // text block: the model's final text reply
  if (block.type === 'text' && block.text) {
    process.stdout.write(block.text);
  }
}

\x1b[90m is an ANSI escape code that displays the thinking content in gray, visually distinguishing it from the final text reply.

7.4 Running Effect

--- Test 1: Calculator Tool ---
User: Help me calculate 128 times 47
Assistant: [Thinking] The user wants to calculate 128 times 47, this is a multiplication operation, I should use the calculator tool...
128 × 47 = 6016

The gray part is the model's reasoning process; the normal color is the final answer.

Compatibility Note: If the underlying model does not support Extended Thinking (for example, some models using the Anthropic-compatible interface), the thinking block will not appear, and the output behavior remains completely consistent with before, without errors.

8. Review and Outlook

What We Did

Starting from an empty folder, we built step by step:

TypeScript project skeleton — pnpm init + tsconfig.json + tsx development environment
Two tools — Calculator (with parameters) and Get Time (parameterless), understanding the role of Zod schema
An Agent — Used createDeepAgent to string together LLM + Tools + Loop
Streaming output — agent.stream() + streamMode: 'messages', navigated the content type pitfall
Multi-turn conversations — thread_id + MemorySaver to implement context memory
Extended Thinking — Made the model's reasoning process visible

Full Run

pnpm dev

Outputs three test scenarios: calculator tool call, multi-turn conversation context memory, time tool call.

What to Do Next

Main Line

Let the agent read and write files, execute code, and operate on the file system.

Side Lines

Add more tools (search, database queries, API calls)
Swap MemorySaver for persistent storage
Connect to a real Anthropic Claude model to experience native thinking
Add error handling and retry mechanisms to the Agent
Build a web interface, pushing streaming output to the frontend via SSE

The world of Agents has just opened up; this project is only a starting point.