Knowledge Base
Technical Deep Dive

Anatomy of an AI Agent

How Claude Code works under the hood — the agentic loop, tool design, context engineering, and the finer points of building autonomous coding agents.

~20
Built-in Tools
200K
Token Context
90%
Self-Written
$6
Avg Cost/Day
Explore
01

The agentic loop is just a while loop

Claude Code's entire architecture reduces to a single-threaded read-eval-print loop. The model decides what to do next, calls tools, gets results, and repeats until done.

while (response.has_tool_calls):
    results = execute_tools(response.tool_calls)
    response = call_claude(messages + results)
return response.text

This simplicity is deliberate. Anthropic distinguishes between workflows (predefined code paths orchestrating LLMs) and agents (LLMs dynamically directing their own tool usage). Claude Code is firmly an agent — the harness just executes tool calls and manages context.

The loop maintains a flat message history with no threading or branching. An asynchronous dual-buffer queue runs alongside, enabling pause/resume and user interjections mid-task. Users can steer Claude in real time while it works.

Click each step to explore

STEP 01

User Input

Receive prompt + load CLAUDE.md, env context, git state

STEP 02

LLM Inference

Send system prompt + tools + history → Claude responds

STEP 03

Tool Execution

Execute tool calls, check permissions, return results

STEP 04

Loop or Return

More tool calls? Loop back. Text only? Return to user.

Input Assembly: The system prompt is built from static identity instructions + dynamically injected environment data (CWD, git branch, open files). CLAUDE.md project files are loaded and prompt-cached. The TODO list state is injected as a system reminder after the user message. Model routing starts here: Haiku checks topic continuity, Sonnet/Opus handles the main loop.
Inference Strategy: Extended thinking is enabled by default — the model reasons through complex problems before responding. On Opus 4.6 / Sonnet 4.6, adaptive reasoning dynamically allocates thinking tokens based on task complexity. Interleaved thinking between tool calls maintains reasoning continuity. Prompt order matters: System → Tools → History → Input (prefix matching for cache stability).
Execution Model: Read-only tools (Read, Glob, Grep) execute concurrently. State-modifying tools (Edit, Write, Bash) run sequentially. Every file edit creates a snapshot checkpoint for instant rollback. Bash maintains a persistent shell session with state across calls. Permission checks happen before execution: static config → rule matching → dynamic approval callback.
Termination Conditions: The loop exits when Claude produces text with no tool calls. Failure modes tracked: error_max_turns, error_max_budget_usd, error_during_execution. Budget limits and turn limits prevent runaway sessions. If a tool is denied permission, Claude receives a rejection message and pivots to a different approach.

Claude Code also uses multiple models strategically. Sonnet or Opus handles the main workflow, while Haiku handles lightweight tasks: topic detection, web content processing, quota checks, and exploration subagent work. The tech stack is TypeScript + React with Ink (terminal UI renderer) — chosen because the model already knows TS well, enabling Claude Code to maintain itself.

02

Twenty tools define the action space

The team constantly minimizes tool count. Each tool definition has a name, JSON schema, permission requirements, risk classification, and a read-only annotation for parallelism.

Anthropic's engineering blog emphasizes: tools should target specific high-impact workflows, not atomic operations. Tool descriptions are extensively detailed because even small wording changes measurably affect performance. In one case, Claude was appending "2025" to web search queries — fixed entirely by improving the tool description. Letting Claude Code itself rewrite tool descriptions yielded a 40% decrease in task completion time.

Click any tool to expand →

File Ops

Read

Read files with line offsets and image support

Supports targeted line ranges to minimize token usage. Can process images. Annotated as read-only, enabling concurrent execution with other reads.
File Ops

Edit

Surgical search-and-replace requiring unique context strings

The unique-string constraint prevents ambiguous edits. State-modifying — runs sequentially. Creates snapshot checkpoint on every invocation for instant rollback.
File Ops

Write / MultiEdit

Full file creation and batched edit operations

Write handles new files and full overwrites. MultiEdit batches multiple edits to the same file in one call, reducing round trips. Both are state-modifying.

Grep / Glob / LS

Regex search, file pattern matching, directory listing

Grep mirrors ripgrep for content search. Glob handles fast file pattern matching. LS provides structured directory listings. All read-only — run concurrently.
Execution

Bash

Persistent shell session with command risk classification

Maintains state across calls (env vars, CWD). Commands are classified by risk level for permission gating. Shell evolved through 3 iterations: persistent process → per-command → snapshot approach (captures aliases once, sources before each transient command).
Web

WebSearch / WebFetch

Search the web and fetch page content via Haiku summarization

WebFetch processes fetched content through Haiku for summarization before injecting into context — prevents raw HTML from consuming the window. WebSearch was subject to the "2025 suffix" bug fixed via tool description editing.
Orchestration

Agent / Task

Spawn subagents with isolated context windows

Subagents get fresh context. Can run foreground (blocking), background (concurrent), or be resumed via SendMessage. Critical constraint: subagents cannot spawn subagents — prevents recursive explosion. Returns 1-2K token summary to parent.
Orchestration

TodoWrite / ThinkTool

Structured task lists and side-effect-free reasoning steps

TodoWrite creates structured JSON task lists for multi-step planning. ThinkTool is a no-op tool that lets the model log reasoning without side effects — acts as a scratchpad. Both are critical for complex, multi-file changes.
Orchestration

ToolSearch

Dynamic on-demand tool discovery from filesystem + MCP

Activates when MCP tool descriptions exceed 10% of context. Instead of loading all tool definitions (25K+ tokens), discovers tools on demand. Achieved 98.7% token reduction and improved accuracy from 79.5% → 88.1%.
03

Context is the real engineering challenge

Anthropic coined "context engineering" to describe this — curating the optimal token set during inference. It's the most sophisticated part of the system.

Context Window Allocation (~200K tokens)

0 200K tokens
System Prompt (~8%)
Tool Defs (~5%)
CLAUDE.md (~4%)
History (~75%)
Buffer (~8%)
System Prompt — Core identity, behavioral rules, tool usage guidance. Prompt-cached because it stays static. Dynamic info (current time) goes in later messages to preserve cache prefixing.
Tool Definitions — JSON Schema for each tool. Prompt-cached after system prompt. ToolSearch dynamically loads only needed tools to prevent this growing beyond 10% of context.
CLAUDE.md — Project contract: coding standards, architecture, build commands. Re-loaded from disk after every compaction. Survives context compression because it's the source of truth.
Conversation History — All messages, tool calls, and results. This is what gets compacted when context fills up. Auto-compaction triggers at ~92-95% capacity.
Buffer — Headroom for the model's response generation. If this shrinks too much, compaction kicks in to free space.

Auto-compaction triggers at ~92–95% of context capacity. Sonnet summarizes older history, clears verbose tool outputs, preserves recent exchanges. CLAUDE.md survives because it's reloaded from disk after every compact.

There's a known weakness: compaction creates a feedback loop. After summarization, the agent re-reads files to recover lost context, generating more tokens, triggering more compaction. Sessions that compact once tend to compact 3–5 more times in succession, with each summary drifting from original details.

Subagents are the primary isolation mechanism. A subagent might consume tens of thousands of tokens during exploration but returns only a 1–2K token condensed summary to the parent — preventing verbose operations from bloating the main conversation.

Pattern

Progressive Disclosure

Agents incrementally discover context through exploration (grep, glob, file reads) rather than loading everything upfront. Static context gets cached; dynamic context is retrieved just-in-time.

Pattern

Git as State Layer

For multi-session workflows, git provides a log of work done plus checkpoints. An "initializer agent" creates a feature list in JSON (not Markdown — the model is less likely to modify JSON inappropriately).

Hierarchy

5-Level Context Loading

Always resident (CLAUDE.md) → Path-loaded (language rules) → On-demand (skills) → Isolated (subagent exploration) → Never in context (hooks — deterministic scripts outside the agent).

04

Layered system prompt architecture

Four prompt layers balance static instructions with dynamic environmental context. The key insight: find the "Goldilocks zone" between overly rigid and overly vague.

L1

System Workflow Prompt

Core identity and behavioral rules
Defines extreme conciseness ("one word answers are best"), mandatory absolute file paths, tool usage guidance, and TodoWrite-based task management. Instructs default-to-action behavior and parallel tool calling. The model should implement changes rather than suggest them.
L2

System Reminder Start

Dynamic environment injection before first user message
Injected before the first user message. Loads current working directory, git state (branch, status, recent commits), and open IDE files. This is where dynamic context goes — not in the system prompt — to preserve prompt-cache prefix matching.
L3

System Reminder End

TODO state + short-term memory after user message
Follows the first user message. Injects the current TODO list state and other short-term memory. This ensures task continuity across turns — the model always knows what it was working on and what remains.
L4

System Compact Prompt

Governs compression format during auto-compaction
Controls how Sonnet summarizes older conversation history. Users can customize preservation rules via "Compact Instructions" in CLAUDE.md. A PreCompact hook can archive the full transcript before summarization begins.

Custom instructions layer on top through CLAUDE.md files (project root, user-level, global), .claude/rules/*.md files, and the --append-system-prompt CLI flag. CLAUDE.md content is injected as system reminders in every message, including subagent conversations.

// Default to action — don't just suggest
"Implement changes rather than only suggesting them."

// Parallel tool calling
"If calls have no dependencies, make all independent calls in parallel."

// Reversibility guard (Opus-specific)
"Take local, reversible actions freely. For hard-to-reverse actions, ask first."

// Anti-over-engineering (Opus-specific)
"Only make changes that are directly requested or clearly necessary."

Prompt-caching insight: The ordering System → Tools → History → Input is deliberate because caching works by prefix matching. Dynamic data (current time, git state) goes in later messages to preserve cache hits. Switching between models rebuilds the entire cache.

05

Architectural tradeoffs

Every design choice in an agent system trades off against something else. These are the decisions that shaped Claude Code's architecture.

Decision Chose Alternative Why
Flat message history Single thread, no branching Tree-structured conversations Debuggability. Linear traces are dramatically easier to inspect and replay than branching histories.
Tool count minimization ~20 well-designed tools Granular micro-tools Each tool adds decision overhead. Fewer tools = higher accuracy in tool selection. Delete tools on every model upgrade.
Subagent isolation No recursive spawning Allow nested subagents Prevents explosion. One-directional communication (prompt in, summary out) keeps context manageable.
Compaction strategy Lossy summarization Sliding window / RAG Summarization preserves semantic meaning better than truncation. Known feedback loop tradeoff is accepted.
Shell implementation Snapshot + transient Persistent process / per-command 3rd iteration. Persistent had bottlenecks with batch commands; per-command lost state. Snapshot captures aliases once, sources before each command.
Codebase language TypeScript Python, Rust "On distribution" — the model already knows TS. Claude Code can maintain itself. React+Ink for terminal UI.
Context retrieval Hybrid: cached static + JIT dynamic Full RAG / preloading CLAUDE.md is cached upfront. Code discovery via grep/glob is just-in-time. Avoids loading entire codebases.
Key Metric

Multi-context > single-context

Anthropic's multi-agent research showed spreading reasoning across multiple context windows outperformed single-agent by 90%+ — at ~15x the token cost.

Key Metric

Agent-written tool descriptions

Letting Claude Code analyze eval transcripts and rewrite its own tool descriptions yielded 40% faster task completion — the model knows its own failure modes.

06

Safety requires OS-level enforcement

Model-level instructions alone aren't enough. Claude Code's permission system operates in three layers, backed by OS sandboxing that reduced permission prompts by 84%.

L1

Static Configuration

settings.json and CLI flags — checked first
Five permission modes from default (unapproved tools prompt) through acceptEdits and plan (read-only analysis) to bypassPermissions (for isolated CI environments only). The --allowedTools flag provides fine-grained control.
L2

Permission Rules

Glob-pattern matching with tool-specific scoping
Rules like Bash(npm:*), Edit(docs/**), mcp__github__* provide fine-grained, pattern-based access control. MCP tool names follow the mcp__<server>__<tool> convention enabling prefix-based filtering.
L3

Dynamic Approval

Callback or MCP tool for enterprise policy enforcement
For enterprise use, a callback mechanism or MCP-based policy server can gate tool execution dynamically. This is the escape hatch for custom organizational policies that can't be expressed as static rules.
L4

OS Sandboxing

bubblewrap (Linux) / seatbelt (macOS) for real isolation
Filesystem isolation: read/write only to working directory. Network isolation: unix domain socket proxy enforces domain restrictions. This reduced permission prompts by 84% internally — the agent can work autonomously because the OS prevents harmful actions regardless of model behavior.

The hooks system adds event-driven lifecycle control: PreToolUse (validate inputs, block dangerous commands), PostToolUse (audit outputs), UserPromptSubmit (inject context), Stop (validate results). This gives teams programmatic control without modifying the agent itself — hooks are deterministic scripts that never enter the context window.

Every file edit creates a snapshot checkpoint enabling instant rollback. Git worktrees provide additional isolation for parallel work — 3–4 concurrent tasks with independent Claude Code sessions and no context pollution.

07

MCP, extended thinking, and git

The extensibility layer, the reasoning engine, and the state persistence mechanism that complete the system.

MCP

Model Context Protocol

"USB-C for AI" — a client-server architecture where Claude Code connects to external servers exposing tools, resources, and prompts. Three transports: stdio (local), SSE (remote), HTTP. Donated to the Linux Foundation's Agentic AI Foundation in March 2025.

Extended Thinking

Adaptive Reasoning

Enabled by default. On Opus 4.6 / Sonnet 4.6, thinking tokens scale dynamically with task complexity. Interleaved thinking between tool calls provides reasoning continuity. Thinking blocks are passed back unchanged for cache optimization.

Git

First-Class Integration

Creates commits with meaningful messages, manages branches, handles selective staging. Commits include Co-Authored-By trailers. Git worktrees enable 3–4 parallel tasks without context pollution.

Pattern

Initializer + Coding Agent

For long-running tasks: an initializer agent creates a feature list (JSON, not Markdown), writes init.sh, and creates a progress file. Subsequent coding agents read git logs and progress files to orient themselves.

Cost

$6/dev/day Average

90% of users below $12. Key efficiency: prompt caching, on-demand tool loading (98.7% token reduction), subagent isolation, model routing (Haiku for lightweight tasks). /clear and /compact for user-controlled optimization.

Insight

90% Self-Written

Claude Code wrote roughly 90% of its own codebase — a recursive testament to the architecture. The choice of TypeScript ("on distribution") means the model already knows the technology well enough to maintain itself.

Three principles emerge from Anthropic's published engineering work: Tools are the agent's true interface — tool design has outsized, measurable effects on performance compared to prompt tweaks. Context rot is the primary failure mode for long-running agents, and the solution is architectural (subagents, compaction, structured note-taking) not algorithmic. Safety requires OS-level enforcement — bubblewrap/seatbelt sandboxing and network proxies reduced permission friction by 84% while maintaining real security boundaries.