The agentic loop is just a while loop
Claude Code's entire architecture reduces to a single-threaded read-eval-print loop. The model decides what to do next, calls tools, gets results, and repeats until done.
while (response.has_tool_calls):
results = execute_tools(response.tool_calls)
response = call_claude(messages + results)
return response.text
This simplicity is deliberate. Anthropic distinguishes between workflows (predefined code paths orchestrating LLMs) and agents (LLMs dynamically directing their own tool usage). Claude Code is firmly an agent — the harness just executes tool calls and manages context.
The loop maintains a flat message history with no threading or branching. An asynchronous dual-buffer queue runs alongside, enabling pause/resume and user interjections mid-task. Users can steer Claude in real time while it works.
Click each step to explore
User Input
Receive prompt + load CLAUDE.md, env context, git state
LLM Inference
Send system prompt + tools + history → Claude responds
Tool Execution
Execute tool calls, check permissions, return results
Loop or Return
More tool calls? Loop back. Text only? Return to user.
error_max_turns, error_max_budget_usd, error_during_execution. Budget limits and turn limits prevent runaway sessions. If a tool is denied permission, Claude receives a rejection message and pivots to a different approach.
Claude Code also uses multiple models strategically. Sonnet or Opus handles the main workflow, while Haiku handles lightweight tasks: topic detection, web content processing, quota checks, and exploration subagent work. The tech stack is TypeScript + React with Ink (terminal UI renderer) — chosen because the model already knows TS well, enabling Claude Code to maintain itself.
Twenty tools define the action space
The team constantly minimizes tool count. Each tool definition has a name, JSON schema, permission requirements, risk classification, and a read-only annotation for parallelism.
Anthropic's engineering blog emphasizes: tools should target specific high-impact workflows, not atomic operations. Tool descriptions are extensively detailed because even small wording changes measurably affect performance. In one case, Claude was appending "2025" to web search queries — fixed entirely by improving the tool description. Letting Claude Code itself rewrite tool descriptions yielded a 40% decrease in task completion time.
Click any tool to expand →
Read
Read files with line offsets and image support
Edit
Surgical search-and-replace requiring unique context strings
Write / MultiEdit
Full file creation and batched edit operations
Grep / Glob / LS
Regex search, file pattern matching, directory listing
Bash
Persistent shell session with command risk classification
WebSearch / WebFetch
Search the web and fetch page content via Haiku summarization
Agent / Task
Spawn subagents with isolated context windows
TodoWrite / ThinkTool
Structured task lists and side-effect-free reasoning steps
ToolSearch
Dynamic on-demand tool discovery from filesystem + MCP
Context is the real engineering challenge
Anthropic coined "context engineering" to describe this — curating the optimal token set during inference. It's the most sophisticated part of the system.
Context Window Allocation (~200K tokens)
Auto-compaction triggers at ~92–95% of context capacity. Sonnet summarizes older history, clears verbose tool outputs, preserves recent exchanges. CLAUDE.md survives because it's reloaded from disk after every compact.
There's a known weakness: compaction creates a feedback loop. After summarization, the agent re-reads files to recover lost context, generating more tokens, triggering more compaction. Sessions that compact once tend to compact 3–5 more times in succession, with each summary drifting from original details.
Subagents are the primary isolation mechanism. A subagent might consume tens of thousands of tokens during exploration but returns only a 1–2K token condensed summary to the parent — preventing verbose operations from bloating the main conversation.
Progressive Disclosure
Agents incrementally discover context through exploration (grep, glob, file reads) rather than loading everything upfront. Static context gets cached; dynamic context is retrieved just-in-time.
Git as State Layer
For multi-session workflows, git provides a log of work done plus checkpoints. An "initializer agent" creates a feature list in JSON (not Markdown — the model is less likely to modify JSON inappropriately).
5-Level Context Loading
Always resident (CLAUDE.md) → Path-loaded (language rules) → On-demand (skills) → Isolated (subagent exploration) → Never in context (hooks — deterministic scripts outside the agent).
Layered system prompt architecture
Four prompt layers balance static instructions with dynamic environmental context. The key insight: find the "Goldilocks zone" between overly rigid and overly vague.
System Workflow Prompt
System Reminder Start
System Reminder End
System Compact Prompt
Custom instructions layer on top through CLAUDE.md files (project root, user-level, global), .claude/rules/*.md files, and the --append-system-prompt CLI flag. CLAUDE.md content is injected as system reminders in every message, including subagent conversations.
// Default to action — don't just suggest
"Implement changes rather than only suggesting them."
// Parallel tool calling
"If calls have no dependencies, make all independent calls in parallel."
// Reversibility guard (Opus-specific)
"Take local, reversible actions freely. For hard-to-reverse actions, ask first."
// Anti-over-engineering (Opus-specific)
"Only make changes that are directly requested or clearly necessary."
Prompt-caching insight: The ordering System → Tools → History → Input is deliberate because caching works by prefix matching. Dynamic data (current time, git state) goes in later messages to preserve cache hits. Switching between models rebuilds the entire cache.
Architectural tradeoffs
Every design choice in an agent system trades off against something else. These are the decisions that shaped Claude Code's architecture.
| Decision | Chose | Alternative | Why |
|---|---|---|---|
| Flat message history | Single thread, no branching | Tree-structured conversations | Debuggability. Linear traces are dramatically easier to inspect and replay than branching histories. |
| Tool count minimization | ~20 well-designed tools | Granular micro-tools | Each tool adds decision overhead. Fewer tools = higher accuracy in tool selection. Delete tools on every model upgrade. |
| Subagent isolation | No recursive spawning | Allow nested subagents | Prevents explosion. One-directional communication (prompt in, summary out) keeps context manageable. |
| Compaction strategy | Lossy summarization | Sliding window / RAG | Summarization preserves semantic meaning better than truncation. Known feedback loop tradeoff is accepted. |
| Shell implementation | Snapshot + transient | Persistent process / per-command | 3rd iteration. Persistent had bottlenecks with batch commands; per-command lost state. Snapshot captures aliases once, sources before each command. |
| Codebase language | TypeScript | Python, Rust | "On distribution" — the model already knows TS. Claude Code can maintain itself. React+Ink for terminal UI. |
| Context retrieval | Hybrid: cached static + JIT dynamic | Full RAG / preloading | CLAUDE.md is cached upfront. Code discovery via grep/glob is just-in-time. Avoids loading entire codebases. |
Multi-context > single-context
Anthropic's multi-agent research showed spreading reasoning across multiple context windows outperformed single-agent by 90%+ — at ~15x the token cost.
Agent-written tool descriptions
Letting Claude Code analyze eval transcripts and rewrite its own tool descriptions yielded 40% faster task completion — the model knows its own failure modes.
Safety requires OS-level enforcement
Model-level instructions alone aren't enough. Claude Code's permission system operates in three layers, backed by OS sandboxing that reduced permission prompts by 84%.
Static Configuration
Permission Rules
Dynamic Approval
OS Sandboxing
The hooks system adds event-driven lifecycle control: PreToolUse (validate inputs, block dangerous commands), PostToolUse (audit outputs), UserPromptSubmit (inject context), Stop (validate results). This gives teams programmatic control without modifying the agent itself — hooks are deterministic scripts that never enter the context window.
Every file edit creates a snapshot checkpoint enabling instant rollback. Git worktrees provide additional isolation for parallel work — 3–4 concurrent tasks with independent Claude Code sessions and no context pollution.
MCP, extended thinking, and git
The extensibility layer, the reasoning engine, and the state persistence mechanism that complete the system.
Model Context Protocol
"USB-C for AI" — a client-server architecture where Claude Code connects to external servers exposing tools, resources, and prompts. Three transports: stdio (local), SSE (remote), HTTP. Donated to the Linux Foundation's Agentic AI Foundation in March 2025.
Adaptive Reasoning
Enabled by default. On Opus 4.6 / Sonnet 4.6, thinking tokens scale dynamically with task complexity. Interleaved thinking between tool calls provides reasoning continuity. Thinking blocks are passed back unchanged for cache optimization.
First-Class Integration
Creates commits with meaningful messages, manages branches, handles selective staging. Commits include Co-Authored-By trailers. Git worktrees enable 3–4 parallel tasks without context pollution.
Initializer + Coding Agent
For long-running tasks: an initializer agent creates a feature list (JSON, not Markdown), writes init.sh, and creates a progress file. Subsequent coding agents read git logs and progress files to orient themselves.
$6/dev/day Average
90% of users below $12. Key efficiency: prompt caching, on-demand tool loading (98.7% token reduction), subagent isolation, model routing (Haiku for lightweight tasks). /clear and /compact for user-controlled optimization.
90% Self-Written
Claude Code wrote roughly 90% of its own codebase — a recursive testament to the architecture. The choice of TypeScript ("on distribution") means the model already knows the technology well enough to maintain itself.
Three principles emerge from Anthropic's published engineering work: Tools are the agent's true interface — tool design has outsized, measurable effects on performance compared to prompt tweaks. Context rot is the primary failure mode for long-running agents, and the solution is architectural (subagents, compaction, structured note-taking) not algorithmic. Safety requires OS-level enforcement — bubblewrap/seatbelt sandboxing and network proxies reduced permission friction by 84% while maintaining real security boundaries.