Codebase Curator implements a unique "Two-Claude" architecture where one Claude instance helps another understand codebases. This document explains how it works and why it's powerful.
┌─────────────────┐ MCP Protocol ┌──────────────────┐
│ │ ◄─────────────────────► │ │
│ Coding Claude │ │ Codebase Curator │
│ (You in Code) │ │ (MCP Server) │
│ │ │ │
└─────────────────┘ └────────┬─────────┘
│
│ Spawns
▼
┌──────────────────┐
│ │
│ Curator Claude │
│ (Codebase Expert)│
│ │
└──────────────────┘
When you use a Codebase Curator tool in Claude Code:
- You ask a question through MCP tools
- MCP Server receives the request
- CuratorService orchestrates the response
- CuratorProcessService spawns a Claude CLI instance
- Curator Claude analyzes your codebase
- Response flows back to you
Curator Claude isn't just another Claude - it's given specialized prompts that make it an expert at codebase analysis:
// From CuratorPrompts.ts
"You're Curator Claude, living in the MCP server.
Another Claude needs your help understanding this codebase.
You know exactly what they need..."
Curator Claude has limited but powerful tools:
- ✅ Read, Grep, Glob, LS - For exploration
- ✅ Bash - For running smart grep
- ✅ Limited Write - Only to
.curator/
directory - ❌ No external access
- ❌ No code execution
This keeps it focused and safe.
The main orchestrator that:
- Detects question types (overview, feature, change)
- Manages the curator process
- Handles session persistence
Manages the Claude CLI subprocess:
- Spawns Claude with proper arguments
- Handles dynamic timeouts
- Manages session continuity
- Parses streaming JSON responses
Tracks conversation history:
- Stores metadata about questions asked
- Maintains session continuity
- Provides history for context
Semantic code understanding:
- Parses code structure (not just text)
- Tracks usage counts
- Shows cross-references
- Organized by type (functions, classes, etc.)
One of the key innovations is maintaining context across questions:
First Question → Creates session → Explores codebase
↓
Saves session ID
↓
Next Question → Resumes session → Uses existing knowledge
- Immutable Sessions: Each interaction creates a new session ID
- Context Preserved: The conversation history is maintained
- Cache Benefits: Anthropic's API caches the context
- Claude Sessions:
.curator/session.txt
(UUID format) - Metadata:
~/.codebase-curator/sessions/
(JSON files)
Different operations need different time:
'Task': 600000, // 10 minutes - complex analysis
'Bash': 300000, // 5 minutes - smart grep
'Read': 120000, // 2 minutes - file reading
'LS': 60000, // 1 minute - directory listing
- Files are streamed, never fully loaded
- Batching prevents memory overload
- Bun's native APIs for performance
- Semantic understanding vs text matching
- Cached indexes for repeat searches
- Language-aware parsing
- Curator Claude runs in a separate process
- Limited tool access
- No access to your main environment
- Read access to project only
- Write access only to
.curator/
directory - Respects
.gitignore
patterns
- No internet access
- No arbitrary code execution
- Safe for sensitive codebases
- Incremental Indexing - Only reindex changed files
- Multi-Language Support - Beyond TypeScript
- Team Knowledge Sharing - Shared curator sessions
- IDE Integration - Beyond Claude Code
- Language extractors in
src/semantic/extractors/
- New MCP tools in
src/mcp/server.ts
- Custom prompts in
src/core/CuratorPrompts.ts
- Specialized Expertise - Curator Claude becomes a codebase expert
- Persistent Context - Knowledge builds over time
- Separation of Concerns - Coding vs Understanding
- Scalability - Can handle massive codebases
- Complexity - Two-process architecture
- Latency - Process spawning overhead
- Resource Usage - Two Claude instances
The Two-Claude architecture enables a new paradigm: instead of Claude starting fresh with each question, you get a persistent codebase expert that truly understands your code. This leads to better suggestions, faster responses, and code that actually fits your patterns.