-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Description
GitHub Issue: MCP Server Recursive Startup and Resource Leaks
🚨 Bug Report: Critical MCP Server Management Issues
Issue Summary
The MCP (Model Context Protocol) server management system exhibits recursive startup behavior, resource leaks, and race conditions that can lead to system instability, especially during development workflows.
Environment
- Affected Files:
src/services/mcp/McpHub.ts
src/services/mcp/McpServerManager.ts
src/core/task/Task.ts
🔍 Critical Issues Identified
Issue 1: File Watcher Infinite Restart Loop
File: src/services/mcp/McpHub.ts
lines 1109-1170
Problem: File watchers create infinite server restart loops during development.
Code Location:
// Lines 1131-1134 in setupFileWatcher()
watchPathsWatcher.on("change", async (changedPath) => {
try {
await this.restartConnection(name, source) // ← Triggers infinite loop
} catch (error) {
console.error(`Failed to restart server ${name} after change in ${changedPath}:`, error)
}
})
// Line 654 in connectToServer() - called by restartConnection
this.setupFileWatcher(name, config, source) // ← Creates NEW watchers
Reproduction:
- Start MCP server that watches
build/index.js
- Rebuild the server file repeatedly (e.g., during development)
- Each file change triggers
restartConnection()
→connectToServer()
→setupFileWatcher()
→ new watchers - Infinite restart loop begins
Issue 2: Singleton Pattern Race Condition
File: src/services/mcp/McpServerManager.ts
lines 20-51
Problem: Multiple MCP hub instances can be created simultaneously due to race conditions.
Code Location:
// Lines 37-39 - Race condition window
if (!this.instance) {
this.instance = new McpHub(provider) // ← Multiple threads can reach here
await context.globalState.update(this.GLOBAL_STATE_KEY, Date.now().toString())
}
Race Condition Scenario:
- Task 1 calls
getInstance()
→ checksinstance=null
→ enters creation - Task 2 calls
getInstance()
→ also checksinstance=null
→ also enters creation - Result: Two MCP hubs created, both trying to connect to same servers
Issue 3: Process Cleanup Race Condition
File: src/services/mcp/McpHub.ts
lines 1012-1038
Problem: Previous MCP server processes are not properly terminated before new ones start.
Code Location:
// Lines 1021-1030 in deleteConnection()
for (const connection of connections) {
try {
if (connection.type === "connected") {
await connection.transport.close() // ← Async, might fail silently
await connection.client.close()
}
} catch (error) {
console.error(`Failed to close transport for ${name}:`, error)
// ← Continues despite failure, leaving processes running
}
}
// Lines 1205-1206 in restartConnection() - gap between cleanup and restart
await this.deleteConnection(serverName, connection.server.source)
// ... parsing happens here ...
await this.connectToServer(serverName, validatedConfig, connection.server.source || "global")
// ← New process starts before old one fully terminated
Issue 4: Concurrent Initialization Chaos
File: src/services/mcp/McpHub.ts
lines 157-164
Problem: Multiple async initialization paths run simultaneously without coordination.
Code Location:
// McpHub constructor - all run concurrently
constructor(provider: ClineProvider) {
// ...
this.watchMcpSettingsFile() // ← Async - sets up file watchers
this.watchProjectMcpFile().catch(console.error) // ← Async - sets up file watchers
this.setupWorkspaceFoldersWatcher() // ← Sets up workspace watchers
this.initializeGlobalMcpServers() // ← Async - connects to servers
this.initializeProjectMcpServers() // ← Async - connects to servers
}
Race Condition: File watchers start firing events before server initialization completes.
Issue 5: Task-Level MCP Access Competition
File: src/core/task/Task.ts
lines 2129-2130
Problem: Multiple tasks simultaneously request MCP hub access, triggering singleton race conditions.
Code Location:
// getSystemPrompt() called for every API request
mcpHub = await McpServerManager.getInstance(provider.context, provider)
// Lines 2136-2139 - timeout behavior creates cycles
await pWaitFor(() => !mcpHub!.isConnecting, { timeout: 10_000 }).catch(() => {
console.error("MCP servers failed to connect in time")
})
Problem: If servers are in restart loop, isConnecting
never clears → tasks timeout and retry.
🔧 Proposed Solutions
Solution 1: File Watcher Debouncing with Backoff
Add exponential backoff to prevent immediate restarts:
// Add to McpHub class
private restartDebounceTimers: Map<string, NodeJS.Timeout> = new Map()
private restartAttempts: Map<string, { count: number; lastAttempt: number }> = new Map()
private async debouncedRestart(serverName: string, source: string): Promise<void> {
const key = `${serverName}-${source}`
// Clear existing timer
const existingTimer = this.restartDebounceTimers.get(key)
if (existingTimer) {
clearTimeout(existingTimer)
}
// Implement exponential backoff
const attempts = this.restartAttempts.get(key) || { count: 0, lastAttempt: 0 }
const now = Date.now()
const timeSinceLastAttempt = now - attempts.lastAttempt
// Reset count if enough time has passed (1 minute)
if (timeSinceLastAttempt > 60000) {
attempts.count = 0
}
attempts.count++
attempts.lastAttempt = now
this.restartAttempts.set(key, attempts)
// Exponential backoff: 1s, 2s, 4s, 8s, max 30s
const backoffDelay = Math.min(1000 * Math.pow(2, attempts.count - 1), 30000)
const timer = setTimeout(async () => {
this.restartDebounceTimers.delete(key)
await this.restartConnection(serverName, source)
}, backoffDelay)
this.restartDebounceTimers.set(key, timer)
}
// Modify file watcher to use debounced restart
watchPathsWatcher.on("change", async (changedPath) => {
await this.debouncedRestart(name, source) // ← Use debounced version
})
Solution 2: Thread-Safe Singleton
Fix race condition in McpServerManager
:
static async getInstance(context: vscode.ExtensionContext, provider: ClineProvider): Promise<McpHub> {
// If we already have an instance, register provider and return
if (this.instance) {
this.providers.add(provider)
return this.instance
}
// If initialization is in progress, wait for it
if (this.initializationPromise) {
await this.initializationPromise
this.providers.add(provider)
return this.instance!
}
// Create initialization promise (atomic operation)
this.initializationPromise = (async () => {
try {
if (!this.instance) { // Double-check with promise protection
this.instance = new McpHub(provider)
await context.globalState.update(this.GLOBAL_STATE_KEY, Date.now().toString())
}
return this.instance
} catch (error) {
this.instance = null // Clean up on failure
throw error
} finally {
this.initializationPromise = null
}
})()
const result = await this.initializationPromise
this.providers.add(provider)
return result
}
Solution 3: Connection State Management
Add locking to prevent concurrent operations:
// Add to McpHub class
private connectionStates: Map<string, 'idle' | 'connecting' | 'disconnecting'> = new Map()
private connectionLocks: Map<string, Promise<void>> = new Map()
private async withConnectionLock<T>(serverName: string, source: string, operation: () => Promise<T>): Promise<T> {
const key = `${serverName}-${source}`
// Wait for any existing operation
const existingLock = this.connectionLocks.get(key)
if (existingLock) {
await existingLock
}
// Create new lock
const lockPromise = (async () => {
try {
this.connectionStates.set(key, 'connecting')
return await operation()
} finally {
this.connectionStates.set(key, 'idle')
this.connectionLocks.delete(key)
}
})()
this.connectionLocks.set(key, lockPromise)
return lockPromise
}
Solution 4: Process Termination Verification
Ensure previous processes are fully terminated:
private async ensureProcessTerminated(transport: any): Promise<void> {
const process = transport._process || transport.process
if (process && !process.killed) {
process.kill('SIGTERM')
// Wait up to 3 seconds for graceful termination
await new Promise<void>((resolve) => {
const timeout = setTimeout(() => {
if (!process.killed) {
process.kill('SIGKILL') // Force kill
}
resolve()
}, 3000)
process.on('exit', () => {
clearTimeout(timeout)
resolve()
})
})
}
}
⚠️ Impact Assessment
Current Impact:
- Resource exhaustion from multiple MCP server processes
- Port conflicts from competing server instances
- System instability during development workflows
- Extension crashes from cascading failures
Priority: CRITICAL - Affects core MCP functionality and development experience
Affected Use Cases:
- MCP server development with file watching
- Multiple concurrent tasks using MCP servers
- Extension stability during server restarts
🧪 Testing Recommendations
- File Watcher Test: Create MCP server that watches a file, modify it rapidly, verify no infinite loops
- Concurrent Access Test: Start multiple tasks simultaneously, verify only one MCP hub created
- Process Cleanup Test: Restart MCP servers, verify old processes are terminated
- Resource Leak Test: Run for extended period, monitor process/memory usage
📝 Implementation Checklist
- Implement file watcher debouncing with exponential backoff
- Fix singleton pattern thread safety in McpServerManager
- Add connection state locking mechanism
- Implement process termination verification
- Add sequential initialization coordination
- Create comprehensive test suite for MCP server lifecycle
- Add monitoring/logging for debugging recursive issues
Labels: bug
, critical
, mcp
, resource-leak
, race-condition
Metadata
Metadata
Assignees
Labels
Type
Projects
Status