Skip to content

[bug] MCP Server Recursive Startup and Resource Leaks #1986

@OleksandrKucherenko

Description

@OleksandrKucherenko

Description

GitHub Issue: MCP Server Recursive Startup and Resource Leaks

🚨 Bug Report: Critical MCP Server Management Issues

Issue Summary

The MCP (Model Context Protocol) server management system exhibits recursive startup behavior, resource leaks, and race conditions that can lead to system instability, especially during development workflows.

Environment

  • Affected Files:
    • src/services/mcp/McpHub.ts
    • src/services/mcp/McpServerManager.ts
    • src/core/task/Task.ts

🔍 Critical Issues Identified


Issue 1: File Watcher Infinite Restart Loop

File: src/services/mcp/McpHub.ts lines 1109-1170

Problem: File watchers create infinite server restart loops during development.

Code Location:

// Lines 1131-1134 in setupFileWatcher()
watchPathsWatcher.on("change", async (changedPath) => {
    try {
        await this.restartConnection(name, source) // ← Triggers infinite loop
    } catch (error) {
        console.error(`Failed to restart server ${name} after change in ${changedPath}:`, error)
    }
})

// Line 654 in connectToServer() - called by restartConnection
this.setupFileWatcher(name, config, source) // ← Creates NEW watchers

Reproduction:

  1. Start MCP server that watches build/index.js
  2. Rebuild the server file repeatedly (e.g., during development)
  3. Each file change triggers restartConnection()connectToServer()setupFileWatcher() → new watchers
  4. Infinite restart loop begins

Issue 2: Singleton Pattern Race Condition

File: src/services/mcp/McpServerManager.ts lines 20-51

Problem: Multiple MCP hub instances can be created simultaneously due to race conditions.

Code Location:

// Lines 37-39 - Race condition window
if (!this.instance) {
    this.instance = new McpHub(provider) // ← Multiple threads can reach here
    await context.globalState.update(this.GLOBAL_STATE_KEY, Date.now().toString())
}

Race Condition Scenario:

  • Task 1 calls getInstance() → checks instance=null → enters creation
  • Task 2 calls getInstance() → also checks instance=null → also enters creation
  • Result: Two MCP hubs created, both trying to connect to same servers

Issue 3: Process Cleanup Race Condition

File: src/services/mcp/McpHub.ts lines 1012-1038

Problem: Previous MCP server processes are not properly terminated before new ones start.

Code Location:

// Lines 1021-1030 in deleteConnection()
for (const connection of connections) {
    try {
        if (connection.type === "connected") {
            await connection.transport.close() // ← Async, might fail silently
            await connection.client.close()
        }
    } catch (error) {
        console.error(`Failed to close transport for ${name}:`, error)
        // ← Continues despite failure, leaving processes running
    }
}

// Lines 1205-1206 in restartConnection() - gap between cleanup and restart
await this.deleteConnection(serverName, connection.server.source)
// ... parsing happens here ...
await this.connectToServer(serverName, validatedConfig, connection.server.source || "global")
// ← New process starts before old one fully terminated

Issue 4: Concurrent Initialization Chaos

File: src/services/mcp/McpHub.ts lines 157-164

Problem: Multiple async initialization paths run simultaneously without coordination.

Code Location:

// McpHub constructor - all run concurrently
constructor(provider: ClineProvider) {
    // ...
    this.watchMcpSettingsFile()           // ← Async - sets up file watchers
    this.watchProjectMcpFile().catch(console.error) // ← Async - sets up file watchers  
    this.setupWorkspaceFoldersWatcher()   // ← Sets up workspace watchers
    this.initializeGlobalMcpServers()     // ← Async - connects to servers
    this.initializeProjectMcpServers()    // ← Async - connects to servers
}

Race Condition: File watchers start firing events before server initialization completes.


Issue 5: Task-Level MCP Access Competition

File: src/core/task/Task.ts lines 2129-2130

Problem: Multiple tasks simultaneously request MCP hub access, triggering singleton race conditions.

Code Location:

// getSystemPrompt() called for every API request
mcpHub = await McpServerManager.getInstance(provider.context, provider)

// Lines 2136-2139 - timeout behavior creates cycles
await pWaitFor(() => !mcpHub!.isConnecting, { timeout: 10_000 }).catch(() => {
    console.error("MCP servers failed to connect in time")
})

Problem: If servers are in restart loop, isConnecting never clears → tasks timeout and retry.


🔧 Proposed Solutions

Solution 1: File Watcher Debouncing with Backoff

Add exponential backoff to prevent immediate restarts:

// Add to McpHub class
private restartDebounceTimers: Map<string, NodeJS.Timeout> = new Map()
private restartAttempts: Map<string, { count: number; lastAttempt: number }> = new Map()

private async debouncedRestart(serverName: string, source: string): Promise<void> {
    const key = `${serverName}-${source}`
    
    // Clear existing timer
    const existingTimer = this.restartDebounceTimers.get(key)
    if (existingTimer) {
        clearTimeout(existingTimer)
    }
    
    // Implement exponential backoff
    const attempts = this.restartAttempts.get(key) || { count: 0, lastAttempt: 0 }
    const now = Date.now()
    const timeSinceLastAttempt = now - attempts.lastAttempt
    
    // Reset count if enough time has passed (1 minute)
    if (timeSinceLastAttempt > 60000) {
        attempts.count = 0
    }
    
    attempts.count++
    attempts.lastAttempt = now
    this.restartAttempts.set(key, attempts)
    
    // Exponential backoff: 1s, 2s, 4s, 8s, max 30s
    const backoffDelay = Math.min(1000 * Math.pow(2, attempts.count - 1), 30000)
    
    const timer = setTimeout(async () => {
        this.restartDebounceTimers.delete(key)
        await this.restartConnection(serverName, source)
    }, backoffDelay)
    
    this.restartDebounceTimers.set(key, timer)
}

// Modify file watcher to use debounced restart
watchPathsWatcher.on("change", async (changedPath) => {
    await this.debouncedRestart(name, source) // ← Use debounced version
})

Solution 2: Thread-Safe Singleton

Fix race condition in McpServerManager:

static async getInstance(context: vscode.ExtensionContext, provider: ClineProvider): Promise<McpHub> {
    // If we already have an instance, register provider and return
    if (this.instance) {
        this.providers.add(provider)
        return this.instance
    }

    // If initialization is in progress, wait for it
    if (this.initializationPromise) {
        await this.initializationPromise
        this.providers.add(provider)
        return this.instance!
    }

    // Create initialization promise (atomic operation)
    this.initializationPromise = (async () => {
        try {
            if (!this.instance) { // Double-check with promise protection
                this.instance = new McpHub(provider)
                await context.globalState.update(this.GLOBAL_STATE_KEY, Date.now().toString())
            }
            return this.instance
        } catch (error) {
            this.instance = null // Clean up on failure
            throw error
        } finally {
            this.initializationPromise = null
        }
    })()

    const result = await this.initializationPromise
    this.providers.add(provider)
    return result
}

Solution 3: Connection State Management

Add locking to prevent concurrent operations:

// Add to McpHub class
private connectionStates: Map<string, 'idle' | 'connecting' | 'disconnecting'> = new Map()
private connectionLocks: Map<string, Promise<void>> = new Map()

private async withConnectionLock<T>(serverName: string, source: string, operation: () => Promise<T>): Promise<T> {
    const key = `${serverName}-${source}`
    
    // Wait for any existing operation
    const existingLock = this.connectionLocks.get(key)
    if (existingLock) {
        await existingLock
    }
    
    // Create new lock
    const lockPromise = (async () => {
        try {
            this.connectionStates.set(key, 'connecting')
            return await operation()
        } finally {
            this.connectionStates.set(key, 'idle')
            this.connectionLocks.delete(key)
        }
    })()
    
    this.connectionLocks.set(key, lockPromise)
    return lockPromise
}

Solution 4: Process Termination Verification

Ensure previous processes are fully terminated:

private async ensureProcessTerminated(transport: any): Promise<void> {
    const process = transport._process || transport.process
    if (process && !process.killed) {
        process.kill('SIGTERM')
        
        // Wait up to 3 seconds for graceful termination
        await new Promise<void>((resolve) => {
            const timeout = setTimeout(() => {
                if (!process.killed) {
                    process.kill('SIGKILL') // Force kill
                }
                resolve()
            }, 3000)
            
            process.on('exit', () => {
                clearTimeout(timeout)
                resolve()
            })
        })
    }
}

⚠️ Impact Assessment

Current Impact:

  • Resource exhaustion from multiple MCP server processes
  • Port conflicts from competing server instances
  • System instability during development workflows
  • Extension crashes from cascading failures

Priority: CRITICAL - Affects core MCP functionality and development experience

Affected Use Cases:

  • MCP server development with file watching
  • Multiple concurrent tasks using MCP servers
  • Extension stability during server restarts

🧪 Testing Recommendations

  1. File Watcher Test: Create MCP server that watches a file, modify it rapidly, verify no infinite loops
  2. Concurrent Access Test: Start multiple tasks simultaneously, verify only one MCP hub created
  3. Process Cleanup Test: Restart MCP servers, verify old processes are terminated
  4. Resource Leak Test: Run for extended period, monitor process/memory usage

📝 Implementation Checklist

  • Implement file watcher debouncing with exponential backoff
  • Fix singleton pattern thread safety in McpServerManager
  • Add connection state locking mechanism
  • Implement process termination verification
  • Add sequential initialization coordination
  • Create comprehensive test suite for MCP server lifecycle
  • Add monitoring/logging for debugging recursive issues

Labels: bug, critical, mcp, resource-leak, race-condition

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    Status

    Issue [Needs Approval]

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions