[bug] MCP Server Recursive Startup and Resource Leaks

### Description

# GitHub Issue: MCP Server Recursive Startup and Resource Leaks

## **🚨 Bug Report: Critical MCP Server Management Issues**

### **Issue Summary**
The MCP (Model Context Protocol) server management system exhibits recursive startup behavior, resource leaks, and race conditions that can lead to system instability, especially during development workflows.

### **Environment**
- **Affected Files:**
  - `src/services/mcp/McpHub.ts`
  - `src/services/mcp/McpServerManager.ts`
  - `src/core/task/Task.ts`

### **🔍 Critical Issues Identified**

---

## **Issue 1: File Watcher Infinite Restart Loop**

**File:** `src/services/mcp/McpHub.ts` lines 1109-1170

**Problem:** File watchers create infinite server restart loops during development.

**Code Location:**
```typescript
// Lines 1131-1134 in setupFileWatcher()
watchPathsWatcher.on("change", async (changedPath) => {
    try {
        await this.restartConnection(name, source) // ← Triggers infinite loop
    } catch (error) {
        console.error(`Failed to restart server ${name} after change in ${changedPath}:`, error)
    }
})

// Line 654 in connectToServer() - called by restartConnection
this.setupFileWatcher(name, config, source) // ← Creates NEW watchers
```

**Reproduction:**
1. Start MCP server that watches `build/index.js`
2. Rebuild the server file repeatedly (e.g., during development)
3. Each file change triggers `restartConnection()` → `connectToServer()` → `setupFileWatcher()` → new watchers
4. Infinite restart loop begins

---

## **Issue 2: Singleton Pattern Race Condition**

**File:** `src/services/mcp/McpServerManager.ts` lines 20-51

**Problem:** Multiple MCP hub instances can be created simultaneously due to race conditions.

**Code Location:**
```typescript
// Lines 37-39 - Race condition window
if (!this.instance) {
    this.instance = new McpHub(provider) // ← Multiple threads can reach here
    await context.globalState.update(this.GLOBAL_STATE_KEY, Date.now().toString())
}
```

**Race Condition Scenario:**
- Task 1 calls `getInstance()` → checks `instance=null` → enters creation
- Task 2 calls `getInstance()` → also checks `instance=null` → also enters creation
- **Result:** Two MCP hubs created, both trying to connect to same servers

---

## **Issue 3: Process Cleanup Race Condition**

**File:** `src/services/mcp/McpHub.ts` lines 1012-1038

**Problem:** Previous MCP server processes are not properly terminated before new ones start.

**Code Location:**
```typescript
// Lines 1021-1030 in deleteConnection()
for (const connection of connections) {
    try {
        if (connection.type === "connected") {
            await connection.transport.close() // ← Async, might fail silently
            await connection.client.close()
        }
    } catch (error) {
        console.error(`Failed to close transport for ${name}:`, error)
        // ← Continues despite failure, leaving processes running
    }
}

// Lines 1205-1206 in restartConnection() - gap between cleanup and restart
await this.deleteConnection(serverName, connection.server.source)
// ... parsing happens here ...
await this.connectToServer(serverName, validatedConfig, connection.server.source || "global")
// ← New process starts before old one fully terminated
```

---

## **Issue 4: Concurrent Initialization Chaos**

**File:** `src/services/mcp/McpHub.ts` lines 157-164

**Problem:** Multiple async initialization paths run simultaneously without coordination.

**Code Location:**
```typescript
// McpHub constructor - all run concurrently
constructor(provider: ClineProvider) {
    // ...
    this.watchMcpSettingsFile()           // ← Async - sets up file watchers
    this.watchProjectMcpFile().catch(console.error) // ← Async - sets up file watchers  
    this.setupWorkspaceFoldersWatcher()   // ← Sets up workspace watchers
    this.initializeGlobalMcpServers()     // ← Async - connects to servers
    this.initializeProjectMcpServers()    // ← Async - connects to servers
}
```

**Race Condition:** File watchers start firing events before server initialization completes.

---

## **Issue 5: Task-Level MCP Access Competition**

**File:** `src/core/task/Task.ts` lines 2129-2130

**Problem:** Multiple tasks simultaneously request MCP hub access, triggering singleton race conditions.

**Code Location:**
```typescript
// getSystemPrompt() called for every API request
mcpHub = await McpServerManager.getInstance(provider.context, provider)

// Lines 2136-2139 - timeout behavior creates cycles
await pWaitFor(() => !mcpHub!.isConnecting, { timeout: 10_000 }).catch(() => {
    console.error("MCP servers failed to connect in time")
})
```

**Problem:** If servers are in restart loop, `isConnecting` never clears → tasks timeout and retry.

---

## **🔧 Proposed Solutions**

### **Solution 1: File Watcher Debouncing with Backoff**
Add exponential backoff to prevent immediate restarts:

```typescript
// Add to McpHub class
private restartDebounceTimers: Map<string, NodeJS.Timeout> = new Map()
private restartAttempts: Map<string, { count: number; lastAttempt: number }> = new Map()

private async debouncedRestart(serverName: string, source: string): Promise<void> {
    const key = `${serverName}-${source}`
    
    // Clear existing timer
    const existingTimer = this.restartDebounceTimers.get(key)
    if (existingTimer) {
        clearTimeout(existingTimer)
    }
    
    // Implement exponential backoff
    const attempts = this.restartAttempts.get(key) || { count: 0, lastAttempt: 0 }
    const now = Date.now()
    const timeSinceLastAttempt = now - attempts.lastAttempt
    
    // Reset count if enough time has passed (1 minute)
    if (timeSinceLastAttempt > 60000) {
        attempts.count = 0
    }
    
    attempts.count++
    attempts.lastAttempt = now
    this.restartAttempts.set(key, attempts)
    
    // Exponential backoff: 1s, 2s, 4s, 8s, max 30s
    const backoffDelay = Math.min(1000 * Math.pow(2, attempts.count - 1), 30000)
    
    const timer = setTimeout(async () => {
        this.restartDebounceTimers.delete(key)
        await this.restartConnection(serverName, source)
    }, backoffDelay)
    
    this.restartDebounceTimers.set(key, timer)
}

// Modify file watcher to use debounced restart
watchPathsWatcher.on("change", async (changedPath) => {
    await this.debouncedRestart(name, source) // ← Use debounced version
})
```

### **Solution 2: Thread-Safe Singleton**
Fix race condition in `McpServerManager`:

```typescript
static async getInstance(context: vscode.ExtensionContext, provider: ClineProvider): Promise<McpHub> {
    // If we already have an instance, register provider and return
    if (this.instance) {
        this.providers.add(provider)
        return this.instance
    }

    // If initialization is in progress, wait for it
    if (this.initializationPromise) {
        await this.initializationPromise
        this.providers.add(provider)
        return this.instance!
    }

    // Create initialization promise (atomic operation)
    this.initializationPromise = (async () => {
        try {
            if (!this.instance) { // Double-check with promise protection
                this.instance = new McpHub(provider)
                await context.globalState.update(this.GLOBAL_STATE_KEY, Date.now().toString())
            }
            return this.instance
        } catch (error) {
            this.instance = null // Clean up on failure
            throw error
        } finally {
            this.initializationPromise = null
        }
    })()

    const result = await this.initializationPromise
    this.providers.add(provider)
    return result
}
```

### **Solution 3: Connection State Management**
Add locking to prevent concurrent operations:

```typescript
// Add to McpHub class
private connectionStates: Map<string, 'idle' | 'connecting' | 'disconnecting'> = new Map()
private connectionLocks: Map<string, Promise<void>> = new Map()

private async withConnectionLock<T>(serverName: string, source: string, operation: () => Promise<T>): Promise<T> {
    const key = `${serverName}-${source}`
    
    // Wait for any existing operation
    const existingLock = this.connectionLocks.get(key)
    if (existingLock) {
        await existingLock
    }
    
    // Create new lock
    const lockPromise = (async () => {
        try {
            this.connectionStates.set(key, 'connecting')
            return await operation()
        } finally {
            this.connectionStates.set(key, 'idle')
            this.connectionLocks.delete(key)
        }
    })()
    
    this.connectionLocks.set(key, lockPromise)
    return lockPromise
}
```

### **Solution 4: Process Termination Verification**
Ensure previous processes are fully terminated:

```typescript
private async ensureProcessTerminated(transport: any): Promise<void> {
    const process = transport._process || transport.process
    if (process && !process.killed) {
        process.kill('SIGTERM')
        
        // Wait up to 3 seconds for graceful termination
        await new Promise<void>((resolve) => {
            const timeout = setTimeout(() => {
                if (!process.killed) {
                    process.kill('SIGKILL') // Force kill
                }
                resolve()
            }, 3000)
            
            process.on('exit', () => {
                clearTimeout(timeout)
                resolve()
            })
        })
    }
}
```

---

## **⚠️ Impact Assessment**

**Current Impact:**
- Resource exhaustion from multiple MCP server processes
- Port conflicts from competing server instances  
- System instability during development workflows
- Extension crashes from cascading failures

**Priority:** **CRITICAL** - Affects core MCP functionality and development experience

**Affected Use Cases:**
- MCP server development with file watching
- Multiple concurrent tasks using MCP servers
- Extension stability during server restarts

---

## **🧪 Testing Recommendations**

1. **File Watcher Test:** Create MCP server that watches a file, modify it rapidly, verify no infinite loops
2. **Concurrent Access Test:** Start multiple tasks simultaneously, verify only one MCP hub created
3. **Process Cleanup Test:** Restart MCP servers, verify old processes are terminated
4. **Resource Leak Test:** Run for extended period, monitor process/memory usage

---

## **📝 Implementation Checklist**

- [ ] Implement file watcher debouncing with exponential backoff
- [ ] Fix singleton pattern thread safety in McpServerManager
- [ ] Add connection state locking mechanism  
- [ ] Implement process termination verification
- [ ] Add sequential initialization coordination
- [ ] Create comprehensive test suite for MCP server lifecycle
- [ ] Add monitoring/logging for debugging recursive issues

---

**Labels:** `bug`, `critical`, `mcp`, `resource-leak`, `race-condition`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bug] MCP Server Recursive Startup and Resource Leaks #1986

Description

GitHub Issue: MCP Server Recursive Startup and Resource Leaks

🚨 Bug Report: Critical MCP Server Management Issues

Issue Summary

Environment

🔍 Critical Issues Identified

Issue 1: File Watcher Infinite Restart Loop

Issue 2: Singleton Pattern Race Condition

Issue 3: Process Cleanup Race Condition

Issue 4: Concurrent Initialization Chaos

Issue 5: Task-Level MCP Access Competition

🔧 Proposed Solutions

Solution 1: File Watcher Debouncing with Backoff

Solution 2: Thread-Safe Singleton

Solution 3: Connection State Management

Solution 4: Process Termination Verification

⚠️ Impact Assessment

🧪 Testing Recommendations

📝 Implementation Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bug] MCP Server Recursive Startup and Resource Leaks #1986

Description

Description

GitHub Issue: MCP Server Recursive Startup and Resource Leaks

🚨 Bug Report: Critical MCP Server Management Issues

Issue Summary

Environment

🔍 Critical Issues Identified

Issue 1: File Watcher Infinite Restart Loop

Issue 2: Singleton Pattern Race Condition

Issue 3: Process Cleanup Race Condition

Issue 4: Concurrent Initialization Chaos

Issue 5: Task-Level MCP Access Competition

🔧 Proposed Solutions

Solution 1: File Watcher Debouncing with Backoff

Solution 2: Thread-Safe Singleton

Solution 3: Connection State Management

Solution 4: Process Termination Verification

⚠️ Impact Assessment

🧪 Testing Recommendations

📝 Implementation Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions