MCP Server Performance Optimization

Shane

2/8/2026

28 min read

An advanced guide to optimizing MCP server performance, covering caching strategies, connection pooling, lazy initialization, memory management, streaming, metrics, and load testing.

mcp model-context-protocol nodejs performance optimization caching

MCP Server Performance Optimization

Overview

MCP servers sit directly in the critical path between a user's prompt and the LLM's response. Every millisecond your tool call burns is a millisecond the user spends staring at a spinner. This guide covers the full spectrum of performance optimization for production MCP servers built on Node.js — from low-level connection pooling and caching strategies to architectural patterns like lazy initialization and result streaming that keep your server fast, lean, and predictable under real workloads.

Prerequisites

Strong working knowledge of Node.js and async I/O patterns
Familiarity with the Model Context Protocol and how tool calls work
Experience building at least one MCP server (stdio or HTTP transport)
Basic understanding of database connection management (PostgreSQL examples used)
Node.js v18+ installed locally
The @modelcontextprotocol/sdk package (v1.x or later)

Why MCP Server Performance Matters

When an LLM decides to call one of your MCP tools, the entire conversation grinds to a halt. The user is waiting. The LLM is waiting. Nothing else happens until your tool responds.

Most LLM providers impose a latency budget of roughly 30–60 seconds for tool calls before they time out. That sounds generous until you realize a single user prompt might trigger 3–5 sequential tool calls, each waiting on the previous result. If each call takes 4 seconds, you have burned 20 seconds before the LLM even starts generating its response. Users notice.

Here is the latency breakdown for a typical tool-augmented interaction:

User prompt to LLM:           ~200ms
LLM decides to call tool:     ~1-3s  (model thinking time)
MCP tool execution:            ???    <-- THIS IS YOUR PROBLEM
LLM processes tool result:    ~1-3s  (model thinking time)
LLM generates response:      ~2-5s  (token generation)

The tool execution time is the only part you control. In my experience, the difference between a well-optimized MCP server and a naive one is often 10x — a 2-second call becomes 200ms, and the user experience goes from "sluggish" to "feels instant."

Profiling Tool Call Latency

You cannot optimize what you do not measure. Before touching any code, instrument your tool calls with end-to-end timing.

var { Server } = require("@modelcontextprotocol/sdk/server/index.js");
var { StdioServerTransport } = require("@modelcontextprotocol/sdk/server/stdio.js");

var metrics = {
  calls: {},
  record: function(toolName, durationMs, success) {
    if (!metrics.calls[toolName]) {
      metrics.calls[toolName] = {
        count: 0,
        totalMs: 0,
        maxMs: 0,
        minMs: Infinity,
        errors: 0
      };
    }
    var m = metrics.calls[toolName];
    m.count++;
    m.totalMs += durationMs;
    if (durationMs > m.maxMs) m.maxMs = durationMs;
    if (durationMs < m.minMs) m.minMs = durationMs;
    if (!success) m.errors++;
  },
  summary: function(toolName) {
    var m = metrics.calls[toolName];
    if (!m) return null;
    return {
      count: m.count,
      avgMs: Math.round(m.totalMs / m.count),
      maxMs: m.maxMs,
      minMs: m.minMs,
      errorRate: (m.errors / m.count * 100).toFixed(1) + "%"
    };
  }
};

function withTiming(toolName, handler) {
  return function(params) {
    var start = process.hrtime.bigint();
    return Promise.resolve(handler(params))
      .then(function(result) {
        var durationMs = Number(process.hrtime.bigint() - start) / 1e6;
        metrics.record(toolName, durationMs, true);
        return result;
      })
      .catch(function(err) {
        var durationMs = Number(process.hrtime.bigint() - start) / 1e6;
        metrics.record(toolName, durationMs, false);
        throw err;
      });
  };
}

Use process.hrtime.bigint() instead of Date.now(). It gives you nanosecond precision and is not affected by system clock adjustments. This matters when you are comparing a 150ms call to a 180ms call — Date.now() can jitter by 10-15ms on its own.

Optimizing Tool Implementations

Async I/O and Non-Blocking Operations

This should be obvious, but I still see MCP servers that use synchronous file reads or child_process.execSync. Every synchronous call blocks the entire Node.js event loop, meaning no other tool calls can be processed.

// BAD: blocks the event loop
var fs = require("fs");

function handleReadFile(params) {
  var content = fs.readFileSync(params.path, "utf8"); // BLOCKS
  return { content: [{ type: "text", text: content }] };
}

// GOOD: non-blocking
var fsPromises = require("fs").promises;

function handleReadFile(params) {
  return fsPromises.readFile(params.path, "utf8")
    .then(function(content) {
      return { content: [{ type: "text", text: content }] };
    });
}

Connection Pooling

If your tool queries a database or calls an external API, creating a new connection per tool call is a performance killer. A TCP handshake plus TLS negotiation can add 50–150ms per call. Connection pooling eliminates this entirely after the first call.

var { Pool } = require("pg");

var pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 10,                   // max connections in pool
  idleTimeoutMillis: 30000,  // close idle connections after 30s
  connectionTimeoutMillis: 5000
});

// Every tool call reuses connections from the pool
function queryDatabase(sql, params) {
  return pool.query(sql, params);
}

The max setting deserves thought. For an MCP server running over stdio (single user), 3–5 connections is plenty. For an HTTP-transport server handling multiple concurrent users, 10–20 is a reasonable starting point. Never set it above 50 without understanding your database's connection limits.

Avoiding Blocking Operations

Watch out for these common blockers in MCP tool implementations:

JSON.parse on large payloads — Parsing a 10MB JSON string takes 50–100ms. Stream-parse with JSONStream instead.
Regex on large strings — A poorly written regex can catastrophically backtrack. Always test against large inputs.
Crypto operations — crypto.pbkdf2Sync will freeze your server. Use the async variant.
Large array sorts — Sorting 100K+ items takes noticeable time. Offload to the database with ORDER BY.

Caching Strategies for Tool Results

Caching is the single highest-impact optimization for most MCP servers. Many tool calls fetch data that changes infrequently — configuration values, user profiles, metadata catalogs. Serving these from memory instead of hitting the database turns a 50ms call into a sub-millisecond one.

In-Memory LRU Cache

An LRU (Least Recently Used) cache keeps the most frequently accessed items in memory and evicts the oldest when the cache is full.

function createLRUCache(maxSize, ttlMs) {
  var cache = new Map();

  function get(key) {
    var entry = cache.get(key);
    if (!entry) return undefined;

    if (Date.now() - entry.timestamp > ttlMs) {
      cache.delete(key);
      return undefined;
    }

    // Move to end (most recently used)
    cache.delete(key);
    cache.set(key, entry);
    return entry.value;
  }

  function set(key, value) {
    cache.delete(key); // Remove if exists (reinsert at end)

    if (cache.size >= maxSize) {
      // Delete oldest entry (first key in Map)
      var firstKey = cache.keys().next().value;
      cache.delete(firstKey);
    }

    cache.set(key, { value: value, timestamp: Date.now() });
  }

  function invalidate(key) {
    cache.delete(key);
  }

  function clear() {
    cache.clear();
  }

  function stats() {
    return { size: cache.size, maxSize: maxSize, ttlMs: ttlMs };
  }

  return { get: get, set: set, invalidate: invalidate, clear: clear, stats: stats };
}

// Usage: cache up to 500 items, expire after 5 minutes
var userCache = createLRUCache(500, 5 * 60 * 1000);

TTL-Based Invalidation

Choose TTLs based on how stale the data can be:

Data Type	Suggested TTL	Reasoning
Static config	30–60 minutes	Rarely changes, low staleness cost
User profiles	2–5 minutes	Changes occasionally, moderate cost
Search results	30–60 seconds	Freshness matters, high churn
Real-time data	No cache	Staleness is unacceptable

Cache-Aside Pattern for Tool Calls

function getUserTool(params) {
  var cacheKey = "user:" + params.userId;
  var cached = userCache.get(cacheKey);

  if (cached) {
    return Promise.resolve({
      content: [{ type: "text", text: JSON.stringify(cached) }]
    });
  }

  return pool.query("SELECT * FROM users WHERE id = $1", [params.userId])
    .then(function(result) {
      var user = result.rows[0];
      if (user) {
        userCache.set(cacheKey, user);
      }
      return {
        content: [{ type: "text", text: JSON.stringify(user || "User not found") }]
      };
    });
}

Database Query Optimization in Tools

Prepared Statements

PostgreSQL prepared statements parse and plan the query once, then execute the plan repeatedly. For tool calls that run the same query shape thousands of times, this saves 1–5ms per call.

// pg module automatically prepares named queries
function getArticlesByCategory(category) {
  return pool.query({
    name: "get-articles-by-category",
    text: "SELECT id, title, slug, published_at FROM articles WHERE category = $1 ORDER BY published_at DESC LIMIT 20",
    values: [category]
  });
}

The name field is the key. The first time this runs, pg sends a Parse message to PostgreSQL. Every subsequent execution skips parsing entirely. On a busy server, this adds up fast.

Index-Aware Query Design

Your MCP tools should only run queries that hit indexes. Use EXPLAIN ANALYZE to verify:

EXPLAIN ANALYZE SELECT * FROM articles WHERE category = 'ai-integration' ORDER BY published_at DESC LIMIT 20;

-- Good output (Index Scan):
-- Index Scan Backward using idx_articles_category_date on articles (cost=0.28..12.45 rows=20 width=256) (actual time=0.032..0.089 rows=20 loops=1)

-- Bad output (Seq Scan):
-- Seq Scan on articles (cost=0.00..1250.00 rows=20 width=256) (actual time=0.015..45.234 rows=20 loops=1)

If your tool queries are doing sequential scans, add the right indexes before caching. Caching masks bad queries — it does not fix them.

HTTP Request Optimization in Tools

If your MCP tools call external APIs, HTTP connection management matters enormously.

Keep-Alive Connections

var http = require("http");
var https = require("https");

var keepAliveAgent = new https.Agent({
  keepAlive: true,
  maxSockets: 10,
  maxFreeSockets: 5,
  timeout: 60000
});

function callExternalApi(path) {
  return new Promise(function(resolve, reject) {
    var options = {
      hostname: "api.example.com",
      path: path,
      agent: keepAliveAgent,
      headers: { "Connection": "keep-alive" }
    };
    var req = https.get(options, function(res) {
      var data = "";
      res.on("data", function(chunk) { data += chunk; });
      res.on("end", function() { resolve(JSON.parse(data)); });
    });
    req.on("error", reject);
  });
}

Without keep-alive, every HTTP request performs a full TCP + TLS handshake. With keep-alive, subsequent requests to the same host reuse the existing connection. I have measured this saving 80–120ms per request in production.

Request Deduplication

When multiple tool calls request the same data simultaneously (common in parallel tool execution), deduplication ensures only one actual request is made:

var inflightRequests = new Map();

function deduplicatedFetch(url) {
  if (inflightRequests.has(url)) {
    return inflightRequests.get(url);
  }

  var promise = callExternalApi(url)
    .then(function(result) {
      inflightRequests.delete(url);
      return result;
    })
    .catch(function(err) {
      inflightRequests.delete(url);
      throw err;
    });

  inflightRequests.set(url, promise);
  return promise;
}

Lazy Initialization Patterns

Do not pay startup costs until you need to. An MCP server might expose 15 tools, but a given session might only use 3 of them. Lazy initialization defers expensive setup — database connections, API client authentication, file index construction — until the first call to a tool that needs it.

var dbPool = null;
var redisClient = null;

function getDbPool() {
  if (!dbPool) {
    var { Pool } = require("pg");
    dbPool = new Pool({
      connectionString: process.env.DATABASE_URL,
      max: 5
    });
  }
  return dbPool;
}

function getRedisClient() {
  if (!redisClient) {
    var redis = require("redis");
    redisClient = redis.createClient({ url: process.env.REDIS_URL });
    redisClient.connect();
  }
  return redisClient;
}

// Tools that need the database call getDbPool()
// Tools that need Redis call getRedisClient()
// Neither is initialized until actually needed

This pattern is especially valuable for stdio-based MCP servers, where startup time directly impacts how long the user waits after launching the server.

Batching Multiple Operations

When an LLM needs to perform several related lookups, having it call one tool five times is slower than calling one tool once with batched parameters. Design your tools to accept arrays where it makes sense.

// Instead of: get_user called 5 times sequentially
// Offer: get_users with an array of IDs

function handleGetUsers(params) {
  var ids = params.userIds; // e.g., [101, 102, 103, 104, 105]
  var placeholders = ids.map(function(_, i) { return "$" + (i + 1); }).join(", ");
  var sql = "SELECT * FROM users WHERE id IN (" + placeholders + ")";

  return getDbPool().query(sql, ids)
    .then(function(result) {
      return {
        content: [{ type: "text", text: JSON.stringify(result.rows) }]
      };
    });
}

One batched query taking 15ms versus five individual queries each taking 12ms (60ms total) — the improvement is roughly 4x, and the database load is dramatically lower. Tell the LLM about this capability in your tool description. Something like: "Accepts a single ID or an array of IDs. Prefer passing multiple IDs when looking up several users."

Resource Pre-Fetching

If your tool calls follow predictable patterns, pre-fetch data before the LLM asks for it. For example, if the LLM typically calls get_project followed by list_project_files, you can eagerly cache the file list when the project is first fetched.

var fileListCache = createLRUCache(100, 60000);

function handleGetProject(params) {
  return getDbPool().query("SELECT * FROM projects WHERE id = $1", [params.projectId])
    .then(function(result) {
      var project = result.rows[0];

      // Pre-fetch file list in background (don't await it)
      getDbPool().query("SELECT name, path, size FROM files WHERE project_id = $1 ORDER BY path", [params.projectId])
        .then(function(fileResult) {
          fileListCache.set("files:" + params.projectId, fileResult.rows);
        })
        .catch(function() { /* ignore pre-fetch failures */ });

      return {
        content: [{ type: "text", text: JSON.stringify(project) }]
      };
    });
}

function handleListProjectFiles(params) {
  var cached = fileListCache.get("files:" + params.projectId);
  if (cached) {
    return Promise.resolve({
      content: [{ type: "text", text: JSON.stringify(cached) }]
    });
  }

  return getDbPool().query("SELECT name, path, size FROM files WHERE project_id = $1 ORDER BY path", [params.projectId])
    .then(function(result) {
      fileListCache.set("files:" + params.projectId, result.rows);
      return {
        content: [{ type: "text", text: JSON.stringify(result.rows) }]
      };
    });
}

Memory Management

MCP servers running over stdio are long-lived processes. They start when the LLM client launches and may run for hours. Memory leaks that seem trivial in short-lived request handlers become serious over time.

Common Leak Sources

Unbounded caches — Always use an LRU cache with a max size. A plain object used as a cache grows forever.
Event listener accumulation — Registering listeners in tool handlers without removing them.
Closures capturing large objects — A closure that captures a 10MB response body keeps it alive indefinitely.
Unresolved promises — Promises that never resolve hold their entire closure chain in memory.

Monitoring Memory in Production

function getMemoryStats() {
  var usage = process.memoryUsage();
  return {
    heapUsedMB: Math.round(usage.heapUsed / 1024 / 1024),
    heapTotalMB: Math.round(usage.heapTotal / 1024 / 1024),
    rssMB: Math.round(usage.rss / 1024 / 1024),
    externalMB: Math.round(usage.external / 1024 / 1024)
  };
}

// Log memory every 60 seconds
setInterval(function() {
  var stats = getMemoryStats();
  if (stats.heapUsedMB > 200) {
    process.stderr.write("[WARN] High memory usage: " + JSON.stringify(stats) + "\n");
  }
}, 60000);

If heapUsedMB grows monotonically over time without plateauing, you have a leak. Use --inspect with Chrome DevTools to take heap snapshots and compare them.

Streaming Large Results

When a tool returns large amounts of data (file contents, query results with thousands of rows, log outputs), buffering the entire result into a single string wastes memory and delays the response. The MCP protocol supports returning multiple content blocks, and you should chunk large results.

function handleLargeQuery(params) {
  var client;
  return getDbPool().connect()
    .then(function(c) {
      client = c;
      var QueryStream = require("pg-query-stream");
      var query = new QueryStream("SELECT * FROM events WHERE created_at > $1", [params.since]);
      var stream = client.query(query);

      var rows = [];
      var chunks = [];
      var chunkSize = 100;

      return new Promise(function(resolve, reject) {
        stream.on("data", function(row) {
          rows.push(row);
          if (rows.length >= chunkSize) {
            chunks.push({ type: "text", text: JSON.stringify(rows) });
            rows = [];
          }
        });
        stream.on("end", function() {
          if (rows.length > 0) {
            chunks.push({ type: "text", text: JSON.stringify(rows) });
          }
          client.release();
          resolve({ content: chunks });
        });
        stream.on("error", function(err) {
          client.release();
          reject(err);
        });
      });
    });
}

For truly massive results, consider returning a summary with a cursor or pagination token instead of dumping everything. The LLM's context window is finite — sending 50,000 rows is rarely useful anyway.

Transport Optimization

Stdio Buffer Sizes

For stdio-based MCP servers, the default buffer sizes work for most cases, but if your tools return large results, you may need to handle backpressure:

// Check if stdout is being overwhelmed
process.stdout.on("drain", function() {
  // Resume writing after backpressure is relieved
});

// For high-throughput stdio servers, increase the highWaterMark
var { StdioServerTransport } = require("@modelcontextprotocol/sdk/server/stdio.js");
var transport = new StdioServerTransport();

HTTP Transport Connection Reuse

For HTTP-based MCP servers (SSE transport), ensure the HTTP server is configured for persistent connections:

var http = require("http");
var { SSEServerTransport } = require("@modelcontextprotocol/sdk/server/sse.js");

var httpServer = http.createServer();
httpServer.keepAliveTimeout = 65000; // slightly above typical LB timeout of 60s
httpServer.headersTimeout = 66000;   // must be greater than keepAliveTimeout

httpServer.listen(3001, function() {
  process.stderr.write("MCP server listening on port 3001\n");
});

Monitoring and Metrics

Production MCP servers need observability. Here is a practical approach that adds a server_metrics tool to your MCP server — the LLM (or a human operator) can call it to get a health report.

var startTime = Date.now();

function handleServerMetrics() {
  var uptimeSeconds = Math.round((Date.now() - startTime) / 1000);
  var memory = getMemoryStats();

  var toolStats = {};
  Object.keys(metrics.calls).forEach(function(name) {
    toolStats[name] = metrics.summary(name);
  });

  var report = {
    uptime: uptimeSeconds + "s",
    memory: memory,
    tools: toolStats,
    cache: {
      users: userCache.stats(),
      files: fileListCache.stats()
    },
    connections: {
      dbPool: {
        total: getDbPool().totalCount,
        idle: getDbPool().idleCount,
        waiting: getDbPool().waitingCount
      }
    }
  };

  return Promise.resolve({
    content: [{ type: "text", text: JSON.stringify(report, null, 2) }]
  });
}

Sample output from a production server after a few hours:

{
  "uptime": "7243s",
  "memory": {
    "heapUsedMB": 45,
    "heapTotalMB": 68,
    "rssMB": 92,
    "externalMB": 2
  },
  "tools": {
    "get_user": {
      "count": 342,
      "avgMs": 12,
      "maxMs": 187,
      "minMs": 1,
      "errorRate": "0.3%"
    },
    "search_articles": {
      "count": 89,
      "avgMs": 45,
      "maxMs": 320,
      "minMs": 8,
      "errorRate": "1.1%"
    }
  },
  "cache": {
    "users": { "size": 127, "maxSize": 500, "ttlMs": 300000 },
    "files": { "size": 34, "maxSize": 100, "ttlMs": 60000 }
  },
  "connections": {
    "dbPool": { "total": 5, "idle": 3, "waiting": 0 }
  }
}

When waiting is consistently above 0, increase your pool's max setting. When avgMs creeps up over time, investigate cache hit rates and database query plans.

Load Testing MCP Servers

You cannot load test MCP servers with traditional HTTP benchmarking tools like wrk or ab because the MCP protocol uses JSON-RPC over stdio or SSE. Build a simple load test client instead:

var { Client } = require("@modelcontextprotocol/sdk/client/index.js");
var { StdioClientTransport } = require("@modelcontextprotocol/sdk/client/stdio.js");
var { spawn } = require("child_process");

function runLoadTest(toolName, params, iterations) {
  var transport = new StdioClientTransport({
    command: "node",
    args: ["./server.js"]
  });

  var client = new Client({ name: "load-test", version: "1.0.0" });

  return client.connect(transport)
    .then(function() {
      var results = [];
      var chain = Promise.resolve();

      for (var i = 0; i < iterations; i++) {
        (function(idx) {
          chain = chain.then(function() {
            var start = process.hrtime.bigint();
            return client.callTool({ name: toolName, arguments: params })
              .then(function() {
                var duration = Number(process.hrtime.bigint() - start) / 1e6;
                results.push({ iteration: idx, durationMs: duration, success: true });
              })
              .catch(function() {
                var duration = Number(process.hrtime.bigint() - start) / 1e6;
                results.push({ iteration: idx, durationMs: duration, success: false });
              });
          });
        })(i);
      }

      return chain.then(function() {
        var durations = results.map(function(r) { return r.durationMs; });
        durations.sort(function(a, b) { return a - b; });

        console.log("Results for " + toolName + " (" + iterations + " iterations):");
        console.log("  Avg: " + Math.round(durations.reduce(function(a, b) { return a + b; }, 0) / durations.length) + "ms");
        console.log("  P50: " + Math.round(durations[Math.floor(durations.length * 0.5)]) + "ms");
        console.log("  P95: " + Math.round(durations[Math.floor(durations.length * 0.95)]) + "ms");
        console.log("  P99: " + Math.round(durations[Math.floor(durations.length * 0.99)]) + "ms");
        console.log("  Errors: " + results.filter(function(r) { return !r.success; }).length);

        return client.close();
      });
    });
}

// Run it
runLoadTest("get_user", { userId: "user-123" }, 1000);

Target these benchmarks for a well-optimized server:

Metric	Target	Acceptable	Needs Work
P50	< 20ms	< 100ms	> 100ms
P95	< 100ms	< 500ms	> 500ms
P99	< 500ms	< 2000ms	> 2000ms
Error rate	< 0.1%	< 1%	> 1%

Scaling Strategies

Process Pooling for Stdio Servers

For stdio-based MCP servers that serve multiple LLM clients, you can use Node.js cluster mode to run multiple worker processes:

var cluster = require("cluster");
var os = require("os");

if (cluster.isMaster) {
  var numWorkers = Math.min(os.cpus().length, 4); // cap at 4
  process.stderr.write("Starting " + numWorkers + " MCP worker processes\n");

  for (var i = 0; i < numWorkers; i++) {
    cluster.fork();
  }

  cluster.on("exit", function(worker) {
    process.stderr.write("Worker " + worker.process.pid + " died, restarting...\n");
    cluster.fork();
  });
} else {
  // Each worker runs a full MCP server
  require("./server.js");
}

Horizontal Scaling for HTTP Transport

For HTTP-based MCP servers, horizontal scaling is straightforward. Put multiple instances behind a load balancer. The key requirement is sticky sessions — the SSE connection must stay on the same server for the duration of a conversation.

upstream mcp_servers {
    ip_hash;  # sticky sessions by client IP
    server 127.0.0.1:3001;
    server 127.0.0.1:3002;
    server 127.0.0.1:3003;
}

server {
    listen 443 ssl;
    location /mcp {
        proxy_pass http://mcp_servers;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_read_timeout 300s;  # long timeout for SSE
    }
}

Complete Working Example

Here is a full MCP server with built-in performance instrumentation: LRU caching, connection pooling, lazy initialization, call duration metrics, and a diagnostic tool.

var { Server } = require("@modelcontextprotocol/sdk/server/index.js");
var { StdioServerTransport } = require("@modelcontextprotocol/sdk/server/stdio.js");
var {
  CallToolRequestSchema,
  ListToolsRequestSchema
} = require("@modelcontextprotocol/sdk/types.js");

// ---- Metrics Collector ----

var serverStartTime = Date.now();

var metricsStore = {
  calls: {},
  record: function(toolName, durationMs, success) {
    if (!metricsStore.calls[toolName]) {
      metricsStore.calls[toolName] = {
        count: 0, totalMs: 0, maxMs: 0, minMs: Infinity, errors: 0
      };
    }
    var m = metricsStore.calls[toolName];
    m.count++;
    m.totalMs += durationMs;
    if (durationMs > m.maxMs) m.maxMs = durationMs;
    if (durationMs < m.minMs) m.minMs = durationMs;
    if (!success) m.errors++;
  }
};

// ---- LRU Cache ----

function createLRUCache(maxSize, ttlMs) {
  var cache = new Map();
  var hits = 0;
  var misses = 0;

  return {
    get: function(key) {
      var entry = cache.get(key);
      if (!entry || Date.now() - entry.ts > ttlMs) {
        if (entry) cache.delete(key);
        misses++;
        return undefined;
      }
      cache.delete(key);
      cache.set(key, entry);
      hits++;
      return entry.value;
    },
    set: function(key, value) {
      cache.delete(key);
      if (cache.size >= maxSize) {
        cache.delete(cache.keys().next().value);
      }
      cache.set(key, { value: value, ts: Date.now() });
    },
    stats: function() {
      return {
        size: cache.size, maxSize: maxSize, ttlMs: ttlMs,
        hits: hits, misses: misses,
        hitRate: hits + misses > 0 ? ((hits / (hits + misses)) * 100).toFixed(1) + "%" : "N/A"
      };
    },
    clear: function() { cache.clear(); hits = 0; misses = 0; }
  };
}

var articleCache = createLRUCache(200, 5 * 60 * 1000);
var userCache = createLRUCache(500, 2 * 60 * 1000);

// ---- Lazy Database Pool ----

var dbPool = null;

function getPool() {
  if (!dbPool) {
    var { Pool } = require("pg");
    dbPool = new Pool({
      connectionString: process.env.DATABASE_URL,
      max: 5,
      idleTimeoutMillis: 30000,
      connectionTimeoutMillis: 5000
    });
    dbPool.on("error", function(err) {
      process.stderr.write("[DB POOL ERROR] " + err.message + "\n");
    });
    process.stderr.write("[INIT] Database pool created (lazy)\n");
  }
  return dbPool;
}

// ---- Tool Handlers ----

function handleGetUser(params) {
  var cacheKey = "user:" + params.userId;
  var cached = userCache.get(cacheKey);

  if (cached) {
    return Promise.resolve({
      content: [{ type: "text", text: JSON.stringify(cached) }]
    });
  }

  return getPool().query({
    name: "get-user-by-id",
    text: "SELECT id, name, email, role, created_at FROM users WHERE id = $1",
    values: [params.userId]
  }).then(function(result) {
    var user = result.rows[0] || null;
    if (user) userCache.set(cacheKey, user);
    return {
      content: [{ type: "text", text: JSON.stringify(user || "User not found") }]
    };
  });
}

function handleSearchArticles(params) {
  var cacheKey = "search:" + params.query + ":" + (params.limit || 10);
  var cached = articleCache.get(cacheKey);

  if (cached) {
    return Promise.resolve({
      content: [{ type: "text", text: JSON.stringify(cached) }]
    });
  }

  var limit = Math.min(params.limit || 10, 50);

  return getPool().query({
    name: "search-articles",
    text: "SELECT id, title, category, published_at FROM articles WHERE to_tsvector('english', title || ' ' || body) @@ plainto_tsquery('english', $1) ORDER BY published_at DESC LIMIT $2",
    values: [params.query, limit]
  }).then(function(result) {
    articleCache.set(cacheKey, result.rows);
    return {
      content: [{ type: "text", text: JSON.stringify(result.rows) }]
    };
  });
}

function handleGetUsers(params) {
  var ids = params.userIds;
  if (!Array.isArray(ids) || ids.length === 0) {
    return Promise.resolve({
      content: [{ type: "text", text: "Error: userIds must be a non-empty array" }]
    });
  }

  // Check cache first for each ID
  var cached = [];
  var uncachedIds = [];

  ids.forEach(function(id) {
    var hit = userCache.get("user:" + id);
    if (hit) {
      cached.push(hit);
    } else {
      uncachedIds.push(id);
    }
  });

  if (uncachedIds.length === 0) {
    return Promise.resolve({
      content: [{ type: "text", text: JSON.stringify(cached) }]
    });
  }

  var placeholders = uncachedIds.map(function(_, i) { return "$" + (i + 1); }).join(", ");
  var sql = "SELECT id, name, email, role, created_at FROM users WHERE id IN (" + placeholders + ")";

  return getPool().query(sql, uncachedIds).then(function(result) {
    result.rows.forEach(function(user) {
      userCache.set("user:" + user.id, user);
    });
    var allUsers = cached.concat(result.rows);
    return {
      content: [{ type: "text", text: JSON.stringify(allUsers) }]
    };
  });
}

function handleServerDiagnostics() {
  var uptimeSeconds = Math.round((Date.now() - serverStartTime) / 1000);
  var memory = process.memoryUsage();

  var toolStats = {};
  Object.keys(metricsStore.calls).forEach(function(name) {
    var m = metricsStore.calls[name];
    toolStats[name] = {
      count: m.count,
      avgMs: Math.round(m.totalMs / m.count),
      maxMs: Math.round(m.maxMs),
      minMs: m.minMs === Infinity ? 0 : Math.round(m.minMs),
      errorRate: (m.errors / m.count * 100).toFixed(1) + "%"
    };
  });

  var poolStats = null;
  if (dbPool) {
    poolStats = {
      totalConnections: dbPool.totalCount,
      idleConnections: dbPool.idleCount,
      waitingClients: dbPool.waitingCount
    };
  }

  var report = {
    server: {
      uptime: uptimeSeconds + "s",
      uptimeHuman: Math.floor(uptimeSeconds / 3600) + "h " + Math.floor((uptimeSeconds % 3600) / 60) + "m",
      nodeVersion: process.version
    },
    memory: {
      heapUsedMB: Math.round(memory.heapUsed / 1024 / 1024),
      heapTotalMB: Math.round(memory.heapTotal / 1024 / 1024),
      rssMB: Math.round(memory.rss / 1024 / 1024)
    },
    tools: toolStats,
    caches: {
      articles: articleCache.stats(),
      users: userCache.stats()
    },
    database: poolStats || "Not initialized (lazy)"
  };

  return Promise.resolve({
    content: [{ type: "text", text: JSON.stringify(report, null, 2) }]
  });
}

// ---- Timed Wrapper ----

function timed(toolName, handler) {
  return function(params) {
    var start = process.hrtime.bigint();
    return Promise.resolve(handler(params))
      .then(function(result) {
        var ms = Number(process.hrtime.bigint() - start) / 1e6;
        metricsStore.record(toolName, ms, true);
        process.stderr.write("[TOOL] " + toolName + " completed in " + ms.toFixed(1) + "ms\n");
        return result;
      })
      .catch(function(err) {
        var ms = Number(process.hrtime.bigint() - start) / 1e6;
        metricsStore.record(toolName, ms, false);
        process.stderr.write("[TOOL] " + toolName + " FAILED in " + ms.toFixed(1) + "ms: " + err.message + "\n");
        return {
          content: [{ type: "text", text: "Error: " + err.message }],
          isError: true
        };
      });
  };
}

// ---- Server Setup ----

var server = new Server(
  { name: "optimized-server", version: "1.0.0" },
  { capabilities: { tools: {} } }
);

var tools = [
  {
    name: "get_user",
    description: "Get a user by ID. Results are cached for 2 minutes.",
    inputSchema: {
      type: "object",
      properties: { userId: { type: "string", description: "The user ID" } },
      required: ["userId"]
    }
  },
  {
    name: "get_users",
    description: "Get multiple users by ID in a single call. More efficient than calling get_user repeatedly. Results are cached individually.",
    inputSchema: {
      type: "object",
      properties: {
        userIds: {
          type: "array",
          items: { type: "string" },
          description: "Array of user IDs to fetch"
        }
      },
      required: ["userIds"]
    }
  },
  {
    name: "search_articles",
    description: "Full-text search across articles. Results are cached for 5 minutes.",
    inputSchema: {
      type: "object",
      properties: {
        query: { type: "string", description: "Search query" },
        limit: { type: "number", description: "Max results (default 10, max 50)" }
      },
      required: ["query"]
    }
  },
  {
    name: "server_diagnostics",
    description: "Returns server health metrics including uptime, memory usage, tool call statistics, cache hit rates, and database connection pool status.",
    inputSchema: { type: "object", properties: {} }
  }
];

var handlers = {
  get_user: timed("get_user", handleGetUser),
  get_users: timed("get_users", handleGetUsers),
  search_articles: timed("search_articles", handleSearchArticles),
  server_diagnostics: timed("server_diagnostics", handleServerDiagnostics)
};

server.setRequestHandler(ListToolsRequestSchema, function() {
  return { tools: tools };
});

server.setRequestHandler(CallToolRequestSchema, function(request) {
  var toolName = request.params.name;
  var handler = handlers[toolName];
  if (!handler) {
    return Promise.resolve({
      content: [{ type: "text", text: "Unknown tool: " + toolName }],
      isError: true
    });
  }
  return handler(request.params.arguments || {});
});

// ---- Start ----

var transport = new StdioServerTransport();
server.connect(transport).then(function() {
  process.stderr.write("[STARTUP] Optimized MCP server running (PID: " + process.pid + ")\n");
});

// Graceful shutdown
process.on("SIGINT", function() {
  process.stderr.write("[SHUTDOWN] Cleaning up...\n");
  if (dbPool) {
    dbPool.end().then(function() {
      process.exit(0);
    });
  } else {
    process.exit(0);
  }
});

Save this as server.js and run it with:

node server.js

Or configure it in your MCP client settings:

{
  "mcpServers": {
    "optimized-server": {
      "command": "node",
      "args": ["/path/to/server.js"],
      "env": {
        "DATABASE_URL": "postgresql://user:pass@localhost:5432/mydb"
      }
    }
  }
}

Common Issues & Troubleshooting

1. Connection Pool Exhaustion

Symptom: Tool calls start timing out after running fine for a while. You see errors like:

Error: Timeout acquiring connection from pool after 5000ms

Cause: Queries that throw errors without releasing the client, or unreturned connections from pool.connect().

Fix: Always release the client in a .finally() or use pool.query() (which manages the client lifecycle automatically) instead of pool.connect(). Check your pool's waitingCount — if it is consistently above 0, increase max or investigate slow queries holding connections.

2. Memory Leak from Unbounded Caches

Symptom: process.memoryUsage().heapUsed grows monotonically, eventually hitting 1.5GB+ and causing garbage collection pauses over 500ms.

[WARN] High memory usage: {"heapUsedMB":1247,"heapTotalMB":1380,"rssMB":1520}

Cause: Using a plain {} object or Map as a cache without any eviction policy.

Fix: Replace with an LRU cache that has a fixed maximum size. Set maxSize based on the size of each cached item — 500 entries of 1KB each is 500KB, but 500 entries of 100KB each is 50MB. Profile your actual cache entry sizes with Buffer.byteLength(JSON.stringify(entry)).

3. Stdio Buffer Overflow

Symptom: Large tool responses get truncated or the MCP client disconnects with:

Error: Unexpected end of JSON input

Cause: The tool returned a result larger than the stdio buffer can handle in a single write. Node.js streams split large writes into chunks, and if the receiving side does not handle partial JSON correctly, it breaks.

Fix: Limit tool response sizes. If a query could return thousands of rows, paginate it. A good rule of thumb: keep individual tool responses under 100KB. For larger data, return a summary and offer a pagination token for follow-up calls.

4. Slow First Call (Cold Start)

Symptom: The first tool call takes 2–5 seconds, but subsequent calls are fast (< 50ms).

[TOOL] get_user completed in 3847.2ms   <-- first call
[TOOL] get_user completed in 12.4ms     <-- second call

Cause: Lazy initialization of database connections. The first call pays the cost of establishing TCP connections, TLS handshake, and potentially PostgreSQL authentication.

Fix: This is usually acceptable — it is the whole point of lazy initialization. If the cold start is too slow, you can add a "warmup" phase that pre-connects during server startup:

// Optional warmup: connect eagerly instead of lazily
getPool().query("SELECT 1").then(function() {
  process.stderr.write("[WARMUP] Database connection pool primed\n");
});

5. High P99 Latency from Garbage Collection

Symptom: Most calls are fast, but occasional calls spike to 200–500ms for no apparent reason. You see GC pauses in --trace-gc output:

[GC] Scavenge 45.2 -> 38.1 MB, 12.3 ms
[GC] Mark-sweep 120.4 -> 85.2 MB, 234.5 ms  <-- PROBLEM

Cause: Large heap size triggers full mark-sweep GC, pausing execution.

Fix: Reduce memory usage (smaller caches, avoid large string concatenations), or tune V8's GC with --max-old-space-size=256 to trigger GC earlier and more frequently with shorter pauses.

Best Practices

Measure before you optimize. Instrument every tool call with timing data from day one. Without numbers, you are guessing. The process.hrtime.bigint() approach shown above costs less than 1 microsecond — there is no reason not to include it.
Cache at the right granularity. Cache individual records, not entire query results. Caching "all users" means the entire cache entry goes stale when any user changes. Caching individual users means only the changed record needs invalidation.
Use prepared statements for repeated queries. The pg module's named queries are free performance. The only cost is holding the prepared statement in PostgreSQL's session memory, which is negligible.
Design tools for batching. If the LLM will predictably need multiple records, expose a batch variant of the tool. Document it clearly so the LLM knows to prefer get_users(ids: [1,2,3]) over three sequential get_user calls.
Set response size limits. Never return unbounded query results from a tool. Always include a LIMIT clause, and cap it at a sensible maximum (50–100 records). Returning 10,000 rows wastes the LLM's context window and slows everything down.
Log to stderr, not stdout. In stdio-based MCP servers, stdout is the protocol channel. Any diagnostic logging must go to stderr. Accidentally writing a console.log() to stdout will corrupt the JSON-RPC stream and crash the connection.
Implement graceful shutdown. Close database pools and clean up resources when the process receives SIGINT or SIGTERM. Abrupt termination can leave database connections hanging in PostgreSQL's pg_stat_activity for minutes.
Test with realistic data volumes. A tool that searches 100 records in development will behave differently when searching 1 million records in production. Your load tests should use production-scale data or a realistic subset.
Monitor connection pool utilization. Expose pool stats (totalCount, idleCount, waitingCount) in your diagnostics tool. A pool that is consistently maxed out is a bottleneck. A pool with too many idle connections is wasting database resources.
Version your cache keys. When you change a query's schema or add new fields, old cached results will have stale shapes. Prefix cache keys with a version number (e.g., v2:user:123) so new code does not serve old-format data.

References

Model Context Protocol Specification — Official MCP spec covering transports, tool schemas, and message formats
MCP TypeScript SDK — The official SDK used in all examples in this article
node-postgres Documentation — Connection pooling, prepared statements, and query streaming
Node.js Performance Best Practices — Official guide to profiling and optimizing Node.js applications
V8 Garbage Collection Deep Dive — Understanding GC pauses and how to mitigate them
pg-query-stream — Stream large result sets from PostgreSQL without buffering

MCP Server Performance Optimization

Overview

Prerequisites

Why MCP Server Performance Matters

Profiling Tool Call Latency

Optimizing Tool Implementations

Async I/O and Non-Blocking Operations

Connection Pooling

Avoiding Blocking Operations

Caching Strategies for Tool Results

In-Memory LRU Cache

TTL-Based Invalidation

Cache-Aside Pattern for Tool Calls

Database Query Optimization in Tools

Prepared Statements

Index-Aware Query Design

HTTP Request Optimization in Tools

Keep-Alive Connections

Request Deduplication

Lazy Initialization Patterns

Batching Multiple Operations

Resource Pre-Fetching

Memory Management

Common Leak Sources

Monitoring Memory in Production

Streaming Large Results

Transport Optimization

Stdio Buffer Sizes

HTTP Transport Connection Reuse

Monitoring and Metrics

Load Testing MCP Servers

Scaling Strategies

Process Pooling for Stdio Servers

Horizontal Scaling for HTTP Transport

Complete Working Example

Common Issues & Troubleshooting

1. Connection Pool Exhaustion

2. Memory Leak from Unbounded Caches

3. Stdio Buffer Overflow

4. Slow First Call (Cold Start)

5. High P99 Latency from Garbage Collection

Best Practices

References

Quick Links

Recommended Reading

Retrieval Augmented Generation with Node.js

Need Expert Help?