95% of GenAI Projects Are Failing – And It’s Not the Model’s Fault

It’s rarely the model. It’s the plumbing: data quality, access control, scope control, and connectivity.
Building Production-Ready MCP Servers That Actually Scale: Our HTTP + Webhook Approach
How we reimagined the Model Context Protocol for enterprise deployments
The Problem with Standard MCP
The Model Context Protocol (MCP) is transforming how AI agents interact with external systems. It provides a standardized way for LLMs to access tools, read resources, and receive real-time updates. But there's a catch.
The standard MCP transport relies on Server-Sent Events (SSE) or stdio for communication. While these work great for desktop applications like Claude Desktop or single-user development tools, they create significant challenges at scale:
- Persistent connections don't scale - Each client maintains a long-lived SSE connection, consuming server resources
- No horizontal scaling - Connection state is tied to a specific server instance
- Load balancer headaches - Sticky sessions become mandatory, defeating the purpose of load balancing
- Third-party integration gap - Services like GitHub, Slack, and MongoDB use webhooks, not SSE
We needed something different. We needed MCP servers that could:
- Scale horizontally with zero connection state
- Integrate natively with third-party webhook providers
- Deploy to Kubernetes with trivial multi-replica configurations
- Maintain full MCP protocol compatibility
So we built mcp-http-webhook - and published it to npm.
Our Approach: HTTP + Webhooks
The core insight is simple: replace persistent connections with stateless HTTP requests and webhook callbacks.
Standard MCP:
Client ←──SSE Connection──→ Server (persistent, stateful)
Our Approach:
Client ──HTTP POST──→ Server ──Response──→ Client (stateless)
Third-Party ──Webhook──→ Server ──Webhook──→ Client (event-driven)
How It Works
For Tools and Resources: Clients make standard HTTP POST requests to call tools or read resources. No connection to maintain.
# Call a tool
curl -X POST https://mcp.example.com/mcp \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "find_documents",
"arguments": { "collection": "users", "filter": {} }
}
}'
For Subscriptions (The Magic Part): Instead of the server pushing updates through SSE, clients provide a webhook URL when subscribing. Updates flow through webhooks.
1. Client subscribes, providing their webhook URL
2. Server registers with third-party service (e.g., GitHub, MongoDB)
3. Third-party sends events to our server's webhook endpoint
4. Our server processes and forwards to client's webhook URL
This architecture means:
- Zero connection state - Any server instance can handle any request
- Native third-party integration - GitHub webhooks, MongoDB change streams, Slack events all fit naturally
- Trivial scaling - Add more pods, point load balancer, done
The Library: mcp-http-webhook
We packaged this approach into a production-ready TypeScript library. Here's what makes it special:
Built on the Official MCP SDK
We don't reinvent the protocol. We use @modelcontextprotocol/sdk under the hood, ensuring full compatibility with MCP clients and tools like MCP Inspector.
import { createMCPServer } from 'mcp-http-webhook';
import { RedisStore } from 'mcp-http-webhook/stores';
const server = createMCPServer({
name: 'my-mcp-server',
version: '1.0.0',
publicUrl: 'https://mcp.example.com',
store: new RedisStore(process.env.REDIS_URL),
tools: [...],
resources: [...],
});
await server.start();
External State Storage
Subscription data lives in Redis (or any key-value store you implement). This is what enables horizontal scaling - any server instance can look up any subscription.
interface KeyValueStore {
get(key: string): Promise<string | null>;
set(key: string, value: string, ttl?: number): Promise<void>;
delete(key: string): Promise<void>;
scan?(pattern: string): Promise<string[]>;
}
First-Class Webhook Support
Resources can define subscription handlers that integrate with any third-party webhook provider:
resources: [{
uri: 'github://repo/{owner}/{repo}/issues',
name: 'GitHub Issues',
subscription: {
// Called when a client subscribes
onSubscribe: async (uri, subscriptionId, thirdPartyWebhookUrl, context) => {
// Register webhook with GitHub pointing to thirdPartyWebhookUrl
const webhook = await octokit.repos.createWebhook({
owner, repo,
config: { url: thirdPartyWebhookUrl }
});
return { thirdPartyWebhookId: webhook.id };
},
// Called when GitHub sends us a webhook
onWebhook: async (subscriptionId, payload, headers) => {
return {
resourceUri: `github://repo/${owner}/${repo}/issues`,
changeType: payload.action === 'opened' ? 'created' : 'updated',
data: payload.issue
};
},
// Called when client unsubscribes
onUnsubscribe: async (uri, subscriptionId, storedData, context) => {
await octokit.repos.deleteWebhook({ hook_id: storedData.thirdPartyWebhookId });
}
}
}]
Production Hardened
- Automatic retry with exponential backoff for webhook delivery
- Signature verification for incoming and outgoing webhooks (HMAC SHA-256)
- Health check endpoints (
/health,/ready) - Prometheus metrics integration
- Graceful shutdown handling
Real-World Example: MongoDB MCP Server
Let's look at how we built a production MongoDB MCP server using this library. This server:
- Exposes MongoDB operations as MCP tools
- Lists collections as MCP resources
- Provides real-time updates via MongoDB change streams
- Falls back to polling when change streams aren't available
Architecture Overview
┌─────────────────────────────────────────────────────────┐
│ MongoDB MCP Server │
├─────────────────────────────────────────────────────────┤
│ Tools: find, insert, update, delete, aggregate │
│ Resources: Collections with schema inference │
│ Subscriptions: Change streams + polling fallback │
├─────────────────────────────────────────────────────────┤
│ mcp-http-webhook │ MongoManager │ ChangeTracker │
├─────────────────────────────────────────────────────────┤
│ Redis Store (subscriptions, credentials) │
└─────────────────────────────────────────────────────────┘
Defining Tools
MongoDB operations become MCP tools with JSON Schema validation:
tools: [
{
name: 'find_documents',
description: 'Find documents in a collection using filter, projection, and sorting.',
inputSchema: {
type: 'object',
properties: {
collection: { type: 'string' },
filter: { type: 'object', default: {} },
projection: { type: 'object' },
sort: { type: 'object' },
limit: { type: 'number', default: 100 },
},
required: ['collection'],
},
handler: async (args, context) => {
const credentials = await getCredentials(context.userId);
const documents = await mongoManager.findDocuments(
credentials,
args.database,
args.collection,
args.filter,
{ projection: args.projection, sort: args.sort, limit: args.limit }
);
return { documents: sanitizeDocuments(documents) };
},
},
// insert_documents, update_documents, delete_documents, aggregate_documents...
]
Defining Resources with Pagination
Collections are exposed as resources with built-in pagination:
resources: [
{
uri: 'mongodb://{connection}/collection/{database}/{collection}',
name: 'MongoDB Collection',
mimeType: 'application/json',
// List all collections across all connections
list: async (context, options) => {
const connections = await listConnections(context.userId);
const resources = [];
for (const conn of connections) {
const collections = await mongoManager.listCollections(conn.credentials);
for (const coll of collections) {
resources.push({
uri: buildResourceUri(conn.name, conn.database, coll.name),
name: `${conn.name}.${coll.name}`,
metadata: await buildCollectionMetadata(coll),
});
}
}
// Apply pagination
const page = options?.page ?? 1;
const limit = options?.limit ?? 100;
const offset = (page - 1) * limit;
return {
resources: resources.slice(offset, offset + limit),
nextCursor: offset + limit < resources.length ? String(page + 1) : undefined,
pagination: { page, limit, total: resources.length, hasMore: offset + limit < resources.length }
};
},
// Read collection data
read: async (uri, context, options) => {
const { connection, database, collection } = parseUri(uri);
const page = options?.pagination?.page ?? 1;
const limit = options?.pagination?.limit ?? 100;
const documents = await mongoManager.findDocuments(
credentials, database, collection, {},
{ limit, skip: (page - 1) * limit }
);
return {
contents: { data: documents.map(convertToRow), mimeType: 'application/json' },
nextCursorUrl: hasMore ? `${uri}/page/${page + 1}` : undefined,
pagination: { page, limit, total, hasMore }
};
},
}
]
Change Stream Subscriptions
The real power comes from subscriptions. We use MongoDB change streams when available, with polling fallback:
subscription: {
onSubscribe: async (uri, subscriptionId, webhookUrl, context) => {
const { database, collection } = parseUri(uri);
const credentials = await getCredentials(context.userId);
// Start tracking changes
await changeTracker.startTracking(
subscriptionId,
credentials,
database,
collection,
webhookUrl,
uri
);
return { thirdPartyWebhookId: subscriptionId };
},
onUnsubscribe: async (uri, subscriptionId) => {
await changeTracker.stopTracking(subscriptionId);
},
onWebhook: async (subscriptionId, payload, headers) => {
// Process incoming change events
return {
resourceUri: payload.resourceUri,
changeType: payload.changeType,
data: { rows: payload.rows, changes: payload.changes }
};
}
}
The ChangeTracker class handles the complexity:
class ChangeTracker {
async startTracking(subscriptionId, credentials, database, collection, webhookUrl) {
try {
// Try change streams first
const changeStream = await mongoManager.watchCollection(credentials, collection);
changeStream.on('change', async (change) => {
// Convert MongoDB change event to webhook payload
await sendWebhook(webhookUrl, {
resourceUri: buildDocumentUri(change.documentKey._id),
changeType: change.operationType,
rows: [change.fullDocument]
});
});
changeStream.on('error', () => this.startFallback(/*...*/));
} catch {
// Fall back to polling for non-replica set deployments
this.startFallback(subscriptionId, credentials, database, collection, webhookUrl);
}
}
private async startFallback(/*...*/) {
setInterval(async () => {
const newDocs = await mongoManager.findDocuments(
credentials, database, collection,
{ updatedAt: { $gt: lastTimestamp } }
);
if (newDocs.length > 0) {
for (const doc of newDocs) {
await sendWebhook(webhookUrl, {
resourceUri: buildDocumentUri(doc._id),
changeType: 'updated',
rows: [doc]
});
}
}
}, POLLING_INTERVAL);
}
}
Credential Discovery
A unique feature: we expose a .well-known/credentials endpoint that describes what credentials the MCP server needs:
credentials: [
{
name: 'mongo_uri',
description: 'MongoDB connection URI',
type: 'string',
required: false,
config: {
type: 'keyValue',
inject: [{ location: 'header', key: 'x-mongo-uri', value: '{{mongo_uri}}' }]
}
},
{
name: 'mongo_database',
description: 'Default database name',
type: 'string',
required: true,
config: {
type: 'keyValue',
inject: [{ location: 'header', key: 'x-mongo-database', value: '{{mongo_database}}' }]
}
},
// More credential fields...
]
This allows orchestration systems to automatically configure connections without hardcoded integrations.
Building Your Own MCP with This Library
Ready to build your own production MCP server? Here's the step-by-step:
1. Install the Library
npm install mcp-http-webhook zod
npm install @modelcontextprotocol/sdk express # peer dependencies
npm install ioredis # for Redis store
2. Set Up Basic Server
import { createMCPServer } from 'mcp-http-webhook';
import { RedisStore } from 'mcp-http-webhook/stores';
import Redis from 'ioredis';
const redis = new Redis(process.env.REDIS_URL);
const store = new RedisStore(redis);
const server = createMCPServer({
name: 'my-custom-mcp',
version: '1.0.0',
publicUrl: process.env.PUBLIC_URL, // Your public HTTPS URL
port: 3000,
store,
// Optional: Add authentication
authenticate: async (req) => {
const token = req.headers.authorization?.replace('Bearer ', '');
const user = await validateToken(token);
return { userId: user.id };
},
tools: [],
resources: [],
});
await server.start();
3. Add Tools
tools: [
{
name: 'my_tool',
description: 'Does something useful',
inputSchema: {
type: 'object',
properties: {
input: { type: 'string', description: 'The input value' }
},
required: ['input']
},
handler: async (args, context) => {
// context.userId available from authentication
const result = await doSomething(args.input);
return { result };
}
}
]
4. Add Resources with Subscriptions
resources: [
{
uri: 'myservice://{tenant}/data/{id}',
name: 'My Data Resource',
list: async (context) => {
const items = await getItems(context.userId);
return items.map(item => ({
uri: `myservice://${item.tenant}/data/${item.id}`,
name: item.name
}));
},
read: async (uri, context) => {
const { id } = parseUri(uri);
const data = await getData(id);
return { contents: { text: JSON.stringify(data) } };
},
// Add subscriptions if your service has webhooks or events
subscription: {
onSubscribe: async (uri, subscriptionId, webhookUrl, context) => {
// Register with your service's webhook system
const hookId = await registerWebhook(webhookUrl);
return { thirdPartyWebhookId: hookId };
},
onWebhook: async (subscriptionId, payload, headers) => {
return {
resourceUri: payload.resourceUri,
changeType: payload.type,
data: payload.data
};
},
onUnsubscribe: async (uri, subscriptionId, storedData) => {
await deleteWebhook(storedData.thirdPartyWebhookId);
}
}
}
]
5. Deploy
# docker-compose.yml
services:
mcp-server:
build: .
ports:
- "3000:3000"
environment:
- PUBLIC_URL=https://mcp.example.com
- REDIS_URL=redis://redis:6379
depends_on:
- redis
deploy:
replicas: 3 # Scale horizontally!
redis:
image: redis:7-alpine
Key Differentiators
| Feature | Standard MCP (SSE) | Our Approach (HTTP + Webhooks) |
|---|---|---|
| Scaling | Requires sticky sessions | Stateless, trivial horizontal scaling |
| Third-party integration | Custom adapters needed | Native webhook support |
| State management | In-memory per instance | External store (Redis) |
| Load balancing | Complex | Simple round-robin |
| Serverless deployment | Difficult | Possible |
| Protocol compatibility | Full | Full (uses official SDK) |
Serverless & Scale-to-Zero Deployment
One of the most compelling advantages of our stateless HTTP approach is native compatibility with serverless platforms. Since there's no persistent connection state, MCP servers built with mcp-http-webhook can deploy to environments that scale to zero—meaning you only pay for actual usage.
AWS Lambda Deployment
AWS Lambda is ideal for MCP servers with variable traffic patterns:
// lambda.ts
import { createMCPServer } from 'mcp-http-webhook';
import { DynamoDBStore } from 'mcp-http-webhook/stores';
import { APIGatewayProxyHandler } from 'aws-lambda';
const store = new DynamoDBStore(process.env.DYNAMODB_TABLE!);
const server = createMCPServer({
name: 'my-mcp-lambda',
version: '1.0.0',
publicUrl: process.env.API_GATEWAY_URL!,
store,
tools: [...],
resources: [...],
});
export const handler: APIGatewayProxyHandler = async (event) => {
return server.handleLambdaEvent(event);
};
Benefits:
- Pay-per-invocation - No cost when idle
- Automatic scaling - Handles traffic spikes without configuration
- Managed infrastructure - No servers to maintain
- Global deployment - Deploy to multiple regions with Lambda@Edge
Kubernetes Knative (Scale-to-Zero)
For teams already on Kubernetes, Knative Serving provides the same scale-to-zero benefits while staying in the K8s ecosystem:
# knative-service.yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: mongodb-mcp
spec:
template:
metadata:
annotations:
# Scale to zero after 5 minutes of inactivity
autoscaling.knative.dev/scale-to-zero-grace-period: "5m"
# Allow up to 100 concurrent requests per pod
autoscaling.knative.dev/target: "100"
spec:
containerConcurrency: 100
containers:
- image: your-registry/mongodb-mcp:latest
ports:
- containerPort: 3000
env:
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: mcp-secrets
key: redis-url
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
Knative advantages:
- Cold start optimization - Knative keeps pods warm based on traffic patterns
- Gradual rollouts - Built-in traffic splitting for canary deployments
- Scale-to-zero - Pods terminate during inactivity, reducing costs by 60-80%
- Kubernetes native - Use existing observability, security, and networking
Architecture for 1000s of Connectors
With scale-to-zero infrastructure, deploying thousands of MCP connectors becomes economically viable:
┌─────────────────────────────────────────────────────────────────┐
│ Load Balancer / API Gateway │
├─────────────────────────────────────────────────────────────────┤
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ MongoDB │ │ Postgres │ │ Salesforce│ │ SAP │ ...x1000 │
│ │ MCP │ │ MCP │ │ MCP │ │ MCP │ │
│ │(scaled=0)│ │(scaled=2)│ │(scaled=0)│ │(scaled=1)│ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Shared Redis / DynamoDB (Subscription State) │
└─────────────────────────────────────────────────────────────────┘
Cost model:
- Traditional approach: 1000 connectors × 2 pods × $50/month = $100,000/month
- Scale-to-zero approach: Pay only for active connectors = $5,000-15,000/month
Production at Scale: Kaman.ai
At Kaman.ai, we've put this architecture to the test. Our enterprise AI agent platform uses mcp-http-webhook to power data integrations across hundreds of customer deployments.
Pre-Built Connectors
We've already built production-ready MCP connectors for the most common enterprise data sources:
Databases:
- PostgreSQL, MySQL, Microsoft SQL Server
- MongoDB, DynamoDB
- SAP HANA, Oracle
- Snowflake, BigQuery, Redshift
File Systems & Storage:
- Google Drive, OneDrive, SharePoint
- AWS S3, Azure Blob Storage
- Dropbox, Box
- Local/network file systems (SFTP, SMB)
Business Applications:
- Salesforce (Sales Cloud, Service Cloud)
- ServiceNow (ITSM, CMDB)
- HubSpot, Zendesk
- SAP ERP, SAP S/4HANA
- Microsoft Dynamics 365
Communication & Collaboration:
- Gmail, Outlook/Exchange
- Slack, Microsoft Teams
- Google Calendar, Outlook Calendar
Developer Tools:
- GitHub, GitLab, Bitbucket
- Jira, Confluence
- Linear, Notion
Enterprise-Grade Features
Every Kaman connector includes:
- Multi-tenant isolation - Credentials and data strictly separated per organization
- OAuth 2.0 / OIDC - Secure authentication without storing passwords
- Incremental sync - Only fetch changed data, reducing API costs
- Schema mapping - Transform source schemas to your data lake format
- Data lineage - OpenLineage integration for compliance and debugging
- Rate limiting - Respect API quotas with intelligent backoff
Real Numbers
Our production deployment handles:
- 500+ active MCP connectors across customer tenants
- 10M+ tool invocations per month
- 99.9% uptime with zero persistent connection overhead
- Cold start times under 2 seconds (Knative) or 500ms (Lambda)
The scale-to-zero architecture means customers only pay for what they use—syncing a SharePoint folder once per hour costs a fraction of a cent, not $50/month for an always-on pod.
What's Next
The mcp-http-webhook library is available on npm (package name: mcp-http-webhook). We're using it in production to power:
- MongoDB data integration
- PostgreSQL connectors
- Google Drive sync
- Slack channel integrations
- And more...
If you're building MCP servers that need to scale, integrate with webhooks, or deploy to Kubernetes - this approach might save you months of engineering effort.
Get Started
npm install mcp-http-webhook
Check out the examples directory for complete implementations, and the Spec.md for detailed API documentation.
Questions? Found a bug? We'd love to hear from you. Open an issue or reach out to our team.
About Surajbhan Satpathy
Surajbhan Satpathy is a driven, forward-thinking tech entrepreneur and the Founder & CEO of Yoctotta — an endeavor rooted in his mission of bridging the gap between technology and business.
With a strong foundation in Java and software development, Surajbhan has demonstrated a commitment to building robust technical solutions. Beyond his technical expertise, he is passionate about education and mentoring: through Yoctotta’s internship initiative (YIP), he has created opportunities for BTech, BE, BSc (IT/CS), BCA and MCA graduates to experience real-world corporate development practices and sharpen their skills.
Surajbhan is also an active thought-leader and content-creator, regularly sharing insights on technology trends — from AI / large-language models to blockchain to software engineering — on his professional network.
With a global outlook grounded in local roots (based in Odisha, India), Surajbhan blends entrepreneurial ambition, technical know-how, and a commitment to nurturing young talent.
Enjoyed this article? Share it with others!

