The Billion-Message Challenge

When you build a bot on BotBuilder, you see a friendly flowchart. But underneath that UI lies a distributed beast capable of swallowing firehoses of data from Discord, Telegram, and Slack simultaneously.

Scaling a bot platform is uniquely difficult because chat traffic is bursty. A viral giveaway or a breaking news alert can spike traffic from 100 to 100,000 events per second (EPS) in moments. If our runtime blinks, your bot ignores users. That is not an option.

This post peels back the layers of our runtime architecture to explain how we guarantee 99.99% uptime.

1. The Ingestion Layer: Surviving the Tsunami

The first line of defense is the Ingestion Gateway. This is not the bot logic; it is a dumb, ultra-fast shield.

When a webhook arrives from Discord, we don’t process it immediately. We acknowledge it (sending a 200 OK to Discord so they don’t retry) and push the raw payload into our message bus.

Technology Stack

Language: Go (Golang) for raw throughput.
Protocol: gRPC for internal comms.
Message Bus: Apache Kafka.

Architecture Rule #1: Never process synchronously at the edge. Always buffer.

// Simplified Ingestion Logic
func HandleWebhook(w http.ResponseWriter, r *http.Request) {
    payload := readBody(r)
    
    // 1. Validate Signature (Security)
    if !validateSignature(payload) {
        return http.StatusUnauthorized
    }

    // 2. Push to Kafka (Fire & Forget)
    kafkaProducer.Produce("incoming-events", payload)

    // 3. Ack immediately
    w.WriteHeader(http.StatusOK)
}

By decoupling ingestion from processing, we can absorb spikes. If the database slows down, the Kafka lag increases, but no data is dropped, and the platforms don’t disconnect us.

2. The Runtime Engine: Stateless & Isolated

The heart of BotBuilder is the Flow Executer. This is the service that reads your flowchart diagram and executes the logic.

Crucially, these workers are stateless. They do not “know” about the user. They fetch the context, apply the logic, and compute the next state.

The “Step” Concept

Every interaction is broken down into atomic “Steps”.

Fetch: Worker grabs an event from Kafka.
Hydrate: Worker pulls the UserSession from Redis.
Execute: Worker runs the logic for just the current block (e.g., an IF condition).
Persist: Worker saves the new state to Redis/Postgres.
Dispatch: Worker publishes the “Reply” command to an outgoing queue.

This allows us to run thousands of execution pods on Kubernetes. If one crashes, another picks up the message and retries.

Component	Responsibility	Scaling Strategy
Ingress	TLS termination, DDoS protection	Cloud Load Balancers
Worker Nodes	CPU-intensive logic execution	HPA (Horizontal Pod Autoscaler) on CPU > 60%
Redis Cluster	Hot state (Session variables)	Sharded Cluster Mode
Postgres	Cold storage (User history)	Read Replicas + Connection Pooling

3. Dealing with “The Thundering Herd”

What happens when a popular bot broadcasts a message to 1 million users? They all reply at once.

We implement a multi-tiered Rate Limiting strategy.

Tier 1: Platform Limits

Discord allows ~50 requests/sec per bot token. We track this budget in Redis using a Token Bucket algorithm with Lua scripts for atomicity. If a bot exceeds its quota, we pause its outgoing queue, not the whole system.

Tier 2: Tenant Isolation

To prevent one noisy bot from clogging the workers for everyone else, we use Priority Queues.

Premium Bots: Dedicated, high-priority lanes.
Free Bots: Shared lanes with aggressive throttling.

-- Lua script for atomic rate limiting in Redis
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local current = tonumber(redis.call('get', key) or "0")

if current + 1 > limit then
    return 0 -- Rejected
else
    redis.call("INCR", key)
    redis.call("EXPIRE", key, 1) -- Reset every second
    return 1 -- Allowed
end

4. Global State & Latency

BotBuilder operates in 3 major regions: US-East, EU-Central, and Asia-Pacific.

The biggest challenge is Data Locality. If a user in Tokyo talks to a bot hosted in Virginia, the speed of light becomes our enemy (approx. 200ms round trip).

To solve this, we use Geo-Smart Routing:

A user’s session is pinned to the region closest to them upon first interaction.
Our Global Edge Database (CockroachDB) handles replication of configuration data (the flowchart design).
Session data (the chat variables) lives only in the local region’s Redis to ensure single-digit millisecond latency.

5. Summary

Scaling is not about buying bigger servers. It is about architectural elegance.

By breaking down complex bot logic into stateless, atomic events and leveraging an event-driven Kafka backbone, BotBuilder ensures that whether you have 10 users or 10 million, your bot replies instantly.

We worry about the infrastructure so you can focus on the conversation.

Inside the Core - How BotBuilder Scales to Billions of Events