Real-Time Data Pipelines: When Milliseconds Matter

When Real-Time Actually Means Real-Time

Not Real-Time: Processing data in 5-minute micro-batches and calling it "near real-time."

Actually Real-Time: End-to-end latency from event generation to actionable insight measured in single-digit milliseconds to low seconds.

Use Cases That Demand True Real-Time

Financial Trading Systems:

Market data processing: <10ms
Trade execution: <1ms
Risk calculations: <100ms

Industrial Control Systems:

Sensor data aggregation: <50ms
Anomaly detection: <100ms
Control loop adjustments: <10ms

Fraud Detection:

Transaction scoring: <200ms
Pattern matching: <500ms
Decision enforcement: Immediate

Gaming and Live Experiences:

Player state synchronization: <50ms
Leaderboard updates: <100ms
Event aggregation: <200ms

The Three Hard Problems

Problem 1: State Management at Scale

Challenge: Stream processing is stateless. Business logic requires state. How do you maintain state for millions of entities without killing throughput?

Example: Calculating rolling averages per sensor across 100,000 IoT devices. Each sensor needs independent state, updated every 100ms.

Approaches:

In-Memory State (Fast, Limited):

RocksDB embedded in stream processor
Partitioned by key for parallelism
Periodic snapshots for recovery
Limited by available memory

External State Store (Slower, Scalable):

Redis for hot state access
Cassandra for large state volumes
Trade latency for capacity
Network hops add milliseconds

Hybrid Approach (Optimal):

Hot state in-memory (recent, frequently accessed)
Warm state in Redis (less frequent access)
Cold state in database (historical, rarely accessed)
Tiering based on access patterns

Problem 2: Exactly-Once Processing

Challenge: Distributed systems fail. Messages get duplicated or lost. How do you guarantee each event is processed exactly once?

Naive Approach (At-Least-Once):

Process every message
Duplicate processing on retry
Downstream systems see duplicates

Better Approach (Idempotency):

Make processing operations idempotent
Duplicate processing has no effect
Requires careful design

Best Approach (Exactly-Once Semantics):

Transactional state updates with offset commits
Kafka transactions for multi-stage processing
Higher latency overhead (2-5ms per transaction)

Trade-off: Exactly-once adds latency. Is correctness worth milliseconds?

Problem 3: Backpressure and Flow Control

Challenge: Upstream produces faster than downstream can consume. How do you prevent system collapse while maintaining low latency?

What Happens Without Backpressure:

Queues fill up
Memory exhausted
System crashes
Data loss

Backpressure Strategies:

Buffering (Simple, Limited):

In-memory buffers absorb bursts
Works for temporary spikes
Fails on sustained overload

Rate Limiting (Protective, Blunt):

Throttle upstream to sustainable rate
Prevents overload
Increases latency for all messages

Dynamic Scaling (Flexible, Complex):

Add processing capacity on demand
Kubernetes pod autoscaling
Cloud function concurrency increases
Takes 30-60 seconds to scale

Load Shedding (Aggressive, Necessary):

Drop lowest-priority messages
Maintain SLA for critical data
Requires priority classification

Architecture Patterns That Work

Pattern 1: Lambda Architecture (Batch + Stream)

Structure:

Batch layer: Complete, accurate, slow
Speed layer: Approximate, fast
Serving layer: Merge results

When to Use:

Correctness matters more than speed
Complex aggregations required
Historical reprocessing needed

Trade-offs:

Maintain two processing paths
Eventual consistency by design
Higher operational complexity

Pattern 2: Kappa Architecture (Stream-Only)

Structure:

Single stream processing path
Replay log for reprocessing
No separate batch layer

When to Use:

Low latency critical
Simpler operational model preferred
Stream processing handles all cases

Trade-offs:

Replay entire log for corrections
Less suitable for complex batch operations
Requires mature stream processing

Pattern 3: Staged Event-Driven Architecture

Structure:

Multiple processing stages
Each stage independently scalable
Event bus between stages

Stages:

Ingestion: Validate, normalize, partition
Enrichment: Add context, join with reference data
Processing: Business logic, aggregation
Actuation: Write to outputs, trigger actions

When to Use:

Different stages have different scaling needs
Multiple output destinations
Complex processing pipelines

Trade-offs:

More network hops = higher latency
Debugging spans multiple services
Eventual consistency between stages

Technology Selection by Latency Requirement

<10ms End-to-End Latency

Constraints:

Single datacenter deployment
In-memory processing only
Minimize serialization overhead
Direct memory communication

Technology Stack:

Custom C++/Rust applications
Shared memory IPC
DPDK for network I/O
Avoid general-purpose frameworks

10-100ms End-to-End Latency

Constraints:

Regional deployment acceptable
Some external state lookups viable
Optimized serialization important

Technology Stack:

Apache Flink for stream processing
Redis for state management
Kafka for event streaming
gRPC for inter-service communication

100ms-1s End-to-End Latency

Constraints:

Multi-region deployment possible
Database lookups acceptable
Standard serialization formats work

Technology Stack:

Apache Kafka Streams
PostgreSQL with connection pooling
REST APIs for external calls
Standard JSON serialization

1s+ End-to-End Latency

Constraints:

Latency not critical
Correctness and cost matter more

Technology Stack:

Apache Spark Structured Streaming
Any database system
Batch-oriented optimizations
Cost-effective infrastructure

Monitoring Real-Time Systems

Metrics That Matter

Latency Percentiles:

P50: Typical performance
P95: User experience boundary
P99: Outlier detection
P99.9: Tail latency issues

Do Not Average Latency:

Average hides problems
Use histograms and percentiles
Track full distribution

Throughput Metrics:

Messages per second (current)
Processing capacity (maximum)
Backlog size (queue depth)
Consumer lag (time behind)

Error Rates:

Processing failures
Deserialization errors
External dependency timeouts
Dead letter queue growth

Alerting Strategies

Latency Degradation:

Alert on P99 > threshold
Trend analysis for gradual degradation
Correlate with deployment events

Backlog Growth:

Consumer lag > 60 seconds
Queue depth > capacity threshold
Sustained growth over 5 minutes

Error Rate Spikes:

Error rate > 1% of throughput
Sustained errors for 2+ minutes
Specific error type correlation

Common Anti-Patterns

Anti-Pattern 1: Synchronous External Calls

What Happens:

Stream processor calls REST API per message
Network latency dominates
Throughput collapses

Fix:

Batch lookups where possible
Cache reference data locally
Use async I/O

Anti-Pattern 2: Unbounded State Growth

What Happens:

State grows indefinitely
Memory exhaustion
Processing stops

Fix:

Time-based state eviction
Size-based limits with LRU eviction
Separate hot/cold state tiers

Anti-Pattern 3: Single Point of Failure

What Happens:

Critical component fails
Entire pipeline stops
SLA violated

Fix:

Replicate stateful components
Multi-AZ deployment
Circuit breakers for external dependencies
Graceful degradation modes

Anti-Pattern 4: Ignoring Serialization Cost

What Happens:

Complex serialization format chosen
CPU spent on encode/decode
Throughput limited by serialization

Fix:

Profile serialization overhead
Use efficient formats (Protobuf, Avro)
Consider schema evolution requirements

Key Takeaways

Real-time means different things—define your latency SLA precisely
State management is the hardest problem at scale
Exactly-once processing adds latency—understand the trade-off
Monitor latency percentiles, not averages
Different latency tiers require different technology choices
Backpressure is not optional for production systems

Build for the latency you need, not the latency that sounds impressive. A reliable 100ms system beats an unreliable 10ms system every time.

Related services: StreamMesh, FluxStream, InfraPulse