"We need real-time data." I hear this in almost every initial conversation with new clients. And in almost every case, when I dig deeper, what they actually need is "fresher batch."
The Cost of Real-Time
Real-time streaming architecture is 3-5x more expensive than batch. Not just in infrastructure costs, in complexity, maintenance, debugging difficulty, and operational overhead.
A batch pipeline that fails at 3am can wait until morning. A streaming pipeline that fails at 3am is losing data every second it's down. That's a different level of operational commitment.
The Real Cost
Batch Pipeline
- Infrastructure: $500/month
- Operational: Low
- Debug time: Hours
Streaming Pipeline
- Infrastructure: $2,000/month
- Operational: High (24/7)
- Debug time: Days
The Decision Framework
Ask these questions before committing to streaming:
1. What happens if data is 1 hour old?
If the answer is "nothing catastrophic," you probably don't need real-time. Hourly batch covers 90% of use cases that people think need streaming.
2. Who is consuming this data?
If it's executives looking at dashboards, batch is fine. If it's a fraud detection system that needs to block transactions, you need real-time.
3. What's the actual latency requirement?
"Real-time" means different things to different people. Get specific:
- Sub-second: true streaming required (Kafka, Flink)
- Minutes: micro-batch (Spark Structured Streaming)
- Hourly: standard batch, just run more frequently
- Daily: traditional batch, overnight processing
When Real-Time is Actually Required
Some use cases genuinely need streaming:
- ✓ Fraud detection: must block transactions in milliseconds
- ✓ Real-time personalization: recommendations that react to current behavior
- ✓ Operational monitoring: alerting on system health in real-time
- ✓ Trading systems: where milliseconds matter for execution
The Middle Ground: Micro-Batch
If hourly is too slow but true real-time is overkill, consider micro-batch. Process data every 5-15 minutes. You get near-real-time freshness with batch simplicity.
This is often the sweet spot for:
- Operational dashboards for customer support
- Inventory updates for e-commerce
- Marketing attribution for active campaigns
The Bottom Line
Start with batch. Move to micro-batch if needed. Only invest in true streaming when you have a clear business requirement that justifies the complexity.
The money and engineering time you save can go toward things that actually matter: better data quality, more complete coverage, or hiring another analyst who can extract insights from the data you already have.