Data Pipelines

The Real-Time Myth

Why most "real-time requirements" are actually "faster batch" requirements, and how to tell the difference before committing to 3x more complexity.

6 min read

"We need real-time data." I hear this in almost every initial conversation with new clients. And in almost every case, when I dig deeper, what they actually need is "fresher batch."

The Cost of Real-Time

Real-time streaming architecture is 3-5x more expensive than batch. Not just in infrastructure costs, in complexity, maintenance, debugging difficulty, and operational overhead.

A batch pipeline that fails at 3am can wait until morning. A streaming pipeline that fails at 3am is losing data every second it's down. That's a different level of operational commitment.

The Real Cost

Batch Pipeline

  • Infrastructure: $500/month
  • Operational: Low
  • Debug time: Hours

Streaming Pipeline

  • Infrastructure: $2,000/month
  • Operational: High (24/7)
  • Debug time: Days

The Decision Framework

Ask these questions before committing to streaming:

1. What happens if data is 1 hour old?

If the answer is "nothing catastrophic," you probably don't need real-time. Hourly batch covers 90% of use cases that people think need streaming.

2. Who is consuming this data?

If it's executives looking at dashboards, batch is fine. If it's a fraud detection system that needs to block transactions, you need real-time.

3. What's the actual latency requirement?

"Real-time" means different things to different people. Get specific:

  • Sub-second: true streaming required (Kafka, Flink)
  • Minutes: micro-batch (Spark Structured Streaming)
  • Hourly: standard batch, just run more frequently
  • Daily: traditional batch, overnight processing

When Real-Time is Actually Required

Some use cases genuinely need streaming:

  • Fraud detection: must block transactions in milliseconds
  • Real-time personalization: recommendations that react to current behavior
  • Operational monitoring: alerting on system health in real-time
  • Trading systems: where milliseconds matter for execution

The Middle Ground: Micro-Batch

If hourly is too slow but true real-time is overkill, consider micro-batch. Process data every 5-15 minutes. You get near-real-time freshness with batch simplicity.

This is often the sweet spot for:

  • Operational dashboards for customer support
  • Inventory updates for e-commerce
  • Marketing attribution for active campaigns

The Bottom Line

Start with batch. Move to micro-batch if needed. Only invest in true streaming when you have a clear business requirement that justifies the complexity.

The money and engineering time you save can go toward things that actually matter: better data quality, more complete coverage, or hiring another analyst who can extract insights from the data you already have.

Trying to figure out batch vs. streaming?

Let's talk about your actual requirements.

Get in Touch