Case study
Telemetry pipeline for a connected hardware manufacturer
Turning raw device logs from a global fleet of over four million units into analytics-ready tables that the business could actually plan around.
4M+
devices in fleet
5 yrs
engagement
Lead
role
AWS
platform
Context
A connected fleet, growing fast
The client was a consumer hardware manufacturer with a connected device installed in millions of homes around the world. Each device emitted semi-structured log data, and the volume kept growing as the fleet did.
The data was rich but not usable. It sat in storage as raw files, with no consistent schema and no way for the analytics, product, and support teams to ask straightforward questions of it.
I joined the engagement early, eventually moved into the principal engineer and project lead role, and stayed with it for five years.
Problem
Logs in, nothing out
Three things had to be true at the same time. The pipeline had to scale with the fleet, which meant doubling roughly every couple of years. It had to produce stable, well-shaped tables that could feed everything from personalised recommendations to error analyses to management reports. And it had to be operable, because no one wanted to be paged on a Sunday because a single device shipped a malformed payload.
Most of the failure modes came from the data itself. Schemas drifted as firmware versions rolled out. Late-arriving and out-of-order events were normal, not exceptional. Reprocessing a year of history had to be cheap enough that we would actually do it when a definition changed.
Approach
Boring on purpose
We built the pipeline on Spark and AWS, with a clear separation between the raw landing layer, a normalised event layer, and the curated tables that downstream consumers actually queried. Each layer had its own contract, and changes flowed through a review process rather than landing as surprises in production.
CI/CD and automated testing were not optional. Every change shipped through GitLab pipelines, with unit and integration tests catching most of the schema and edge-case issues before they hit a real run. Monitoring went into Grafana and Instana, with alerting routed through OpsGenie so the on-call rotation always had context.
Once the foundation was stable, the work shifted toward growing the team and the system together. I moved into the principal and lead role, set the engineering standards, and made sure the next set of engineers could ship without me holding the pen on every change.
Deliverables
What was shipped
ETL codebase
Spark and Java jobs transforming semi-structured logs into a layered data model, with unit tests, integration tests, and clear ownership.
Analytics-ready tables
Curated tables for personalised recommendations, error analysis, and management dashboards, documented and versioned.
CI/CD and quality gates
GitLab pipelines, automated test suites, and code review standards that made deployments routine rather than risky.
Monitoring and on-call
Grafana and Instana dashboards, OpsGenie routing, and runbooks the team could follow at three in the morning without calling for help.
Team and process
Mentored engineers, established review and release practices, and led the project as the system and team scaled together.
Operational track record
Five years of running the pipeline through fleet growth, firmware changes, and product pivots without losing trust in the numbers.
Outcome
Numbers people trust
By the time I rolled off, the pipeline was processing logs from over four million devices and feeding multiple downstream products. Personalised recommendations, error and quality reporting, and the management view of the fleet all sat on top of the same curated tables.
The bigger win was less visible. The team had moved from heroic firefighting to routine releases. New engineers could onboard in days rather than weeks. The client could ask new questions of the data and get an answer in the same sprint, not the next quarter.
Stack
Technologies
Offered today as
If this sounds like your problem
Engagements like this one map onto a few of the services I run today. Most projects start with one of these and grow from there.
Data Pipelines
Batch and streaming pipelines built to run, scale, and stay understandable.
Data Platforms
Foundations that downstream teams can build on without paging the platform team.
Data Strategy
Architecture reviews and roadmap work for teams who want to make the right call before they pour concrete.
Sitting on a pile of telemetry?
Happy to walk through what a sensible first step looks like.