Data Platforms

The foundation for everything else. Warehouse, lake, or lakehouse, designed for your actual needs, not vendor benchmarks.

Context

Why this matters

A well-designed data platform is the foundation for analytics, ML, and operational reporting. Get it wrong, and every downstream use case becomes painful. Get it right, and you've built something that scales with your business.

The key is matching the platform to your stage. A Series A startup doesn't need the same infrastructure as a 500-person company. I design for where you are today with a path to where you're going.

Capabilities

What I build

Data Warehouse

Snowflake, BigQuery, Redshift, or PostgreSQL, depending on what fits. Schema design, modeling, and optimization.

Lakehouse Architecture

Delta Lake, Iceberg on Databricks or cloud-native. Bronze/silver/gold layers with defined data contracts.

Infrastructure as Code

Everything in Terraform, CDK, or Pulumi. Reproducible, version-controlled, and auditable infrastructure.

Cloud Setup

AWS, Azure, or GCP: proper account structure, networking, security, and cost controls from day one.

Cost Optimization

Right-sizing compute, storage tiering, reservation strategies. I've achieved 50% cost reductions through optimization.

Data Governance

Catalog implementation, lineage tracking, access control. Built-in compliance and audit capabilities.

Philosophy

Right-sized for your stage

Early Stage

< 50GB data

PostgreSQL is usually enough as both operational database and lightweight analytics target. dbt handles transformations from day one so you can swap the warehouse later without rewriting your models. Metabase on top for reporting.

~$300-700/month infrastructure

Growth Stage

50GB - 1TB data

Time for a proper team warehouse. MotherDuck on top of Postgres gives you the DuckDB engine with managed storage, sharing, and DuckLake-based open format. dbt stays the transformation layer, so the warehouse is swappable. No lock-in if you outgrow it.

~$600-1,500/month self-hosted, up to $2k fully managed

Scale Stage

> 1TB data or specific triggers

Snowflake, Databricks, or BigQuery start earning their cost when you hit 20+ concurrent analysts, heavy governance requirements, multi-engine access needs, or transforms that outgrow a single node.

$3k+/month infrastructure

Stack

Technologies I work with

Platforms

Databricks Snowflake BigQuery PostgreSQL DuckDB MotherDuck

Infrastructure

Terraform AWS CDK Kubernetes Docker

Cloud

AWS Azure GCP

Building a data platform?

Let's make sure you build the right one for your stage.