Cut your cloud bill by 30%
without slowing down
your team.

Infrastructure engineer and engineering leader who's reduced costs by $1.5M+ annually while improving performance. I optimize what you have, not sell you more tools.

Your CFO wants infrastructure costs down 30%. Your CTO needs reliability up.

Most companies are in this bind: cut cloud spending while maintaining (or improving) performance. The usual answer is "migrate to Kubernetes" or "buy another monitoring tool, making things worse.

I've spent a decade optimizing infrastructure at scale, from operating 10,000+ server Hadoop clusters at LinkedIn to avoiding $1.5M in annual costs at CircleCI and cutting operational costs 30% at Scuba. I find the waste, fix the bottlenecks, and make your existing tools actually work. I use AI extensively to accelerate delivery, which means lower engagement costs for you.

Cloud Cost Optimization

Find 20-40% savings in your existing infrastructure

Most cloud waste comes from over-provisioning, idle resources, and inefficient architectures; not from using the wrong vendor. I analyze actual utilization patterns, eliminate waste, and optimize workload placement. Typical engagements find $200K-$2M+ in annual savings without migrations or downtime.

Platform Engineering

Build internal platforms that accelerate your engineering team

Your developers shouldn't spend 40% of their time on infrastructure. I design self-service platforms that give teams what they need without requiring deep DevOps knowledge. Focus on reducing cognitive load, improving deployment velocity, and eliminating toil through automation that actually fits your workflow.

Observability That Works

Stop paying $200K/year for monitoring that doesn't help you debug

You have Datadog/New Relic/etc but incidents still take 4 hours to resolve. The problem isn't the tool — it's what you're measuring and how it's instrumented. I've built custom observability for Presto, Hadoop, and modern distributed systems. I'll make your existing tools useful or tell you what actually needs to change.

Reliability & Incident Response

Reduce on-call burden while improving availability

Operated infrastructure at 10,000+ servers at LinkedIn and led SRE teams at CircleCI. Reduced operational toil by 40% through better automation and processes. I establish incident response that works, implement SLOs that matter, and build systems that page less while running better.

AI-Native Code Repair

Fix rushed "vibe coded" projects that shipped fast but need real foundations

Startups ship fast with AI-generated code, then hit walls when scaling. I refactor AI-generated or rapidly prototyped codebases into production-grade systems. I use AI extensively to accelerate work, which means lower engagement costs for you while maintaining quality and architectural rigor. You can't fix vibes without knowing how they're done!

Engineering Leadership & Team Scaling

Add structure as you grow from 5 engineers to 50+ without breaking what works

Built and scaled engineering orgs from scratch multiple times. At Scuba, grew from initial team to 25-person org across SRE, DevOps, and feature teams. At CircleCI, created the Security Engineering team and scaled macOS infrastructure team. I establish processes that scale, implement effective on-call rotations, create hiring frameworks, and mentor engineers without adding bureaucracy that slows teams down.

Where I've done this before

LinkedIn

Senior Site Reliability Engineer, Hadoop & Presto

Operated big data infrastructure spanning 10,000+ servers across multiple datacenters. Replaced LinkedIn's Hadoop monitoring stack with custom metrics infrastructure. Defined Hadoop performance standards and kickstarted Presto SRE. This is where I learned to operate at scale.

CircleCI

Staff SRE → Engineering Manager → Interim Director of Security

Started as Staff SRE, then created and led the macOS cloud team focused on cost avoidance and capacity planning—avoiding $1.5M in annual CAPEX. Moved to lead Security Engineering, building the team from Security Operations members. Eventually acted as Interim Director of Security, leading FedRAMP and SOC 2 compliance efforts. This progression taught me to identify problems, build teams to solve them, and deliver measurable business outcomes.

Scuba Analytics

Director of Software Engineering

Led 25-person org across SRE, DevOps, and feature teams. Reduced SRE toil by 40% and operational costs by 30%. Achieved ISO 27001 and SOC 2 Type II certification. Supported enterprise deals with Salesforce, Microsoft, and others through technical customer calls and reliability guarantees.

Behavure AI

Head of Engineering (Current)

Leading engineering for behavioral analytics platform. Back in the codebase contributing to frontend, backend, and infrastructure. Built BehavureGPT for natural language querying. Running lean, focused team with emphasis on fast iteration and customer impact.

This is for you if

Your cloud bill hit $500K-$5M/year and you're not sure where it's all going

Your engineering team spends more time on infrastructure than shipping features

Your CFO wants 20-30% cost cuts but your CTO can't sacrifice reliability

You pay $200K+/year for observability tools but incidents still take hours to debug

You need someone hands-on who fixes problems, not a strategy consultant with slides

You're scaling past 50-200 engineers and infrastructure is becoming a bottleneck

You shipped fast with AI-generated code and now need to make it production-grade

Your engineering team grew from 5 to 20+ and you need structure before chaos sets in

Let's talk about your infrastructure

Most projects start with a technical discussion about what's working and what isn't. I will also use AI to accelerate work where it matters, which means you get expertise at lower engagement costs. I'll tell you honestly if I can help or point you elsewhere if I can't.