Infrastructure engineer and engineering leader who's reduced costs by $1.5M+ annually while improving performance. I optimize what you have, not sell you more tools.
Most companies are in this bind: cut cloud spending while maintaining (or improving) performance. The usual answer is "migrate to Kubernetes" or "buy another monitoring tool, making things worse.
I've spent a decade optimizing infrastructure at scale, from operating 10,000+ server Hadoop clusters at LinkedIn to avoiding $1.5M in annual costs at CircleCI and cutting operational costs 30% at Scuba. I find the waste, fix the bottlenecks, and make your existing tools actually work. I use AI extensively to accelerate delivery, which means lower engagement costs for you.
Find 20-40% savings in your existing infrastructure
Most cloud waste comes from over-provisioning, idle resources, and inefficient architectures; not from using the wrong vendor. I analyze actual utilization patterns, eliminate waste, and optimize workload placement. Typical engagements find $200K-$2M+ in annual savings without migrations or downtime.
Build internal platforms that accelerate your engineering team
Your developers shouldn't spend 40% of their time on infrastructure. I design self-service platforms that give teams what they need without requiring deep DevOps knowledge. Focus on reducing cognitive load, improving deployment velocity, and eliminating toil through automation that actually fits your workflow.
Stop paying $200K/year for monitoring that doesn't help you debug
You have Datadog/New Relic/etc but incidents still take 4 hours to resolve. The problem isn't the tool — it's what you're measuring and how it's instrumented. I've built custom observability for Presto, Hadoop, and modern distributed systems. I'll make your existing tools useful or tell you what actually needs to change.
Reduce on-call burden while improving availability
Operated infrastructure at 10,000+ servers at LinkedIn and led SRE teams at CircleCI. Reduced operational toil by 40% through better automation and processes. I establish incident response that works, implement SLOs that matter, and build systems that page less while running better.
Fix rushed "vibe coded" projects that shipped fast but need real foundations
Startups ship fast with AI-generated code, then hit walls when scaling. I refactor AI-generated or rapidly prototyped codebases into production-grade systems. I use AI extensively to accelerate work, which means lower engagement costs for you while maintaining quality and architectural rigor. You can't fix vibes without knowing how they're done!
Add structure as you grow from 5 engineers to 50+ without breaking what works
Built and scaled engineering orgs from scratch multiple times. At Scuba, grew from initial team to 25-person org across SRE, DevOps, and feature teams. At CircleCI, created the Security Engineering team and scaled macOS infrastructure team. I establish processes that scale, implement effective on-call rotations, create hiring frameworks, and mentor engineers without adding bureaucracy that slows teams down.
Senior Site Reliability Engineer, Hadoop & Presto
Operated big data infrastructure spanning 10,000+ servers across multiple datacenters. Replaced LinkedIn's Hadoop monitoring stack with custom metrics infrastructure. Defined Hadoop performance standards and kickstarted Presto SRE. This is where I learned to operate at scale.
CircleCI
Staff SRE → Engineering Manager → Interim Director of Security
Started as Staff SRE, then created and led the macOS cloud team focused on cost avoidance and capacity planning—avoiding $1.5M in annual CAPEX. Moved to lead Security Engineering, building the team from Security Operations members. Eventually acted as Interim Director of Security, leading FedRAMP and SOC 2 compliance efforts. This progression taught me to identify problems, build teams to solve them, and deliver measurable business outcomes.
Scuba Analytics
Director of Software Engineering
Led 25-person org across SRE, DevOps, and feature teams. Reduced SRE toil by 40% and operational costs by 30%. Achieved ISO 27001 and SOC 2 Type II certification. Supported enterprise deals with Salesforce, Microsoft, and others through technical customer calls and reliability guarantees.
Behavure AI
Head of Engineering (Current)
Leading engineering for behavioral analytics platform. Back in the codebase contributing to frontend, backend, and infrastructure. Built BehavureGPT for natural language querying. Running lean, focused team with emphasis on fast iteration and customer impact.
Your cloud bill hit $500K-$5M/year and you're not sure where it's all going
Your engineering team spends more time on infrastructure than shipping features
Your CFO wants 20-30% cost cuts but your CTO can't sacrifice reliability
You pay $200K+/year for observability tools but incidents still take hours to debug
You need someone hands-on who fixes problems, not a strategy consultant with slides
You're scaling past 50-200 engineers and infrastructure is becoming a bottleneck
You shipped fast with AI-generated code and now need to make it production-grade
Your engineering team grew from 5 to 20+ and you need structure before chaos sets in
Most projects start with a technical discussion about what's working and what isn't. I will also use AI to accelerate work where it matters, which means you get expertise at lower engagement costs. I'll tell you honestly if I can help or point you elsewhere if I can't.