Scale Your Ops Without
Slowing Down

AI has changed the speed of development. Are your operations keeping up? With over 10+ years of experience managing operations teams and building observability systems from the ground up, I can help you keep moving fast without breaking things.

Book a Discovery Call See My Approach

Formerly at

Top-rated coach on

Don't Wait To Add Quality Before Things Break

Traditional methods of switching between roadmap velocity and tech debt paydown require more time and money to reach the same destination. Worse, the zig-zag between shipping features and fire-fighting creates costly context switches that drain engineering morale and productivity. By building quality incrementally, your customers never detect a visible slowdown in feature velocity, and your engineers stay in a steady, sustainable flow instead of whiplashing between build mode and fix mode.

Old Way

Add quality when things are starting to break

Teams ship fast early on, but as the codebase grows, tech debt accumulates. Eventually they're forced to stop and refactor, losing velocity. The cycle repeats with each growth phase.

My Approach

Build quality incrementally to keep accelerating

By introducing quality gates and automation early, teams avoid the stop-and-fix cycle. Continuous small improvements compound over time, so quality and velocity grow together.

How I Help You Scale

Drawing from experience operationalizing products at scale, I bring enterprise-grade practices to growing tech companies without the enterprise overhead. I can help you build these capabilities into your culture and stack, while you continue to move fast and stay lean.

Detection & Response

From smart alerting to fast incident resolution, build the detection and response capabilities that keep your systems reliable.

Proactive + Reactive Alerting
Incident Response Best Practices
Operational Dashboards & Observability

Deployment & Safety

Ship with confidence through automated pipelines, safety gates, and blast radius controls that catch issues before customers do.

CI/CD Automation
Pre/Post Deployment Tests
Regionalized Service Deployment
Blast Radius Control
Feature Flags
Canary Testing
Automated Rollbacks
Progressive Deployments

Culture & Process

Sustainable operations start with ownership culture and the right rituals. Build habits that scale with your team.

Build Customer Empathy
Establish Regular Operational Rituals
Post-mortem Reviews
Prioritizing Technical Debt

Agentic AI Engineering

Adopt AI coding agents and agentic workflows with the right guardrails, evaluation frameworks, and human-in-the-loop controls.

Spec-Driven Development Templates
Automated Code Reviews
Human-in-the-loop Best Practices

How We Can Work Together

Choose the approach that fits your needs and budget

Consulting

Embed within your team to produce an operational excellence health scorecard, typically 2-4 weeks.

Fractional

Provide hourly-based services to implement recommended changes from the health scorecard.

Retainer

Set number of hours per month for support and other extended services needed by your organization.

About Me

10+ years building and operating large-scale distributed systems. I've been in the trenches as an engineer, managed teams through 100+ high-severity incidents, and learned what it takes to build systems that don't wake you up at 3 AM.

DigitalOcean

2022 - Present

Senior Engineering Manager

2025 - Present

Leading engineering strategy for the managed database platform.

Engineering Manager

2024 - 2025

Delivered major product launches including Managed Valkey, Scalable Storage, Storage Autoscaling, and RBAC.

Senior Software Engineer

2022 - 2024

Drove technical execution of a new Managed OpenSearch offering and improved CI/CD deployment success rates.

Achievements

99.9% -> 99.99% SLA30% YoY revenue growth42% incident reduction36% increase in CI/CD deployment success

Splunk

2021 - 2022

Software Engineer III

2021 - 2022

Built internal observability infrastructure for all global teams. Designed a service for caching API responses from multiple clouds (AWS, GCP), reducing rate limiting errors.

Achievements

85% error reduction in rate limit errors

Microsoft

2017 - 2021

Software Engineer II

2019 - 2021

Bing Local Search team. Deployed ML models for metadata inference, improving recall while maintaining high accuracy

Software Engineer I

2017 - 2019

Built scalable ETL pipelines for processing location data at scale.

Achievements

23% recall improvement in metatdata coverage97%+ accuracy for global restaurant price inference

Latest from the Blog

Practical insights on operations, observability, and engineering leadership.

March 1, 2026

Learnings from using Claude for PR reviews

Don’t let PR review be the frontier in agentic engineering that slows you TL;DR Utilizing claude-code-action in a PR workflow (i.e. GitHub Actions) su...

February 26, 2026

CI/CD tiered rollouts to control blast radius

Deploy code to production gradually across regions, not all at once. A tiered rollout strategy with CI/CD job dependencies limits blast radius while k...

February 20, 2026

Canary Deployments with Argo

How to implement progressive delivery with Argo Rollouts, including canary strategy configuration, automated rollback with Prometheus-backed AnalysisT...

February 16, 2026

I Called My Claude Coding Agent Incompetent

Does being rude to your AI coding agent actually hurt its performance? I ran an experiment with Claude Opus 4.6 to find out.

February 9, 2026

Observability 101: Start with Logs

Start your observability journey with logs, not complex distributed tracing. A practical guide to building effective monitoring with tools you already...

February 6, 2026

Rollback First, Ask Questions Later

The fastest way to resolve production incidents: rollback first, investigate later. Lessons from a principal engineer on reducing MTTR.

Read All Posts

Technical Skills

Tools and technologies I use to build and operate reliable systems at scale.

Languages

Java

Python

Terraform

JavaScript

Tools

Kubernetes

Docker

React

Git

Grafana

Kafka

Temporal

gRPC

Platforms

DigitalOcean

AWS

Azure

GitHub

GitLab

Databases

PostgreSQL

MySQL

Prometheus

OpenSearch

Elastic

Redis

MongoDB

Ready to Scale Your Operations?

Let's talk about where your company is today, where you want to be, and how to get there without sacrificing speed or quality. Book a free discovery call to explore how we might work together.

What to expect on our call:

Discuss your current operational challenges
Identify quick wins and high-impact improvements
Explore engagement options that fit your needs

Book a Discovery Call

30 minutes to explore how I can help your team

Schedule Your Call

Click below to see available times and book a 30-minute discovery call.

Book a Discovery Call

Have context to share?

Fill out questionnaire

Prefer email?

[email protected]

Common Questions

What size companies do you work with?

I focus on small and medium businesses, both tech startups who have found product-market fit and need to operationalize for growth as well as other businesses with software development needs who want to leverage best practices to move faster.

What's the typical engagement length?

Project-based work usually runs 2-4 months. Fractional engagements are ongoing, typically 2-3 days per week, with a minimum 3-month commitment.

Do you work remotely?

Yes, I work remotely with companies globally. I'm flexible on overlap hours to accommodate different time zones.

What if we're not sure what we need?

That's exactly what the discovery call is for. I'll ask questions, listen, and help identify where the biggest opportunities are. No commitment required.

Scale Your Ops WithoutSlowing Down

Don't Wait To Add Quality Before Things Break

Old Way

My Approach

How I Help You Scale

Detection & Response

Deployment & Safety

Culture & Process

Agentic AI Engineering

How We Can Work Together

Consulting

Fractional

Retainer

About Me

DigitalOcean

Splunk

Microsoft

Latest from the Blog

Learnings from using Claude for PR reviews

CI/CD tiered rollouts to control blast radius

Canary Deployments with Argo

I Called My Claude Coding Agent Incompetent

Observability 101: Start with Logs

Rollback First, Ask Questions Later

Technical Skills

Languages

Tools

Platforms

Databases

Ready to Scale Your Operations?

What to expect on our call:

Book a Discovery Call

Schedule Your Call

Have context to share?

Prefer email?

Common Questions

What size companies do you work with?

What's the typical engagement length?

Do you work remotely?

What if we're not sure what we need?

Scale Your Ops Without
Slowing Down