Scale Your Ops Without
Slowing Down
AI has changed the speed of development. Are your operations keeping up? With over 10+ years of experience managing operations teams and building observability systems from the ground up, I can help you keep moving fast without breaking things.
Top-rated coach on

Don't Wait To Add Quality Before Things Break
Traditional methods of switching between roadmap velocity and tech debt paydown require more time and money to reach the same destination. Worse, the zig-zag between shipping features and fire-fighting creates costly context switches that drain engineering morale and productivity. By building quality incrementally, your customers never detect a visible slowdown in feature velocity, and your engineers stay in a steady, sustainable flow instead of whiplashing between build mode and fix mode.
Old Way
Add quality when things are starting to break
Teams ship fast early on, but as the codebase grows, tech debt accumulates. Eventually they're forced to stop and refactor, losing velocity. The cycle repeats with each growth phase.
My Approach
Build quality incrementally to keep accelerating
By introducing quality gates and automation early, teams avoid the stop-and-fix cycle. Continuous small improvements compound over time, so quality and velocity grow together.
How I Help You Scale
Drawing from experience operationalizing products at scale, I bring enterprise-grade practices to growing tech companies without the enterprise overhead. I can help you build these capabilities into your culture and stack, while you continue to move fast and stay lean.
Detection & Response
From smart alerting to fast incident resolution, build the detection and response capabilities that keep your systems reliable.
- Proactive + Reactive Alerting
- Incident Response Best Practices
- Operational Dashboards & Observability
Deployment & Safety
Ship with confidence through automated pipelines, safety gates, and blast radius controls that catch issues before customers do.
- CI/CD Automation
- Pre/Post Deployment Tests
- Regionalized Service Deployment
- Blast Radius Control
- Feature Flags
- Canary Testing
- Automated Rollbacks
- Progressive Deployments
Culture & Process
Sustainable operations start with ownership culture and the right rituals. Build habits that scale with your team.
- Build Customer Empathy
- Establish Regular Operational Rituals
- Post-mortem Reviews
- Prioritizing Technical Debt
Agentic AI Engineering
Adopt AI coding agents and agentic workflows with the right guardrails, evaluation frameworks, and human-in-the-loop controls.
- Spec-Driven Development Templates
- Automated Code Reviews
- Human-in-the-loop Best Practices
How We Can Work Together
Choose the approach that fits your needs and budget
Consulting
Embed within your team to produce an operational excellence health scorecard, typically 2-4 weeks.
Fractional
Provide hourly-based services to implement recommended changes from the health scorecard.
Retainer
Set number of hours per month for support and other extended services needed by your organization.
About Me
10+ years building and operating large-scale distributed systems. I've been in the trenches as an engineer, managed teams through 100+ high-severity incidents, and learned what it takes to build systems that don't wake you up at 3 AM.
LinkedInDigitalOcean
2022 - Present
Senior Engineering Manager
2025 - Present
Leading engineering strategy for the managed database platform.
Engineering Manager
2024 - 2025
Delivered major product launches including Managed Valkey, Scalable Storage, Storage Autoscaling, and RBAC.
Senior Software Engineer
2022 - 2024
Drove technical execution of a new Managed OpenSearch offering and improved CI/CD deployment success rates.
Achievements
Splunk
2021 - 2022
Software Engineer III
2021 - 2022
Built internal observability infrastructure for all global teams. Designed a service for caching API responses from multiple clouds (AWS, GCP), reducing rate limiting errors.
Achievements
Microsoft
2017 - 2021
Software Engineer II
2019 - 2021
Bing Local Search team. Deployed ML models for metadata inference, improving recall while maintaining high accuracy
Software Engineer I
2017 - 2019
Built scalable ETL pipelines for processing location data at scale.
Achievements
Latest from the Blog
Practical insights on operations, observability, and engineering leadership.
March 1, 2026
Learnings from using Claude for PR reviews
Don’t let PR review be the frontier in agentic engineering that slows you TL;DR Utilizing claude-code-action in a PR workflow (i.e. GitHub Actions) su...
February 26, 2026
CI/CD tiered rollouts to control blast radius
Deploy code to production gradually across regions, not all at once. A tiered rollout strategy with CI/CD job dependencies limits blast radius while k...
February 20, 2026
Canary Deployments with Argo
How to implement progressive delivery with Argo Rollouts, including canary strategy configuration, automated rollback with Prometheus-backed AnalysisT...
February 16, 2026
I Called My Claude Coding Agent Incompetent
Does being rude to your AI coding agent actually hurt its performance? I ran an experiment with Claude Opus 4.6 to find out.
February 9, 2026
Observability 101: Start with Logs
Start your observability journey with logs, not complex distributed tracing. A practical guide to building effective monitoring with tools you already...
February 6, 2026
Rollback First, Ask Questions Later
The fastest way to resolve production incidents: rollback first, investigate later. Lessons from a principal engineer on reducing MTTR.
Technical Skills
Tools and technologies I use to build and operate reliable systems at scale.
Languages
Tools
Platforms
Databases
Ready to Scale Your Operations?
Let's talk about where your company is today, where you want to be, and how to get there without sacrificing speed or quality. Book a free discovery call to explore how we might work together.
What to expect on our call:
- Discuss your current operational challenges
- Identify quick wins and high-impact improvements
- Explore engagement options that fit your needs
Book a Discovery Call
30 minutes to explore how I can help your team
Schedule Your Call
Click below to see available times and book a 30-minute discovery call.
Book a Discovery CallPowered by Calendly
Common Questions
What size companies do you work with?
I focus on small and medium businesses, both tech startups who have found product-market fit and need to operationalize for growth as well as other businesses with software development needs who want to leverage best practices to move faster.
What's the typical engagement length?
Project-based work usually runs 2-4 months. Fractional engagements are ongoing, typically 2-3 days per week, with a minimum 3-month commitment.
Do you work remotely?
Yes, I work remotely with companies globally. I'm flexible on overlap hours to accommodate different time zones.
What if we're not sure what we need?
That's exactly what the discovery call is for. I'll ask questions, listen, and help identify where the biggest opportunities are. No commitment required.