Back to Home

Blog

Thoughts on operations, engineering leadership, and scaling tech companies.

Learnings from using Claude for PR reviews

March 1, 2026

Learnings from using Claude for PR reviews

Don’t let PR review be the frontier in agentic engineering that slows you TL;DR Utilizing claude-code-action in a PR workflow (i.e. GitHub Actions) supports automatically doing code reviews for a PR....

claude-codecode-reviewgithub-actions
CI/CD tiered rollouts to control blast radius

February 26, 2026

CI/CD tiered rollouts to control blast radius

Deploy code to production gradually across regions, not all at once. A tiered rollout strategy with CI/CD job dependencies limits blast radius while keeping deployment velocity high.

DevOpsCI-CDprogressive-delivery
Canary Deployments with Argo

February 20, 2026

Canary Deployments with Argo

How to implement progressive delivery with Argo Rollouts, including canary strategy configuration, automated rollback with Prometheus-backed AnalysisTemplates, and practical considerations for first deploys and low-traffic services.

canary-deploymentsargokubernetes
I Called My Claude Coding Agent Incompetent

February 16, 2026

I Called My Claude Coding Agent Incompetent

Does being rude to your AI coding agent actually hurt its performance? I ran an experiment with Claude Opus 4.6 to find out.

claude-codeai-agentai
Observability 101: Start with Logs

February 9, 2026

Observability 101: Start with Logs

Start your observability journey with logs, not complex distributed tracing. A practical guide to building effective monitoring with tools you already have.

logsalertsopensearch
Rollback First, Ask Questions Later

February 6, 2026

Rollback First, Ask Questions Later

The fastest way to resolve production incidents: rollback first, investigate later. Lessons from a principal engineer on reducing MTTR.

operational-excellencemttrincident-response
Your Alerts Are Lying to You: Why More Monitoring Won’t Save Your On-Call Engineers

February 3, 2026

Your Alerts Are Lying to You: Why More Monitoring Won’t Save Your On-Call Engineers

Why adding more monitoring won't fix your on-call problems. The uncomfortable truth about alerting strategies and how to build alerts that actually matter.

software-engineeringoperational-excellenceprometheus
Grafana vs. Prometheus Agent

May 2, 2023

Grafana vs. Prometheus Agent

Compare Grafana Agent and Prometheus Agent for metrics collection. Key differences in architecture, use cases, and when to choose each for your observability stack.

grafanaobservabilitygrafana-agent
Think about solutions, not code

May 13, 2018

Think about solutions, not code

Before writing code, understand the problem. How design thinking and solution-first approaches lead to better software engineering outcomes.

codedesign-thinkingeducation