About Brian Teller
The voice behind Ship It Weekly.
- 46Episodes published
- 25+Years in production
- 2Industry ambassadorships
Why Ship It Weekly exists
Brian started Ship It Weekly because most tech news does a decent job saying what happened, but not always why it matters to the people who actually have to run the systems afterward. A new cloud feature, a GitHub outage, a security advisory, an AI tooling release, a supply chain incident, a Kubernetes change, or a weird platform failure can all sound interesting in a headline. For the people on-call, managing infra, supporting developers, watching costs, and trying not to break production on a Friday, the real question is usually simpler: What does this mean for my team? That is the lens Brian brings to every episode.
Brian has spent years working in real production environments across cloud, infrastructure, automation, and reliability. Day-to-day work has covered Terraform and Terragrunt, AWS, Kubernetes, EKS, Kafka, CI/CD, GitHub, incident response, infrastructure guardrails, cost management, platform patterns, and the messy operational details that rarely show up in clean conference demos. Brian has worked as both a hands-on engineer and a technical mentor, helping teams make better decisions around infrastructure design, reliability, security, and delivery.
Ship It Weekly is built for the engineers, SREs, platform teams, DevOps folks, cloud engineers, technical leaders, and curious practitioners who want more than a headline recap. The show filters the noise down to the stories that matter for infrastructure, reliability, security, cost, engineering workflows, and production operations. Some weeks that means a major outage. Some weeks it means digging into GitHub Actions, AI agents, Terraform, Kubernetes, AWS, supply chain risk, or why a "small" tooling change can become a very big production problem.
Brian's style is practical, opinionated, and grounded in the reality of doing the work—not trying to sound like an analyst reading market notes. The focus is tradeoffs, failure modes, operational risk, team impact, and the "okay, what should we actually do with this?" part that often gets skipped.
That usually means asking questions like: Does this change how we build pipelines? Does this affect our blast radius? Should we be tightening permissions? Is this a real platform shift, or just vendor noise? What would I want my team to know before this becomes our incident?
Outside Ship It Weekly, Brian creates DevOps and cloud engineering content through Teller's Tech, with a focus on practical education instead of lab-only demos. The goal is to help engineers connect concepts to the kind of decisions they actually face in production: how to structure Terraform, how to think about reliability, how to avoid fragile automation, how to use AI without outsourcing judgment, and how to build systems that teams can safely operate over time.
At its core, Brian's work is about making infrastructure and operations conversations more useful. Less hype. Less vague "best practices." More context, more judgment, and more respect for the people carrying the pager.
Track record: production systems, leadership, and communication
Production infrastructure
Brian has co-led large-scale AWS → GCP migrations and owned Kafka platform engineering through vendor transitions (Confluent Cloud → MSK → Confluent Cloud with private networking), including Schema Registry and MirrorMaker2 / Replicator during cutover windows. He has supported public-company readiness from an infrastructure and platform angle, run SOC2 evidence programs across multiple audit cycles, and helped prepare teams for SOX expectations. Earlier in his career he operated AWS at depth and contributed heavily to PCI and SOC2 programs in HIPAA-certified environments, and he has led disaster recovery work with explicit RTO/RPO targets. As CTO of a digital-signage company, he led a zero-downtime migration to the cloud.
Leadership under pressure
Brian has operated as CTO and engineering manager, leading technical programs where clarity and accountability matter as much as architecture. Earlier leadership experience included running high-volume restaurant operations with teams up to ~60. Different domain, same lessons in shift orchestration, standards under stress, and customer-visible incidents when something breaks in front of the customer.
Early technical roots
Brian's first paid work in tech came during high school at fred.net (later xecu.net), a Frederick-area dial-up ISP doing front-line support, Unix administration, colo, and customer website hosting. In high school he also managed Unix mail servers and was president of the web club. He has been doing production-minded tech work since before DevOps was a job title.
Broadcast and communication
Brian came up through radio: a college show on XTSR at Towson University with a dormmate; intern to associate morning show producer at a major Washington, DC station (then Z104 / WWVZ–WWZZ); hosted evening (7–11pm) and Sunday mornings on Key 103.1 in Frederick, Maryland; and spent years as a mobile DJ for schools, weddings, and events. Live timing, reading a room, and staying composed when things break on air. Photos and audio from that era are below in On air & behind the mic.
Outside work
Brian coaches youth football. Married, four kids. Fundamentals, repetition, and calm communication carry the same whether you are on a sideline, in a war room, or at the dinner table.
On air & behind the mic
Before infrastructure leadership and podcast hosting, Brian came up through college radio, major-market production in Washington, DC, and Frederick’s Key 103.1 — plus years as a mobile DJ. The photos and audio clips below are archive samples from that era (roughly 15–20 years ago). For how he sounds today on DevOps and platform topics, listen to Ship It Weekly.
Audio samples
-
XTSR promo
Junior year at Towson University — college show on XTSR with a dormmate.
-
Quiznos commercial
Key 103.1, Frederick, Maryland — on-air commercial voice work.
-
Carroll Manor Fire Company commercial
Key 103.1, Frederick, Maryland — on-air commercial voice work.
-
Pentagon area report (September 13, 2001)
Z104 (Washington, DC) — news intern coverage after 9/11, including a bomb-threat situation on 9/13/2001.
What I cover on Ship It Weekly
- DevOps
- Site Reliability Engineering
- Platform Engineering
- AI in Operations
- Cloud Engineering
- CI/CD
- Observability
- Incident Response
- Kubernetes
- Infrastructure as Code
- Production Engineering
Available for talks & engagements
Brian also speaks at conferences, internal engineering all-hands, leadership offsites, and on podcasts — same operator-focused lens, in person.
Currently writing
Confidently Wrong — a practical book for DevOps, SRE, platform, and infrastructure engineers on using AI safely across Terraform, Kubernetes, CI/CD, agentic workflows, and operational decisions. Same operator lens you hear on the show, in long form.
Building in the labs
Teller's Tech Labs is where I build practical tools and training systems for DevOps, SRE, platform engineering, and AI-era operational judgment. Flagship: Code Duck, an AI-driven incident simulator that helps engineers practice production judgment without breaking production. Currently in early access.
Also runs
lmgt.org and lmgt.com — long-running “let me google that” pages that send a steady trickle of search traffic back to Ship It Weekly and this bio. Built once, maintained occasionally, useful indefinitely.
Recent episodes
Ship It Conversations: Meta’s Francois Richard on AI Incident Response, SLOs, and Reliability at Scale
In this episode, Francois Richard from Meta discusses the evolving landscape of reliability at scale, particularly with AI's impact on production risks. He emphasizes the importance of recovery practices alongside prevention, and how SLOs should reflect a commitment to users.
Coinbase Outage, Meta AI Account Recovery, AWS AgentCore Code Injection, Apigee Tenant Isolation, and the Glue That Breaks Production
This episode of Ship It Weekly discusses critical infrastructure failures and their implications. Brian analyzes Coinbase's outage due to an AWS cooling failure, Meta's AI-driven account recovery issues, and vulnerabilities in AWS AgentCore and Google Apigee.
Kiro CLI Approval Bypass, Amazon Braket Pickle Risk, AWS Org Logging, KEDA Upgrades, and Automation’s Hidden Boundaries
This episode of Ship It Weekly explores automation's hidden boundaries, focusing on Kiro CLI's CVE-2026-9255 approval bypass and Amazon Braket's Python pickle risk.
Listen wherever you get podcasts
Never miss an episode
New episodes weekly. Real conversations and news for engineers running production systems.
Find Ship It Weekly on your platform →Work with me
Be a Guest
Want to share your DevOps journey on Ship It Weekly? We're looking for passionate engineers to interview!
Apply to be a Guest →Become a Sponsor
Reach thousands of DevOps, Platform, and Cloud Engineering professionals. Partner with Ship It Weekly!
Talk Sponsorship →Connect with Teller's Tech
Follow the show on the platforms where the conversation actually happens.
Practitioner conversations
Where Brian shows up in practitioner threads: recent Reddit replies from /u/tellerstech across the engineering subreddits. Comments only — episode posts and self-promotion threads are filtered out.
-
Re: What do you guys recommend for rightsizing and autoscaling workloads in k8s?
Yeah by locked down I mostly mean dont let Traefik make everything accidentally public lol.
View thread on Reddit →
For Traefik I’d look at middlewares: IPAllowList, BasicAuth, ForwardAuth/oauth2-proxy, rate limits, headers, etc. Also make sure the dashboard/API isnt exposed to the internet.
My rule is public apps can be public. Admin stuff, dashboards, db tools, metrics, etc should be VPN/Tailscale only or behind real auth. -
Re: What do you guys recommend for rightsizing and autoscaling workloads in k8s?
Yeah def learn both, but they solve different problems.
View thread on Reddit →
NetworkPolicies help with pod-to-pod traffic, like “this random compromised app pod shouldnt be able to talk to my db.” RBAC is more about what pods/users/service accounts can do against the Kubernetes API.
For blast radius I’d prob do… default deny network policies, only allow what each app actually needs, separate namespaces, dont use default service accounts, turn off automount service account tokens where you d... -
Re: What do you guys recommend for rightsizing and autoscaling workloads in k8s?
For a hobby cluster I’d prob start with Goldilocks or VPA in recommend mode. Easy enough to mess with and it’ll give you decent starting points.
View thread on Reddit →
If you already have Prometheus running, Robusta KRR is cool too.
Only thing I’d be careful with is limits. These tools are usually better at suggesting requests than giving you perfect limits. Memory limits especially can bite you if you get too cute. -
Re: What do you guys recommend for rightsizing and autoscaling workloads in k8s?
Yep exactly, this is the better way to say what I meant. I’ve just seen too many people see 137 and instantly go “OOM” and then stop digging.
View thread on Reddit →
Half the battle is getting folks to look at the pod status/events instead of treating exit codes like the whole RCA lol. -
Re: How do you track FinOps recommendation ownership after your cost tool finds savings?
Yep thats been what I’ve seen too…
View thread on Reddit →
Tags get you to “ok this probably belongs to this team/app”, but thats the easy part. The messy bit is getting someone to actually pick it up, agree the change wont break stuff, and then check if the bill actually moved after.
Also some cost recs look great in the tool, then once you talk to the app team it’s like “oh yeah we cant do that because reasons” lol. So having a real owner to sanity check it matters a lot. -
Re: Is there anyone else struggling with DevOps hiring timelines? I need HELP
Yeah this doesnt usually “sort itself out” imo. The missing infra person just becomes hidden tax on everyone else.
View thread on Reddit →
That $200k looks saved until you add slower deploys, weekend debugging, half-owned reliability work, and product engineers context switching into stuff they dont really want to own.
I’d push to either speed up the hire, bring in a contractor for the worst pain, or make the scope smaller and more realistic. Otherwise you’re just paying for it in worse ways. -
Re: How do you track FinOps recommendation ownership after your cost tool finds savings?
Not a FinOps engineer, but I’ve been pulled into a bunch of this from the infra side.
View thread on Reddit →
IMO it starts with tagging. Resource tags are table stakes, but getting down to k8s workloads / apps / teams is way better. Same idea for Airflow DAGs or any shared compute thing. If nobody can tell who owns the cost, the recomendation just dies.
For tracking, Jira usually works better than a spreadsheet once it’s actionable. Owner, status, PR links, comments, all that.
For proving savings... -
Re: What do you guys recommend for rightsizing and autoscaling workloads in k8s?
Been through this a bit. I’d start small and prove safety first, not try to rightsize the whole cluster at once.
View thread on Reddit →
Pick a few low-risk services, look at p95/p99 over a real window, then lower requests slowly with some headroom. Memory I’d be way more careful with than CPU. Dont go from 20GB to 10GB just because the graph says it peaks at 10GB.
If you’re on AWS, Karpenter is a solid option for node autoscaling. For workloads, HPA is fine for boring CPU/memory stuff, KEDA is...