Host Commentary

For this Ship It Weekly episode, I wanted to lean into something I think a lot of us are feeling right now.

The “hard” parts of infra are still hard, but the stuff that keeps biting teams is the glue. The little config assumptions. The trigger rules. The cert renewals that “should be automated.” The Helm defaults everyone copy-pasted three years ago. The automation tool that quietly turned into a control plane holding every credential.

That’s basically the connective tissue for all four stories.

CodeBreach is the cleanest example of “glue is the boundary.” AWS’s bulletin is pretty straightforward, but the bigger takeaway is the part nobody wants to admit: CI triggers are now authorization. If an untrusted event can cause a trusted pipeline to run, you’ve built an execution environment exposed to the internet, and you’re hoping the YAML doesn’t have a footgun in it. That’s why the Wiz angle matters too. It frames how quickly “just CI” can become “supply chain.”
AWS bulletin: https://aws.amazon.com/security/security-bulletins/2026-002-AWS/
Wiz research: https://www.wiz.io/blog/wiz-research-codebreach-vulnerability-aws-codebuild

The Bazel cert expiry story is the other flavor of glue failure. It’s boring, it’s binary, and it’s absolutely capable of blocking an entire engineering org. The part I keep coming back to is that “auto renew” is not a system. It’s a single step. The real system is issuance, renewal, deployment, reload, and verification, with monitoring that checks what users actually hit. Most orgs have 2 out of 5 and call it done.
Postmortem: https://blog.bazel.build/2026/01/16/ssl-cert-expiry.html
Extra context that pairs well with this: https://surfingcomplexity.blog/2025/12/27/the-dangers-of-ssl-certificates/

Helm chart reliability is the one that feels less spicy, but it’s probably the most “this will save you pain next month” story. Prequel’s report reads like a postmortem generator because these are exactly the failure modes that stretch incidents out. Missing requests means unpredictable scheduling and throttling. Weak probes means traffic going to pods that aren’t actually ready. Missing disruption guardrails means node drains can turn into brownouts. And the worst part is how often Kubernetes looks “green” while your app is melting.
Report: https://www.prequel.dev/blog-post/the-real-state-of-helm-chart-reliability-2025-hidden-risks-in-100-open-source-charts

Then n8n. This one keeps coming up on the show for a reason. Workflow automation tools feel harmless until you remember what they really are. They’re a control plane that holds credentials and runs code on behalf of humans. So when you see sandbox escape + code execution, “authenticated” should not make you relax. It should make you ask who can author workflows and what those workflows can touch. Patch fast, narrow authorship, reduce exposure, and least-privilege the credentials it holds.
The Hacker News coverage: https://thehackernews.com/2026/01/two-high-severity-n8n-flaws-allow.html
Canadian advisory (nice summary + tracking): https://www.cyber.gc.ca/en/alerts-advisories/al26-001-vulnerabilities-affecting-n8n-cve-2026-21858-cve-2026-21877-cve-2025-68613
JFrog technical write-up: https://research.jfrog.com/post/achieving-remote-code-execution-on-n8n-via-sandbox-escape/

If you listened to the last two n8n CVE episodes, this is the same theme, just the “next” set of vulns. It’s not about n8n being uniquely bad. It’s that the category is high leverage, and if it’s inside your network with credentials, you treat it like production software.
Past episode 1: S1E12Jan 9, 2026⏱️ 16:18n8n Critical CVE (CVE-2026-21858), AWS GPU Capacity Blocks Price Hike, Netflix TemporalEpisode: n8n Critical CVE (CVE-2026-21858), AWS GPU Capacity Blocks Price Hike, Netflix Temporal
Past episode 2: S1E14Jan 16, 2026⏱️ 12:28n8n Auth RCE (CVE-2026-21877), GitHub Artifact Permissions, and AWS DevOps Agent LessonsEpisode: n8n Auth RCE (CVE-2026-21877), GitHub Artifact Permissions, and AWS DevOps Agent Lessons

Lightning round was intentionally “operator-friendly” this week.

Fence is the kind of primitive I want more of. Safe-by-default beats clever-by-default, especially if you’re experimenting with agents or runbooks that execute.
Fence: https://github.com/Use-Tusk/fence

HashiCorp agent-skills is interesting because it’s a vendor leaning into guardrails and reusable skills instead of “just prompt better.” That’s the right direction if we’re going to operationalize agent workflows without lighting things on fire.
agent-skills: https://github.com/hashicorp/agent-skills

marimo hits a real pain point for me: incident analysis and ops experiments are genuinely useful in notebook form, but notebook JSON is awful to version and review. “Notebook, but it’s just Python files” is a good trade.
marimo: https://marimo.io/

And the Claude loop story is funny, but it’s also the warning label. People are already building “keep going until it works” loops. If you don’t put constraints and verification around that, you get confident nonsense at scale. The failure mode is basically alert fatigue, but for decisions.
The Register: https://www.theregister.com/2026/01/27/ralph_wiggum_claude_loops/

So yeah. Big picture, I think this is the real job now. Not “automate everything.” It’s “treat guardrails like product work.” Because the moment your glue gets fast and automated, failures get fast too. And if you only build accelerators, you’re not building a better platform. You’re building a faster incident.

More episodes and links live here: https://shipitweekly.fm

Show Notes

This week on Ship It Weekly, Brian looks at four “glue failures” that can turn into real outages and real security risk.

We start with CodeBreach: AWS disclosed a CodeBuild webhook filter misconfig in a small set of AWS-managed repos. The takeaway is simple: CI trigger logic is part of your security boundary now.

Next is the Bazel TLS cert expiry incident. Cert failures are a binary cliff, and “auto renew” is only one link in the chain.

Third is Helm chart reliability. Prequel reviewed 105 charts and found a lot of demo-friendly defaults that don’t hold up under real load, rollouts, or node drains.

Fourth is n8n. Two new high-severity flaws disclosed by JFrog. “Authenticated” still matters because workflow authoring is basically code execution, and these tools sit next to your secrets.

Lightning round: Fence, HashiCorp agent-skills, marimo, and a cautionary agent-loop story.

Links

AWS CodeBreach bulletin https://aws.amazon.com/security/security-bulletins/2026-002-AWS/

Wiz research https://www.wiz.io/blog/wiz-research-codebreach-vulnerability-aws-codebuild

Bazel postmortem https://blog.bazel.build/2026/01/16/ssl-cert-expiry.html

Helm report https://www.prequel.dev/blog-post/the-real-state-of-helm-chart-reliability-2025-hidden-risks-in-100-open-source-charts

n8n coverage https://thehackernews.com/2026/01/two-high-severity-n8n-flaws-allow.html

Fence https://github.com/Use-Tusk/fence

agent-skills https://github.com/hashicorp/agent-skills

marimo https://marimo.io/

Agent loop story https://www.theregister.com/2026/01/27/ralph_wiggum_claude_loops/

Related n8n episodes:

S1E12Jan 9, 2026⏱️ 16:18n8n Critical CVE (CVE-2026-21858), AWS GPU Capacity Blocks Price Hike, Netflix TemporalEpisode: n8n Critical CVE (CVE-2026-21858), AWS GPU Capacity Blocks Price Hike, Netflix Temporal

S1E14Jan 16, 2026⏱️ 12:28n8n Auth RCE (CVE-2026-21877), GitHub Artifact Permissions, and AWS DevOps Agent LessonsEpisode: n8n Auth RCE (CVE-2026-21877), GitHub Artifact Permissions, and AWS DevOps Agent Lessons

More episodes + details: https://shipitweekly.fm