CISA’s GitHub Leak, AI Root Cause Analysis, Copilot Agents, Claude Code in CI/CD, and Kubernetes Seccomp Risk

Transcript

A repo named Private-CISA was reportedly public

on GitHub, which is a pretty painful reminder

that naming something private is not access control.

This week, we've got leaked AWS keys, internal

deployment docs, Terraform, Kubernetes manifests,

Argo CD files, AI root cause analysis, agents

clicking through legacy apps. Claude Code showing

up in CI/CD. And another Kubernetes security reminder

hiding in a boring default. The theme is pretty

simple. Automation is getting more powerful.

But we are still leaking secrets, missing defaults,

and writing postmortem action items that quietly

go to die. I'm Brian Teller from Teller's Tech,

and this is Ship It Weekly. Welcome back to Ship

It Weekly, the show where we look at the DevOps,

SRE, cloud, platform, and security stories that

actually matter when you are the person who eventually

has to keep the thing running. This week, we're

starting with the CISA contractor GitHub leak.

Then we'll get into AWS DevOps agent and automated

root cause analysis. Microsoft Copilot Studio

computer using agents, Atlassian adding cloud

code to Bitbucket Pipelines, and CVE-2026-46333

with the Kubernetes seccomp angle. Then in the

lightning round, we'll hit GitHub expanding OIDC

support for Dependabot and code scanning. Java

pods getting OOMKilled even when heap looks

fine. And why LLM-generated SQL can be wrong

in ways that still run. And for the human closer,

we'll talk about why postmortem action items

die. So let's get into it. First up, the big

one. A contractor for CISA reportedly had a public

GitHub repository named Private-CISA. Again,

Private-CISA. That is like naming an S3 bucket

definitely not customer data and hoping AWS respects

the vibe. GitGuardian says it found the public

repo on May 14th containing around 844 megabytes

of exposed data. Reporting included plain text

passwords, AWS tokens, Entra ID SAML certificates,

and internal material tied to CISA systems. And

this is the part that matters for us. This was

not just one forgotten .env file. The reporting

describes CI/CD logs, Kubernetes manifests, Argo

CD files, Terraform code, GitHub Actions workflows,

deployment details, and internal docs. This is

not only a secret leak. This is context leakage.

Credentials are bad. AWS tokens are bad. Plain

text passwords are bad. But when you also leak

deployment docs, infrastructure code, manifests,

CI/CD logs, and internal workflows, you are giving

someone more than a key. You are giving them

the floor plan. Now, we should be careful. From

the outside, we do not know the full operational

impact. Public reporting says CISA was notified

and the repo was taken down. But the lesson is

already clear. Secrets scanning matters, but it

is not enough. The repo should not have been

public. The archive should not have been there.

The credentials should not have been valid. And

once operational context leaks, the response

cannot just be rotate the key and move on. You

need to ask what the leaked context teaches an

attacker. Most teams hear a story like this and

think, wow, how does this happen? And then somewhere

in their own org, there is a repo called old

prod migration backup final last touched by a

contractor in 2022. containing a zip file nobody

wants to open because it feels cursed. Most of

us have some version of this risk. Old repos,

personal forks, contractor projects, build logs,

Terraform state, kube configs, migration scripts,

weird backup folders. So the takeaway is not

just don't commit secrets. That is true, but

it is table stakes. The better takeaway is to

inventory the weird stuff. Look for old repos,

archived repos, contractor -owned repos, personal

forks, internal examples, demo projects, and

places where somebody may have dumped operational

context while trying to move fast. And when a

leak happens, treat the context as compromise

too, because sometimes the key is the headline,

but the deployment map is the real prize. Next

up, AWS published a post on using AWS DevOps

Agent to automate root cause analysis across

Datadog and Elasticsearch with CloudTrail and

EKS involved too. This one sits right at the

intersection of useful and slightly unsettling.

The AWS example starts with a Datadog alert.

AWS DevOps Agent gets access to EKS. so it can

describe Kubernetes objects, pull pod logs, and

look at cluster events. It also correlates Elasticsearch

logs, Datadog metrics, and CloudTrail deployment

events to figure out what changed. And honestly,

that sounds a lot like incident response. You

start with one symptom. Then you try to rebuild

the timeline. What changed? What deployed? Which

pod restarted? What metric moved first? Did someone

push a config change? Did a dependency decide

today was a great day to build character? Half

of incident response is not fix the thing. Half

of it is reconstructing the story fast enough

that you fix the right thing. So if an agent

can gather context and build a decent first-pass

timeline, that is useful. Logs over here. Metrics

over there. CloudTrail in another tab. Kubernetes

events in a terminal. Datadog on one monitor

and Slack on another. and someone asking an update

every 90 seconds, which is understandable and

still emotionally damaging. But automated RCA

can become very convincing very quickly. A tool

that says probable root cause with a clean summary

can easily become the thing everybody believes,

especially when the incident channel is noisy

and everybody is tired. So I like the direction,

but I would treat AI RCA like a fast incident

scribe or junior investigator. Great at pulling

threads. Useful for assembling context. Helpful

for reducing time to understanding. But not the

final authority on causality. Humans still need

to ask, is this correlation or cause? Did this

happen before or after customer impact? What

else changed? What evidence would prove this

wrong? Because if the incident review action

item is that AI said it was Elasticsearch, so

we restarted Elasticsearch, that is not RCA.

That is vibes with a dashboard. Before we get

to the next story, a quick note from this week's

sponsor, Guardsquare. If you are building mobile

apps, good enough security is usually a problem

waiting to happen. Guardsquare focuses on actually

protecting your code in addition to scanning

it. That means code hardening, runtime protection,

testing, and visibility into what's happening

once your app is out in the wild. So if you are

responsible for shipping and securing mobile

apps, Android or iOS, Definitely worth taking

a look at guardsquare.com. Alright, back to

the show. Third story. Microsoft says computer

using agents in Copilot Studio are now generally

available. This one is interesting because it

changes the automation boundary. A lot of automation

assumes there is an API. You call an endpoint.

You get a response. You wire it into a workflow.

Everyone pretends the internal CRM is not held

together by three workflows, a CSV export, and

one person named Linda. Computer using agents

are different. Microsoft describes these agents

as interacting with graphical user interfaces,

websites, desktop apps, screens, buttons, forms.

So instead of saying there is no API, so we cannot

automate this, the pitch becomes the agent can

use the UI like a human. And look. I get why

this is appealing. Every company has legacy systems.

Every company has some vendor portal that looks

like it was designed during the emotional low

point of enterprise software. Every company has

workflows where someone copies data from one

screen into another and calls it a process. If

an agent can take some of that away, great. Nobody

needs a fulfilling career in manually clicking

invoice screens. But this is also where the risk

gets weird. API automation usually gives you

structure. scopes, endpoints, logs, schemas.

UI automation is messier. The agent is looking

at screens, reading labels, clicking buttons,

entering text, and deciding what to do next.

Which means your automation path may now include

a model looking at a webpage and deciding which

button seems right. That sounds funny until the

button says submit payment, delete record, approve

request, or yes, I understand this is permanent.

So the governance questions matter. What apps

can the agent access? What account does it use?

Can it reach production admin screens? Are actions

auditable? Can it pause before destructive steps?

What happens if the UI changes slightly? And

who owns the outcome when it clicks the wrong

thing? Computer using agents may become useful

because plenty of enterprise systems will never

get good APIs. But when the agent operates through

a UI, treat that UI like an automation interface.

Use restricted accounts. Use test environments.

Use approval gates. Log actions. Limit destructive

workflows. And be very suspicious of anything

involving bulk update. A computer -using agent

is not just a better macro. It is automation

with eyeballs. And like most things with eyeballs

in enterprise software, it probably needs supervision.

Fourth story. Atlassian says agentic pipelines

now support Claude Code as a provider in bitbucket

pipelines. This sounds like a feature announcement

until you think about where it lives. It lives

in CI/CD. Atlassian's examples include README

updates, security report triage, feature flag

cleanup, PR descriptions, and other repetitive

engineering chores. And this keeps coming back

to a point I've made before. Developer tooling

is production now. CI/CD is not just the stuff

around production. It is the path code takes

to become production. So when you put agents

inside pipeline workflows, you are not just making

developers faster. You are changing the delivery

path. And yes, some of these tasks are genuinely

annoying. Security report triage, feature flag

cleanup, PR descriptions, documentation updates.

Please take them. Nobody is sitting around hoping

for more stale flags. and slightly wrong README

files. But once an AI agent is part of a pipeline,

the boring questions matter. What repository

context does it get? What logs does it see? What

secrets are visible to that step? Can it modify

files? Can it open PRs? Can it change tests?

Can it generate security triage notes that humans

treat as authoritative? Can it make a flaky test

look fixed by weakening the assertion? That last

one is not me being dramatic. If the task is

fix the failing pipeline, a bad agentic workflow

might make the pipeline pass without making the

system better. And yes, humans do this too. We

just call it temporary and let it survive three

reorgs. Atlassian also points users to guidance

around third-party agent providers and data

handling. That matters because if you bring Claude

Code into a pipeline, you need to understand

what code, prompts, logs, and generated context

may leave your environment. That does not mean

do not use it. It means do not discover your

data flow by reading the invoice. Treat agentic

pipeline steps like any other privileged CI step.

Start with low-risk tasks. Avoid secrets. Avoid

production deploy authority. Make outputs reviewable.

Require human approval when the agent changes

code. And document what data goes to the provider.

The agent does not need to be terrifying, but

it also should not be a surprise guest in your

delivery path. Fifth story. Let's talk about

CVE-2026-46333 and Kubernetes seccomp defaults.

This one is more technical, but it's a good grounding

story after all of the AI agent stuff. Because

sometimes the most important security decision

is not a new tool or a new model. Sometimes it

is whether your pods are running with the syscall

profile that you thought they were running with.

NVD tracks CVE-2026-46333 as a Linux kernel

issue in the ptrace path. Qualys describes

it as a local privilege escalation and credential

disclosure issue. The Kubernetes angle is that

unset or unconfined seccomp profiles can leave

pods exposed to the tested path, while runtime

default block the tested pidfd_getfd path.

PSS restricted added more protection as well.

The operator version is simple. Your Kubernetes

security defaults matter. And they matter most

when nobody is thinking about them. seccomp is

easy to mentally file under container security

stuff we should revisit someday. Which is engineering

speak for future incident seasoning. Kubernetes

docs say that if seccompDefault is enabled,

pods use the RuntimeDefault seccomp profile

when no other profile is specified. Otherwise,

the default is unconfined. That difference matters.

Because unset does not always mean safe. Sometimes

unset means congratulations. You are raw -dogging

syscalls in production. Probably do not put that

in your architecture diagram. So the takeaway

is straightforward. Check whether RuntimeDefaults

is actually being applied in your clusters. Review

your pod security standards posture. Know where

unconfined seccomp is allowed. Know where privileged

pods exist. Know which namespaces have exceptions.

And if you use managed Kubernetes, do not assume

the provider magically made your pod security

posture sane because the control plane has a

nice logo. Containers share the host kernel.

That is the deal. So when there is a local kernel

bug and your workload security posture allows

the relevant path, the cluster configuration

suddenly matters a lot. The boring defaults are

not boring. They are latent decisions. And every

once in a while, a CVE shows up and asks what

you decided. Now let's do a quick lightning round.

First, GitHub expanded OIDC support for Dependabot

and code scanning. GitHub says that Dependabot

and code scanning now support OpenID Connect

authentication for organization-level private

registries for Cloudsmith and Google Artifact

Registry. The short version is fewer long-lived

registry credentials sitting around as secrets.

And that is good. OIDC -based access is not magic.

But short-lived identity-based auth is usually

healthier than, here's a token, Please don't

leak it. Best of luck to everyone involved. Second,

Java pods getting OOMKilled in Kubernetes even

when the heap looks fine. Classic operator trap.

Someone sets -Xmx, looks at heap, and thinks we're

fine. Then Kubernetes kills the pod anyway. And

everyone stares at the graphs like the cluster

betrayed them personally. But JVM heap is not

the whole memory footprint. Metaspace, direct

buffers, thread stacks, native memory, JIT. GC

overhead, and off -heap usage still count towards

the container memory limit. Kubernetes does not

care that your heap looked reasonable. It cares

that the process crossed the limit. So if your

Java pods are getting OOMKilled, do not only

look at -Xmx. Look at total container memory and

leave headroom. Because production loves punishing

tight memory math. Third, LLM-generated SQL

can be wrong in ways that still run. A broken

query that fails loudly is annoying, but at least

you know it failed. A query that returns plausible

nonsense is worse. The dashboard loads. The numbers

look reasonable. Someone puts it in a slide deck.

And now your business metric is powered by a

hallucinated join and a missing filter. So text

to SQL needs guardrails. Read-only roles. Query

limits. Schema-aware validation. Known templates

where possible. Human review for anything important.

The danger is not always that AI writes broken

SQL. Sometimes it writes SQL that is wrong quietly.

And quiet wrongness is how bad decisions get

confidence. The Human Closer this week is about

postmortem action items. Because if there is

one place that engineering organizations consistently

lie to themselves, it is the bottom of a postmortem

document. Not maliciously, just optimistically.

The incident happens. People jump in. The team

writes a timeline. There is a good discussion.

Nobody blames anyone. Everybody agrees on what

went wrong. Then the action items show up. Improve

monitoring. Review runbooks. Add better alerting.

Investigate retry behavior. Clean up ownership.

These are not action items. These are wishes

wearing a Jira costume. Incident .io had a good

piece on why postmortem action items die. The

reasons are painfully familiar. No named owner,

wrong tracking place, vague wording, and no follow

-up cadence. That's basically the whole game.

A postmortem action item without an owner is

not an action item. It's a group hallucination.

An action item that lives in a doc that nobody

opens again is documentation tax. And an action

item that says improve monitoring is not an action

item. It is a mood. A real action item sounds

more like Maria adds a replication lag alert

for the payment database by Friday. Or Kevin

removes production deploy access from the old

CI token before the next release. Named owner.

Specific verb. Clear outcome. Real tracking

location, due date. That does not make the work

easy, but it makes it real. This connects back

to every story this week. The CISA leak is not

fixed by saying review GitHub practices. AI RCA

is not useful if the follow-up is improve incident

response. Computer using agents are not safe

because someone wrote ensure governance. Kubernetes

seccomp is not handled because someone says

harden workloads. This is the staff and principal

engineer part of the job that does not always

look exciting on a roadmap. Turning vague risk

into specific work. Turning incidents into system

changes. That is where reliability actually improves.

Because production does not care how good the

postmortem sounded. Production cares whether

anything changed. Okay, that's it for this week

of Ship It Weekly. We covered the CISA contractor

GitHub leak, AWS DevOps agent and automated root

cause analysis, Microsoft Copilot Studio computer

using agents, Atlassian agentic pipelines with

Claude code, CVE-2026-46333, and Kubernetes

seccomp defaults, plus a lightning round on GitHub

OIDC support, Java pods getting OOMKilled, and

LLM-generated SQL. If this episode was useful,

follow or subscribe wherever you are listening

or watching. If you're on YouTube, hit subscribe.

If you're in a podcast app, follow the show there.

And if you know somebody on a DevOps, SRE, platform,

security, or engineering leadership team who

is dealing with secrets, agents, Kubernetes defaults,

or postmortem follow-up, send this one to them.

It helps the show grow. And honestly, it helps

me keep making this kind of content for people

who actually live with these systems. You can

find the weekly brief at OnCallBrief.com and

more episodes and show notes on ShipItWeekly

.fm. I'm Brian from Teller's Tech, and thanks

for listening. And remember, if your repo is

named Private-CISA, your agent can click buttons,

your pipeline can call Claude, and your postmortem

action item says improve monitoring, maybe take

a breath then go find the owner the token the

default and the ticket because production does

not run on good intentions it runs on the stuff

that someone actually fixed

CISA’s GitHub Leak, AI Root Cause Analysis, Copilot Agents, Claude Code in CI/CD, and Kubernetes Seccomp Risk

Watch this episode here

Chapters

Transcript

Catch This Episode

Host Commentary

Show Notes

More from Ship It Weekly

Ship It Conversations: Gareth Kersey on IaCConf 2026, AI, and Corey Quinn’s Terraform Keynote

EKS Rollbacks, GitHub Actions Supply Chain Attacks, AI Agentjacking, CloudWatch Log Alarms, and Why Safety Nets Don’t Replace Ownership

containerd CRI Vulnerabilities, Datadog PostgreSQL HA on Kubernetes, AWS DevOps Agent with Datadog MCP Server, EKS Control Plane Egress, and Why Users Feel the Wait

Kubernetes 1.36, Gateway API v1.5, AWS Copilot End of Support, and Cloudflare Non-Human Identities

Get the next episode in your inbox

CISA’s GitHub Leak, AI Root Cause Analysis, Copilot Agents, Claude Code in CI/CD, and Kubernetes Seccomp Risk

Chapters

Transcript

Catch This Episode

Host Commentary

Show Notes

Related On Call Brief

More from Ship It Weekly

Ship It Conversations: Gareth Kersey on IaCConf 2026, AI, and Corey Quinn’s Terraform Keynote

EKS Rollbacks, GitHub Actions Supply Chain Attacks, AI Agentjacking, CloudWatch Log Alarms, and Why Safety Nets Don’t Replace Ownership

containerd CRI Vulnerabilities, Datadog PostgreSQL HA on Kubernetes, AWS DevOps Agent with Datadog MCP Server, EKS Control Plane Egress, and Why Users Feel the Wait

Kubernetes 1.36, Gateway API v1.5, AWS Copilot End of Support, and Cloudflare Non-Human Identities

Get the next episode in your inbox