Host Commentary

This episode is really about one idea: authority is the new blast radius.

For the last couple years, most of the AI conversation in engineering has been about productivity. Can it write code faster? Can it explain logs? Can it summarize an incident? Can it help junior engineers get unstuck? Can it save senior engineers from staring at the same YAML for the 900th time?

All of that still matters. But this week’s stories point at something bigger.

AI agents are not just getting smarter. They are getting places to run, APIs to call, identities to assume, and automation systems to trigger.

That changes the conversation.

A coding assistant that suggests a function is one thing. A cloud agent that can be started through an API, work in its own environment, make changes, validate them, and open a pull request is a different thing. At that point, the agent is not just helping a developer type. It is becoming part of the software delivery path.

That does not mean the sky is falling. It does mean the mental model has to change.

GitHub Copilot cloud agent tasks through the REST API are interesting because APIs are how tools become platforms. As soon as something can be started programmatically, other systems will start wiring into it. A ticket can start work. A vulnerability finding can start work. A dependency update can start work. A support escalation can start work. That is useful, but it also means the agent becomes another automation actor inside your engineering system.

And once something becomes an actor, you have to care about authority.

What repository can it touch? What branch can it write to? What token does it use? Can it trigger CI? Can that CI deploy? Can it comment on issues? Can it open a PR against production code? Can it modify tests to make its own change look correct? Who reviews the result? Who owns it if the change breaks something?

That is the part that is easy to skip because the demo feels productive. But production incidents do not care how impressive the demo was.

The Auth0 MCP story is the identity version of the same problem. MCP is quickly becoming one of the connective layers between AI agents and real tools. That means MCP servers cannot just be treated like fun local adapters. If an MCP server can reach customer data, cloud APIs, internal systems, source code, CI/CD, or production operations, then it needs to be treated like a production API.

That means authentication. Authorization. Logging. Revocation. Delegation. Auditability.

The weird part is that agent identity is not as clean as normal user identity. With a normal user, we can say Brian logged in, Brian clicked the button, Brian had these permissions. With an agent, the action might be requested by a human, executed by an application, delegated through a token, and carried out by a model calling a tool.

That is not impossible to manage, but it is different enough that lazy answers will hurt people.

The Red Hat and Ansible story makes this even more concrete. Ansible is not a toy. It is not just a dev environment helper. It is a real automation platform that teams use to patch servers, restart services, configure systems, manage infrastructure, and run operational workflows. When AI agents start connecting to something like Ansible, the agent is suddenly much closer to the machinery that changes production.

That might actually be the right direction. I would much rather see agents routed through governed automation than freehanding shell commands on production systems because they read three stale wiki pages and felt confident.

But that only works if the automation itself is worth trusting.

A messy playbook does not become safe because an AI agent invoked it. A broad inventory does not become scoped because a model called it. A dangerous script does not become governed because it has a nicer interface. In some cases, AI may just make old operational debt easier to trigger.

That is the risk.

Not that agents exist.

The risk is that agents expose every sloppy permission, every overpowered workflow, every unsafe runbook, every “only Bob knows how to run this” script, and every service account that was supposed to be temporary three years ago.

OpenAI Daybreak fits into this from the security side. AI-assisted vulnerability discovery, patch generation, and remediation validation are going to be useful. I do not think that part is controversial. Security teams are already drowning in findings, and anything that helps triage, validate, patch, and verify could be a real improvement.

But it also changes the bottleneck.

If AI finds more issues, the hard part becomes absorbing the output. Can the organization validate the findings? Can it prioritize them? Can it find the owner? Can it patch safely? Can it ship quickly? Can it prove the fix worked? Can it do all of that without breaking production in the process?

Security does not end when the issue is found. For a lot of companies, that is where the real pain starts.

That is why Daybreak is not just a security story to me. It is an engineering systems story. If your delivery process is slow, brittle, under-tested, or full of unclear ownership, AI-generated security findings may not make you safer right away. They may just make the backlog more honest and more painful.

We also mentioned our special on Project Glasswing / Claude Mythos:

Episode 33Apr 15, 2026⏱️ 16:28Special: Claude Mythos Preview and Project Glasswing: AI Exploit Discovery, Zero-Day Risk, Business Fallout, and What It Means for DevOps, Cloud, and Platform SecurityEpisode: Special: Claude Mythos Preview and Project Glasswing: AI Exploit Discovery, Zero-Day Risk, Business Fallout, and What It Means for DevOps, Cloud, and Platform Security

The Discord ScyllaDB automation story is the useful counterweight to all of this. That is the kind of automation we should be aiming for before we get too excited about agents doing operational work.

Their story is not magic. It is not “AI fixed databases.” It is a team looking at fragile scripts and turning them into a more reliable control plane with state, preconditions, resumability, notifications, and safer workflows.

That is the boring work that actually matters.

A lot of teams say they have automation, but what they really have is a pile of scripts that work when the right person runs them on the right day in the right order with the right assumptions in their head. That is better than nothing, but it is not the same as safe operational automation.

Safe automation knows when to stop. It checks assumptions. It notices when the cluster is degraded. It does not blindly plow forward because the script got to line 47. It gives humans visibility. It reduces babysitting. It makes the system more predictable instead of just making the command shorter.

That matters even more in an agentic world.

If agents are going to call tools, the tools need to be boring, constrained, observable, and designed around failure. Otherwise we are not building reliable operations. We are just giving a very confident system a faster way to trip over our old mistakes.

The lightning stories all point back to the same general theme.

GuardDuty and crypto mining are a reminder that cloud abuse often shows up as cost before it shows up as drama. A compromised credential might not immediately become a headline breach. It might become a weird bill, degraded performance, or a mining workload hiding in an account nobody checks closely enough.

Queues and backpressure are the reliability version. A queue can smooth bursts, but it cannot magically absorb sustained overload forever. It just stores the problem somewhere else until message age, lag, retries, or downstream failure finally make the truth obvious.

And the Datadog index scan story is a nice reminder that labels can lie to your intuition. “Using an index” sounds good until the query is still expensive. The plan can be technically correct and still operationally painful. That is true for databases, and honestly, it is true for a lot of AI and automation too.

The label is not enough.

“Agentic” is not enough.

“Authenticated” is not enough.

“Automated” is not enough.

“Uses an index” is not enough.

The details matter.

What is it allowed to do? What path does it take? What assumptions does it make? What happens when those assumptions are wrong? Who gets alerted? Who can stop it? Who owns the outcome?

That is where I think a lot of engineering teams need to focus.

Not on whether AI agents are good or bad. That debate is already too broad to be useful. The better question is where they sit in the system and how much authority they have.

An AI agent with read-only access to logs is one kind of risk.

An AI agent that can open pull requests is another.

An AI agent that can trigger CI/CD is another.

An AI agent that can call MCP servers attached to internal tools is another.

An AI agent that can invoke Ansible against production systems is another.

Those are not the same thing, and we should stop talking about them as if they are.

The more authority an agent has, the more it needs to look like a real production principal. Scoped access. Clear ownership. Good audit logs. Human approval at the right boundaries. Dry-run modes. Kill switches. Reviewable output. Strong defaults. No mystery tokens hiding in a demo server someone forgot about.

None of that is anti-AI. It is just operations.

The funny thing is, AI may end up forcing teams to clean up the parts of their systems they should have cleaned up anyway. Bad runbooks. Overpowered service accounts. Weak CI permissions. Unowned scripts. Unclear release paths. Missing rollback plans. Poor observability around internal automation.

Those were already risks.

Agents just make them harder to ignore.

So the takeaway from this episode is not “do not use agents.”

The takeaway is to label them correctly.

An agent with repo access is part of your software delivery system.

An MCP server with production reach is part of your control plane.

An automation workflow that changes systems is production infrastructure.

A security tool that generates patches is part of your remediation process.

A queue hiding overload is not resilience.

An index scan is not automatically fast.

And an AI-generated change is still owned by the humans and systems that allowed it to ship.

Authority is the new blast radius.

The teams that handle this well will not be the ones that block everything. They will be the ones that give agents useful jobs, narrow permissions, clear boundaries, and safe paths to action.

The teams that handle it poorly will accidentally build a coworker with API access, hand it a badge, and then act surprised when it finds the side door to production.

Show Notes

This episode of Ship It Weekly is about AI agents moving from helpful coding assistants into real operational actors. Brian covers GitHub making Copilot cloud agent tasks available through a REST API, Auth0 bringing authentication and authorization to MCP servers, Red Hat positioning Ansible as a trusted execution layer for agentic IT operations, and OpenAI Daybreak pushing AI deeper into security research and remediation.

The bigger thread this week is authority: what these agents can reach, what they can change, who approved the action, and who owns the outcome when something breaks.

Brian also covers Discord’s ScyllaDB automation work, AWS GuardDuty crypto mining detection, queues and back pressure, and a Datadog PostgreSQL case where an index scan was still painfully slow.

Sponsored by Guardsquare https://hubs.ly/Q04fJgkJ0

Links

GitHub Copilot cloud agent tasks via REST API https://github.blog/changelog/2026-05-13-start-copilot-cloud-agent-tasks-via-the-rest-api/

GitHub REST API endpoints for agent tasks https://docs.github.com/en/rest/agent-tasks/agent-tasks

Auth0 Auth for MCP is now generally available https://auth0.com/blog/auth0-auth-for-mcp-servers-generally-available/

Red Hat on Ansible as the execution layer for agentic IT https://www.redhat.com/en/about/press-releases/red-hat-establishes-ansible-automation-platform-trusted-execution-layer-it-operations-agentic-era

OpenAI Daybreak https://openai.com/daybreak/

Discord automates ScyllaDB clusters at scale https://discord.com/blog/how-discord-automates-scylladb-clusters-at-scale

AWS GuardDuty crypto mining detection and prevention https://aws.amazon.com/blogs/security/detecting-and-preventing-crypto-mining-in-your-aws-environment/

Queues do not absorb load, they delay failure https://dzone.com/articles/queues-dont-absorb-load-they-delay-bankruptcy

Datadog on inefficient PostgreSQL index scans https://www.datadoghq.com/blog/detect-inefficient-index-scans-with-dbm/

This week’s On Call Brief https://www.tellerstech.com/on-call-brief/2026-W20/

More episodes and show notes https://shipitweekly.fm/

Brian Teller
Hosted by
Brian Teller

25 years in production: DevOps, SRE, platform, and cloud. DevOps Institute & ITIL Ambassador.

More about Brian Teller →