AI Agents Get API Access and Identity: GitHub Copilot Cloud Agents, MCP Auth, Ansible Automation, OpenAI Daybreak, and the New Production Risk

Transcript

0:00 AI agents just got APIs. They got identity. And

0:03 they're starting to plug into the automation

0:05 tools teams already use to change real systems.

0:09 So the question is moving past, can AI write

0:13 code? The better question is, what happens when

0:16 AI can open pull requests, call tools, authenticate

0:20 to services, and trigger operations workflows?

0:23 Because at that point, you did not build a chatbot.

0:27 You built a coworker with API access. I'm Brian

0:31 Teller from Teller's Tech, and this is Ship It

0:33 Weekly. Welcome back to Ship It Weekly, the show

0:53 where we look at DevOps, SRE, cloud,

0:57 platform, and security stories that actually matter when

1:01 you're the person who eventually has to keep

1:04 the thing running. This week we're looking at

1:06 GitHub making Copilot cloud agent tasks available

1:10 through a REST API, Auth0 bringing authentication

1:14 to MCP servers, Red Hat positioning Ansible as

1:18 an execution layer for agentic IT operations,

1:21 and OpenAI Daybreak pushing AI deeper into security

1:25 research and remediation. Then we'll step away

1:29 from the AI cycle for a really good Discord engineering

1:32 story on automating ScyllaDB operations at scale.

1:37 And in the lightning round, we'll hit AWS GuardDuty

1:40 and crypto mining detection, queues and

1:43 backpressure, and why an index scan can still

1:46 ruin your day. The theme this week is authority,

1:49 not intelligence, not productivity, authority.

1:53 What can these agents reach? What can they change?

1:57 Who approved the action? And when something breaks,

2:00 who owns it? That's the thread for this episode.

2:03 So let's get into it. First up, GitHub Copilot

2:10 Cloud Agent Tasks can now be started through

2:13 the REST API. This is the right place to start

2:16 because it sounds like a small product update,

2:19 but it changes the shape of the thing. GitHub

2:22 says Copilot Business and Enterprise users can

2:25 now programmatically start Copilot cloud agent

2:28 tasks through a new Agent Tasks REST API, currently

2:33 in public preview. The Copilot cloud agent works

2:36 in the background in its own development environment.

2:39 It can make code changes, validate those changes,

2:42 and open a pull request. That part alone is already

2:45 interesting. But the API is the bigger shift.

2:48 Because now this is not just a developer manually

2:51 asking Copilot to work on something from inside

2:54 GitHub. Now another system can kick it off. That

2:57 means you could wire this into custom workflows.

3:00 A support escalation, a bug triage process, a

3:03 security finding, a dependency update workflow,

3:06 a backlog grooming process. or whatever else

3:10 somebody decides to connect. And that's where

3:12 this gets operationally interesting. Because

3:14 once an agent can be started by automation, it

3:17 becomes part of your automation surface. It becomes

3:20 something you need to reason about like any other

3:23 system that can create change. What repos can

3:27 it touch? What permissions does the token need?

3:30 Who approved the task? What branch protection

3:33 applies? Can it create a pull request but not

3:36 merge one? Can it trigger CI? Can that CI deploy?

3:40 And if the workflow is kicked off by another

3:42 tool, do you still have a clear human owner?

3:45 That last one matters because it is very easy

3:48 to imagine a chain like this. A vulnerability

3:51 scanner opens a ticket. A workflow kicks off

3:54 an AI agent. The AI agent makes a patch. CI passes.

3:58 A PR gets opened. Somebody rubber-stamps it because

4:01 the diff looks boring and the scanner says the

4:04 vulnerability is resolved. And maybe that is

4:07 great. Maybe you just saved an engineer three

4:09 hours. Or maybe you just created a subtle production

4:13 issue from a change nobody really understood.

4:16 The practical takeaway here is not don't use

4:19 it. The practical takeaway is that agent workflows

4:22 need the same boring controls we already expect

4:25 from normal engineering workflows. Branch protection.

4:29 Required reviews. Code owners. Scoped credentials.

4:32 Audit trails. Clear ownership. and a very bright

4:36 line between agent can propose and agent can

4:39 ship. The interesting part of AI agents is not

4:42 that they can do work. The interesting part is

4:45 that we have to decide how much authority that

4:48 work gets. That leads nicely into the second

4:55 story. Auth0 announced that Auth for MCP is generally

5:00 available. MCP or Model Context Protocol has

5:03 become one of those terms that shows up everywhere

5:06 now. It is basically a way for agents and AI

5:09 tools to connect to external systems, tools,

5:13 APIs, and data sources in a more standardized

5:17 way. And that matters because agents are only

5:20 as useful as the tools they can reach. A model

5:23 sitting in a chat box can give advice. A model

5:27 connected to tools can take action. And once

5:30 it can take action, authentication and authorization

5:32 stop being side concerns. They become the whole

5:36 game. Auth0's announcement is focused on putting

5:40 an identity layer around MCP servers. They call

5:44 out authentication, CIMD registration, and on

5:47 behalf of token exchange. The plain-English version

5:51 is this. If agents are going to call tools, those

5:54 tools need to know who or what is calling them.

5:58 On whose behalf? and what that caller is actually

6:01 allowed to do. That sounds obvious, but a lot

6:04 on-behalf-of token exchange. The plain-English version

6:07 that feels like local developer convenience first,

6:11 production safety second. You spin up a server,

6:13 you connect it to your agent, you give it access

6:16 to some tools, and suddenly your agent can read

6:19 things, write things, query things, maybe even

6:23 change things. That's fine in a sandbox. It is

6:26 not fine... when the tools are attached to customer

6:29 data, production infrastructure, internal admin

6:33 APIs, CI/CD, billing systems, or cloud accounts.

6:38 And this is where identity gets weird. Because

6:41 with a normal user, we mostly know how to think

6:44 about it. Brian logged in. Brian clicked a thing.

6:47 Brian had these permissions. With an agent, the

6:50 story is messier. Was the action taken by the

6:54 agent? By the user who asked the agent? By the

6:57 application hosting the agent? By a service account?

7:00 By a delegated token? And when something goes

7:03 wrong, where does accountability land? That's

7:06 why I think this Auth0 story is more important

7:09 than it looks. MCP is not just a cute connector

7:13 system for demos. It is becoming connective tissue

7:16 for AI tooling. And connective tissue needs identity,

7:20 authorization, logging, and revocation. Otherwise,

7:24 we're just building a faster way for something

7:26 to call the wrong API with too much permission.

7:29 For DevOps and platform teams, this is probably

7:32 where the real work starts. Not how do we let

7:35 every team use agents, but how do we let teams

7:38 use agents without turning every MCP server into

7:42 an ungoverned production backdoor? Before we

7:45 get to the next story, a quick note from this

7:47 week's sponsor, Guardsquare. If you are building

7:49 mobile apps, good enough security is usually

7:53 a problem waiting to happen. Guardsquare focuses

7:56 on actually protecting your code in addition

7:59 to scanning it. That means code hardening, runtime

8:02 protection, testing, and visibility into what's

8:06 happening once your app is out in the wild. So

8:09 if you are responsible for shipping and securing

8:11 mobile apps, Android or iOS, definitely worth

8:15 taking a look at guardsquare.com. All right.

8:18 Back to the show. Third story. Red Hat is pushing

8:25 Ansible Automation Platform as a trusted execution

8:28 layer for IT operations in the agentic era. That

8:33 is a very enterprise sentence. But underneath

8:35 the marketing language, this is actually a big

8:38 deal. Because Ansible is not theoretical. Ansible

8:41 is already used to patch systems, restart services,

8:45 configure servers, manage network gear, run operational

8:49 tasks, and handle a bunch of work that is very

8:52 close to production reality. So when you connect

8:55 AI agents to Ansible, you are not just giving

8:58 an agent a little toy function. You are connecting

9:00 it to the machinery that already changes real

9:03 systems. Red Hat's angle is basically this. Agents

9:06 may be good at reasoning, planning, or interpreting

9:10 intent, but enterprises still need a governed,

9:13 trusted, auditable execution layer. when it is

9:17 time to actually do something. That is the right

9:19 framing. Because the dangerous version of agentic

9:22 operations is not an agent saying, here's the

9:26 runbook. The dangerous version is the agent saying,

9:29 I ran the runbook. And then everyone hoping it

9:31 did the right thing. Now, to be fair, this is

9:34 also where something like Ansible can help. Because

9:37 mature automation gives you structure. You have

9:40 inventories. You have playbooks. You have idempotency,

9:43 at least when things are written well.

9:46 You have logs. You have a known execution path.

9:49 You have a place to put approval gates. That

9:52 is much better than an agent freehanding shell

9:55 commands on a production box because it read

9:57 three Confluence pages and felt confident. But

10:00 the same rules apply here. The agent should not

10:03 get more authority than the automation deserves.

10:06 If your existing playbooks are messy, overly

10:09 broad, poorly scoped, or rely on tribal knowledge,

10:13 an agent does not magically make them safe. It

10:16 may just make them easier to invoke. And that

10:19 is the part I'd be nervous about. A bad script

10:21 that an agent can discover and execute through

10:24 a tool interface is a different class of problem.

10:27 So the takeaway is not Ansible plus AI is bad.

10:30 It is actually the opposite. If agentic ops is

10:33 coming, I'd much rather see agents routed through

10:36 controlled automation than improvised commands.

10:39 But teams should treat this as a forcing function.

10:42 Clean up your automation. Narrow the blast radius.

10:45 Split read-only diagnostics from mutating actions.

10:49 Make destructive playbooks require approval.

10:52 Add dry-run modes where possible. Make sure the

10:55 logs clearly say who asked for the action, what

10:58 agent or system executed it, and what changed.

11:02 Because if Ansible becomes the execution layer

11:04 for agents, the quality of your automation becomes

11:08 the quality of your agent safety model. Fourth

11:15 story. OpenAI announced Daybreak, its cybersecurity

11:18 initiative built around GPT-5.5 and Codex Security.

11:23 I'm treating this as a follow-up to the Mythos

11:26 and Project Glasswing episode, not a totally

11:29 separate story. Because the broader trend is

11:31 the same. AI systems are getting better at vulnerability

11:34 discovery, exploit reasoning, patch generation,

11:38 and remediation validation. OpenAI describes

11:41 Daybreak as a way to use AI for cyber defense.

11:45 The pitch is that it can help identify threats,

11:48 generate patches, and verify remediation across

11:51 code and systems. And on one hand, this is exactly

11:54 what we want. Most organizations are drowning

11:57 in vulnerability backlog. They have more findings

11:59 than time. Some findings are noisy, some are

12:02 real, some are technically real but not actually

12:05 reachable. Some are buried in legacy code that

12:08 nobody wants to touch. And even when the fix

12:11 is obvious, there is still work. Open the issue.

12:14 Find the owner. Understand the code path. Patch

12:17 it. Test it. Get it reviewed. Deploy it. Verify

12:21 the scanner is happy and hope nothing broke.

12:24 So an AI system that can help triage, validate,

12:27 patch, and verify is genuinely useful. But here's

12:31 the uncomfortable part. If defenders get this,

12:33 attackers get some version of it too. Maybe not

12:36 the same controlled access. Maybe not the same

12:39 polished product. But the underlying capability

12:41 trend is not one-sided. That means the bottleneck

12:45 for security teams shifts. It is no longer just

12:49 can we find vulnerabilities. It becomes can we

12:52 process, prioritize, patch, and safely ship fixes

12:56 fast enough. And that lands right in the lap

12:59 of DevOps, SRE, platform, and application teams.

13:03 Because finding the bug is only step one. The

13:06 real work is changing the system. And changing

13:08 the system safely requires all the boring stuff.

13:12 Ownership, tests, CI/CD, feature flags, rollback

13:16 plans, dependency strategy, runtime visibility,

13:19 asset inventory, patch windows, and enough architectural

13:23 knowledge to know when the easy fix is actually

13:27 a trap. This is why I keep coming back to the

13:29 same point. AI security tooling will probably

13:32 find more issues. That is good. it will probably

13:35 also create more pressure. That is complicated.

13:38 If your organization already struggles to patch

13:41 known vulnerabilities, adding AI that finds more

13:44 of them does not automatically make you safer.

13:46 It may just make the backlog more honest. So

13:49 the real question is not can Daybreak find things?

13:52 The question is, can your engineering system

13:55 absorb the findings? Can you validate them? Can

13:58 you prioritize them? Can you patch them? Can

14:01 you ship them? Can you prove the fix worked?

14:04 And can you do all of that without creating a

14:07 second incident while fixing the first one? That

14:10 is where this becomes a operations story, not

14:12 just a security story. Now let's step away from

14:19 AI for a minute. Because Discord published a

14:21 really good write-up on how they automate ScyllaDB

14:24 clusters at scale. And honestly, this is the

14:27 kind of engineering story that I love. Discord's

14:30 persistence infrastructure team runs a lot of

14:32 ScyllaDB. Over time, they had accumulated Python

14:36 and shell scripts to help with operations. But

14:39 those scripts had the usual problems. They were

14:42 useful. They were also fragile. They were easy

14:45 to misuse. They relied on humans understanding

14:47 the right order of operations. And for complex

14:50 cluster-wide workflows, that becomes a lot of

14:53 operational risk. So they built what they call

14:56 the Scylla control plane. The goal was to safely

14:58 automate and orchestrate cluster-wide workflows.

15:01 Things like rolling restarts, replacing nodes,

15:04 bootstrapping, and doing work that previously

15:07 required a lot more manual supervision. One of

15:10 the details that I liked from the write-up is

15:12 that webhook notifications mattered more than

15:14 they expected. That sounds small, but it is very

15:17 real. There is also a huge difference between

15:19 babysitting a terminal for two hours and trusting

15:23 the system to notify you when it needs attention.

15:26 That's the difference between automation that

15:28 technically works and automation that actually

15:30 reduces human load. And that distinction matters.

15:34 A lot of teams say they have automation, but

15:36 what they really have is a pile of scripts. A

15:38 script can be automation, but it might not be

15:41 safe automation. Safe automation needs state.

15:44 It needs preconditions. It needs retries. It

15:47 needs idempotency. It needs clear failure modes.

15:50 It needs visibility. It needs a way to resume

15:53 without making things worse. And it needs to

15:56 know when to stop. That last one is underrated.

15:59 Good automation is not automation that blindly

16:02 completes the task no matter what. Good automation

16:05 is automation that knows when the world no longer

16:08 matches its assumptions. If a node is unhealthy,

16:11 stop. If the cluster is already degraded, stop.

16:15 If replication is not where it should be, stop.

16:18 If the previous step did not converge, stop.

16:21 That is how you move from script that usually

16:23 works to operational control plane. And this

16:26 connects back to the AI stories in a weird way.

16:29 Because before we let agents run operational

16:32 tasks, we need more automation that looks like

16:35 this. Explicit. Recoverable. Observable. Constrained.

16:40 Designed around failure. If the future is agents

16:43 calling tools, then the tools need to be boring,

16:46 safe, and well-structured. Discord's story is

16:50 a reminder that the best automation is not magic.

16:53 It is just a lot of careful engineering around

16:56 the parts where humans usually get tired, distracted,

16:59 or inconsistent. Now let's do a quick lightning

17:09 round. First, AWS GuardDuty and crypto mining.

17:13 AWS published a guide on detecting and preventing

17:17 crypto mining in AWS environments using GuardDuty.

17:20 This is one of those classic cloud security problems

17:23 where security, reliability, and cost all run

17:26 into each other. A compromised credential does

17:29 not always turn into a dramatic data breach.

17:32 Sometimes it turns into a compute bill. Someone

17:34 gets access. They spin up resources. They run

17:37 mining workloads. They try to persist. And by

17:40 the time anyone notices, the incident is both

17:43 a security problem and a finance problem. The

17:45 practical question for teams is simple. If somebody

17:48 compromised a credential today and started mining

17:51 in your AWS account, how fast would you know?

17:54 Would it be GuardDuty? Would it be Cost Anomaly

17:57 detection? Would it be Datadog? Would it be a

18:00 budget alert? Would it be a developer asking

18:02 why their workload is slow? Or would it be Finance

18:05 two weeks from now forwarding a bill and asking

18:08 what happened? That is the difference between

18:10 having a detection strategy and having a surprise.

18:13 Next, queues and backpressure. There was a good

18:16 piece making the point that queues do not absorb

18:19 load forever. They delay failure. And that is

18:22 exactly right. Queues are great for smoothing

18:24 bursts. They are terrible when teams use them

18:27 to hide sustained overload. If messages are arriving

18:31 faster than consumers can process them, the backlog

18:34 will grow. A bigger queue does not fix that.

18:37 It just gives you a bigger place to store the

18:40 problem. Eventually, you hit freshness issues,

18:43 storage limits, memory pressure, retry storms,

18:46 customer-facing delay, or some downstream dependency

18:50 that finally gives up. So the practical takeaway

18:53 is simple. Monitor queue depth. Monitor message

18:55 age. Monitor consumer lag. Have backpressure.

18:59 Have limits. Know when to shed load. And please,

19:02 do not call a system resilient just because it

19:05 has a queue in front of the fire. Last lightning

19:08 item. Datadog had a nice PostgreSQL performance

19:11 write-up about inefficient index scans. The

19:15 short version is that using an index does not

19:17 automatically mean a query is cheap. Datadog

19:20 walked through a production query where the plan

19:23 used an index scan, but it was still expensive.

19:26 They changed the indexing strategy and cut average

19:29 latency from 300 milliseconds to 38 microseconds.

19:34 That is a ridiculous improvement, and it is a

19:37 good reminder. You cannot stop at the query uses

19:40 an index. You need to understand whether it is

19:43 using the right index, how many rows it is touching,

19:46 how selective the predicate is, what the access

19:49 pattern looks like, and whether the index actually

19:52 matches the way the query behaves in production.

19:54 Sometimes the database is not slow. Sometimes

19:57 your mental model is. The human closer this week

20:08 is about authority because that is really what

20:11 all these agent stories come down to. Not intelligence,

20:14 not productivity, not whether the model is impressive.

20:17 Authority. What is this thing allowed to do?

20:20 What can it read? What can it change? Can it

20:23 trigger work? Can it authenticate? Can it call

20:25 tools? Can it run automation? Can it open pull

20:28 requests? Can it touch production? And maybe

20:31 the hardest question, who owns what happens next?

20:35 Because in real operations, ownership is not

20:38 optional. If I write a Terraform change and it

20:40 breaks something, I own that. If I approve a

20:43 bad pull request, I own that. If I run the playbook

20:46 against the wrong environment, I own that. AI

20:49 does not remove that responsibility. It just

20:51 makes the path to action shorter. And shorter

20:54 paths to action are great when the guardrails

20:56 are good. They are terrifying when the guardrails

20:59 are vibes. That is where I think a lot of teams

21:02 are going to struggle. They're going to treat

21:03 agent adoption like a tooling rollout. Enable

21:07 the feature, give access, write a quick policy,

21:10 maybe do a lunch and learn. And then six months

21:12 later, they will realize that they created a

21:14 new automation layer that nobody fully owns.

21:17 That is not a reason to panic. It is a reason

21:20 to be deliberate. Start small. Keep agents in

21:23 proposal mode before execution mode. Treat MCP

21:26 servers like production APIs. Treat agent tokens

21:29 like service accounts. Treat agent created pull

21:32 requests like code written by a junior engineer

21:35 who is fast, confident, and occasionally very

21:39 wrong. And before an agent can run a workflow,

21:42 make sure the workflow itself is worth trusting.

21:45 Because the future probably is not humans versus

21:47 agents. It is humans deciding which agents get

21:50 authority, where the boundaries are, and what

21:52 systems are safe enough to let them touch. That

21:55 is engineering work. And honestly, it is probably

21:58 some of the most important engineering work we

22:01 are going to do over the next few years. That's

22:04 it for this week's Ship It Weekly. We covered

22:07 GitHub Copilot Cloud Agent Tasks through the

22:10 REST API. Auth0 bringing identity to MCP servers.

22:14 Red Hat connecting Ansible to agentic IT operations.

22:18 OpenAI Daybreak and the next phase of AI-assisted

22:22 security. Discord ScyllaDB automation work. And

22:25 a lightning round on GuardDuty crypto mining

22:28 detection, queues, and database indexes. If you

22:32 found this useful, follow the show. Share it

22:34 with someone who is either excited or mildly

22:37 terrified by agentic operations. And check out

22:40 the weekly brief at OnCallBrief .com. I'm Brian

22:43 Teller from Teller's Tech. Thanks for listening.

22:46 And remember, if your AI agent can open a pull

22:49 request, call an MCP server, authenticate through

22:52 your identity provider, and trigger Ansible,

22:55 congratulations. You did not build a chatbot.

22:58 You built a coworker with API access. Maybe give

23:01 it a badge. but maybe don't give it production

23:04 admin on day one.

AI Agents Get API Access and Identity: GitHub Copilot Cloud Agents, MCP Auth, Ansible Automation, OpenAI Daybreak, and the New Production Risk

Watch this episode here

Transcript

Catch This Episode

Host Commentary

Show Notes