On Call Brief

Your weekly SRE/DevOps briefing. Security patches, postmortems, releases, and community reads — curated for the on-call engineer.

Each brief is dated by its editorial week (not the companion podcast release schedule); when inbox or RSS ingest lags affected sourcing, we say so in the draft.

Latest: On Call Brief – Week of June 14–20, 2026

Draft Updates daily · Publishes Sunday Last updated 3 hours ago (Jun 17, 2026 3:06 am EDT) Read last published brief →

Get the brief in your inbox

One short email a week. Written for engineers running production systems.

No spam. Unsubscribe any time. Protected by reCAPTCHA — Privacy & Terms apply.

Search every published brief by keyword, vendor, CVE, or topic.

CtrlK to focus search

Browse by topic

Jump to every brief that covered a topic — curated for the on-call engineer.

On Call Brief – Week of June 14–20, 2026

2026-06-14 — 2026-06-20

Category:
Tags:

This week's top stories

1. ServiceNow Fixes Flaw That Could Lead to Unauthorized Access to Instances

  • Category: Community
  • What happened: ServiceNow patched a vulnerability in its Australia platform that allowed unauthorized access to customer cloud instance data, with the fix deployed on June 5 following detection of anomalous activity between June 3 and 4. The company has notified affected customers about the potential unauthorized access to their instances. Operators using ServiceNow should verify whether their instances were affected by checking for communications from ServiceNow and reviewing access logs for the June 3-4 timeframe for any suspicious activity. Organizations should confirm their instances have received the patch and consider rotating credentials or reviewing recent data access patterns as a precautionary measure. No CVE number has been publicly disclosed for this vulnerability.
  • Worth reading: This patch addresses a critical security vulnerability that could have exposed sensitive customer data. Operators using ServiceNow should ensure they are on the latest version to mitigate any potential risks.
  • Sources: Security Boulevard Newsletters
  • Tags:

2. ShinyHunters Secret to Success: Breaking the Trust Barrier

  • Category: Community
  • What happened: The ShinyHunters ransomware gang has executed a sophisticated campaign exploiting federated OAuth trust relationships to pivot laterally across multiple cloud vendors, with confirmed attacks affecting Snowflake and Anodot among other services. This attack vector leverages the transitive trust inherent in OAuth federation, where compromising credentials at one service provider can grant access to all connected federated services without requiring additional authentication. SRE teams should immediately audit their OAuth federation configurations, review which third-party services have federated access to their infrastructure, and implement additional authentication controls such as conditional access policies that validate device state and network location rather than relying solely on token-based trust. Organizations should also consider implementing break-glass procedures to quickly revoke federated trust relationships if suspicious activity is detected, and ensure comprehensive logging of all OAuth token grants and usage across federated boundaries.
  • Worth reading: This attack method underscores the need for tighter security measures around OAuth implementations and third-party credential management to prevent similar breaches in production environments.
  • Sources: Security Boulevard Newsletters
  • Tags:

3. GitHub Removes PAT Requirement for Agentic Workflows GitHub dropped the personal access token requirement for

  • Category: Community
  • What happened: GitHub has eliminated the personal access token (PAT) requirement for agentic workflows, substituting it with short-lived scoped tokens that are issued per session. This change reduces the risk associated with long-lived credentials in CI/CD environments, as the exposure is now limited to the duration of a single run rather than the lifespan of a secret.
  • Worth reading: This update significantly enhances security for teams using CI/CD agents by minimizing the risk of credential leaks, which could lead to unauthorized access. Operators should review their workflows to implement these new scoped tokens effectively.
  • Source: Techstrong Brief
  • Tags:

4. Release 2.68.0 corresponding to NGC container 26.04

  • Category: Breaking Change
  • What happened: The Triton Inference Server release 2.68.0 introduces several new features and improvements, including a breaking change where client shared memory is disabled by default, requiring users to enable it explicitly. It also enforces a shared limit on max inflight requests for ensemble pipelines, adds support for explicit model control in the OpenAI-compatible frontend, and includes various bug fixes and performance improvements.
  • Do this Monday: The breaking change regarding client shared memory could affect existing deployments that rely on the previous default behavior, necessitating updates to server startup commands. The enforced limits on ensemble requests may also impact performance and resource management in production environments.
  • Source: Triton Inference Server releases
  • Tags:

5. npm v12 Is Coming in July — Here's What Developers Need to Do Now

  • Category: Breaking Change
  • What happened: npm v12 is scheduled for release in July and will introduce breaking changes affecting lockfile behavior, peer dependency resolution, and provenance enforcement that could disrupt existing build and deployment workflows. According to DevOps.com, SRE and DevOps teams should immediately audit their existing lockfiles to identify potential conflicts, update their CI/CD toolchains to support the new npm version, and validate their entire pipeline with the upcoming changes before the July release. These breaking changes have the potential to cause build failures or unexpected dependency resolution issues if not proactively addressed. Teams should plan testing windows in non-production environments to validate their Node.js build processes against npm v12 release candidates before rolling out to production systems.
  • Do this Monday: The changes in npm v12 could significantly affect package security and CI builds, requiring immediate action from developers to ensure smooth transitions and avoid issues post-release.
  • Sources: DevOps.com
  • Tags:

6. Anthropic Turns Off Access to Latest AI Models Following U.S. Order

  • Category: Community
  • What happened: Anthropic has disabled access to its latest AI models, Fable 5 and Mythos 5, in response to a U.S. government order that prohibits foreign nationals from using these systems. The restrictions reflect new export controls related to AI safety and national security concerns that directly impact how AI service vendors can operate and who can access their products. Operators using Anthropic's API services should verify which model versions remain accessible to their user base and review whether geographic restrictions or user nationality checks are now required for their applications. Organizations with international teams or customer bases should assess whether this restriction affects their AI-dependent workflows and plan alternative solutions or model versions if Fable 5 or Mythos 5 were part of their technology stack. (Source: Techstrong.ai)
  • Worth reading: This change could disrupt operations for organizations using Anthropic's AI models, particularly those with foreign stakeholders or users. It raises concerns about compliance and the availability of AI resources.
  • Sources: Techstrong.ai
  • Tags:

7. Cloudflare launched a new integration that allows security teams to write proactive WAF rules using live threat

  • Category: Community
  • What happened: Cloudflare has launched a new integration that enables security teams to create proactive Web Application Firewall (WAF) rules based on real-time threat intelligence data. The integration allows teams to automatically respond to emerging threats by dynamically generating WAF rules from live threat feeds rather than relying solely on static rule sets. Security and SRE teams using Cloudflare WAF should evaluate this feature for their environments to improve their ability to block zero-day attacks and adapt to evolving threat patterns. This capability is particularly relevant for teams managing high-traffic applications or those facing frequent targeted attacks, as it reduces the time between threat identification and protection deployment.
  • Worth reading: This integration could significantly enhance the security posture of applications by allowing for real-time adjustments to WAF rules, potentially reducing the risk of successful attacks.
  • Sources: CNCF via TLDR DevOps, Anthropic via TLDR DevOps, GitHub via TLDR DevOps (+3 more)
  • Tags:

8. Turning Cloudflare's threat indicators into real-time WAF rules

  • Category: Community
  • What happened: Cloudflare introduced a new feature that enables security teams to create proactive Web Application Firewall (WAF) rules based on real-time threat intelligence data. This integration aims to enhance the security posture by allowing immediate response to emerging threats.
  • Worth reading: This change could significantly improve the effectiveness of WAFs by allowing for dynamic rule creation based on current threat data, potentially reducing the risk of attacks on applications.
  • Source: Blog Cloudflare via TLDR DevOps
  • Tags:

9. DASH 2026 Security & Compliance: Guide to Datadog's newest announcements

  • Category: Community
  • What happened: Datadog introduced several security and compliance enhancements, including AI agents for threat hunting, improvements in SIEM investigations, code remediation capabilities, API security measures, and fixes for sensitive data leaks. The announcements also include expanded integrations, achievement of FedRAMP High certification, and a new API authentication model.
  • Worth reading: These enhancements could improve security posture and compliance for users, potentially affecting how teams manage threat detection and data protection in production environments. The new API authentication model may also require updates to existing integrations.
  • Source: Datadoghq via TLDR DevOps
  • Tags:

10. A backdoor in a LinkedIn job offer

  • Category: Community
  • What happened: A developer was targeted by a fake recruiter on LinkedIn who requested a code review for a malicious GitHub repository. The investigation revealed a hidden backdoor that executes remote payloads when project dependencies are installed. The attackers used stolen identities to gain trust and entice the victim into executing harmful scripts.
  • Worth reading: This incident highlights the risks associated with code reviews and the importance of verifying the legitimacy of job offers and code repositories. Operators should be vigilant about potential backdoors in third-party code and implement security measures to prevent such attacks.
  • Source: Roman Pt via TLDR Dev
  • Tags:

CVE & Security

1. Oracle Issues Emergency Guidance as PeopleSoft Flaw Linked to Widespread Data Theft

  • Category: Security / Patch
  • What happened: Oracle has issued emergency guidance for CVE-2026-35273, a critical unauthenticated remote code execution vulnerability affecting PeopleSoft PeopleTools versions 8.61 and 8.62. The flaw is being actively exploited in the wild by the ShinyHunters cybercrime group to conduct widespread data theft campaigns. Operators running PeopleSoft should immediately apply Oracle's security patches and review their environments for indicators of compromise, particularly focusing on unauthorized access attempts and data exfiltration activities. Organizations should prioritize patching of internet-facing PeopleSoft instances and implement additional network segmentation controls until patches can be fully deployed across all affected systems.
  • Do this Monday: This vulnerability poses a significant risk as it allows unauthenticated remote code execution, which could lead to widespread data theft. Organizations using affected versions of PeopleSoft should prioritize applying the mitigation guidance to protect their systems.
  • Sources: Security Boulevard Newsletters
  • Tags:

2. Oracle Issues Emergency Guidance as PeopleSoft Flaw Linked to Widespread Data Theft Oracle issued out-of-band

  • Category: Security / Patch
  • What happened: Oracle issued emergency guidance following a critical flaw in PeopleSoft that has been linked to widespread data theft. Attackers exploited vulnerabilities in the HR and finance modules to access connected systems undetected. Organizations must treat their ERP application tier as a perimeter to mitigate risks effectively.
  • Do this Monday: This flaw poses a significant risk to enterprises using PeopleSoft, as attackers can exploit it to access sensitive data. Immediate patching and a reevaluation of security postures around ERP systems are essential to prevent data breaches.
  • Source: Techstrong Brief
  • Tags:

3. Tenet's 'Agentjacking' Attack Turns Sentry Errors Into Code Execution Tenet researchers demonstrated Agentjacking

  • Category: Security / Patch
  • What happened: Tenet researchers introduced 'Agentjacking', a new attack method that exploits AI coding agents by manipulating error messages into executing untrusted code. This highlights a vulnerability in current agent security models, which do not account for error-driven workflows as an attack surface. The research emphasizes that any unchecked data source, including error logs, can serve as an injection point for attacks.
  • Do this Monday: This attack vector could lead to significant security risks if AI coding agents are not designed to validate error messages properly. Organizations using such agents should reassess their security models to include error logs as potential threats.
  • Source: Techstrong Brief
  • Tags:

4. The AI Skill Supply Chain Is Under Attack

  • Category: Security / Patch
  • What happened: The article discusses the vulnerabilities in the AI skill supply chain, highlighting the risks posed by malicious actors targeting AI training data and models. It emphasizes the need for robust security measures to protect the integrity of AI systems and the potential consequences of compromised AI capabilities.
  • Do this Monday: Operators should be aware of the security risks associated with AI systems and consider implementing stronger safeguards to protect training data and models from attacks - this could impact the reliability and trustworthiness of AI outputs in production environments.
  • Source: DevOps.com
  • Tags:

Releases

1. Tekton: v0.36.0, v1.12.0

  • Category: Release
  • What happened: Tekton has released two coordinated updates with security and architectural changes requiring operator attention. Tekton Triggers v0.36.0 adds TLS security profile support for core interceptors enabling cluster-wide TLS policy enforcement on OpenShift, while also migrating metrics from OpenCensus to OpenTelemetry. Tekton Pipeline v1.12.0 "Exotic Shorthair Elektrobots LTS" implements TEP-0137 by introducing a dedicated events controller that handles CloudEvents for PipelineRuns and TaskRuns through a new tekton-events-controller Deployment. Operators should verify the tekton-events-controller Deployment is running after upgrading to v1.12.0, as the previous inline event handling mechanism has been replaced with this separate controller architecture. Both releases represent significant architectural shifts in their respective components and should be tested in non-production environments before rollout.
  • Do this Monday: This update enhances security by enabling TLS policy enforcement, which could be critical for compliance and security posture in production environments using OpenShift.
  • Sources: Tekton Triggers releases, Tekton Pipelines releases
  • Tags:

2. Generating a Pulumi Provider from an OpenAPI Spec

  • Category: Release
  • What happened: Pulumi has released v1.0 of the Pulumi Service Provider, which is generated from the Pulumi Cloud OpenAPI specification. This update allows for a broader range of resources to be managed directly through Pulumi, including fine-grained RBAC, Pulumi IDP, and audit log export as code. The provider now reflects new Pulumi Cloud features in real-time, eliminating delays in feature availability.
  • Do this Monday: This release enhances the ability to manage Pulumi Cloud infrastructure more effectively and quickly, which could lead to improved operational efficiency and faster feature adoption.
  • Source: Pulumi Blog
  • Tags:

Also this week

Deep dives & postmortems

12. DevOps'ish 313: Export Controlled, AUR Torched, Lawyers Disqualified, and more

  • Category: Deep Dive
  • What happened: Anthropic's launch of Claude Fable 5 and Mythos 5 models failed when US government export controls required the company to abruptly disable these models for all customers due to national security concerns. The incident demonstrates how export control regulations can directly impact service availability for AI models without advance warning to operators or users. SRE teams managing AI infrastructure should review their dependency on cutting-edge model releases and implement fallback strategies to handle sudden service discontinuations driven by regulatory requirements. Organizations should also assess whether their AI service providers have clear policies and communication channels for handling export control compliance events that could affect production systems.
  • Takeaway: Operators using Arch Linux's AUR in CI/CD pipelines should audit their dependencies immediately due to the compromised packages. The incident with Anthropic raises concerns about regulatory compliance that could affect AI product deployments. The Miasma worm incident highlights the need for vigilance in repository security and credential management.
  • Sources: DevOps'ish
  • Tags:

Community reads

11. AWS Nitro Isolation Engine: Formally verifying the hypervisor in the AWS Nitro System - Mathematical proof your

  • Category: Community
  • What happened: The AWS Nitro Isolation Engine provides a formal verification of the hypervisor in the AWS Nitro System, ensuring that neighboring instances cannot access each other's memory. This feature is exclusive to Graviton5 instances, which come with a 9% price increase. While the mathematical proofs are significant, the requirement for the latest hardware may limit accessibility for some users.
  • Worth reading: The introduction of the Nitro Isolation Engine may necessitate migration to Graviton5 instances for enhanced security, potentially increasing costs for users. This could impact budgeting and planning for infrastructure upgrades.
  • Source: AWS via Last Week in AWS
  • Tags:

13. Homebrew 6.0.0

  • Category: Community
  • What happened: Homebrew 6.0.0 introduces a significant security feature called tap trust, which mandates that users explicitly trust third-party repositories before executing their code. Additionally, it makes the faster internal JSON API the default and adds Linux sandboxing for enhanced security.
  • Worth reading: This update could affect production environments using Homebrew, as the new tap trust feature requires careful management of third-party repositories to ensure security and functionality.
  • Source: Brew Sh via TLDR Dev
  • Tags:

14. The RCE that AMD wouldn't fix

  • Category: Community
  • What happened: A security researcher found that AMD's AutoUpdate software downloaded executable updates over unsecured HTTP without signature verification. Although AMD initially rejected the proposed fix, they later addressed the issue.
  • Worth reading: This vulnerability could potentially allow remote code execution if exploited, affecting systems using AMD's AutoUpdate software - operators should ensure their systems are updated to the latest version to mitigate risks.
  • Source: Mrbruh via TLDR Dev
  • Tags:

15. Anthropic Reverses Course on Hidden AI Restrictions Following Developer Backlash

  • Category: Community
  • What happened: Anthropic has reversed its implementation of hidden performance throttling in Claude Fable 5 following significant developer backlash, making the model's guardrails and restrictions visible rather than opaque. Developers complained that the hidden throttling caused inefficient API token usage and unpredictable model behavior without clear visibility into when or why requests were being restricted or rerouted. Operators using Claude API integrations should review their implementation to understand how visible guardrails now affect their request patterns and token consumption. This change improves transparency in model routing decisions and refusal behavior, allowing teams to better predict and optimize their Claude API usage. According to DevOps.com, the modification specifically affects the frontier LLM's behavior controls that were previously applied without developer awareness.
  • Worth reading: This change enhances transparency in AI model behavior, which can improve reliability and efficiency in production environments relying on AI services.
  • Sources: DevOps.com
  • Tags:

Lightning links

Human Stories

Looking at the ServiceNow breach and the ShinyHunters campaign exploiting OAuth trust relationships, I keep coming back to how much of our security posture now hinges on trust boundaries we don't directly control. The old perimeter is gone, replaced by this web of federated authentication and third-party integrations where a compromise in one place cascades everywhere. What strikes me about GitHub dropping PATs for short-lived tokens and Anthropic shutting down model access isn't just the technical fixes - it's the growing realization that we're all operating in an ecosystem where vendor security decisions directly impact our incident response playbooks. The Cloudflare WAF integration using live threat intel feels like the right direction, giving us tools that move at the same speed as attackers, but it also means we need to get comfortable with security tooling that's increasingly opaque and automated. We're not just managing our own infrastructure anymore; we're managing trust in an interconnected system where someone else's zero-day becomes our 3am page.

Also worth reading

MAJOR debilitating bug (Cursor Forum)

A user reported a critical bug in Cursor IDE that resulted in the deletion of over 3TB of data on their computer, including applications and games. The issue occurred while building a plan file, rendering Cursor unusable. The user expressed frustration and requested immediate contact regarding the s

In incidents, swarming is a feature, not a bug (SRE Weekly)

The article discusses the concept of swarming during incidents, emphasizing that it is a beneficial practice rather than a flaw. It highlights how swarming can enhance collaboration and problem-solving among teams when responding to incidents.

When Claude changed, everything changed: Managing AI blast radius in production (SRE Weekly)

The article discusses the implications of changes in AI models like Claude on production systems, emphasizing the need for careful management of the potential blast radius when deploying such models. It highlights strategies for mitigating risks associated with AI changes in production environments.

View Full Brief →

Past Briefs

2026-06-14 — 2026-06-20

On Call Brief – Week of June 14–20, 2026

Draft updated 3 hours ago (Jun 17, 2026 3:06 am EDT)

Breaking Change Community Deep Dive Release Security / Patch
2026-06-07 — 2026-06-13

On Call Brief – Week of June 7–13, 2026

Updated 4 days ago (Jun 13, 2026 3:05 am EDT)

Community Deep Dive Release Security / Patch
2026-05-31 — 2026-06-06

On Call Brief – Week of May 31 – June 6, 2026

Updated 2 weeks ago (Jun 6, 2026 3:05 am EDT)

Breaking Change Community Deep Dive Release Security / Patch
2026-05-24 — 2026-05-30

On Call Brief – Week of May 24–30, 2026

Updated 3 weeks ago (May 30, 2026 3:08 am EDT)

Breaking Change Community Deep Dive Release Security / Patch
2026-05-17 — 2026-05-23

On Call Brief – Week of May 17–23, 2026

Updated 4 weeks ago (May 23, 2026 3:04 am EDT)

Community Deep Dive Release Security / Patch
2026-05-10 — 2026-05-16

On Call Brief – Week of May 10–16, 2026

Updated 1 month ago (May 16, 2026 3:05 am EDT)

Breaking Change Community Deep Dive Release Security / Patch
2026-05-03 — 2026-05-09

On Call Brief – Week of May 3–9, 2026

Updated 1 month ago (May 7, 2026 3:08 am EDT)

Breaking Change Deep Dive Release Security / Patch
2026-04-26 — 2026-05-02

On Call Brief – Week of April 26 – May 2, 2026

Updated 2 months ago (Apr 30, 2026 3:06 am EDT)

Community Deep Dive Release Security / Patch
2026-04-19 — 2026-04-25

On Call Brief – Week of April 19–25, 2026

Updated 2 months ago (Apr 28, 2026 7:22 pm EDT)

Community Deep Dive Release Security / Patch
2026-04-12

On Call Brief – Week of April 12–18, 2026

Updated 2 months ago (Apr 16, 2026 1:48 pm EDT)

Breaking Change Deep Dive Release Security / Patch
2026-04-05

On Call Brief – Week of April 5–11, 2026

Updated 2 months ago (Apr 9, 2026 3:06 am EDT)

Breaking Change Deep Dive Release Security / Patch
2026-03-29

On Call Brief – Week of March 29 – April 4, 2026

Updated 3 months ago (Apr 2, 2026 3:05 am EDT)

Community Deep Dive Release Security / Patch
Scroll to Top