On Call Brief – Week of May 17–23, 2026

2026-05-17 — 2026-05-23 Briefing: 2026-05-17 Draft last updated 2 hours ago
Category:
Tags:

This week's top stories

1. Researcher says Microsoft secretly built a backdoor into BitLocker

  • Category: Deep Dive
  • What happened: A security researcher claims that Microsoft has secretly integrated a backdoor into BitLocker, which could potentially compromise the security of encrypted data. The discussion surrounding this allegation raises concerns about trust in encryption technologies and the implications for user data privacy.
  • Takeaway: If true, this could affect the integrity of data protection mechanisms in production environments that rely on BitLocker for encryption - organizations may need to reassess their encryption strategies.
  • Source: Techspot via Lobsters
  • Discussion: https://lobste.rs/s/ynxkj6/researcher_says_microsoft_secretly
  • Tags:
  • 2. Reducing MTTR became a “context reconstruction” problem for us, not a monitoring problem.

    • Category: Community
    • What happened: SRE teams are facing challenges with incident response knowledge retention during team rotations and multi-tenant troubleshooting scenarios. According to discussions on Reddit's r/sre and r/devops communities, reducing MTTR has shifted from a traditional monitoring problem to a "context reconstruction" challenge, particularly in Platform9/OpenStack environments where existing tools can detect failures but fail to explain operational changes. The core issue is that while formal runbooks persist through team rotations, critical undocumented knowledge about alert noise patterns, misleading metrics, and inefficient monitoring checks typically disappears when experienced engineers rotate out. SRE teams should prioritize capturing operational context beyond basic runbooks by documenting alert interpretation nuances, metric correlation patterns, and environment-specific troubleshooting workflows to prevent knowledge loss. Organizations should implement structured knowledge transfer processes that include hands-on shadowing periods and detailed documentation of the reasoning behind monitoring decisions, not just the procedures themselves.
    • Worth reading: This approach could enhance incident response efficiency by providing clearer context during failures, potentially leading to faster recovery times in production environments.
    • Sources: Reddit r/sre, Reddit r/devops
  • Tags:
  • 3. Some of the worst deploy incidents came from things the diff never showed

    • Category: Deep Dive
    • What happened: A Reddit r/kubernetes discussion reveals that the most severe deployment incidents typically originate from adjacent systems, configuration drift, or environmental factors that don't appear in code diffs, rather than the actual code changes being deployed. Related operational challenges identified across SRE communities include post-mortem action items frequently dying due to lack of explicit decision-making about implementation (SRE Weekly), and systems accumulating complexity faster than teams can maintain operational understanding, leading to poor incident knowledge preservation (Reddit r/sre). Additionally, critical troubleshooting knowledge and system insights are lost during on-call handoffs because they exist as informal, verbal knowledge not captured in runbooks (Reddit r/devops). SRE teams should focus on documenting environmental dependencies and adjacent system interactions, establish explicit processes for post-mortem action item ownership and deadlines, and create structured knowledge transfer protocols that capture informal operational insights beyond standard runbook procedures.
    • Takeaway: Understanding that deploy incidents can arise from factors outside of the code changes can help teams improve their deployment strategies and incident response. Creating specific checklists based on past incidents may enhance operational resilience.
    • Sources: Reddit r/kubernetes, Incident via SRE Weekly, Reddit r/sre (+1 more)
  • Tags:
  • 4. Random local web server access failure — ping works but HTTP fails for some users only

    • Category: Community
    • What happened: A user is troubleshooting a local web server access issue where some users can ping the server but cannot access it via HTTP. The problem is intermittent, affecting random users without a clear pattern. Tests show that while ping succeeds, HTTP requests fail. Possible causes include wireless congestion, network loops, duplicate IPs, or connection exhaustion.
    • Worth reading: This issue could indicate underlying network problems that may affect service availability for users, particularly in environments with many wireless devices. Understanding the root cause is crucial to prevent future disruptions.
    • Source: Reddit r/sysadmin
  • Tags:
  • 5. The occasional `ECONNRESET`

    • Category: Community
    • What happened: The article discusses the `ECONNRESET` error, which occurs when a connection is forcibly closed by the remote host. It explores potential causes, including network issues and server configurations, and suggests troubleshooting steps to mitigate the problem.
    • Worth reading: Understanding the causes of `ECONNRESET` can help operators diagnose and resolve connection issues that may affect application performance - especially in distributed systems.
    • Source: Movq De via Lobsters
    • Discussion: https://lobste.rs/s/z306ya/occasional_econnreset
  • Tags:
  • 6. Show HN: Machine – per-project dev VMs with session-only secrets

    • Category: Community
    • What happened: The article introduces a CLI tool called 'machine' that creates a Lima VM for each coding project, enhancing security by isolating development environments. It allows for easy sharing of development setups through declarative profiles and integrates with MacOS keychain or 1Password for secure SSH signature forwarding and environment variable management.
    • Worth reading: This tool could improve security practices in development environments by preventing local machine vulnerabilities and simplifying environment setup for teams.
    • Source: Hacker News Show HN
    • Discussion: https://news.ycombinator.com/item?id=48166119
  • Tags:
  • 7. Max Mode Broken

    • Category: Community
    • What happened: The Cursor IDE is experiencing a critical bug where the 'Max Mode' feature is broken, rendering the IDE unusable for users on Windows 10/11. The issue has been reported with version 3.4.20 of the IDE, and it affects all AI models used within the application.
    • Worth reading: This bug may significantly disrupt development workflows for teams relying on Cursor IDE, necessitating immediate attention and potential workarounds until a fix is released.
    • Source: Cursor Forum
  • Tags:

  • CVE & Security

    1. New Windows 'MiniPlasma' zero-day exploit gives SYSTEM access, PoC released

    • Category: Security / Patch
    • What happened: A proof-of-concept exploit for a Windows privilege escalation zero-day named 'MiniPlasma' has been released, allowing attackers to gain SYSTEM privileges on fully patched Windows systems.
    • Do this Monday: This exploit poses a significant security risk as it can be used to escalate privileges on Windows systems, potentially affecting production environments if not mitigated.
    • Source: Bleeping Computer
  • Tags:
  • 2. CVE-2026-46333 in Kubernetes: unset seccomp let pods reach pidfd_getfd, RuntimeDefault blocked it

    • Category: Security / Patch
    • What happened: CVE-2026-46333 is a vulnerability in Kubernetes related to the Linux __ptrace_may_access() bug. Testing revealed that pods with unset seccomp profiles could exploit the pidfd_getfd primitive, leading to potential file descriptor theft. The tests showed that while RuntimeDefault and PSS Restricted profiles effectively blocked this vulnerability, the PSS Baseline profile allowed it under certain conditions. Recommendations include enforcing seccomp profiles, patching node kernels, and ensuring proper privilege settings to mitigate risks.
    • Do this Monday: Operators should ensure that effective seccomp profiles are set for workloads to prevent potential file descriptor theft. The vulnerability can be exploited if seccomp is unset or set to Unconfined, which could lead to security breaches in Kubernetes clusters.
    • Source: Reddit r/kubernetes
  • Tags:
  • 3. Pwn2Own Berlin 2026: Researchers Exploit 47 Zero-Days for $1.3M Payouts

    • Category: Security / Patch
    • What happened: The Pwn2Own Berlin 2026 hacking contest has concluded with researchers earning over $1.2 million by exploiting 47 zero-day vulnerabilities. This highlights the ongoing security risks associated with unpatched software and the importance of addressing vulnerabilities promptly.
    • Do this Monday: - The discovery of multiple zero-day vulnerabilities emphasizes the need for vigilance in patch management and security practices to mitigate potential exploitation in production environments.
    • Source: Bleeping Computer
  • Tags:
  • 4. DevOps'ish 309: Dirty Pages All the Way Down, The Cloud Is Hot, and more

    • Category: Security / Patch
    • What happened: DevOps'ish 309 reports an increasing frequency of Linux kernel vulnerabilities requiring enhanced safety measures during system upgrades, though specific CVE numbers and affected kernel versions are not detailed in the available excerpts. The publication highlights Kubernetes updates including the graduation of PSI (Pressure Stall Information) metrics to General Availability status, which provides improved resource pressure monitoring capabilities for cluster operators. SRE teams should review their current kernel patching procedures to ensure robust rollback mechanisms are in place and evaluate upgrading to Kubernetes versions that include the GA PSI metrics for better resource monitoring. Additionally, teams should consider the migration guidance provided regarding transitions from ingress-nginx to Envoy-based solutions when planning infrastructure updates.
    • Do this Monday: Operators should be aware of the kernel vulnerabilities and ensure their systems are prepared for frequent updates. The Kubernetes updates may require adjustments in resource management and scheduling strategies.
    • Sources: via DevOps'ish, Devopsish via DevOps'ish
  • Tags:
  • 5. Google SecOps releases new parser documentation for log ingestion

    • Category: Security / Patch
    • What happened: Google Cloud has released new parser documentation for Google SecOps, enabling users to ingest and normalize logs from various sources including Arista VeloCloud, Microsoft Defender for Endpoint, SAP products, and more. This documentation aims to enhance log management and security operations.
    • Do this Monday: This update may affect production environments by improving log ingestion capabilities, which can enhance security monitoring and incident response processes.
    • Source: Google Cloud Release Notes
  • Tags:

  • Releases

    1. GPT-5.3-Codex is now the base model for Copilot Business and Enterprise

    • Category: Release
    • What happened: GitHub has upgraded Copilot Business and Enterprise to use GPT-5.3-Codex as the base model, replacing the previous GPT-4.1 implementation according to their official changelog. This new model is designated as GitHub's first long-term support (LTS) release with guaranteed 12-month availability, providing enhanced stability for enterprise security workflows. In related AI tooling developments, TerraShark (a Claude-based Terraform assistant) has added support for trusted modules across AWS, Azure, and GCP platforms as reported on Reddit's r/terraform community. SRE teams using GitHub Copilot should monitor for any changes in code suggestion behavior or quality, while those evaluating Terraform AI assistants can now leverage TerraShark's version-locked community modules to reduce infrastructure-as-code errors. Organizations should update their AI tooling policies to account for the LTS model lifecycle and evaluate whether the enhanced stability aligns with their development workflows.
    • Do this Monday: Organizations using Copilot Business and Enterprise should prepare for the transition to GPT-5.3-Codex, as it may affect internal review processes and model approval workflows. The upcoming deprecation of GPT-4.1 could impact teams still relying on that model.
    • Sources: GitHub Changelog, Reddit r/terraform
  • Tags:
  • 2. Autonomous AI Penetration Testing with Consent-First Ethical Framework — Research Paper + Working Implementation

    • Category: Release
    • What happened: The article discusses a research paper on autonomous AI penetration testing that emphasizes a consent-first ethical framework. It includes a working implementation of the proposed methodology.
    • Do this Monday: This research could influence how penetration testing is conducted, potentially affecting security practices and compliance requirements.
    • Source: Reddit r/netsec
  • Tags:
  • 3. Looking for testers. (Does a PS module even count as a tool?)

    • Category: Release
    • What happened: A developer is seeking testers for a PowerShell module designed to quickly diagnose Kubernetes issues by analyzing information from the Kubernetes API. The module aims to identify edge cases, usefulness, and shortcomings, providing deterministic diagnostics for service failures related to Cilium egress policies.
    • Do this Monday: This tool could enhance troubleshooting efficiency in Kubernetes environments, particularly for issues related to network policies and service communication.
    • Source: Reddit r/kubernetes
  • Tags:
  • 4. spr: Stacked Pull Requests on GitHub

    • Category: Release
    • What happened: The article discusses a tool called 'spr' that facilitates managing stacked pull requests on GitHub, allowing developers to create and review multiple related pull requests more efficiently. It highlights the benefits of using this tool for better collaboration and code review processes.
    • Do this Monday: This tool could improve workflow efficiency for teams using GitHub, particularly in managing complex feature branches and pull requests - it may reduce the overhead associated with code reviews.
    • Source: Github via Lobsters
    • Discussion: https://lobste.rs/s/txnyjt/spr_stacked_pull_requests_on_github
  • Tags:
  • 5. The Mac mini just became infrastructure

    • Category: Release
    • What happened: Apple's recent earnings call highlighted a significant demand for Mac mini and Mac Studio due to their use as infrastructure for persistent agent AI tools. Developers are increasingly choosing Mac mini as the platform for always-on agents, which can run tasks autonomously. This trend indicates a shift in how personal computing devices are being utilized, with the Mac mini emerging as a new standard for running agent-based applications, driven by its low power consumption and cost-effectiveness compared to cloud VMs.
    • Do this Monday: The growing use of Mac mini for persistent agents could influence infrastructure decisions, especially for teams considering cost-effective, always-on solutions for automation and AI workloads.
    • Source: The New Stack
  • Tags:
  • 6. IntelliJ IDEA 2026.1.2 Is Out!

    • Category: Release
    • What happened: IntelliJ IDEA 2026.1.2 includes several fixes such as improved project opening via .ipr files, corrected Java ternary expression indentation, resolved unexpected context menu opening on Windows, and restored functionality for live templates with groovyScript. Additional fixes address issues with dragging and dropping code, opening diffs in external tools, and workspace functionality, along with resolving several IDE freezes.
    • Do this Monday: These updates may improve developer productivity by fixing common issues encountered in the IDE, potentially reducing friction in development workflows.
    • Source: JetBrains Blog
  • Tags:
  • 7. Compose Multiplatform 1.11.0 Is Now Available

    • Category: Release
    • What happened: Compose Multiplatform 1.11.0 introduces enhancements for iOS and web, including a native text input implementation for iOS that improves user experience, concurrent rendering enabled by default, upgraded UI testing APIs, and improved scrolling performance on web targets.
    • Do this Monday: These updates may affect production apps using Compose Multiplatform by improving performance and user experience on iOS and web, which could lead to better user engagement and satisfaction.
    • Source: JetBrains Blog
  • Tags:
  • 8. 2026 年の方針: JetBrains IDE における AI と従来のワークフロー

    • Category: Release
    • What happened: JetBrains outlines its strategy for integrating AI into its IDEs, emphasizing coexistence with traditional coding workflows. Developers can choose between conventional coding or AI-assisted methods, with a focus on maintaining a seamless user experience. The IDE will support various AI tools and protocols, allowing flexibility and avoiding vendor lock-in. JetBrains aims to ensure that regardless of the method used, the responsibility for the code remains with the developer.
    • Do this Monday: This strategy may affect how developers interact with JetBrains IDEs, potentially changing coding practices and introducing new tools for AI integration. Teams may need to adapt to the dual workflow approach and consider the implications of AI-assisted coding on code quality and responsibility.
    • Source: JetBrains Blog
  • Tags:

  • Lightning links

    Human Stories

    Looking across these incidents and discussions, I keep coming back to how much of our work has become about managing what we can't see. Whether it's the BitLocker backdoor allegations, deployment failures that never show up in diffs, or those maddening ECONNRESET errors that appear without warning, we're constantly wrestling with invisible complexity. The story about MTTR becoming a "context reconstruction problem" really hits home here - we're not just debugging systems anymore, we're debugging our own understanding of systems that have grown beyond what any single person can fully grasp. Tools like the Machine dev VM project represent one response to this challenge: creating cleaner boundaries and isolated contexts where we can actually reason about what's happening. But perhaps the real lesson is that in a world of hidden dependencies, environmental drift, and emergent behaviors, our most valuable skill isn't just technical troubleshooting - it's becoming better detectives of the invisible.

    Also worth reading

    Autopilot user name or password is incorrect 802.11x authentication (Reddit r/sysadmin)

    A user is experiencing issues with Windows Autopilot where they receive a 'user name or password is incorrect' error during the setup process when using 802.11x authentication. The problem does not occur when using Ethernet or WPA2 authentication. The user has simplified their setup by removing app
    Scroll to Top