Category: Kubernetes

25 November 202525 November 2025

My First KubeCon as a CNCF Ambassador: Atlanta 2025 Highlights

This year’s KubeCon in Atlanta brought together some of the most vibrant and forward-looking voices in the cloud-native ecosystem. Following an insightful and energetic Cloud Native Rejekts conference where the community pushed boundaries and shared unfiltered technical expertise, it was time for KubeCon, the global stage where these ideas scale, mature, and inspire real-world change.

KubeCon + CloudNativeCon North America 2025 also marked a milestone in my personal journey. It was my first time attending as a CNCF Ambassador, and I had the privilege of contributing as a speaker not once, but twice. It was a meaningful and transformative experience, and I’m excited to share what made this edition truly exceptional. With thousands of attendees from across the globe, dozens of tracks spanning everything from platform engineering and security to storage and edge compute, and the usual surge of side-conversations and hallway meetups, the energy was palpable.

I also had the opportunity to join the Platform Engineering Coffee Meetup for the first time a valuable learning experience and a surprisingly engaging discussion to kick off the day at 7 AM (outch!).

As a contributor to Kubernetes security eco-system and an active member of the cloud-native community, I arrived with two hats: attendee and speaker (twice). This event felt like a clear reflection of how fast our ecosystem is evolving, and how cloud-native security is reshaping itself alongside platform engineering, AI, and runtime detection.

Preparing and delivering these talks was an incredible learning experience. The KubeCon audience is uniquely engaged filled with practitioners who ask thoughtful questions and share their own experiences. The conversations that continued in the hallway track after my sessions were just as valuable as the presentations themselves.

Several themes dominated the conversations at KubeCon Atlanta 2025:

AI + Kubernetes Integration

Kubernetes shifting from orchestration to GPU-native AI operating substrate
Focus on dynamic GPU partitioning (MIG, vGPU) to reduce cost
NUMA/topology-aware scheduling for low-latency training and inference
Standardized device plugins across NVIDIA, AMD ROCm, Intel
Autoscaling based on tokens & latency, not HTTP metrics
Model, dataset, and embeddings treated as signed supply-chain artifacts
Identity-backed access to models and GPUs leveraging SPIFFE/SPIRE

Platform Engineering for Multi-Tenant AI

Internal platforms must enforce GPU quotas, tenancy, and access control
Integration of model registries with RBAC + SPIFFE/SPIRE identities
Dataset lineage + provenance manifests as mandatory artifacts
Policy-as-code for inference filtering (prompts, outputs)
SLSA-based model pipelines (dataset → training → signed model → inference)
Golden paths now include GPU profiles, dataset hashing, autoscaling hints
Tenant isolation + workload identity using SPIFFE/SPIRE across pipelines

Zero Trust for AI Workloads

Zero trust applied to models, datasets, and GPU hardware
SPIFFE/SPIRE used for identity-bound GPU access + model attestation
Dataset poisoning considered a CI/CD security risk
Prompt abuse across shared tenants treated as data leakage vector
GPU side-channel attacks via shared memory and plugins
Model exfiltration prevention using signed registries + identity controls

eBPF for AI Observability & Security

Collection of GPU telemetry with near-zero overhead
Detection of anomalies in token-level latency and inference cost
Monitoring PCIe/NVLink/GPU bandwidth for distributed training
Real-time introspection into vector-heavy pipelines
eBPF + SPIFFE/SPIRE integration emerging for identity-aware telemetry

Core Takeaway

Kubernetes is evolving into a GPU-native, AI-governed platform
AI demands reshaping scheduling, platform engineering, and security models
SPIFFE/SPIRE is becoming the identity backbone of AI infrastructure
The future of Kubernetes is AI-native by design

Must-Watch Sessions

Looking forward to seeing everyone at future KubeCon events. The next stops are Amsterdam (Europe 2026), Mumbai (India 2026), and Yokohama (Japan 2026)!

Maxime.

9 November 20259 November 2025

My Experience at Cloud Native Rejekts NA 2025

After speaking last year at Cloud Native Rejekts Salt Lake City 2024 with Mathieu on “Platform Engineering Loves Security: Shift Down to Your Platform, not Left to Your Developers!”, I returned to Cloud Native Rejekts North America 2025 in Atlanta, this time as an attendee, eager to reconnect with the community, discover new research, and exchange ideas on the evolution of Kubernetes Security, AI integration, and platform engineering.

Cloud Native Rejekts has always been a special event for me. It’s intimate, technically rich, and community-driven, the perfect pre-KubeCon gathering where bold ideas and unfiltered discussions shape the future of our ecosystem.

Atlanta’s edition had an incredible mix of platform engineers, SREs, Devs, and security practitioners from across North America and beyond. What I love most about Rejekts is the raw energy talks are deeply technical, hallway conversations turn into architecture sessions, and everyone genuinely wants to share and learn.

The venue setup encouraged collaboration, and the diversity of topics from runtime isolation to AI-driven observability, reflected just how fast our space is evolving.

My Top 5 Highlights from Cloud Native Rejekts 2025

1. Catch Me If You Can: A Kubernetes Escape Story

A live, show-stopping demo by Jed Salazar and James Petersen revealed the anatomy of a real-world container escape from the initial breakout to lateral movement across a Kubernetes cluster. The session unpacked how weak isolation, misconfigured permissions, and monitoring blind spots open the door to stealthy takeovers, and how defenses like user namespaces in Kubernetes 1.33, capability hardening, and runtime detection close it.

Through a step-by-step attack reconstruction, the speakers connected kernel-level exploits to cluster-wide compromise, then flipped the lens to show how to build multi-tenant isolation, detect breakout signals early, and contain the blast radius.

A must-watch for anyone serious about runtime hardening and defense-in-depth in Kubernetes.

2. Beyond the Default Scheduler: Navigating GPU Multitenancy in the AI Era

Shivay Lamba, Hrittik Roy, and Saiyam Pathak explored one of the toughest challenges in AI infrastructure: secure GPU sharing.

They broke down how time-slicing improves utilization but weakens isolation and why NVIDIA MIG’s hardware partitioning (cores, memory, L2 cache) is a game changer.

By leveraging schedulers like KAI, Volcano, and Kueue, they showed how to build secure, fair, and efficient multi-tenant GPU clusters that can power the next generation of AI workloads.

3. Make Your Developer’s Pains Go Away, with the Right Level of Abstraction for Your Platform

Mathieu Benoit and Artem Lajko tackled a reality every engineer knows too well: developers don’t spend their day coding, they spend it battling TicketOps, infrastructure blockers, and security gates.

Their talk presented a battle-tested approach to building Internal Developer Platforms (IDPs) with empathy, powered by Score and Kro.

The key takeaway: successful platforms don’t hide Kubernetes, they abstract it at the right level.

By combining GitOps workflows with automation, they demonstrated how developers can deploy secure, production-grade workloads effortlessly focusing on their apps while the platform handles the hard parts behind the scenes. It wasn’t about YAMLs or GitOps, it was about developer joy.

4. In-SPIRE-ing Identity: Using SPIRE for Verifiable Container Isolation

Marina Moore delivered a brilliant session on cryptographic attestation for workloads using SPIFFE/SPIRE. Edera’s architecture lets teams prove that workloads run in isolated zones with end-to-end encryption and non-falsifiable build provenance essentially, identity as a security perimeter.

Her insights into deployment challenges and configuration trade-offs offered a roadmap for teams moving toward verifiable workload trust in cloud-native systems.

behind the scenes. It wasn’t about YAMLs or GitOps, it was about developer joy.

5. The Paranoid’s Guide to Deploying Skynet’s Interns

This talk is a reality check for anyone deploying AI agents in production. While autonomous agents are powerful, they’re being plugged into legacy or unsecured architectures, a recipe for chaos.

The speaker (Dan Fernandez) breaks down the anatomy of AI agent ecosystems (Agents, MCP servers, Tools, and Memory) before exposing the major security pitfalls:

Tangled Web of Trust: Agents interact with tools and data sources of mixed trust levels, risking internal system compromise.
Persistent Threats: Because agents “remember,” attacks can persist, evolve, and resurface over time.
Amplified Supply Chain Risks: Every autonomous action turns dependencies into potential attack vectors.
Compounding Complexity: Multi-agent comms and centralized MCPs obscure visibility and weaken control.

The key takeaway: treat AI agents as untrusted, dynamic supply chains. Apply strict segmentation, isolation, and defense-in-depth to every component from MCP servers to memory stores. Paranoia isn’t overkill here, it’s essential for survival in the era of autonomous AI.

Networking and Shared Purpose

Beyond the sessions, hallway conversations were pure gold.
I had deep discussions about:

Integrating Kubernetes security controls within Platform Engineering and Internal Developer Platforms (IDPs) to deliver secure-by-default services while maintaining developer velocity.
Measuring platform security maturity using structured threat models and practical scorecards.
Embedding AI-driven risk assessment directly into CI/CD pipelines for continuous validation.

Watch the Replays

Theater Sessions
- https://www.youtube.com/watch?v=iQo1Sx4wXtY
- https://www.youtube.com/watch?v=ZgnJWnhldyE
Crystal Dining Room Sessions
- https://www.youtube.com/watch?v=DiB6Rwqhn1g
- https://www.youtube.com/watch?v=tROp-nmNYxo

It’s clear that the Cloud Native Rejekts community thrives on transparency, mentorship, and shared improvement, values that continue to guide my own journey in cloud-native security.

And finally a heartfelt thank you to all the volunteers who made this event possible. Your passion, generosity, and dedication are what make Cloud Native Rejekts such a unique experience. It’s more than a conference it’s a community space where creativity meets curiosity, where ideas grow into open-source projects, and where the next wave of cloud-native innovation quietly takes shape.

Maxime.

15 October 202515 October 2025

Restricting Pod Access to Azure IMDS (Preview)

In the world of Kubernetes on Azure, there’s been a longstanding default: any pod in your AKS cluster can query the Azure Instance Metadata Service (IMDS). That’s powerful — but also risky. Today, Microsoft introduces a preview feature that lets you block pod access to IMDS, tightening your cluster’s security boundaries.

Why Restrict IMDS?

IMDS is a REST API that provides VM metadata: VM specs, networking, upcoming maintenance events, and (critically) identity tokens. Because it’s accessible by default (via IP 169.254.169.254), a pod that’s compromised or misbehaving could exploit this to pull sensitive information or impersonate the node’s identity. That’s a serious threat.

By limiting which pods can reach IMDS, you reduce the “blast radius” of potential vulnerabilities.

How the Restriction Works (Preview)

Non host network pods (hostNetwork: false) lose access to IMDS entirely once restriction is enabled.
Host network pods (hostNetwork: true) retain access (they share the same network space as the node).
Azure implements this via iptables rules on the node to block traffic from non-host pods.
Tampering with iptables (e.g. via SSH or privileged containers) can break enforcement, so best practices like disabling SSH or avoiding privileged pods come into play.

Limitations & Considerations

Because this is still in preview, there are a number of tradeoffs:

Many AKS add-ons do not support IMDS restriction (e.g. Azure Monitor, Application Gateway Ingress, Flux/GitOps, Azure Policy, etc.).
Windows node pools aren’t supported yet.
Enabling restriction on a cluster that uses unsupported add-ons will fail.
After enabling or disabling, you must reimage the nodes (e.g. via az aks upgrade --node-image-only) to apply or remove the iptables rules.
The feature is opt-in and isn’t backed by an SLA or warranty.

Getting Started: Enabling IMDS Restriction

Use Azure CLI 2.61.0+ and install or update aks-preview.
Register the IMDSRestrictionPreview feature and refresh the ContainerService provider.
Ensure OIDC issuer is enabled on your cluster (required).
To create a new cluster with this feature:az aks create ... --enable-imds-restriction
To enable it on an existing cluster:az aks update ... --enable-imds-restriction Then reimage nodes for enforcement.
To verify, deploy test pods with and without hostNetwork: true and attempt to curl IMDS — the non-host pods should fail, the host pods should succeed.
To disable, run az aks update --disable-imds-restriction and reimage.

Final Thoughts

This new capability gives AKS users an additional layer of defense: limiting which pods can access VM metadata and identities.

Reference: https://learn.microsoft.com/en-us/azure/aks/imds-restriction

Maxime.