Category: Kubernetes

Understanding Kubernetes API Server Concurrency Controls

Kubernetes API performance depends heavily on how the API server manages concurrent requests. Two important parameters control how many simultaneous operations the control plane can process: --max-requests-inflight and --max-mutating-requests-inflight.

These flags define how many concurrent read and write requests the API server allows before it starts rejecting new ones with HTTP 429 Too Many Requests errors. They exist to prevent resource exhaustion and protect etcd and the API server itself from overload.

How the API Server Handles Requests

The API server processes every incoming request through a pipeline that includes authentication, authorization, admission control, and storage operations.

Before these stages, each request is subject to inflight limits:

  • Non-mutating requests (GETLISTWATCH) are controlled by --max-requests-inflight.
  • Mutating requests (POSTPUTPATCHDELETE) are limited by --max-mutating-requests-inflight.

Internally, Kubernetes uses semaphore-like counters implemented in Go to manage concurrency. When all available slots are occupied, new requests are rejected immediately.

efault Values and Tuning

The default values are usually:

  • --max-requests-inflight400
  • --max-mutating-requests-inflight200

Increasing these numbers allows more concurrent requests but consumes more CPU and memory and can create backpressure on etcd.
Setting them too low causes frequent throttling and timeouts for controllers and users.

A general rule of thumb is to keep the read limit around twice the write limit.
The optimal configuration depends on the control plane’s CPU, memory, and the overall cluster size.

Monitoring and Observability

Monitoring API server performance is key to proper tuning.
The following Prometheus metrics provide visibility:

  • apiserver_current_inflight_requests
  • apiserver_request_total{code="429"}

If 429 errors appear regularly without corresponding etcd latency increases, the API server limit is likely too restrictive.
If etcd latency rises first, the bottleneck is the storage layer, not the API layer.

Always adjust these flags gradually and validate the impact using Prometheus or Grafana dashboards.

API Priority and Fairness (APF)

Modern Kubernetes versions enable API Priority and Fairness (APF) by default.
This subsystem provides a dynamic way to manage concurrency through FlowSchemas and PriorityLevels.

The inflight flags still act as global hard limits, while APF handles per-user and per-workload fairness.
The recommended approach is to use these flags as safety caps and rely on APF for traffic shaping and workload isolation.

Managed Services (AKS, EKS, GKE)

On managed Kubernetes platforms, these flags can’t be changed directly — the control plane is fully managed by the cloud provider.
However, you can still influence how API requests behave and avoid throttling.

Azure Kubernetes Service (AKS)

  • You cannot change the API server flags directly.
  • Use API Priority and Fairness (APF) to control request behavior.
  • Choose a higher control plane SKU (Standard or Premium) for better performance.

Amazon Elastic Kubernetes Service (EKS)

  • AWS automatically adjusts concurrency limits based on cluster size.
  • For very large clusters or CI/CD-heavy environments, use multiple smaller clusters to spread the load.

Google Kubernetes Engine (GKE)

  • GKE automatically scales the control plane to handle load.
  • You cannot modify inflight flags directly.
  • You can define FlowSchemas for specific workloads if you need fine-grained API control.

Security and DoS Protection

These concurrency flags also play a critical role in protecting against denial-of-service attacks.
Without them, a flood of LIST or WATCH requests could exhaust the API server’s resources and cause a cluster-wide outage.

To protect against such risks:

  • Keep reasonable inflight limits.
  • Enable API Priority and Fairness with limitResponse.reject for low-priority users.
  • Use RBAC and NetworkPolicies to limit who can access the API.
  • Apply client-side throttling in controllers and operators.

Maxime.

Kubernetes 1.34: What’s New in Security

Released on August 27, 2025 under the theme “Of Wind & Will (O’ WaW)”, Kubernetes v1.34 brings a strong security focus, reinforcing zero-trust principles, secure defaults, and identity-aware operations across the platform.

Projected ServiceAccount Tokens for Image Pulls (Beta)

– What’s new: The kubelet can now use short-lived, audience‑bound ServiceAccount tokens to authenticate with container registries, eliminating static Secrets on nodes.

– Why it matters: This significantly shrinks the attack surface by eschewing long-lived credentials, aligning registry access with workload identity rather than node-level secrets.

Scoped Anonymous Access for API Endpoints

– What’s new: Administrators can now safely expose health endpoints (/healthz, /readyz, /livez) to unauthenticated access, while denying broader anonymous access via narrow configuration in AuthenticationConfiguration.

– Why it matters: Prevents accidental overexposure of API capabilities, balancing observability/open health checks with tightened security controls.

Pod Identity & mTLS with PodCertificateRequests (Stable)

– What’s new: Pods can now obtain X.509 certificates via PodCertificateRequests, allowing kubelet-managed issuance for use in mTLS authentication.

– Why it matters: Embeds strong, workload-specific identity into the platform, reinforcing secure communication patterns among services.

Field or Label-Aware RBAC (Enhanced Least Privilege)

– What’s new: Although not yet GA, emerging enhancements allow RBAC rules that consider node or pod-specific attributes (fields or labels) to enforce least-privilege access.

– Why it matters: Granular permissions reduce risk from overbroad role bindings, tightening control over what pods or nodes can access and do.

CEL Mutation Policies & External JWT Signing

– CEL Mutation Policies: Introduce native support for rule-based mutation using Common Expression Language (CEL), enabling secure, declarative policy enforcement within Kubernetes.

– External JWT Signing: Facilitates signing JWTs via external key management services, removing local key storage and enhancing auditability and security.

Mutual TLS (mTLS) for Pod-to-API Traffic

– What’s new: Kubernetes is ramping up mTLS support to secure pod-to-API server communications, though details are still unfolding.

– Why it matters: Ensures encrypted, authenticated channeling between workloads and the control plane, a key zero-trust tenet.

OCI Artifact Volumes & Image Pull Security

– What’s new: Ability to mount OCI images directly as volumes, ensuring secure, versioned delivery of external files to pods.

– Why it matters: Reduces reliance on sidecars or manual injection methods, streamlining configuration while preserving integrity.

Conclusion

Kubernetes v1.34 represents a meaningful step forward in embedding robust security into the platform itself. From per-pod identity to safer defaults, explicit anonymous access handling, and fine-grained policy enforcement, it advances Kubernetes toward a more zero-trust architecture.

Organizations should explore upgrading thoughtfully, especially leveraging the projected ServiceAccount tokens, pod-level certification, and scoped anonymous access to immediately elevate cluster security.

Maxime.

User Namespaces in Kubernetes: Perspectives on Isolation and Escape

User Namespaces in Kubernetes are designed to improve pod isolation by mapping container users to non-root UIDs on the host. While they offer a promising sandboxing mechanism, their security implications are nuanced. For offensive security practitioners, understanding how user namespaces work opens doors to assess potential privilege escalation, misconfigurations, and runtime escape attempts in hardened clusters.


What Are Kubernetes User Namespaces?

In traditional Kubernetes setups, containers often run as root (UID 0), which also maps to root on the host unless otherwise restricted (e.g., with seccomp, AppArmor, or dropping capabilities). With User Namespaces, UID 0 inside the container can be mapped to a non-root UID (e.g., 100000) on the host, drastically reducing the risk of container breakout.

Core Concept:

Container UIDMapped Host UID
0100000
1100001
1000101000

This mapping isolates privilege levels, ensuring that root inside the container ≠ root on the host.


Offensive Security Perspective: Attack Surface & Evasion Tactics

Despite the promise of tighter isolation, user namespaces introduce complexity that can be exploited or abused if not configured properly. Let’s analyze common offensive scenarios.


1. Privilege Escalation via Misconfigured Mappings

If the UID/GID mappings are too broad, or improperly configured (e.g., overlapping ranges), an attacker could potentially:

  • Access sensitive host resources via mapped UIDs.
  • Use remapped file permissions to exploit volume mounts (e.g., hostPath or PVCs).
  • Abuse misaligned subuid/subgid ranges to escalate outside the intended sandbox.

Example Attack:

bashCopyEdit# Container UID 0 maps to Host UID 100000
# But /data is mounted with files owned by Host UID 100000
cat /data/secrets.txt

Result: UID 0 inside the container has effective access to host-owned files, violating isolation.


2. Kernel Exploits Inside Namespaces

Even with UID remapping, the container shares the kernel. User namespaces do not prevent kernel-level exploits such as:

  • DirtyPipe (CVE-2022-0847)
  • Dirty COW (CVE-2016-5195)
  • StackRot (CVE-2023-3269)

Red Team Tactic:
If CAP_SYS_ADMIN is not dropped, or seccomp filters are lax, you can test kernel exploits inside user namespaces with minimal detection due to reduced apparent privileges.

# Run DirtyPipe exploit inside a user-namespaced container
./dirtypipe /etc/passwd "root::0:0:root:/root:/bin/bash\n"

3. Anti-Forensics & Evasion with Mapped UIDs

User namespaces make detection more complex from a blue team’s perspective:

  • Logs might show actions from UID 100000+ on the host instead of UID 0.
  • Traditional forensic tooling might miss attribution of malicious activity if unaware of the mapping.

Example:

bashCopyEdit# From container: creates a backdoor as UID 0
echo "malicious_user:x:0:0::/root:/bin/bash" >> /etc/passwd

On host, this action appears as being made by UID 100000 — not obviously suspicious unless correlated with namespace mappings.


4. Side-channel and Shared Resource Attacks

Even with user namespaces, shared resources can become attack vectors:

  • cgroups/proc, and /sys access
  • Spectre/Meltdown-style attacks
  • CPU time, memory pressure side channels

Tactic:
Use PID or mount namespace escape primitives combined with user namespace to test access to host-level interfaces. For example:

lsns -t user,pid

If misconfigured, you may be able to observe or interfere with host-level processes.


What Doesn’t Work (Defense Successes)

User namespaces offer strong mitigations against:

ThreatMitigated by User Namespaces
Direct host root access
Access to host UID 0 files
CAP_SYS_ADMIN container use✅ (if dropped)
AppArmor/SELinux bypass✅ (if enforced properly)

However, they do not protect against:

  • Kernel-level vulnerabilities
  • Volume mount misconfigurations
  • Lax seccomp/bpf policies
  • Insufficient container runtime restrictions (e.g., allowing --privileged)

Offensive Testing Setup

You can simulate a user namespace-enabled cluster using:

yamlCopyEditapiVersion: v1
kind: Pod
metadata:
  name: userns-test
spec:
  securityContext:
    runAsUser: 0
    runAsGroup: 0
    seccompProfile:
      type: RuntimeDefault
  runtimeClassName: "userns"
  containers:
  - name: test
    image: alpine
    command: ["/bin/sh", "-c", "id && sleep 3600"]

Check the UID mappings inside the container:

cat /proc/self/uid_map

Use host PID or mount checks to assess boundary enforcement.


How to Defend Against Offense

Defense LayerBest Practices
UID/GID MappingUse minimal, non-overlapping ranges (subuidsubgid)
CapabilitiesDrop all except required ones (especially CAP_SYS_ADMIN)
SeccompEnforce strict syscall profiles (RuntimeDefaultCustom)
Volume ManagementAvoid hostPath; use projected volumes or CSI
RuntimeClass EnforcementUse PodSecurityAdmission or Kyverno/Gatekeeper to enforce runtimeClassName

Conclusion

While User Namespaces are a powerful isolation primitive in Kubernetes, they’re not a silver bullet. Offensive security testing shows that, when misconfigured or poorly integrated with other controls, user namespaces can be subverted or bypassed. A layered defense including syscall filtering, strict capability drops, and forensic visibility tooling — is essential.

If you’re building hardened Kubernetes platforms, test user namespaces like an attacker. Simulate kernel exploits, map UID collisions, and validate how well the telemetry captures identity mappings.