Skip to content

QuotaExhaustion

Danger Level: Medium

Creates a restrictive ResourceQuota to test operator behavior under resource pressure.

Spec Fields

Field Type Required Default Description
quotaName string Yes - Name for the ResourceQuota to create
cpu string No - CPU limit (e.g., '100m', '1')
memory string No - Memory limit (e.g., '128Mi', '1Gi')
pods string No - Maximum number of pods (e.g., '0', '5')
ttl duration No 300s Auto-cleanup duration

How It Works

QuotaExhaustion creates a Kubernetes ResourceQuota with intentionally tight limits in the target namespace. This forces the operator to handle resource creation failures (pods, PVCs, etc.) that would normally succeed.

API calls: 1. Check if a quota with the given name already exists (reject if so) 2. Build a ResourceList from the provided parameters (cpu, memory, pods, etc.) 3. Create the ResourceQuota with chaos labels 4. On cleanup: Delete the ResourceQuota

At least one resource limit parameter is required. Setting pods: "0" is the most aggressive option, preventing any new pod creation in the namespace.

Cleanup: Deletes the ResourceQuota. Idempotent.

Crash safety: Revert checks for chaos labels before deleting, so it won't accidentally remove user-created quotas. Use operator-chaos clean for orphaned quotas.

Disruption Rubric

Expected behavior on a healthy operator: The operator attempts to create or scale resources and encounters quota errors (403 Forbidden). It should handle these errors gracefully: log the failure, set degraded status conditions on the CR, and retry with backoff. Once the quota is removed, the operator should resume normal operation.

Contract violation indicators: - Operator crashes or panics on quota errors (indicates missing error handling for resource creation) - Operator enters infinite tight loop retrying without backoff (indicates missing retry logic) - Operator does not surface quota errors in CR status (indicates swallowed errors) - Operator does not recover after quota is removed (indicates no retry mechanism)

Collateral damage risks: - Medium to high. The quota affects ALL pod/resource creation in the namespace, not just the operator - Other controllers and workloads in the same namespace are also restricted - Setting pods: "0" blocks all new pod creation, including rollout restarts - Use a dedicated test namespace when possible

Recovery expectations: - Recovery time: immediate after quota removal (pending pods should be created) - Reconcile cycles: 1-3 (detect quota removal, retry resource creation, verify) - What "recovered" means: operator successfully creates resources that were previously blocked

Cross-Component Results

Component Experiment Danger Description
dashboard dashboard-quota-exhaustion medium Exhausting pod quota in the dashboard namespace should prevent new pods from bei...
odh-model-controller odh-model-controller-quota-exhaustion medium Creating a restrictive ResourceQuota that prevents pod creation should cause the...