Skip to content

ConfigDrift

Danger Level: Low

Modifies a key in a ConfigMap or Secret to test configuration reconciliation.

Spec Fields

Field Type Required Default Description
resourceType string Yes - Target resource type: ConfigMap or Secret
name string Yes - Name of the ConfigMap or Secret
key string Yes - Key within the data map to modify
value string Yes - Value to set (replaces existing value)
ttl duration No 300s Auto-cleanup duration

How It Works

ConfigDrift reads the target ConfigMap or Secret, saves the original value of the specified key, and overwrites it with the injected value. For ConfigMaps, the original value is stored in a rollback annotation on the resource itself. For Secrets, a separate rollback Secret is created (to avoid exposing sensitive data in annotations).

API calls: 1. Get the target ConfigMap or Secret 2. Store original value (annotation for ConfigMap, separate Secret for Secret) 3. Update the resource with the new value 4. On cleanup: Get the rollback data, restore original value, remove rollback metadata

Cleanup: Restores the original value from rollback storage. If the key did not originally exist, it is deleted.

Crash safety: Rollback data persists in Kubernetes (annotation or Secret). The Revert method can restore the original value even after a crash.

Disruption Rubric

Expected behavior on a healthy operator: The operator detects the configuration change and either: (a) reconciles the ConfigMap/Secret back to the expected state, or (b) adapts its behavior to the new configuration gracefully. The steady-state check should pass within recoveryTimeout.

Contract violation indicators: - Operator does not detect the change (no reconciliation triggered) - Operator crashes or enters error loop due to invalid configuration (indicates missing validation) - Configuration is silently accepted with incorrect behavior (indicates missing validation)

Collateral damage risks: - Low. Only the specified key in the specified resource is modified - If the ConfigMap is mounted as a volume, pods may need restart to pick up changes (depends on how the operator reads config) - Using dangerLevel: high with allowDangerous: true is required for config changes that could affect cluster-wide behavior

Recovery expectations: - Recovery time: 1-30 seconds (depends on reconciliation interval) - Reconcile cycles: 1 (detect drift, restore expected state) - What "recovered" means: ConfigMap/Secret has correct values, operator functioning normally

Cross-Component Results

Component Experiment Danger Description
codeflare codeflare-config-drift high When the codeflare operator configuration is corrupted, new cluster configuratio...
dashboard dashboard-config-drift high When the kube-rbac-proxy configuration is corrupted, the RBAC proxy sidecar shou...
kserve kserve-isvc-config-corruption high When the deploy key in the inferenceservice-config ConfigMap is overwritten with...
llamastack llamastack-config-drift high When the llamastack serving configuration is corrupted, new LLM deployments rece...
modelmesh modelmesh-config-drift high When the modelmesh serving configuration is corrupted, new model deployments rec...
odh-model-controller odh-model-controller-config-drift high When the inferenceservice-config ConfigMap is corrupted with an invalid deployme...
odh-model-controller odh-model-controller-ingress-config-corruption high When the ingress key in inferenceservice-config is emptied, the odh-model-contro...
odh-model-controller odh-model-controller-webhook-cert-corrupt high All 7 webhooks fail after TLS cert corruption; cert-manager or operator restores...