ConfigDrift¶
Danger Level: Low
Modifies a key in a ConfigMap or Secret to test configuration reconciliation.
Spec Fields¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
resourceType |
string |
Yes | - | Target resource type: ConfigMap or Secret |
name |
string |
Yes | - | Name of the ConfigMap or Secret |
key |
string |
Yes | - | Key within the data map to modify |
value |
string |
Yes | - | Value to set (replaces existing value) |
ttl |
duration |
No | 300s |
Auto-cleanup duration |
How It Works¶
ConfigDrift reads the target ConfigMap or Secret, saves the original value of the specified key, and overwrites it with the injected value. For ConfigMaps, the original value is stored in a rollback annotation on the resource itself. For Secrets, a separate rollback Secret is created (to avoid exposing sensitive data in annotations).
API calls:
1. Get the target ConfigMap or Secret
2. Store original value (annotation for ConfigMap, separate Secret for Secret)
3. Update the resource with the new value
4. On cleanup: Get the rollback data, restore original value, remove rollback metadata
Cleanup: Restores the original value from rollback storage. If the key did not originally exist, it is deleted.
Crash safety: Rollback data persists in Kubernetes (annotation or Secret). The Revert method can restore the original value even after a crash.
Disruption Rubric¶
Expected behavior on a healthy operator:
The operator detects the configuration change and either: (a) reconciles the ConfigMap/Secret back to the expected state, or (b) adapts its behavior to the new configuration gracefully. The steady-state check should pass within recoveryTimeout.
Contract violation indicators: - Operator does not detect the change (no reconciliation triggered) - Operator crashes or enters error loop due to invalid configuration (indicates missing validation) - Configuration is silently accepted with incorrect behavior (indicates missing validation)
Collateral damage risks:
- Low. Only the specified key in the specified resource is modified
- If the ConfigMap is mounted as a volume, pods may need restart to pick up changes (depends on how the operator reads config)
- Using dangerLevel: high with allowDangerous: true is required for config changes that could affect cluster-wide behavior
Recovery expectations: - Recovery time: 1-30 seconds (depends on reconciliation interval) - Reconcile cycles: 1 (detect drift, restore expected state) - What "recovered" means: ConfigMap/Secret has correct values, operator functioning normally
Cross-Component Results¶
| Component | Experiment | Danger | Description |
|---|---|---|---|
| codeflare | codeflare-config-drift | high | When the codeflare operator configuration is corrupted, new cluster configuratio... |
| dashboard | dashboard-config-drift | high | When the kube-rbac-proxy configuration is corrupted, the RBAC proxy sidecar shou... |
| kserve | kserve-isvc-config-corruption | high | When the deploy key in the inferenceservice-config ConfigMap is overwritten with... |
| llamastack | llamastack-config-drift | high | When the llamastack serving configuration is corrupted, new LLM deployments rece... |
| modelmesh | modelmesh-config-drift | high | When the modelmesh serving configuration is corrupted, new model deployments rec... |
| odh-model-controller | odh-model-controller-config-drift | high | When the inferenceservice-config ConfigMap is corrupted with an invalid deployme... |
| odh-model-controller | odh-model-controller-ingress-config-corruption | high | When the ingress key in inferenceservice-config is emptied, the odh-model-contro... |
| odh-model-controller | odh-model-controller-webhook-cert-corrupt | high | All 7 webhooks fail after TLS cert corruption; cert-manager or operator restores... |