LabelStomping¶
Danger Level: Medium
Modifies or removes labels on operator-managed resources to test label-based reconciliation.
Spec Fields¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
apiVersion |
string |
Yes | - | API version of the target resource (e.g., apps/v1) |
kind |
string |
Yes | - | Kind of the target resource (e.g., Deployment) |
name |
string |
Yes | - | Name of the target resource instance |
labelKey |
string |
Yes | - | Label key to modify or delete |
action |
string |
Yes | - | Action to perform: 'overwrite' (set a new value) or 'delete' (remove the label) |
newValue |
string |
No | chaos-stomped |
Value to set when action is 'overwrite' |
How It Works¶
LabelStomping uses the Unstructured client to read the target resource, records the current label value in a rollback annotation (with SHA-256 integrity checksum), then applies a JSON merge patch to overwrite or remove the specified label.
API calls:
1. Get the target resource as Unstructured
2. Read the current label value, store rollback data in annotation via safety.WrapRollbackData
3. Patch the resource with the new label value (overwrite) or null (delete via merge patch), plus chaos labels
4. On cleanup: restore original label value from rollback annotation, remove chaos metadata
Cleanup: Re-fetches the resource, restores the original label value (or removes it if it didn't exist before), and removes chaos labels and rollback annotation. Idempotent.
Crash safety: Rollback annotation persists on the resource with SHA-256 checksum. Revert reads the annotation to restore state even after a process crash.
Safety Rules¶
- Chaos-owned labels are rejected:
app.kubernetes.io/managed-byand any label prefixed withchaos.operatorchaos.io/cannot be targeted (prevents rollback corruption) - System labels require high danger: Labels matching
kubernetes.io/,k8s.io/, ornode-role.kubernetes.io/patterns requiredangerLevel: high - Label key/value validation: Keys and values are validated against Kubernetes label format rules (max 63 char name, optional DNS prefix, alphanumeric with
._-) - Delete non-existent label rejected: Attempting to delete a label that doesn't exist returns an error (no-op rejection)
Disruption Rubric¶
Expected behavior on a healthy operator: The operator detects that a label on its managed resource has been modified or removed, and restores it to the expected value during the next reconciliation cycle. This validates that the operator's label-based selectors and reconciliation logic are working correctly.
Contract violation indicators: - Operator does not detect the label change (indicates missing watch or label reconciliation) - Operator does not restore the original label value (reconciler only checks spec, not metadata) - Operator's label selectors break and it loses track of the resource - Operator creates a duplicate resource because it can no longer find the original by label
Collateral damage risks: - Medium. Only the target resource's label metadata is modified - If the operator uses labels for service selectors, traffic routing may be disrupted during the test window - System labels (kubernetes.io/) can affect scheduling, node affinity, and cluster behavior, hence the high danger requirement
Recovery expectations: - Recovery time: 5-30 seconds (depends on reconciliation interval) - Reconcile cycles: 1-2 - What "recovered" means: label restored to its original value
Cross-Component Results¶
| Component | Experiment | Danger | Description |
|---|---|---|---|
| odh-model-controller | odh-model-controller-label-stomping | high | When a label used for resource discovery is overwritten on the odh-model-control... |