Skip to content

OwnerRefOrphan

Danger Level: Medium

Removes ownerReferences from operator-managed resources to test re-adoption logic.

Spec Fields

Field Type Required Default Description
apiVersion string Yes - API version of the target resource (e.g., apps/v1)
kind string Yes - Kind of the target resource (e.g., Deployment)
name string Yes - Name of the target resource instance
ttl duration No 300s Auto-cleanup duration

How It Works

OwnerRefOrphan reads the target resource, saves its ownerReferences in a rollback annotation, then clears all ownerReferences via a JSON merge patch. This simulates a resource becoming "orphaned" from its parent controller.

API calls: 1. Get the target resource as Unstructured 2. Serialize original ownerReferences to rollback annotation 3. Patch the resource with empty ownerReferences array, add chaos labels 4. On cleanup: check if operator re-adopted, restore original ownerReferences only if still orphaned

Cleanup: Checks whether the operator has already re-adopted the resource (non-empty ownerReferences). If so, only removes chaos metadata. If still orphaned, restores the original ownerReferences from the rollback annotation. Idempotent.

Crash safety: Rollback annotation persists on the resource. Revert also checks for re-adoption before restoring.

Disruption Rubric

Expected behavior on a healthy operator: The parent controller detects that its child resource no longer has an ownerReference pointing back to it, and re-adopts the resource by adding a new ownerReference. The resource should never be garbage collected during the test window because the experiment uses a short TTL.

Contract violation indicators: - Operator does not detect the orphaned resource (indicates missing watch or adoption logic) - Resource is garbage collected because the operator relied solely on ownerReferences for lifecycle management - Operator creates a duplicate resource instead of re-adopting the existing one - Operator enters error loop trying to manage a resource it no longer owns

Collateral damage risks: - Medium. Only the target resource's metadata is modified - If the operator uses ownerReferences for cascading deletion, orphaning may prevent cleanup - Protected kinds (Namespace, Node, ChaosExperiment) are rejected by validation

Recovery expectations: - Recovery time: 5-60 seconds (depends on reconciliation interval) - Reconcile cycles: 1-2 - What "recovered" means: resource has ownerReferences restored (either by operator or cleanup)

Cross-Component Results

Component Experiment Danger Description
kserve kserve-ownerref-orphan medium Removing ownerReferences from the kserve-controller-manager Deployment should tr...
odh-model-controller odh-model-controller-ownerref-orphan medium Removing ownerReferences from the odh-model-controller Deployment should trigger...