trustyai-service-operator: Cache Architecture¶
Controller-runtime cache configuration controls which Kubernetes resources are cached in-memory. Misconfigured caches (cluster-wide watches on high-cardinality types without filters) are a primary cause of operator OOM kills.
Cache Architecture¶
Manager Configuration¶
| Property | Value |
|---|---|
| Manager file | cmd/main.go |
| Cache scope | cluster-wide |
| DefaultTransform | no |
| GOMEMLIMIT | 630MiB |
| Memory limit | 700Mi |
Issues¶
- No cache configuration: all informers are cluster-wide (OOM risk)
- Type ConfigMap is watched but has no cache filter (cluster-wide informer)
- Type Deployment is watched but has no cache filter (cluster-wide informer)
- Type EvalHub is watched but has no cache filter (cluster-wide informer)
- Type GuardrailsOrchestrator is watched but has no cache filter (cluster-wide informer)
- Type InferenceService is watched but has no cache filter (cluster-wide informer)
- Type Job is watched but has no cache filter (cluster-wide informer)
- Type LMEvalJob is watched but has no cache filter (cluster-wide informer)
- Type Namespace is watched but has no cache filter (cluster-wide informer)
- Type NemoGuardrails is watched but has no cache filter (cluster-wide informer)
- Type Service is watched but has no cache filter (cluster-wide informer)
- Type TrustyAIService is watched but has no cache filter (cluster-wide informer)
- Type Workload is watched but has no cache filter (cluster-wide informer)