Skip to content

trustyai-service-operator: Cache Architecture

Controller-runtime cache configuration controls which Kubernetes resources are cached in-memory. Misconfigured caches (cluster-wide watches on high-cardinality types without filters) are a primary cause of operator OOM kills.

Cache Architecture

Manager Configuration

Property Value
Manager file cmd/main.go
Cache scope cluster-wide
DefaultTransform no
GOMEMLIMIT 630MiB
Memory limit 700Mi

Issues

  • No cache configuration: all informers are cluster-wide (OOM risk)
  • Type ConfigMap is watched but has no cache filter (cluster-wide informer)
  • Type Deployment is watched but has no cache filter (cluster-wide informer)
  • Type EvalHub is watched but has no cache filter (cluster-wide informer)
  • Type GuardrailsOrchestrator is watched but has no cache filter (cluster-wide informer)
  • Type InferenceService is watched but has no cache filter (cluster-wide informer)
  • Type Job is watched but has no cache filter (cluster-wide informer)
  • Type LMEvalJob is watched but has no cache filter (cluster-wide informer)
  • Type Namespace is watched but has no cache filter (cluster-wide informer)
  • Type NemoGuardrails is watched but has no cache filter (cluster-wide informer)
  • Type Service is watched but has no cache filter (cluster-wide informer)
  • Type TrustyAIService is watched but has no cache filter (cluster-wide informer)
  • Type Workload is watched but has no cache filter (cluster-wide informer)