Development Setup¶
This guide walks you through setting up a local development environment for Operator Chaos.
Prerequisites¶
Required Tools¶
- Go 1.25 or later — Install Go
- Git — For cloning the repository
- kubectl — Install kubectl
- Access to a Kubernetes cluster — Kind, Minikube, or OpenShift
Optional Tools¶
- kind — Install kind (recommended for local testing)
- golangci-lint — Install golangci-lint (for linting)
- make — For using Makefile targets
Clone the Repository¶
Verify Go Version¶
If your Go version is older, update it before proceeding.
Install Dependencies¶
This downloads all Go module dependencies defined in go.mod.
Build the Project¶
Build All Binaries¶
This compiles all packages and ensures there are no syntax or type errors.
Build the CLI¶
The binary will be placed in bin/operator-chaos.
Build the Controller¶
The controller is the same binary, started with controller start:
Run Tests¶
Unit Tests¶
Run tests with verbose output:
Test Specific Packages¶
Run with Coverage¶
Open coverage.html in a browser to view coverage reports.
Project Structure¶
operator-chaos/
├── api/v1alpha1/ # CRD types and validation
│ ├── types.go # ChaosExperiment CRD definition
│ └── groupversion_info.go # API group metadata
├── cmd/
│ ├── cli/ # CLI entrypoint
│ └── controller/ # Controller entrypoint
├── pkg/
│ ├── injection/ # Injection engine
│ │ ├── engine.go # Registry and Injector interface
│ │ ├── podkill.go # PodKill implementation
│ │ ├── network.go # NetworkPartition implementation
│ │ └── ... # Other injectors
│ ├── observer/ # Observation system
│ │ ├── board.go # Blackboard implementation
│ │ ├── contributor.go # Contributor interface
│ │ └── ... # Specific contributors
│ ├── orchestrator/ # Experiment orchestration
│ │ └── lifecycle.go # Lifecycle state machine
│ ├── evaluator/ # Verdict computation
│ ├── reporter/ # Report generation
│ ├── safety/ # Blast radius and safety checks
│ ├── model/ # Operator knowledge and dependency graph
│ └── sdk/ # Go SDK for client-side chaos
│ ├── client.go # ChaosClient wrapper
│ ├── types.go # FaultConfig types
│ └── faults/ # Fault injection primitives
├── config/
│ ├── crd/ # CRD manifests
│ ├── controller/ # Controller deployment manifests
│ └── samples/ # Example experiments
├── experiments/ # Additional experiment examples
├── site/ # Documentation (MkDocs)
└── Makefile # Build automation
Running the CLI Locally¶
Run an Experiment¶
Validate an Experiment¶
List Available Injection Types¶
Generate Report¶
Reports are saved as JSON files in the specified directory.
Running the Controller Locally¶
1. Set Up a Local Cluster¶
Using kind¶
Using Minikube¶
2. Install CRDs¶
Verify CRD installation:
3. Run Controller Locally¶
The controller will watch for ChaosExperiment resources and reconcile them.
Controller Logs:
INFO controller-runtime.metrics Metrics server is starting to listen
INFO controller-runtime.builder Starting EventSource
INFO controller-runtime.builder Starting Controller
INFO controller-runtime.controller Starting workers
4. Submit an Experiment¶
In another terminal:
Watch experiment progress:
View experiment status:
Running the Dashboard Locally¶
The dashboard is a React + Vite web UI for viewing experiment results.
1. Install Node.js Dependencies¶
2. Start Development Server¶
# Terminal 1: Run the Go backend
go run ./dashboard/cmd/dashboard/ -knowledge-dir knowledge/
# Terminal 2: Run the Vite dev server (with HMR)
cd dashboard/ui && npm run dev
The Vite dev server proxies /api/ requests to the Go backend (port 8080). The dashboard will be available at http://localhost:5173.
3. Build for Production¶
This outputs to dashboard/ui-dist/, which is embedded into the Go binary via go:embed.
Code Quality Tools¶
Linting¶
Fix auto-fixable issues:
Formatting¶
Vet¶
Development Workflow¶
1. Create a Feature Branch¶
2. Make Changes¶
Edit code, add tests, update documentation.
3. Run Tests¶
4. Lint and Format¶
5. Commit Changes¶
6. Push and Open PR¶
Open a pull request on GitHub.
Testing Against a Real Cluster¶
1. Deploy Target Operators¶
Deploy the operators you want to test. For example, to test with OpenDataHub, follow the ODH installation guide.
2. Apply Operator Knowledge¶
3. Run Experiments¶
4. View Results¶
Debugging¶
Enable Debug Logging¶
Set CHAOS_LOG_LEVEL=debug when running the controller or CLI:
Inspect Chaos-Managed Resources¶
List all resources managed by chaos:
View Rollback Annotations¶
Controller Restart Recovery¶
To test crash-safe cleanup, kill the controller mid-experiment and restart it:
The controller should detect in-progress experiments and clean them up via Revert().
Common Issues¶
"CRD not found"¶
Solution: Install CRDs:
"Permission denied" errors from controller¶
Solution: Apply RBAC manifests:
Experiments stuck in "Pending"¶
Solution: Check controller logs for validation errors:
TTL cleanup not working¶
Solution: Ensure the cleanup controller is running:
Next Steps¶
- Adding Failure Modes — Implement a new injector
- Architecture Overview — Understand system design
- Go SDK Reference — Use SDK in your operators