Analyzing Repositories¶
Single repository analysis¶
Extract + render (recommended)¶
This is the most common operation. It:
- Runs all 17 extractors against the repository
- Produces
component-architecture.jsonwith all extracted data - Renders all 7 diagram/report formats
Extract only¶
If you only need the JSON data (for custom processing or aggregation):
Render from existing JSON¶
If you already have extracted JSON and want different renderers or formats:
# All formats
arch-analyzer render component-architecture.json --output-dir diagrams/
# Specific formats only
arch-analyzer render component-architecture.json --formats rbac,component
Available format names: rbac, component, security, dependencies, c4, dataflow, report.
Full analysis¶
Combines architecture extraction, diagram rendering, code graph scanning, and schema extraction:
Output includes everything from analyze plus:
- Security findings from code property graph queries
- CRD JSON schemas for contract validation
What each extractor produces¶
The analyzer walks the repository looking for specific file patterns. Each extractor operates independently:
- YAML extractors (CRDs, RBAC, services, deployments, etc.) parse Kubernetes manifests
- Go source extractors (controller watches, HTTP endpoints, cache config) use tree-sitter AST parsing
- File extractors (Dockerfiles, Helm charts, go.mod) parse specialized formats
Extractors are designed to be resilient: if a file doesn't match the expected format, the extractor skips it and logs a warning instead of failing.
Understanding the output¶
component-architecture.json¶
The core data structure containing all extracted information:
{
"component": "my-operator",
"repo": "github.com/org/my-operator",
"extracted_at": "2026-04-14T10:30:00Z",
"analyzer_version": "0.2.0",
"crds": [...],
"rbac": { "cluster_roles": [...], "role_bindings": [...] },
"services": [...],
"deployments": [...],
"network_policies": [...],
"controller_watches": { ... },
"dependencies": [...],
"secrets": [...],
"dockerfiles": [...],
"helm": { ... },
"webhooks": [...],
"config_maps": [...],
"http_endpoints": [...],
"ingress_routing": [...],
"cache_config": { ... }
}
Diagrams¶
Each .mmd file is a self-contained Mermaid diagram. View in:
- GitHub (renders Mermaid natively in markdown)
- Mermaid Live Editor
- VS Code with Mermaid extension
report.md¶
The structured markdown report contains tables for every extracted category plus cache analysis findings with severity ratings.
Tips¶
- Run against a clean checkout for most accurate results
- The analyzer reads from local filesystem, so submodules need to be initialized first
- For large repos with many YAML files, extraction typically takes under 10 seconds
- The code property graph (for security scanning) adds a few seconds for tree-sitter parsing