Analyzing Repositories¶

Single repository analysis¶

Extract + render (recommended)¶

arch-analyzer analyze /path/to/repo --output-dir output/

This is the most common operation. It:

Runs all 17 extractors against the repository
Produces component-architecture.json with all extracted data
Renders all 7 diagram/report formats

Extract only¶

If you only need the JSON data (for custom processing or aggregation):

arch-analyzer extract /path/to/repo --output component-architecture.json

Render from existing JSON¶

If you already have extracted JSON and want different renderers or formats:

# All formats
arch-analyzer render component-architecture.json --output-dir diagrams/

# Specific formats only
arch-analyzer render component-architecture.json --formats rbac,component

Available format names: rbac, component, security, dependencies, c4, dataflow, report.

Full analysis¶

Combines architecture extraction, diagram rendering, code graph scanning, and schema extraction:

arch-analyzer full-analysis /path/to/repo --output-dir output/

Output includes everything from analyze plus:

Security findings from code property graph queries
CRD JSON schemas for contract validation

What each extractor produces¶

The analyzer walks the repository looking for specific file patterns. Each extractor operates independently:

YAML extractors (CRDs, RBAC, services, deployments, etc.) parse Kubernetes manifests
Go source extractors (controller watches, HTTP endpoints, cache config) use tree-sitter AST parsing
File extractors (Dockerfiles, Helm charts, go.mod) parse specialized formats

Extractors are designed to be resilient: if a file doesn't match the expected format, the extractor skips it and logs a warning instead of failing.

Understanding the output¶

component-architecture.json¶

The core data structure containing all extracted information:

{
  "component": "my-operator",
  "repo": "github.com/org/my-operator",
  "extracted_at": "2026-04-14T10:30:00Z",
  "analyzer_version": "0.2.0",
  "crds": [...],
  "rbac": { "cluster_roles": [...], "role_bindings": [...] },
  "services": [...],
  "deployments": [...],
  "network_policies": [...],
  "controller_watches": { ... },
  "dependencies": [...],
  "secrets": [...],
  "dockerfiles": [...],
  "helm": { ... },
  "webhooks": [...],
  "config_maps": [...],
  "http_endpoints": [...],
  "ingress_routing": [...],
  "cache_config": { ... }
}

Diagrams¶

Each .mmd file is a self-contained Mermaid diagram. View in:

GitHub (renders Mermaid natively in markdown)
Mermaid Live Editor
VS Code with Mermaid extension

report.md¶

The structured markdown report contains tables for every extracted category plus cache analysis findings with severity ratings.

Tips¶

Run against a clean checkout for most accurate results
The analyzer reads from local filesystem, so submodules need to be initialized first
For large repos with many YAML files, extraction typically takes under 10 seconds
The code property graph (for security scanning) adds a few seconds for tree-sitter parsing