Design Overview¶

The Architecture Analyzer is a deterministic static analysis tool. No LLM involvement, no non-determinism, no API calls. It reads a repository and produces structured architecture data.

Core pipeline¶

flowchart TB
    REPO["Git Repository"] --> EXTRACT["Extractors (17)"]
    EXTRACT --> JSON["component-architecture.json"]
    JSON --> RENDER["Renderers (7)"]
    JSON --> AGG["Aggregator"]

    REPO --> PARSE["Tree-sitter Parser"]
    PARSE --> CPG["Code Property Graph"]
    CPG --> ANNOTATE["Domain Annotators"]
    ANNOTATE --> QUERY["Security Queries"]
    QUERY --> FINDINGS["Findings (JSON/SARIF)"]

    JSON --> SCHEMA["Schema Extractor"]
    SCHEMA --> VALIDATE["Contract Validator"]

    classDef extract fill:#3498db,stroke:#2980b9,color:#fff
    classDef render fill:#2ecc71,stroke:#27ae60,color:#fff
    classDef cpg fill:#9b59b6,stroke:#8e44ad,color:#fff
    classDef data fill:#e74c3c,stroke:#c0392b,color:#fff

    class EXTRACT extract
    class RENDER render
    class CPG,ANNOTATE,QUERY cpg
    class JSON,FINDINGS data

Design decisions¶

Why static analysis?¶

Deterministic: Same input always produces same output. No model variability.
Fast: Full analysis of a typical K8s operator repo takes under 10 seconds.
Free: No API calls, no token consumption. Run as often as you want.
Source-traceable: Every fact in the output can be traced back to a specific file and line.

Why extractors + renderers separation?¶

Extraction and rendering are decoupled through the JSON intermediate format:

Extract once, render many: Extract JSON, then produce different visualizations without re-scanning
Aggregate: Merge multiple JSON files for cross-component analysis
Custom processing: Use the JSON with any tool (jq, Python, etc.)
CI artifacts: Store JSON as build artifacts, render on demand

Why tree-sitter for Go parsing?¶

No Go toolchain dependency: Tree-sitter parses syntax without needing go build to succeed
Fast incremental parsing: Can parse individual files without resolving the full module graph
Cross-language potential: Same approach extends to Python, TypeScript, etc.
Partial-file resilience: Parses what it can even if the file has errors

Why a code property graph?¶

The CPG provides:

Cross-function analysis: Trace data flow across function boundaries
Annotation layers: Multiple domains (security, testing, upgrade) annotate the same graph
Composable queries: Each query traverses the same graph independently
Architecture integration: CPG nodes link to architecture data for cross-cutting analysis

Package structure¶

pkg/
  extractor/      # 17 architecture extractors
  renderer/       # 7 diagram/report renderers
  aggregator/     # Platform-wide aggregation
  validator/      # CRD contract validation
  parser/         # Tree-sitter Go parser
  builder/        # CPG builder
  graph/          # CPG data structure (thread-safe)
  annotator/      # Annotation engine
  query/          # Query engine + taint analysis
  domains/        # Pluggable domain framework
    security/     # Security domain
    testing/      # Testing domain
    upgrade/      # Upgrade domain
  arch/           # Architecture data types and parsing
  linker/         # Storage linker
  config/         # Configuration types

Data flow¶

Input: Path to a git repository (local checkout)
YAML extraction: Walk filesystem for Kubernetes manifests, parse into typed structs
Go extraction: Tree-sitter parse controller files, extract watches/endpoints/cache config
File extraction: Parse Dockerfiles, Helm charts, go.mod
Assembly: All extracted data merged into ComponentArchitecture struct
Serialization: JSON output
Rendering: Each renderer reads JSON, produces its format
CPG (optional): Tree-sitter parses all Go files, builds graph, runs domain queries
Aggregation (optional): Multiple component JSONs merged into platform view
Validation (optional): CRD schemas compared against baseline contracts