Code Property Graph¶
The code property graph (CPG) is a unified representation of Go source code that supports cross-function analysis and security queries.
What is a CPG?¶
When you write Go code like func main() { fmt.Println("hello") }, the compiler first turns it into an Abstract Syntax Tree (AST): a tree structure where each node represents a syntactic element (function declaration, call expression, string literal, etc.). The AST captures the structure of the code but not the relationships between different parts of the program.
A Code Property Graph goes further: it takes the AST and adds cross-references, data flow edges, and semantic annotations. This means you can ask questions like "which functions call exec.Command with user-supplied input?" or "does any controller watch a CRD it doesn't own?" by traversing the graph rather than doing text search.
The analyzer builds its CPG using tree-sitter, a fast incremental parsing library, via the Go binding go-tree-sitter. Tree-sitter produces a concrete syntax tree for each Go source file, which the builder then transforms into the CPG by resolving cross-file references and adding semantic edges.
Structure¶
graph TD
FILE["File Node"] --> FUNC["Function Node"]
FUNC --> PARAM["Parameter Nodes"]
FUNC --> CALL["Call Nodes"]
FUNC --> STRUCT["Struct Literal Nodes"]
FUNC -->|"EdgeCalls"| FUNC2["Called Function"]
PARAM -->|"EdgeDataFlow"| CALL
STRUCT -->|"EdgeContains"| FUNC
classDef node fill:#9b59b6,stroke:#8e44ad,color:#fff
class FILE,FUNC,PARAM,CALL,STRUCT,FUNC2 node
Node kinds¶
| Kind | Description |
|---|---|
File |
Source file |
Function |
Function or method declaration |
Parameter |
Function parameter with type |
Call |
Function call expression |
StructLiteral |
Composite literal (struct instantiation) |
Edge kinds¶
| Kind | Description |
|---|---|
EdgeCalls |
Function A calls function B |
EdgeAliases |
Type alias relationship |
EdgeContains |
Containment (file contains function, function contains literal) |
Properties¶
Each node carries:
- Name: Identifier
- File: Source file path
- Line: Line number
- Properties: Key-value metadata (e.g., parameter type, return type)
- Annotations: Domain-specific metadata added by annotators
Thread safety¶
The CPG implementation (pkg/graph/cpg.go) is thread-safe:
sync.RWMutexprotects all node and edge operations- Multiple annotators can read concurrently
- Write operations (adding nodes/edges) are serialized
Building the CPG¶
The CPG is built in two stages: parse, then assemble.
flowchart LR
SRC["*.go files"] --> TS["Tree-sitter\nParser"]
TS --> AST["Syntax Trees\n(per file)"]
AST --> EXT["Extractor\n(functions, calls,\nparams, literals)"]
EXT --> PR["ParseResult"]
PR --> BUILD["Builder\n(cross-file\nresolution)"]
BUILD --> CPG["Code Property\nGraph"]
classDef parse fill:#3498db,stroke:#2980b9,color:#fff
classDef build fill:#2ecc71,stroke:#27ae60,color:#fff
class TS,AST,EXT,PR parse
class BUILD build
-
Parser (
pkg/parser/go_parser.go): Tree-sitter parses each.gofile into a syntax tree, then the parser walks the tree extracting:- Function declarations with parameters and return types
- Function call expressions with arguments (including detecting sensitive sinks like
exec.Command,sql.Query) - Composite literals (struct instantiation with field values)
- Switch/case statements (used for detecting unhandled CRD versions)
-
Builder (
pkg/builder/builder.go): Assembles per-file parse results into the unified CPG:- Creates nodes for each function, parameter, call, literal
- Creates edges (calls, contains, aliases)
- Resolves cross-file references (function A in file X calls function B in file Y)
Architecture enrichment¶
When --with-arch is provided, the CPG gains an ArchData sidecar:
type CPG struct {
nodes map[string]*Node
edges map[string][]*Edge
ArchData *arch.ArchitectureData // Optional
}
Architecture data enables queries that cross-reference code against extracted architecture:
- CGA-U01: Compare CRD version references in code against extracted CRD schemas
- Architecture-aware taint analysis: Follow data through known API boundaries
- Finding enrichment: Add
ArchRefto findings linking code to architecture components
Query execution¶
Queries traverse the CPG looking for patterns:
flowchart LR
QUERY["Query\n(e.g., CGA-004)"] --> WALK["Walk nodes\nmatching criteria"]
WALK --> CHECK["Check annotations\nand properties"]
CHECK --> EMIT["Emit finding\nwith file:line"]
classDef query fill:#e74c3c,stroke:#c0392b,color:#fff
class QUERY,WALK,CHECK query
The query engine (pkg/query/engine.go) provides:
RunDomain(domain): Execute all queries for a domain- Results grouped by domain with deduplication
- Per-domain SARIF output support
Taint analysis¶
Taint analysis (pkg/query/taint.go) traces data flow from sources to sinks:
- Sources: Functions that introduce untrusted data (HTTP request params, env vars, user input)
- Propagators: Functions that pass data through (assignments, returns, function calls)
- Sinks: Functions that perform sensitive operations (SQL queries, command execution, file writes)
A taint finding is emitted when data flows from a source to a sink without passing through a sanitizer.