Design Overview¶

Adversarial Review is a multi-agent system where independent specialist agents analyze code or strategy documents, debate their findings, and produce a validated report. The core design principle is that no single agent's judgment is trusted: findings must survive structured adversarial scrutiny.

Architecture¶

The system flows from user invocation through flag parsing, cache initialization, and context loading, then into the 5-phase review pipeline. Each phase has internal subcomponents shown in the expanded boxes below. The main pipeline is linear (phases run sequentially), but within each phase there are loops and conditional paths. Phase 5 (dashed) is optional.

graph TD
    USER["User invocation"] --> SKILL["SKILL.md\n(orchestration procedure)"]

    SKILL --> PARSE["Parse flags\n& resolve scope"]
    PARSE --> CACHE["Initialize cache\n(manage_cache.py)"]
    CACHE --> REFS["Discover references\n(discover_references.py)"]
    REFS --> CONTEXT["Fetch context\n(fetch-context.sh)"]

    CONTEXT --> P1["Phase 1: Self-Refinement"]
    P1 --> P2["Phase 2: Challenge Round"]
    P2 --> P3["Phase 3: Resolution"]
    P3 --> P4["Phase 4: Report"]
    P4 --> P5["Phase 5: Remediation"]

    subgraph P1_detail["Phase 1 internals"]
        direction TB
        SPAWN["Spawn isolated agents\n+ reference modules"] --> ITER["Iterate (2-3x)"]
        ITER --> VALIDATE["validate-output.sh"]
        VALIDATE --> CONVERGE["detect-convergence.sh"]
    end

    subgraph P2_detail["Phase 2 internals"]
        direction TB
        SANITIZE["Sanitize findings"] --> AFFINITY["Domain-aware routing\n(affinity matrix)"]
        AFFINITY --> ROUTE["Route challenges"]
        ROUTE --> DEFENSE["Collect defenses"]
        DEFENSE --> EVIDENCE["Evidence-based rebuttal\n(iteration 3)"]
    end

    subgraph P3_detail["Phase 3 internals"]
        direction TB
        DEDUP["deduplicate.py"] --> CLASSIFY["Classify agreement"]
        CLASSIFY --> RESOLVE["Resolve verdicts"]
    end

    subgraph P4_detail["Phase 4 internals"]
        direction TB
        REPORT["Assemble report"] --> META["Metadata block\n+ prompt versions"]
        META --> PERSIST["Finding persistence\n(fingerprint_findings.py)"]
        PERSIST --> NORM["Output normalization\n(normalize_findings.py)"]
    end

    subgraph P5_detail["Phase 5 internals"]
        direction TB
        FIX_CLASSIFY["Classify findings"] --> FIX_IMPL["Implement fixes"]
        FIX_IMPL --> FIX_VERIFY["Fix verification\n(re-invoke specialist)"]
        FIX_VERIFY --> |"incomplete"| FIX_IMPL
    end

    P1 -.-> P1_detail
    P2 -.-> P2_detail
    P3 -.-> P3_detail
    P4 -.-> P4_detail
    P5 -.-> P5_detail

    style P5 stroke-dasharray: 5 5
    style PERSIST stroke-dasharray: 3 3
    style NORM stroke-dasharray: 3 3

Key design decisions¶

Why multi-agent?¶

A single LLM pass produces findings that reflect one perspective. Multiple independent agents:

Cover different failure modes (security vs. performance vs. correctness)
Challenge each other's assumptions through structured debate
Produce findings with transparent agreement levels
Reduce false positives through adversarial scrutiny

Why isolation?¶

Agents run in separate contexts with no access to each other's raw output. This prevents:

Anchoring bias: Seeing another agent's findings before forming your own
Conformity pressure: Adjusting findings to match what others said
Output manipulation: Crafting output to influence another agent's behavior

Why programmatic validation?¶

LLM outputs are unpredictable. Bash scripts validate structure, detect injection, and enforce guardrails independently of agent compliance. This means:

Malformed findings are caught before they reach the report
Injection attempts in reviewed code don't propagate to agent behavior
Budget and scope constraints are enforced programmatically, not by asking agents nicely

Why convergence detection?¶

Self-refinement without a stopping condition wastes tokens. Convergence detection compares finding sets between iterations and stops when the delta is below threshold. This typically saves 30-40% of the budget compared to fixed iteration counts.

Why domain-aware routing?¶

During the challenge round, agents receive a domain affinity hint that maps finding categories to their primary and adjacent domains. This reduces unnecessary Tier 2 reads (full finding files) by guiding agents to focus on findings in their domain. Agents can still challenge any finding, but the routing hint saves 40-60% of cross-agent token consumption compared to agents reading every finding in full.

Why finding-aware reference selection?¶

Reference modules are filtered by specialist, but when truncation is needed under budget constraints, modules relevant to actual findings are prioritized. The --finding-categories flag lets the orchestrator pass Phase 1 finding categories to discover_references.py, which then truncates non-matching modules first. This keeps the most relevant reference material available even under tight budgets.

Why finding persistence?¶

Without cross-run tracking, each review is a fresh start. Finding persistence fingerprints each finding based on its content (file, line bucket, title, specialist) and stores history in .adversarial-review/findings-history.jsonl. On subsequent runs, findings are classified as new, recurring, resolved, or regressed. This lets teams track whether issues are actually getting fixed and detect regressions.

Why output normalization?¶

LLM outputs are non-deterministic. Running the same review twice produces findings with slightly different wording, ordering, and formatting. Normalization canonicalizes the output (consistent ordering, standardized formatting) so meaningful differences stand out from noise. Stability metrics quantify how much variance exists between runs.

Why prompt versioning?¶

Agent prompts evolve over time. Without version tracking, there's no way to know which prompt version produced which findings. Content-based hashing in prompt frontmatter enables reproducibility analysis: if findings changed between runs, was it the code or the prompt that changed?

Component map¶

Component	Location	Purpose
SKILL.md	`skills/adversarial-reviewing/SKILL.md`	Main orchestration procedure
Phases	`phases/`	Per-phase execution procedures
Protocols	`protocols/`	Operational rules and constraints
Agents	`profiles/<profile>/agents/`	Specialist prompt definitions
Templates	`profiles/<profile>/templates/`	Output format definitions
References	`profiles/<profile>/references/`	Knowledge base modules
Scripts	`scripts/`	Validation and utility scripts
Tests	`tests/`	Test suite with fixtures

Execution flow¶

Parse invocation: Resolve target files, flags, profile, specialists
Initialize cache: Create temp directory, populate with code and context
Phase 1: Spawn isolated agents, self-refine with convergence detection
Phase 2: Mediated cross-agent challenge round
Phase 3: Deduplicate, classify agreement, resolve verdicts
Phase 4: Generate structured report
Phase 5 (optional): Classify, draft Jira, implement fixes
Cleanup: Remove cache, output budget summary