Skip to content

🥋 Self-Review

Code Review Council reviewed its own pull request. Verdict: PASS — confidence 0.90 — in 48 seconds. This page documents what that means, what was fixed to get here, and what's still honest work-in-progress.


📊 The Real Verdict

This is the actual output from Council reviewing its own docs + config PR:

Overall verdict: PASS  (confidence: 0.90)

Reviewer panel:
  secops     PASS   0 findings
  qa         PASS   2 findings
  architect  PASS   0 findings
  docs       PASS   0 findings

Accepted warnings:
  [LOW] site/docs/stylesheets/extra.css:1
        No tests present for CSS assets.
        Accepted: expected for documentation-only changes.
        Suggestion: rely on existing MkDocs CI build + consider HTML validation step.

  [LOW] .github/workflows/pages.yml:1
        Bash validation step can fail deploy if infographic assets are missing.
        Accepted: intentional release gate.
        Suggestion: scope to main branch only, document requirement prominently.

Runtime: 48s

The two warnings were accepted — not dismissed — because they were real observations with valid rationale. This is the evidence-based Chair model working as intended: findings are adjudicated individually, not counted.


🔄 Three Rounds of Review

This project went through two self-review rounds and two GPT-5.2 peer review rounds before reaching V1. 26 issues were identified and fixed across those rounds.

Round 1 — Self-Review Fixes

# Issue Found Fix Applied
1 Secret detection line numbers off Proper target line tracking through hunk content
2 .councilignore loaded from cwd, not repo root repo_root passed through from orchestrator
3 No LLM retry logic num_retries=2 on all LLM calls
4 Signature extraction only handled ast.Name ast.unparse() for all annotation types
5 fnmatch imported inside loop Moved to module-level import

Round 1 — Peer Review Fixes (GPT-5.2)

# Issue Found Fix Applied
6 Gate Zero results missing from reviewer/Chair prompts Full context serialized to both
7 changed_symbols included ALL file symbols, not just changed lines Filtered to changed line ranges; methods classified
8 Diff text lost file boundaries === FILE: path (change_type) === headers added
9 No warnings field in schema First-class in schema, parsing, and all reporters
10 Chair pseudo-JSON schema Replaced with valid JSON example object
11 Malformed findings silently dropped Tracked count, sets error field
12 _file_priority() ignored config Reads config.priorities correctly
13 Path traversal in get_file_content() is_relative_to() containment check added
14 --ci without --branch empty diff risk Warning emitted
15 Stray brace-expansion directory in repo Removed
16 Linter integration config-only, not implemented Implemented with shlex.split, {files} placeholder, timeout/error handling
17 TS/JS analyzer placeholders existed without implementation Shipped dependency-free TS/JS Gate Zero analyzers and kept them disabled by default for an explicit rollout
18 council init missing workflow scaffold Now creates .github/workflows/council-review.yml
19 "Chunking" naming implied splitting — actually truncation Documented honestly as truncation

Round 2 — Peer Review Fixes (GPT-5.2)

# Issue Found Fix Applied
20 Workflow template assumed PyPI publication Changed to pip install .
21 Degraded mode only caught asyncio exceptions Unified: exceptions + invalid JSON + malformed findings all trigger degraded
22 degraded_reasons not visible to users Added degraded_reasons: list[str] to ChairVerdict, propagated through all return paths, surfaced in reporters
23 Deleted symbols invisible to reviewers _extract_deleted_symbols() scans removed hunk lines for function/class defs
24 Linter command used .split() — unsafe for quoted paths shlex.split() + {files} placeholder support
25 repo_policies always empty — config never flowed through Populated from config: require_docs, require_types, check_secrets, max_file_lines, enabled_analyzers
26 github_pr config defaulted true with no implementation Set to false with explicit comment

Post Round 2 — Hardening

Change Summary
Chair injection policy Removed hard override; added full exploitability evidence gate before any blocker is accepted
Prompt guardrails Additional SecOps and QA guardrails to reduce speculative findings
BYOK workflow Fork-safe workflow emitting council-report.json and council-review.md artifacts
Config schema + defaults load_config() accepts nested [[council.reviewer]] / [[council.reviewers]]; model mix updated to GPT-5.2/GPT-4o/GPT-4o-mini
Phase 2 ReviewPack parity ReviewPack and Gate Zero cover Python plus parser-free TypeScript/JavaScript exports and test-path heuristics
Phase 3 portability Shared LiteLLM transport falls back from native JSON mode, council doctor preflights setup, and reports surface transport notes
Phase 3 PR + Windows hardening GitHub PR summaries/inline comments are best-effort, Git diff decoding is lossless with surrogateescape, terminal output is Windows-safe, and generated CI is pinned to Gemini with configurable reviewer timeouts

✅ What's Solid

  • Pipeline stage contractsDiffContext → GateZeroResult → ReviewPack → ReviewerOutput[] → ChairVerdict — each stage emits an explicit artifact consumed by the next
  • Pydantic v2 schema enforcement at every stage boundary — no silent failures
  • Unified degraded mode — if a reviewer times out or returns malformed output, Council continues at reduced confidence and surfaces specific reasons
  • Gate Zero is zero-cost — deterministic checks with real linter integration, no LLM spend
  • Token budget management — configurable priorities, truncation labeled honestly
  • Evidence-based Chair — accept / dismiss / warn three-bucket model; no count-based rules
  • Changed AND deleted symbol detection — reviewers see what was removed, not just what was added
  • Policy context flows end-to-end — config settings reach reviewers and Chair
  • Path traversal protection and CI safety warnings built in
  • 286 collected tests across all modules covering legacy flow plus Phase 3 transport fallback, doctor checks, GitHub reporting behavior, Windows terminal safety, and lossless diff ingestion

⚠️ Remaining Known Limitations

These are documented honestly, not hidden

V1 Alpha ships with known limitations. They are tracked, not papered over.

Accuracy

  • Test coverage map searches diff files only — labeled IN DIFF in prompts. Full repo search is deferred.
  • Test coverage matching is substring-based — false positives possible on short filenames.
  • Deleted symbol detection is a regex heuristic — catches def/class patterns, not complex multiline signatures.
  • parse_diff always runs get_current_branch subprocess — unnecessary overhead in test/CI.

Not Yet Implemented

  • Full-repo test coverage discovery — coverage mapping still works from the diff, not a repository-wide index.
  • GitHub reporting remains best-effort — sticky summaries, workflow annotations, and inline PR comments should not fail the review if the GitHub API is flaky.
  • Editable prompts without code changes — prompts are still in code rather than repo-editable assets. Future scope.
  • Phase 4 intelligence layer — opt-in autofix, repeated-debt detection, confidence calibration, and metrics are planned after the Phase 4A onboarding/parity pass.

Design Note

One deliberate design disagreement

A peer reviewer recommended compact JSON transport for reviewer payloads. We use markdown with all fields present. Rationale: LLMs consume readable structured text more effectively than nested JSON blobs. The information is complete; the format is optimised for the consumer. This is a design choice worth revisiting with real-world false-positive data.


💡 The Meta Point

Council reviewed its own PR and produced a real, structured verdict with evidence. It didn't hallucinate security blockers. It accepted two low-severity warnings with documented rationale. It ran in 48 seconds across 4 specialist reviewers.

That's not a marketing claim. That's the output. You can see it on the PR.

"Not 'trust the AI.' Verify the AI. With another AI. With evidence."