Skip to content

๐Ÿ‘๏ธ Overview

AI tools can now generate code faster than most teams can review it properly. That's exciting. It's also a trust problem that nobody has cleanly solved yet.


๐Ÿ’ก Why This Exists

In 2026, a single developer with Cursor, Copilot, or a custom coding agent can ship PRs faster than a team of five used to. Review capacity hasn't scaled with output capacity. Senior engineers become the bottleneck. Rubber-stamping happens. Unreviewed AI-generated code ships quietly โ€” with bugs, gaps, and sometimes vulnerabilities.

Code Review Council is the automated trust layer built for this reality. It doesn't replace human judgment โ€” it ensures every PR gets a structured, evidence-based review before it reaches a human, so the human's time goes on decisions, not discovery.


๐Ÿค– The Multi-LLM Approach

Council doesn't require a single LLM for everything. The local config is provider-configurable, while the generated GitHub workflows currently pin every CI reviewer and the Chair to Gemini for predictable secret handling:

Role Domain Local scaffold default Generated CI default
๐Ÿ›ก๏ธ SecOps Security vulnerabilities, secret detection, injection chains openai/gpt-5.2 gemini/gemini-3-pro-preview
๐Ÿงช QA Test coverage, edge cases, error handling openai/gpt-5.2 gemini/gemini-3-pro-preview
๐Ÿ—๏ธ Architect Design patterns, coupling, scalability openai/gpt-4o gemini/gemini-3-pro-preview
๐Ÿ“ Docs Documentation completeness, clarity openai/gpt-4o-mini gemini/gemini-3-pro-preview
๐Ÿช‘ Chair Synthesis, evidence adjudication, final verdict openai/gpt-4o gemini/gemini-3-pro-preview

This matters for two reasons. First, local teams can distribute blind-spot risk by assigning different providers or model families to different roles. Second, they can control cost by using heavier models only where the stakes justify it. Generated CI currently chooses single-provider reliability instead.


๐Ÿ”ฌ The 5-Stage Pipeline

Every PR passes through five stages before a verdict is issued:

Stage Name What Happens
0 Gate Zero Deterministic checks โ€” secrets, lint, types, missing docs. Zero LLM cost. Under 2 seconds.
1 Diff Preprocessor Filters lockfiles and generated code. Enforces token budgets.
2 ReviewPack Assembly Builds structured context: changed symbols, test map, policy violations.
3 Reviewer Panel 4 specialist agents run in parallel against the same ReviewPack.
4 Council Chair Synthesises all findings. Requires exploit chain for blockers. Renders verdict.

The staged design means cheap deterministic checks run first โ€” only PRs that clear Gate Zero proceed to LLM analysis. This keeps cost and latency predictable.


๐ŸŽฏ Two Outputs, One Analysis

The same review engine produces two output formats depending on who needs to act:

Output Audience What It Contains
๐Ÿง‘โ€๐Ÿ’ป Developer Engineers File/line findings, evidence chains, fix suggestions, policy references
๐Ÿง‘โ€๐Ÿ’ผ Owner Product / Leadership Plain-English risk summary, ship/no-ship recommendation, copy-paste fix prompt

Neither audience gets a weaker review. The analysis is identical โ€” only the presentation changes.


๐Ÿ” The Autonomous Loop Vision

Council was built with a specific end-state in mind: fully autonomous development with automated quality enforcement at every gate.

AI agent writes code  (e.g. OpenClaw)
        โ†“
   PR opened automatically
        โ†“
Council reviews โ€” 4 reviewers + Chair
        โ†“
  PASS? โ†’ merge โœ…
  FAIL? โ†’ findings fed back to coding agent
        โ†“
  Agent patches and resubmits
        โ†“
   Council re-reviews
        โ†“
 (loop until PASS, then merge)

V1 delivered the review gate. V2 expanded ReviewPack parity for Python, TypeScript, and JavaScript. V3 hardened provider portability, council doctor, and GitHub PR reporting. V4 is split deliberately: V4A improves onboarding, fix guidance, local/CI parity, and full-repo context planning; V4B adds the intelligence layer such as opt-in autofix, repeated-debt detection, and metrics.


โœ… What Council Does

  • โœ… Runs deterministic checks before any LLM analysis (zero cost fast-fail)
  • โœ… Builds structured reviewer context to reduce guesswork and hallucination
  • โœ… Uses specialist reviewers in parallel, each on a model matched to their domain
  • โœ… Requires a full exploit chain before accepting any security blocker
  • โœ… Produces outputs for both technical and non-technical audiences
  • โœ… Operates as a CI hard gate or a local advisory tool
  • โœ… Surfaces degraded mode explicitly when a reviewer fails โ€” never silently passes

โŒ What Council Does Not Do

  • โŒ Does not guarantee bug-free or vulnerability-free software
  • โŒ Does not replace human engineering judgment on complex architectural decisions
  • โŒ Does not make universal promises about speed or cost โ€” these depend on model selection, diff size, and concurrency
  • โŒ Does not audit your entire application โ€” it reviews the diff, not the full codebase

Scope reminder

Council is a PR/diff/code-change review tool. It is not a full holistic application security audit platform. Use it as one layer in a layered quality strategy.