Proposal: Role-based review personas #51

Closed
opened 2026-05-09 05:43:30 +00:00 by rodin · 0 comments
Owner

Summary

Introduce role-based review personas to complement (or replace) the current generic multi-model review approach. Each persona brings a specialized lens, reducing noise and catching domain-specific issues that generic reviewers miss.

Problem

Current twin reviews (Sonnet + GPT-5) both perform generic code review:

  • Redundancy: Often flag the same issues
  • Gaps: Miss specialized concerns (security, domain logic, UX)
  • Noise: NITs about code style mixed with critical findings
  • No ownership: Findings lack clear domain attribution

Proposed Personas

Persona Focus Example Findings
Product Owner Feature completeness, acceptance criteria, scope "PR claims to implement X but Y is missing from the AC"
Security Specialist Attack vectors, auth, data exposure, secrets "User input passed to SQL without parameterization"
Architect Patterns, consistency, bounded contexts, OTP "This GenServer should use via tuple, not named registration"
Domain Expert Business logic correctness (trading, finance) "Fill quantity can exceed order quantity — overfill handling unclear"
User/Docs API ergonomics, clarity, documentation "Function name suggests X but actually does Y"

Implementation Approach

1. Persona Prompts

Each persona gets a tailored system prompt with:

  • Role description and expertise area
  • Specific checklist of what to look for
  • Explicit scope boundaries (what NOT to comment on)
  • Severity calibration for their domain

Example for Security:

You are a security specialist reviewing code for vulnerabilities.
Focus: injection, auth bypass, secrets exposure, access control, data leakage.
Ignore: code style, performance (unless security-related), documentation.
Severity guide: RCE/auth bypass = CRITICAL, data exposure = MAJOR, ...

2. Execution Strategy

Option A: Parallel Panel

  • All personas run simultaneously on the same diff
  • Findings deduplicated and merged
  • Best for: large PRs, security-sensitive changes

Option B: Tiered Review

  • Stage 1: Architect + Product Owner (structural issues)
  • Stage 2: Security + Domain Expert (if Stage 1 passes)
  • Best for: cost optimization, smaller PRs

Option C: Selective Assignment

  • PR labels or file patterns trigger specific personas
  • security label → Security Specialist
  • lib/trading/ changes → Domain Expert
  • Best for: mature repos with clear boundaries

3. Synthesis

After all personas complete:

  1. Deduplicate identical findings
  2. Group by severity (CRITICAL → NIT)
  3. Tag each finding with persona source
  4. Generate unified report with sections per persona

4. PR Size Adaptation

PR Size Personas
XS/S Architect only
M Architect + Security
L Full panel
XL Should be split first

Gargoyle-Specific Personas

For gargoyle, the Domain Expert would specialize in:

  • Order lifecycle and state machine correctness
  • Fill handling, partial fills, overfills
  • Risk calculations and position tracking
  • Event sourcing invariants (idempotency, replay correctness)
  • Decimal precision for money/quantities

Cost Considerations

  • More passes = more cost — but specialized prompts are often shorter
  • Reduced back-and-forth — catching domain issues early saves rework
  • Model selection per persona:
    • Security: Opus/GPT-5 (high stakes, needs reasoning)
    • Architect: Sonnet (pattern matching, fast)
    • Product Owner: Haiku (checklist comparison, cheap)

Open Questions

  1. Should personas have access to different context? (Security sees secrets config, Domain Expert sees related tests)
  2. How to handle disagreements between personas?
  3. Should findings from certain personas block merge? (Security MAJOR = hard block?)
  4. How to evolve persona checklists based on historical findings?

Success Metrics

  • Reduction in post-merge bugs by category
  • Fewer false positives per review
  • Time from PR open to actionable feedback
  • Developer satisfaction with review quality

Next Steps

  1. Prototype with 2-3 personas on gargoyle PRs
  2. Compare findings vs current twin review
  3. Iterate on prompts based on signal/noise ratio
  4. Expand to full panel if validated

Addendum: Model Selection & Prompt Structure

Model Strengths by Persona

Persona Best Model Why
Security Opus, GPT-5 Needs deep reasoning for attack chains, subtle vulnerabilities. High stakes = worth the cost.
Architect Sonnet, GPT-4.1 Pattern matching, consistency checks. Fast iteration, good enough reasoning.
Domain Expert Opus, GPT-5 Business logic requires understanding complex invariants (order states, financial calculations).
Product Owner Haiku, GPT-4.1-mini Checklist comparison against AC. Structured task, doesn't need deep reasoning.
User/Docs Sonnet Clarity assessment, API ergonomics. Middle ground.

Prompt Structure Principles

1. Role Anchoring
Start with a clear identity that constrains scope:

You are a security specialist. Your ONLY job is finding security vulnerabilities.
You do NOT comment on: code style, performance, documentation, naming.

2. Expertise Context
Give the persona relevant background knowledge:

You have expertise in:
- OWASP Top 10
- Elixir/Phoenix security patterns
- Event sourcing attack vectors (replay attacks, event injection)

3. Severity Calibration
Each persona needs domain-specific severity definitions:

CRITICAL: Remote code execution, auth bypass, data exfiltration
MAJOR: Privilege escalation, information disclosure, DoS
MINOR: Missing rate limiting, verbose errors

4. Output Format
Structured output for easy synthesis:

Finding:
- Severity: [CRITICAL|MAJOR|MINOR|NIT]
- Location: file:line
- Issue: <one sentence>
- Evidence: <code snippet>
- Recommendation: <fix>

5. Anti-Patterns to Avoid

  • Don't make prompts too long (model loses focus)
  • Don't mix concerns (security persona shouldn't mention naming)
  • Don't be vague about scope boundaries
  • Don't forget to calibrate what "MAJOR" means for THIS persona

Prompt Evolution

Prompts should evolve based on:

  1. False positive rate — if a persona keeps flagging non-issues, tighten scope
  2. Missed findings — if real bugs slip through, add to checklist
  3. Overlap — if two personas keep finding the same thing, clarify boundaries
  4. Model behavior — different models respond differently to the same prompt

Track prompt versions and correlate with finding quality.

Context Shaping

Different personas may need different context windows:

Persona Context Needs
Security Full diff + auth/config files + dependency versions
Architect Full diff + related module interfaces + bounded context docs
Domain Expert Full diff + domain docs + related tests
Product Owner Diff summary + linked issue/AC + feature docs
User/Docs Public API changes + README + existing docs

Smaller context = faster + cheaper + more focused. Don't give Security the full codebase when they only need auth flows.

## Summary Introduce role-based review personas to complement (or replace) the current generic multi-model review approach. Each persona brings a specialized lens, reducing noise and catching domain-specific issues that generic reviewers miss. ## Problem Current twin reviews (Sonnet + GPT-5) both perform generic code review: - **Redundancy:** Often flag the same issues - **Gaps:** Miss specialized concerns (security, domain logic, UX) - **Noise:** NITs about code style mixed with critical findings - **No ownership:** Findings lack clear domain attribution ## Proposed Personas | Persona | Focus | Example Findings | |---------|-------|------------------| | **Product Owner** | Feature completeness, acceptance criteria, scope | "PR claims to implement X but Y is missing from the AC" | | **Security Specialist** | Attack vectors, auth, data exposure, secrets | "User input passed to SQL without parameterization" | | **Architect** | Patterns, consistency, bounded contexts, OTP | "This GenServer should use via tuple, not named registration" | | **Domain Expert** | Business logic correctness (trading, finance) | "Fill quantity can exceed order quantity — overfill handling unclear" | | **User/Docs** | API ergonomics, clarity, documentation | "Function name suggests X but actually does Y" | ## Implementation Approach ### 1. Persona Prompts Each persona gets a tailored system prompt with: - Role description and expertise area - Specific checklist of what to look for - Explicit scope boundaries (what NOT to comment on) - Severity calibration for their domain Example for Security: ``` You are a security specialist reviewing code for vulnerabilities. Focus: injection, auth bypass, secrets exposure, access control, data leakage. Ignore: code style, performance (unless security-related), documentation. Severity guide: RCE/auth bypass = CRITICAL, data exposure = MAJOR, ... ``` ### 2. Execution Strategy **Option A: Parallel Panel** - All personas run simultaneously on the same diff - Findings deduplicated and merged - Best for: large PRs, security-sensitive changes **Option B: Tiered Review** - Stage 1: Architect + Product Owner (structural issues) - Stage 2: Security + Domain Expert (if Stage 1 passes) - Best for: cost optimization, smaller PRs **Option C: Selective Assignment** - PR labels or file patterns trigger specific personas - `security` label → Security Specialist - `lib/trading/` changes → Domain Expert - Best for: mature repos with clear boundaries ### 3. Synthesis After all personas complete: 1. Deduplicate identical findings 2. Group by severity (CRITICAL → NIT) 3. Tag each finding with persona source 4. Generate unified report with sections per persona ### 4. PR Size Adaptation | PR Size | Personas | |---------|----------| | XS/S | Architect only | | M | Architect + Security | | L | Full panel | | XL | Should be split first | ## Gargoyle-Specific Personas For gargoyle, the Domain Expert would specialize in: - Order lifecycle and state machine correctness - Fill handling, partial fills, overfills - Risk calculations and position tracking - Event sourcing invariants (idempotency, replay correctness) - Decimal precision for money/quantities ## Cost Considerations - **More passes = more cost** — but specialized prompts are often shorter - **Reduced back-and-forth** — catching domain issues early saves rework - **Model selection per persona:** - Security: Opus/GPT-5 (high stakes, needs reasoning) - Architect: Sonnet (pattern matching, fast) - Product Owner: Haiku (checklist comparison, cheap) ## Open Questions 1. Should personas have access to different context? (Security sees secrets config, Domain Expert sees related tests) 2. How to handle disagreements between personas? 3. Should findings from certain personas block merge? (Security MAJOR = hard block?) 4. How to evolve persona checklists based on historical findings? ## Success Metrics - Reduction in post-merge bugs by category - Fewer false positives per review - Time from PR open to actionable feedback - Developer satisfaction with review quality ## Next Steps 1. Prototype with 2-3 personas on gargoyle PRs 2. Compare findings vs current twin review 3. Iterate on prompts based on signal/noise ratio 4. Expand to full panel if validated --- ## Addendum: Model Selection & Prompt Structure ### Model Strengths by Persona | Persona | Best Model | Why | |---------|------------|-----| | **Security** | Opus, GPT-5 | Needs deep reasoning for attack chains, subtle vulnerabilities. High stakes = worth the cost. | | **Architect** | Sonnet, GPT-4.1 | Pattern matching, consistency checks. Fast iteration, good enough reasoning. | | **Domain Expert** | Opus, GPT-5 | Business logic requires understanding complex invariants (order states, financial calculations). | | **Product Owner** | Haiku, GPT-4.1-mini | Checklist comparison against AC. Structured task, doesn't need deep reasoning. | | **User/Docs** | Sonnet | Clarity assessment, API ergonomics. Middle ground. | ### Prompt Structure Principles **1. Role Anchoring** Start with a clear identity that constrains scope: ``` You are a security specialist. Your ONLY job is finding security vulnerabilities. You do NOT comment on: code style, performance, documentation, naming. ``` **2. Expertise Context** Give the persona relevant background knowledge: ``` You have expertise in: - OWASP Top 10 - Elixir/Phoenix security patterns - Event sourcing attack vectors (replay attacks, event injection) ``` **3. Severity Calibration** Each persona needs domain-specific severity definitions: ``` CRITICAL: Remote code execution, auth bypass, data exfiltration MAJOR: Privilege escalation, information disclosure, DoS MINOR: Missing rate limiting, verbose errors ``` **4. Output Format** Structured output for easy synthesis: ``` Finding: - Severity: [CRITICAL|MAJOR|MINOR|NIT] - Location: file:line - Issue: <one sentence> - Evidence: <code snippet> - Recommendation: <fix> ``` **5. Anti-Patterns to Avoid** - Don't make prompts too long (model loses focus) - Don't mix concerns (security persona shouldn't mention naming) - Don't be vague about scope boundaries - Don't forget to calibrate what "MAJOR" means for THIS persona ### Prompt Evolution Prompts should evolve based on: 1. **False positive rate** — if a persona keeps flagging non-issues, tighten scope 2. **Missed findings** — if real bugs slip through, add to checklist 3. **Overlap** — if two personas keep finding the same thing, clarify boundaries 4. **Model behavior** — different models respond differently to the same prompt Track prompt versions and correlate with finding quality. ### Context Shaping Different personas may need different context windows: | Persona | Context Needs | |---------|---------------| | Security | Full diff + auth/config files + dependency versions | | Architect | Full diff + related module interfaces + bounded context docs | | Domain Expert | Full diff + domain docs + related tests | | Product Owner | Diff summary + linked issue/AC + feature docs | | User/Docs | Public API changes + README + existing docs | Smaller context = faster + cheaper + more focused. Don't give Security the full codebase when they only need auth flows.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: rodin/review-bot#51