Proposal: Role-based review personas #51

New Issue

2026-05-09T05:43:30Z

rodin commented

2026-05-09 05:43:30 +00:00

Summary

Introduce role-based review personas to complement (or replace) the current generic multi-model review approach. Each persona brings a specialized lens, reducing noise and catching domain-specific issues that generic reviewers miss.

Problem

Current twin reviews (Sonnet + GPT-5) both perform generic code review:

Redundancy: Often flag the same issues
Gaps: Miss specialized concerns (security, domain logic, UX)
Noise: NITs about code style mixed with critical findings
No ownership: Findings lack clear domain attribution

Proposed Personas

Persona	Focus	Example Findings
Product Owner	Feature completeness, acceptance criteria, scope	"PR claims to implement X but Y is missing from the AC"
Security Specialist	Attack vectors, auth, data exposure, secrets	"User input passed to SQL without parameterization"
Architect	Patterns, consistency, bounded contexts, OTP	"This GenServer should use via tuple, not named registration"
Domain Expert	Business logic correctness (trading, finance)	"Fill quantity can exceed order quantity — overfill handling unclear"
User/Docs	API ergonomics, clarity, documentation	"Function name suggests X but actually does Y"

Implementation Approach

1. Persona Prompts

Each persona gets a tailored system prompt with:

Role description and expertise area
Specific checklist of what to look for
Explicit scope boundaries (what NOT to comment on)
Severity calibration for their domain

Example for Security:

You are a security specialist reviewing code for vulnerabilities.
Focus: injection, auth bypass, secrets exposure, access control, data leakage.
Ignore: code style, performance (unless security-related), documentation.
Severity guide: RCE/auth bypass = CRITICAL, data exposure = MAJOR, ...

2. Execution Strategy

Option A: Parallel Panel

All personas run simultaneously on the same diff
Findings deduplicated and merged
Best for: large PRs, security-sensitive changes

Option B: Tiered Review

Stage 1: Architect + Product Owner (structural issues)
Stage 2: Security + Domain Expert (if Stage 1 passes)
Best for: cost optimization, smaller PRs

Option C: Selective Assignment

PR labels or file patterns trigger specific personas
security label → Security Specialist
lib/trading/ changes → Domain Expert
Best for: mature repos with clear boundaries

3. Synthesis

After all personas complete:

Deduplicate identical findings
Group by severity (CRITICAL → NIT)
Tag each finding with persona source
Generate unified report with sections per persona

4. PR Size Adaptation

PR Size	Personas
XS/S	Architect only
M	Architect + Security
L	Full panel
XL	Should be split first

Gargoyle-Specific Personas

For gargoyle, the Domain Expert would specialize in:

Order lifecycle and state machine correctness
Fill handling, partial fills, overfills
Risk calculations and position tracking
Event sourcing invariants (idempotency, replay correctness)
Decimal precision for money/quantities

Cost Considerations

More passes = more cost — but specialized prompts are often shorter
Reduced back-and-forth — catching domain issues early saves rework
Model selection per persona:
- Security: Opus/GPT-5 (high stakes, needs reasoning)
- Architect: Sonnet (pattern matching, fast)
- Product Owner: Haiku (checklist comparison, cheap)

Open Questions

Should personas have access to different context? (Security sees secrets config, Domain Expert sees related tests)
How to handle disagreements between personas?
Should findings from certain personas block merge? (Security MAJOR = hard block?)
How to evolve persona checklists based on historical findings?

Success Metrics

Reduction in post-merge bugs by category
Fewer false positives per review
Time from PR open to actionable feedback
Developer satisfaction with review quality

Next Steps

Prototype with 2-3 personas on gargoyle PRs
Compare findings vs current twin review
Iterate on prompts based on signal/noise ratio
Expand to full panel if validated

Addendum: Model Selection & Prompt Structure

Model Strengths by Persona

Persona	Best Model	Why
Security	Opus, GPT-5	Needs deep reasoning for attack chains, subtle vulnerabilities. High stakes = worth the cost.
Architect	Sonnet, GPT-4.1	Pattern matching, consistency checks. Fast iteration, good enough reasoning.
Domain Expert	Opus, GPT-5	Business logic requires understanding complex invariants (order states, financial calculations).
Product Owner	Haiku, GPT-4.1-mini	Checklist comparison against AC. Structured task, doesn't need deep reasoning.
User/Docs	Sonnet	Clarity assessment, API ergonomics. Middle ground.

Prompt Structure Principles

1. Role Anchoring
Start with a clear identity that constrains scope:

You are a security specialist. Your ONLY job is finding security vulnerabilities.
You do NOT comment on: code style, performance, documentation, naming.

2. Expertise Context
Give the persona relevant background knowledge:

You have expertise in:
- OWASP Top 10
- Elixir/Phoenix security patterns
- Event sourcing attack vectors (replay attacks, event injection)

3. Severity Calibration
Each persona needs domain-specific severity definitions:

CRITICAL: Remote code execution, auth bypass, data exfiltration
MAJOR: Privilege escalation, information disclosure, DoS
MINOR: Missing rate limiting, verbose errors

4. Output Format
Structured output for easy synthesis:

Finding:
- Severity: [CRITICAL|MAJOR|MINOR|NIT]
- Location: file:line
- Issue: <one sentence>
- Evidence: <code snippet>
- Recommendation: <fix>

5. Anti-Patterns to Avoid

Don't make prompts too long (model loses focus)
Don't mix concerns (security persona shouldn't mention naming)
Don't be vague about scope boundaries
Don't forget to calibrate what "MAJOR" means for THIS persona

Prompt Evolution

Prompts should evolve based on:

False positive rate — if a persona keeps flagging non-issues, tighten scope
Missed findings — if real bugs slip through, add to checklist
Overlap — if two personas keep finding the same thing, clarify boundaries
Model behavior — different models respond differently to the same prompt

Track prompt versions and correlate with finding quality.

Context Shaping

Different personas may need different context windows:

Persona	Context Needs
Security	Full diff + auth/config files + dependency versions
Architect	Full diff + related module interfaces + bounded context docs
Domain Expert	Full diff + domain docs + related tests
Product Owner	Diff summary + linked issue/AC + feature docs
User/Docs	Public API changes + README + existing docs

Smaller context = faster + cheaper + more focused. Don't give Security the full codebase when they only need auth flows.

## Summary Introduce role-based review personas to complement (or replace) the current generic multi-model review approach. Each persona brings a specialized lens, reducing noise and catching domain-specific issues that generic reviewers miss. ## Problem Current twin reviews (Sonnet + GPT-5) both perform generic code review: - **Redundancy:** Often flag the same issues - **Gaps:** Miss specialized concerns (security, domain logic, UX) - **Noise:** NITs about code style mixed with critical findings - **No ownership:** Findings lack clear domain attribution ## Proposed Personas | Persona | Focus | Example Findings | |---------|-------|------------------| | **Product Owner** | Feature completeness, acceptance criteria, scope | "PR claims to implement X but Y is missing from the AC" | | **Security Specialist** | Attack vectors, auth, data exposure, secrets | "User input passed to SQL without parameterization" | | **Architect** | Patterns, consistency, bounded contexts, OTP | "This GenServer should use via tuple, not named registration" | | **Domain Expert** | Business logic correctness (trading, finance) | "Fill quantity can exceed order quantity — overfill handling unclear" | | **User/Docs** | API ergonomics, clarity, documentation | "Function name suggests X but actually does Y" | ## Implementation Approach ### 1. Persona Prompts Each persona gets a tailored system prompt with: - Role description and expertise area - Specific checklist of what to look for - Explicit scope boundaries (what NOT to comment on) - Severity calibration for their domain Example for Security: ``` You are a security specialist reviewing code for vulnerabilities. Focus: injection, auth bypass, secrets exposure, access control, data leakage. Ignore: code style, performance (unless security-related), documentation. Severity guide: RCE/auth bypass = CRITICAL, data exposure = MAJOR, ... ``` ### 2. Execution Strategy **Option A: Parallel Panel** - All personas run simultaneously on the same diff - Findings deduplicated and merged - Best for: large PRs, security-sensitive changes **Option B: Tiered Review** - Stage 1: Architect + Product Owner (structural issues) - Stage 2: Security + Domain Expert (if Stage 1 passes) - Best for: cost optimization, smaller PRs **Option C: Selective Assignment** - PR labels or file patterns trigger specific personas - `security` label → Security Specialist - `lib/trading/` changes → Domain Expert - Best for: mature repos with clear boundaries ### 3. Synthesis After all personas complete: 1. Deduplicate identical findings 2. Group by severity (CRITICAL → NIT) 3. Tag each finding with persona source 4. Generate unified report with sections per persona ### 4. PR Size Adaptation | PR Size | Personas | |---------|----------| | XS/S | Architect only | | M | Architect + Security | | L | Full panel | | XL | Should be split first | ## Gargoyle-Specific Personas For gargoyle, the Domain Expert would specialize in: - Order lifecycle and state machine correctness - Fill handling, partial fills, overfills - Risk calculations and position tracking - Event sourcing invariants (idempotency, replay correctness) - Decimal precision for money/quantities ## Cost Considerations - **More passes = more cost** — but specialized prompts are often shorter - **Reduced back-and-forth** — catching domain issues early saves rework - **Model selection per persona:** - Security: Opus/GPT-5 (high stakes, needs reasoning) - Architect: Sonnet (pattern matching, fast) - Product Owner: Haiku (checklist comparison, cheap) ## Open Questions 1. Should personas have access to different context? (Security sees secrets config, Domain Expert sees related tests) 2. How to handle disagreements between personas? 3. Should findings from certain personas block merge? (Security MAJOR = hard block?) 4. How to evolve persona checklists based on historical findings? ## Success Metrics - Reduction in post-merge bugs by category - Fewer false positives per review - Time from PR open to actionable feedback - Developer satisfaction with review quality ## Next Steps 1. Prototype with 2-3 personas on gargoyle PRs 2. Compare findings vs current twin review 3. Iterate on prompts based on signal/noise ratio 4. Expand to full panel if validated --- ## Addendum: Model Selection & Prompt Structure ### Model Strengths by Persona | Persona | Best Model | Why | |---------|------------|-----| | **Security** | Opus, GPT-5 | Needs deep reasoning for attack chains, subtle vulnerabilities. High stakes = worth the cost. | | **Architect** | Sonnet, GPT-4.1 | Pattern matching, consistency checks. Fast iteration, good enough reasoning. | | **Domain Expert** | Opus, GPT-5 | Business logic requires understanding complex invariants (order states, financial calculations). | | **Product Owner** | Haiku, GPT-4.1-mini | Checklist comparison against AC. Structured task, doesn't need deep reasoning. | | **User/Docs** | Sonnet | Clarity assessment, API ergonomics. Middle ground. | ### Prompt Structure Principles **1. Role Anchoring** Start with a clear identity that constrains scope: ``` You are a security specialist. Your ONLY job is finding security vulnerabilities. You do NOT comment on: code style, performance, documentation, naming. ``` **2. Expertise Context** Give the persona relevant background knowledge: ``` You have expertise in: - OWASP Top 10 - Elixir/Phoenix security patterns - Event sourcing attack vectors (replay attacks, event injection) ``` **3. Severity Calibration** Each persona needs domain-specific severity definitions: ``` CRITICAL: Remote code execution, auth bypass, data exfiltration MAJOR: Privilege escalation, information disclosure, DoS MINOR: Missing rate limiting, verbose errors ``` **4. Output Format** Structured output for easy synthesis: ``` Finding: - Severity: [CRITICAL|MAJOR|MINOR|NIT] - Location: file:line - Issue: <one sentence> - Evidence: <code snippet> - Recommendation: <fix> ``` **5. Anti-Patterns to Avoid** - Don't make prompts too long (model loses focus) - Don't mix concerns (security persona shouldn't mention naming) - Don't be vague about scope boundaries - Don't forget to calibrate what "MAJOR" means for THIS persona ### Prompt Evolution Prompts should evolve based on: 1. **False positive rate** — if a persona keeps flagging non-issues, tighten scope 2. **Missed findings** — if real bugs slip through, add to checklist 3. **Overlap** — if two personas keep finding the same thing, clarify boundaries 4. **Model behavior** — different models respond differently to the same prompt Track prompt versions and correlate with finding quality. ### Context Shaping Different personas may need different context windows: | Persona | Context Needs | |---------|---------------| | Security | Full diff + auth/config files + dependency versions | | Architect | Full diff + related module interfaces + bounded context docs | | Domain Expert | Full diff + domain docs + related tests | | Product Owner | Diff summary + linked issue/AC + feature docs | | User/Docs | Public API changes + README + existing docs | Smaller context = faster + cheaper + more focused. Don't give Security the full codebase when they only need auth flows.

rodin referenced this issue from a commit

2026-05-10 09:49:27 +00:00

feat(persona): add role-based review personas (#51)

rodin referenced a pull request that will close this issue

2026-05-10 09:49:41 +00:00

feat(persona): add role-based review personas #55

rodin referenced this issue from a commit

2026-05-10 15:42:28 +00:00

feat(persona): add role-based review personas (#51)

rodin referenced this issue from a commit

2026-05-10 15:43:28 +00:00

feat(persona): add role-based review personas