Files
review-bot/docs/DESIGN-51-personas.md
T
Rodin 4dd67742f9
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 15s
CI / review (/anthropic/v1, anthropic--claude-4.6-sonnet, sonnet, anthropic, SONNET_REVIEW_TOKEN) (pull_request) Successful in 43s
CI / review (/openai/v1, gpt-5, gpt, openai, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m28s
CI / review (/openai/v1, gpt-5, security, openai, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m55s
fix: address review feedback on persona feature
MAJOR fixes:
- Remove external YAML dependency (github.com/goccy/go-yaml)
  Per project convention: Go standard library only, zero dependencies.
  Convert all persona files from YAML to JSON format.
- Fix TestValidateWorkspacePath error expectation
  Go 1.21+ filepath.Join normalizes absolute paths differently.

MINOR fixes:
- Remove custom contains helper in persona_test.go (use strings.Contains)
- Add Unicode-safe CapitalizeFirst function for header titles
- ListBuiltinPersonas returns empty slice instead of nil on error
- Fix test comment about filepath.Join behavior

Documentation:
- Update README to reflect JSON-only persona format
- Update design doc with note about JSON decision
- Fix action.yml description for persona-file input
2026-05-10 10:01:34 -07:00

12 KiB

Design: Role-based Review Personas (Issue #51)

Note: This design was revised during implementation to use JSON instead of YAML to maintain the repository's zero-external-dependencies convention. All persona files use JSON format. See "Design Revision" section at the end for details.

Problem

Current review-bot performs generic code review. Every reviewer (regardless of reviewer-name) uses the same base prompt and evaluates the same concerns. This leads to:

  1. Redundancy — Two reviewers (e.g., GPT + Claude twins) often flag identical issues
  2. Gaps — Generic reviewers miss specialized concerns (security, domain logic, architecture)
  3. Noise — NITs about style mixed with critical security findings
  4. No ownership — Findings lack clear domain attribution

Constraints

  • Must work with existing CLI flags and CI workflow patterns
  • Must not break backwards compatibility (existing configs still work)
  • Must integrate cleanly with the budget system (personas add to context)
  • Multiple personas running in parallel must not interfere with each other
  • Each persona must have clear scope boundaries (no duplication)

Proposed Approach

1. Persona Definition

A persona is a named review role with:

  • Identity — Who am I? What's my expertise?
  • Focus — What do I look for?
  • Scope boundaries — What do I explicitly NOT comment on?
  • Severity calibration — What counts as MAJOR/MINOR/NIT for MY domain?

Personas are defined in JSON files that can live:

  1. In the pattern repos (shared across projects)
  2. In the target repo (project-specific personas)
  3. Inline via a new --persona-file flag (JSON format)

2. Persona File Format

# .review/personas/security.yaml
name: security
display_name: Security Specialist
model_preference: opus  # optional hint for expensive analysis

identity: |
  You are a security specialist reviewing code for vulnerabilities.
  Your expertise: OWASP Top 10, injection attacks, auth/authz, secrets management,
  event sourcing security (replay attacks, event injection).

focus:
  - Injection attacks (SQL, command, path traversal, template)
  - Authentication and authorization gaps
  - Secrets exposure (hardcoded credentials, tokens in logs)
  - Input validation (unsanitized input, unsafe deserialization)
  - Race conditions with security implications
  - Event sourcing attack vectors

ignore:
  - Code style and naming conventions
  - Performance (unless security-related)
  - Documentation
  - General code quality
  - Test coverage

severity:
  critical: "Remote code execution, auth bypass, data exfiltration"
  major: "Privilege escalation, information disclosure, DoS"
  minor: "Missing rate limiting, verbose errors"
  nit: "Theoretical risk with low exploitability"

output_format: |
  For each finding:
  - Severity: [CRITICAL|MAJOR|MINOR|NIT]
  - Attack vector: How could this be exploited?
  - Evidence: Code snippet showing the vulnerability
  - Recommendation: Specific fix

3. New CLI Flags

--persona-file PATH      Path to persona JSON file (local or in repo)
--persona NAME           Built-in persona name (security, architect, domain)

Either flag sets the persona. If neither is provided, behavior is unchanged (generic review).

4. Prompt Assembly

Current flow:

SystemBase → Patterns → Conventions → [LLM]

New flow with persona:

PersonaPrompt (from YAML) → Patterns (filtered?) → Conventions → [LLM]

The persona's identity/focus/ignore/severity sections become the system prompt, replacing the generic "You are an expert code reviewer" base.

5. Built-in Personas

Ship with these built-in personas (loadable via --persona NAME):

Name Focus
security Vulnerabilities, auth, secrets
architect Patterns, consistency, design
domain Business logic (requires repo-specific config)
docs Documentation, API clarity

Built-in personas live in review/personas/ as embedded Go assets or YAML shipped with the binary.

6. CI Workflow Integration

Single persona:

- uses: rodin/review-bot/.gitea/actions/review@v1
  with:
    reviewer-name: security
    persona: security
    ...

Multiple personas (parallel jobs):

jobs:
  review:
    strategy:
      matrix:
        include:
          - name: security
            persona: security
          - name: architect
            persona: architect
    steps:
      - uses: rodin/review-bot/.gitea/actions/review@v1
        with:
          reviewer-name: ${{ matrix.name }}
          persona: ${{ matrix.persona }}

Custom persona from repo:

- uses: rodin/review-bot/.gitea/actions/review@v1
  with:
    reviewer-name: trading
    persona-file: .review/personas/trading.yaml

7. Persona + Patterns Interaction

Some personas benefit from filtered patterns:

  • Security → only security-related patterns
  • Architect → all patterns (structural focus)
  • Domain → domain docs, not language patterns

For v1, keep it simple: all patterns are included regardless of persona. Future enhancement could add patterns_filter to persona YAML.

8. Output Format Changes

Persona name appears in the review header:

# Security Review

## Summary
No critical vulnerabilities found in this change.

## Findings
| # | Severity | File | Line | Finding |
...

## Recommendation
**APPROVE** — No security-relevant issues detected.

---
*Review by security*
<!-- review-bot:security -->

State/Data Model

Persona struct

// review/persona.go
type Persona struct {
    Name          string   `yaml:"name"`
    DisplayName   string   `yaml:"display_name"`
    ModelPref     string   `yaml:"model_preference,omitempty"`
    Identity      string   `yaml:"identity"`
    Focus         []string `yaml:"focus"`
    Ignore        []string `yaml:"ignore"`
    Severity      Severity `yaml:"severity"`
    OutputFormat  string   `yaml:"output_format,omitempty"`
}

type Severity struct {
    Critical string `yaml:"critical"`
    Major    string `yaml:"major"`
    Minor    string `yaml:"minor"`
    Nit      string `yaml:"nit"`
}

Loading precedence

  1. --persona-file PATH → load from local file system
  2. --persona NAME → load from embedded built-ins
  3. Neither → use generic system prompt (current behavior)

Error Cases

Error Handling
Persona file not found Fatal exit with clear message
Invalid YAML in persona file Fatal exit with parse error
Both --persona and --persona-file specified Fatal exit: mutually exclusive
Unknown built-in persona name Fatal exit with list of valid names
Empty identity in persona Warning, fall back to generic prompt

Edge Cases

  • Empty focus list: Valid — persona relies on identity alone
  • Empty ignore list: Valid — no explicit scope exclusions
  • No severity section: Use default MAJOR/MINOR/NIT definitions
  • Model preference set but budget insufficient: Ignore preference, log warning
  • Persona file in pattern repo: Fetch like other pattern files

Testing Strategy

Unit tests

  • persona_test.go: Parse valid/invalid YAML, validate required fields
  • prompt_test.go: Verify persona prompt assembly
  • Integration with budget: persona prompts count toward token limit

Integration tests

  • End-to-end with --persona security (built-in)
  • End-to-end with --persona-file custom.yaml
  • Backwards compatibility: no flags = generic behavior

Manual verification

  • Run security persona on a PR with obvious vulnerability
  • Verify security persona ignores style issues
  • Verify non-security persona doesn't flag security issues

Implementation Phases

Phase 1: Persona types and loading

  • review/persona.go: Persona struct + YAML parsing
  • review/persona_test.go: Unit tests
  • Embed built-in personas in binary
  • Compiles clean, tests pass

Phase 2: Prompt generation

  • review/prompt.go: BuildPersonaPrompt(p Persona) string
  • Modify BuildSystemBase() to accept optional persona
  • Integrate persona prompt with budget system
  • Tests for prompt assembly

Phase 3: CLI integration

  • Add --persona and --persona-file flags
  • Flag validation (mutually exclusive, valid names)
  • Load persona based on flags
  • Pass persona to prompt builder

Phase 4: Action integration

  • Add persona and persona-file inputs to action.yml
  • Update README with persona examples
  • End-to-end CI test

Phase 5: Built-in personas

  • security.yaml built-in
  • architect.yaml built-in
  • docs.yaml built-in
  • Document each persona's focus

Open Questions

  1. Persona file location in repo: Should we support --persona-file .review/security.yaml where the file is fetched from the PR's repo (like conventions)? This adds complexity but enables project-specific personas without action changes.

  2. Model preference enforcement: If persona specifies model_preference: opus but the action uses a different model, should we warn? Override? Ignore? Current thinking: log warning, use the specified model (user controls model via action input).

  3. Severity override output: If persona defines custom severity levels (CRITICAL), should the JSON output include them, or map back to standard MAJOR/MINOR/NIT? Current thinking: keep standard output format, use severity calibration only for prompt guidance.

Completion Checklist

  1. Persona struct matches YAML schema exactly?
  2. Built-in personas embedded in binary (not external files)?
  3. --persona and --persona-file are mutually exclusive?
  4. Unknown persona name produces clear error with valid options?
  5. Empty persona file fields have sensible defaults?
  6. Persona prompt integrates with budget system (token counting)?
  7. Backwards compatibility: no flags = current behavior?
  8. Review header shows persona display name?
  9. Sentinel still uses reviewer-name (not persona name)?
  10. Unit tests cover parse errors, missing fields, valid YAML?

Design Review Findings (Self-Review)

Finding 1: Severity Mapping

The persona YAML allows critical severity, but the LLM output parser (review/parser.go) only accepts MAJOR/MINOR/NIT.

Resolution: Keep standard output format. Persona severity section is ONLY for calibrating the LLM's judgment (prompt guidance). Output must still use MAJOR/MINOR/NIT. Document this clearly in persona format docs.

Finding 2: Embedding Built-in Personas

Go doesn't natively embed YAML. Must use //go:embed directive (Go 1.16+).

Resolution: Create review/personas/ directory with YAML files and use:

//go:embed personas/*.yaml
var embeddedPersonas embed.FS

Finding 3: display_name vs reviewer-name

Design says header shows "persona display name" but sentinel uses "reviewer-name". This is correct - they serve different purposes:

  • display_name → human-readable header ("Security Specialist Review")
  • reviewer-name → machine sentinel for cleanup (<!-- review-bot:security -->)

When persona is used, display_name takes precedence for the header title, but reviewer-name (CLI flag) is still used for the sentinel.

Design Revision: YAML with gopkg.in/yaml.v3

Decision: Add gopkg.in/yaml.v3 as a dependency.

YAML is preferred over JSON for persona files because:

  • Multi-line strings are cleaner (no escaping quotes in identity/focus text)
  • Comments are supported for documentation
  • More human-readable for complex persona definitions

The implementation supports both YAML (.yaml, .yml) and JSON (.json) for backwards compatibility, with YAML as the default for built-in personas.