docs(#148): add SKILL.md and dev-loop-spec.md for dispatch redesign (#149)

2026-05-15 08:12:02 +00:00
parent 2dac6ce0c8
commit 30fe48d265
2 changed files with 407 additions and 0 deletions
@@ -0,0 +1,129 @@
+# Dev-Loop Skill: review-bot
+
+This file documents the dev-loop architecture for the `review-bot` project.
+It lives in the repo so changes are version-controlled alongside the code.
+
+## Architecture
+
+Dispatch is a **pure shell script** — no model reasoning.
+
+```
+Cron (agentTurn, toolsAllow: [exec, sessions_spawn, read])
+  → runs dispatch script
+  → reads output for SPAWN or HANDOFF lines
+  → spawns worker if instructed
+
+Dispatch script  (~/.openclaw/workspace/scripts/dev-loop-dispatch.sh)
+  → pure bash, all decisions are curl API calls + branches
+  → exits after emitting one SPAWN line (at most one worker per run)
+  → emits HANDOFF for each qualifying PR (does not exit after HANDOFF)
+
+Workers (Opus, spawned by cron model)
+  → receive precise task description
+  → do one job: self-review, fix CI, address feedback, or implement
+  → remove wip label when done, reply NO_REPLY
+```
+
+The cron model's **only** job: run script, read output, spawn worker if told to.
+The model **never** assesses project state or makes dispatch decisions.
+
+## Safety Invariants
+
+1. **NEVER MERGE** — no merge API call exists anywhere in the script or worker templates
+2. **REQUEST_CHANGES always blocks** — checked first, before CI, before self-review, before handoff
+3. **WIP mutex** — one active worker per repo; WIP label gates new issue pickup
+4. **One SPAWN per run** — script emits at most one SPAWN line per execution
+5. **set -euo pipefail** — any curl failure aborts immediately, no partial actions
+6. **Workers reply NO_REPLY** — no dispatch-level side effects (workers may push changes and manage labels as part of their task)
+
+## Dispatch Rules (in order)
+
+| Rule | Condition | Action |
+|------|-----------|--------|
+| 0 | WIP label > 1hr old | Remove stale WIP, continue |
+| 0b | WIP label ≤ 1hr old | Mark ACTIVE_WIP=1, continue (only gates Rule 10) |
+| _(1)_ | _(reserved — intentionally unused)_ | — |
+| 2 | Any reviewer has REQUEST_CHANGES | SPAWN:findings |
+| 3 | PR not mergeable | SPAWN:rebase |
+| 4 | CI failure, no fix plan | SPAWN:ci-fix |
+| 4b | CI failure, fix plan exists | Skip (worker in progress) |
+| 5 | Bot review missing | Wait |
+| 6 | CI pending/unknown | Wait |
+| 7 | No clean self-review, no fix plan | SPAWN:self-review |
+| 7b | Self-review needs attention, no fix plan | SPAWN:sr-fix |
+| 8 | Unacknowledged bot review findings | SPAWN:address-feedback |
+| 9 | Unresolved inline diff comments | SPAWN:address-feedback |
+| 10 | All checks pass | HANDOFF |
+| 11 | No open PRs + no ACTIVE_WIP | SPAWN:impl (next issue) |
+
+## Files
+
+| File | Description |
+|------|-------------|
+| `~/.openclaw/workspace/scripts/dev-loop-dispatch.sh` | Dispatch script — pure bash |
+| `~/.openclaw/workspace/scripts/worker-tasks/self-review.md` | Self-review worker template |
+| `~/.openclaw/workspace/scripts/worker-tasks/sr-fix.md` | Fix findings from self-review |
+| `~/.openclaw/workspace/scripts/worker-tasks/ci-fix.md` | CI fix worker template |
+| `~/.openclaw/workspace/scripts/worker-tasks/address-feedback.md` | Address feedback worker template |
+| `~/.openclaw/workspace/scripts/worker-tasks/findings.md` | Address REQUEST_CHANGES findings |
+| `~/.openclaw/workspace/scripts/worker-tasks/rebase.md` | Rebase worker template |
+| `~/.openclaw/workspace/scripts/worker-tasks/impl.md` | Issue implementation worker template |
+| `~/.openclaw/workspace/scripts/test/dispatch.bats` | Unit tests (bats) |
+| `~/.openclaw/workspace/scripts/test/check-invariants.sh` | Static invariant checks |
+| `~/.openclaw/workspace/memory/projects/review-bot.yaml` | Project config |
+
+## Project Config
+
+Config is at `~/.openclaw/workspace/memory/projects/review-bot.yaml`.
+
+Key fields:
+- `repo`: `rodin/review-bot`
+- `api_base`: `https://gitea.weiker.me/api/v1`
+- `user`: `rodin` (bot Gitea username)
+- `labels.wip`: WIP label ID
+- `labels.ready`: ready label ID
+- `review_bots`: list of bot sentinel names
+
+## Cron Config
+
+```yaml
+- label: review-bot-dev-loop
+  schedule: "*/15 * * * *"
+  prompt: |
+    Run: bash ~/.openclaw/workspace/scripts/dev-loop-dispatch.sh review-bot
+
+    Read the output. If it contains a SPAWN line, load the matching template from
+    ~/.openclaw/workspace/scripts/worker-tasks/<type>.md, substitute {{PROJECT}},
+    {{PR_NUM}}, and {{HEAD_SHA}}, then spawn with sessions_spawn(mode: "run",
+    model: "hai-anthropic/anthropic--claude-4.6-opus", thinking: "high").
+
+    If no SPAWN line in output, reply NO_REPLY.
+
+    See ~/.openclaw/workspace/skills/dev-loop/SKILL.md for full instructions.
+    (This repo's SKILL.md is deployed to that workspace path.)
+  model: hai-anthropic/anthropic--claude-4.5-haiku
+  toolsAllow: [exec, sessions_spawn, read]
+```
+
+## Tests
+
+```bash
+# Unit tests (no real API calls):
+bats ~/.openclaw/workspace/scripts/test/dispatch.bats
+
+# Invariant checks (static analysis):
+bash ~/.openclaw/workspace/scripts/test/check-invariants.sh
+
+# Dry-run against real API:
+DRY_RUN=1 bash ~/.openclaw/workspace/scripts/dev-loop-dispatch.sh review-bot
+```
+
+## Related Issues
+
+- **#144** — autonomous merge: eliminated by removing all merge API calls from dispatch
+- **#145** — merged despite REQUEST_CHANGES: eliminated by checking REQUEST_CHANGES first, unconditionally
+- **#148** — this redesign
+
+## Spec
+
+Full design spec: `docs/dev-loop-spec.md`
@@ -0,0 +1,278 @@
+# Dev-Loop Dispatch Spec
+
+**Version:** 1.0
+**Status:** Implemented
+**Implements:** Issue #148
+
+This document is the authoritative spec for the review-bot dev-loop dispatch architecture.
+The dispatch script (`~/.openclaw/workspace/scripts/dev-loop-dispatch.sh`) and its tests
+are validated against the rules and invariants in this document.
+
+---
+
+## 1. Overview
+
+The dev-loop is a 15-minute cron that advances the state of open pull requests and picks up
+new issues when there is nothing in review. It is designed for **zero human intervention**
+in the normal flow and **hard stops at key safety boundaries**.
+
+### Architecture
+
+```
+Cron (15-min cadence)
+  → exec: bash dev-loop-dispatch.sh <project>
+  → read stdout for SPAWN/HANDOFF lines
+  → if SPAWN: load worker template, spawn subagent
+  → if HANDOFF: log, do nothing else
+  → if neither: NO_REPLY
+```
+
+The cron model has **no ambient knowledge** of the project state. All state is derived
+from the dispatch script's output, which in turn comes from live API calls.
+
+---
+
+## 2. Inputs
+
+### Project Config
+
+```yaml
+# memory/projects/<project>.yaml
+repo: rodin/review-bot         # <owner>/<repo>
+api_base: https://gitea.../v1  # API base URL
+token_path: ~/.openclaw/...    # path to bearer token
+user: rodin                    # bot Gitea username
+labels:
+  wip: <id>
+  ready: <id>
+review_bots:                   # sentinel names in review bodies
+  - sonnet
+  - gpt
+  - security
+```
+
+### Script Arguments
+
+```bash
+bash dev-loop-dispatch.sh <project>   # normal run
+DRY_RUN=1 bash dev-loop-dispatch.sh <project>   # dry-run (no mutations)
+```
+
+---
+
+## 3. State
+
+The dispatch script is **stateless per run**. All state lives in the Gitea API:
+
+| State | API location |
+|-------|-------------|
+| Open PRs | `GET /repos/:repo/pulls?state=open` |
+| PR labels | `GET /repos/:repo/issues/:n/labels` |
+| PR reviews | `GET /repos/:repo/pulls/:n/reviews` |
+| CI status | `GET /repos/:repo/commits/:sha/status` |
+| Issue comments | `GET /repos/:repo/issues/:n/comments` |
+| Inline diff comments | `GET /repos/:repo/pulls/:n/comments` |
+| Issue timeline | `GET /repos/:repo/issues/:n/timeline` |
+
+No file-based state. No cron-to-cron carry-over.
+
+---
+
+## 4. Output Protocol
+
+The script emits structured lines to stdout. Stderr is diagnostic logging.
+
+### `SPAWN:<type>:<number>:<sha>`
+
+A worker is needed. The cron model reads this and spawns a subagent using the
+template at `worker-tasks/<type>.md`.
+
+| Field | Description |
+|-------|-------------|
+| `type` | Worker type: `self-review`, `ci-fix`, `address-feedback`, `findings`, `rebase`, `impl` |
+| `number` | PR number (or issue number for `impl`) |
+| `sha` | HEAD SHA of the PR (empty for `impl`) |
+
+At most **one SPAWN** is emitted per script run.
+
+### `HANDOFF:<pr_num>`
+
+All checks passed for `pr_num`. The script applied the `ready` label and assigned
+to the human reviewer. The cron model logs this and takes no further action.
+
+Multiple HANDOFFs may be emitted in one run (one per qualifying PR).
+
+---
+
+## 5. Dispatch Rules
+
+Rules are evaluated **in order** for each open PR. The first matching condition wins.
+Only one SPAWN is emitted per full pass.
+
+### Rule 0: WIP Cleanup
+
+For each open PR with a `wip` label:
+
+1. Find the timestamp when the label was most recently applied (via timeline events)
+2. If age > 1hr: **remove the label** (stale lock — worker likely crashed)
+3. If age ≤ 1hr: **set ACTIVE_WIP=1** (do not exit, only gates Rule 10)
+
+### Rule 2: REQUEST_CHANGES Blocks
+
+**ALWAYS evaluated before any other per-PR rule.**
+
+For each reviewer, take their **latest** review state. If any reviewer's latest
+state is `REQUEST_CHANGES`:
+
+→ Acquire WIP label on this PR
+→ Emit `SPAWN:findings:<pr_num>:<head_sha>`
+→ Continue to next PR (but only one SPAWN total)
+
+This rule cannot be bypassed by any condition. There is no waiver mechanism.
+
+### Rule 3: Merge Conflicts
+
+If `mergeable == false`:
+
+→ Acquire WIP
+→ Emit `SPAWN:rebase:<pr_num>:<head_sha>`
+
+### Rule 4: CI Failure
+
+If CI state is `failure` or `error`:
+
+- If a fix plan comment exists for this HEAD SHA: **skip** (worker in progress)
+- Otherwise:
+
+→ Acquire WIP
+→ Emit `SPAWN:ci-fix:<pr_num>:<head_sha>`
+
+### Rule 5: Bot Reviews Missing
+
+For each configured `review_bot`, check whether a review body contains the
+sentinel `<!-- review-bot:<name> -->`.
+
+If any sentinel is missing: **wait** (continue to next PR, no SPAWN).
+
+### Rule 6: CI Pending/Unknown
+
+If CI state is `pending` or `unknown`: **wait**.
+
+### Rule 7: Self-Review
+
+Check for a self-review comment from the bot user against the current HEAD SHA:
+- Comment contains `Self-review against <head_sha>`
+
+Sub-cases:
+- **Missing**: No self-review comment →
+  → Acquire WIP, emit `SPAWN:self-review:<pr_num>:<head_sha>`
+- **Needs attention** (`Assessment: ⚠️`): Found, but has findings:
+  - Fix plan exists for HEAD SHA: skip
+  - No fix plan: → Acquire WIP, emit `SPAWN:sr-fix:<pr_num>:<head_sha>`
+- **Clean** (`Assessment: ✅ Clean`): Continue to Rule 8
+
+### Rule 8: Unacknowledged Bot Review Findings
+
+For each **current** (contains `Evaluated against <head_short>`) APPROVED bot review
+that has a findings table:
+
+A finding is **unacknowledged** if it does not appear as `Finding #N` in a fix plan
+comment from the bot user for this HEAD SHA.
+
+If any unacknowledged findings exist:
+- Fix plan exists: skip
+- No fix plan: → Acquire WIP, emit `SPAWN:address-feedback:<pr_num>:<head_sha>`
+
+### Rule 9: Unresolved Inline Diff Comments
+
+An inline diff comment is **unresolved** if:
+1. `in_reply_to_id` is null (top-level comment)
+2. `resolver` is null (not formally resolved)
+3. No other comment has `in_reply_to_id` pointing to this comment (no reply)
+
+If unresolved comments exist:
+- Fix plan exists: skip
+- No fix plan: → Acquire WIP, emit `SPAWN:address-feedback:<pr_num>:<head_sha>`
+
+### Rule 10: Handoff
+
+All rules above passed. Verify all bot reviews are current (contain `Evaluated against <head_short>`).
+
+If all current:
+- Apply `ready` label
+- Assign to `aweiker`
+- Emit `HANDOFF:<pr_num>`
+- Continue evaluating remaining PRs (do NOT exit)
+
+If already assigned to `aweiker`: skip (assume handoff was already performed; continue to next PR without emitting another HANDOFF).
+
+### Rule 11: New Issue Pickup
+
+Only runs if: no open PRs exist AND `ACTIVE_WIP == 0`.
+
+Fetch open, unassigned issues. Priority: bugs first, then by number ascending.
+
+Claim the issue (assign to bot user to prevent double-pick), then:
+→ Emit `SPAWN:impl:<issue_num>:`
+
+---
+
+## 6. Safety Invariants
+
+These are statically checked by `~/.openclaw/workspace/scripts/test/check-invariants.sh` and enforced in all changes:
+
+| ID | Invariant |
+|----|-----------|
+| S1 | Zero merge API calls in dispatch script (`/merge` does not appear) |
+| S2 | REQUEST_CHANGES check (Rule 2) appears before CI check (Rule 4) |
+| S3 | REQUEST_CHANGES check (Rule 2) appears before ready label application (Rule 10) |
+| S4 | No model/AI API references in dispatch script |
+| S5 | `set -euo pipefail` present |
+| S6 | Active WIP does not cause early exit (only sets ACTIVE_WIP flag) |
+| S7 | SPAWN:impl guarded by `ACTIVE_WIP == 0` check |
+| S8 | No merge calls in any worker template |
+
+---
+
+## 7. Error Handling
+
+| Error | Behavior |
+|-------|----------|
+| `curl` returns error | `set -euo pipefail` aborts script — no partial actions |
+| `jq` parse error | Script aborts |
+| Worker crashes | WIP label left on PR; stale WIP cleanup (Rule 0) removes it after 1hr |
+| Race: two crons fire | WIP mutex prevents double-dispatch for same PR |
+| `sessions_spawn` fails | Worker not spawned; WIP label orphaned → cleaned in 1hr |
+| Config file missing | Exit code 2 with error message |
+
+---
+
+## 8. Worker Templates
+
+Each worker receives a precise task description with substituted values:
+
+| Template | Trigger | Key job |
+|----------|---------|---------|
+| `self-review.md` | No clean self-review | Post self-review comment, remove WIP |
+| `sr-fix.md` | Self-review needs attention | Address self-review findings, push, remove WIP |
+| `ci-fix.md` | CI failing | Diagnose, fix, push, remove WIP |
+| `address-feedback.md` | Unacknowledged findings or inline comments | Address feedback, push, remove WIP |
+| `findings.md` | REQUEST_CHANGES present | Address REQUEST_CHANGES, push, remove WIP |
+| `rebase.md` | Merge conflicts | Rebase on main, push, remove WIP |
+| `impl.md` | New issue | Implement feature/fix, open PR |
+
+Workers **always** remove the WIP label on completion and reply `NO_REPLY`.
+
+---
+
+## 9. Fixes for Issues #144 and #145
+
+**Issue #144** (autonomous merge):
+The dispatch script contains no merge API calls anywhere. The `~/.openclaw/workspace/scripts/test/check-invariants.sh`
+invariant `S1` verifies this. Workers do not receive merge instructions.
+
+**Issue #145** (merged despite REQUEST_CHANGES):
+Rule 2 is the **first** rule evaluated per PR. It cannot be skipped, reasoned past,
+or bypassed. It is checked before CI, before self-review, before handoff. The check
+uses latest-per-reviewer state, so a reviewer who re-approved after REQUEST_CHANGES
+is correctly handled.