feat: redesign dev-loop dispatch as pure shell script — no model reasoning in dispatch #148

Closed
opened 2026-05-15 05:33:25 +00:00 by rodin · 1 comment
Owner

Summary

The current dev-loop uses a single Haiku model call that simultaneously assesses project state, makes dispatch decisions, AND acts. This caused two production failures:

  • PR #138 merged despite active REQUEST_CHANGES — model reasoned past the check (#145)
  • PR #140 merged autonomously without human approval (#144)

The fix: dispatch becomes a pure shell script with no model reasoning. Every decision is a boolean API check + branch. The model only spawns a worker when the script's output says to.

Full Pre-Code Plan

Pre-Code Plan: Dev-Loop Dispatch Redesign

Problem

The current dev-loop uses a single model call (Haiku) that simultaneously assesses project state, makes decisions, AND acts. This is a logic-model-action pipeline with no hard stops. Two production failures demonstrate the failure mode:

  1. PR #138 merged despite active REQUEST_CHANGES from security-review-bot — the model reasoned past the check instead of executing it as a binary shell command.
  2. PR #140 merged autonomously without human approval — the model (or a worker it spawned) called the merge API despite an explicit prohibition in the skill.

The root cause is that dispatch is entangled with judgment. When Haiku "reasons" about whether to merge, it can reach wrong conclusions. The only safe dispatch is a dispatch that cannot reason — one that executes boolean API checks and branches on their output.

This plan redesigns dispatch to be a pure shell script with no model reasoning, and restricts models to workers that receive a precise situation description and do only the work they're given.

Constraints

  • Merge is never called by any automated component — only Aaron merges
  • REQUEST_CHANGES from any reviewer always blocks — no exceptions, no reasoning past it
  • WIP mutex preserved — one active worker at a time per repo
  • Workers receive only the situation relevant to their task — no ambient project state
  • The dispatch script is version-controlled and auditable
  • The cron entry calls the script with toolsAllow: ["exec"] only — no model reasoning in dispatch
  • Must fix issues #144 and bug: dev-loop merged PR #138 despite active REQUEST_CHANGES from security-review-bot (#145)
  • Must not break existing workers (self-review, CI fix, address-feedback, impl)

Proposed Approach

Architecture

Cron (agentTurn, toolsAllow: ["exec"])
  → runs dispatch script
  → dispatch script: pure bash, API calls, branching

Dispatch script
  → checks project state via curl API calls
  → branches to first matching condition
  → if worker needed: writes task file, calls sessions_spawn via exec
  → exits immediately after one action

Workers (Sonnet, spawned by dispatch)
  → receive precise situation description
  → do one job (self-review, fix CI, address feedback, implement)
  → remove wip label, exit with NO_REPLY

What Changes

1. New: workspace/scripts/dev-loop-dispatch.sh

A bash script with no model reasoning. Every decision is:

RESULT=$(curl -s ...)
if echo "$RESULT" | jq -e '<condition>' > /dev/null; then
  # take action
  exit 0
fi

Full dispatch logic (in order):

#!/usr/bin/env bash
# dev-loop-dispatch.sh — Pure shell dispatch. No model reasoning.
# Called by cron via: exec("bash dev-loop-dispatch.sh <project>")

set -euo pipefail
PROJECT="${1:?Usage: dev-loop-dispatch.sh <project>}"
CONFIG="memory/projects/${PROJECT}.yaml"

# Load config (yq or python-yaml)
TOKEN=$(cat "$(yq '.token_path' "$CONFIG")")
API=$(yq '.api_base' "$CONFIG")
REPO=$(yq '.repo' "$CONFIG")
USER=$(yq '.user' "$CONFIG")
WIP_ID=$(yq '.labels.wip' "$CONFIG")
READY_ID=$(yq '.labels.ready' "$CONFIG")
NODE=$(yq '.node' "$CONFIG")
REVIEW_BOTS=$(yq '.review_bots[]' "$CONFIG")  # newline-separated

# --- Pre-check 1: WIP lock ---
# Fetch open PRs with wip label
WIP_PR=$(curl -s -H "Authorization: token $TOKEN" \
  "$API/repos/$REPO/pulls?state=open&labels=$WIP_ID" | \
  jq '[.[] | select(.user.login == env.USER)] | first')

if [ "$WIP_PR" != "null" ] && [ -n "$WIP_PR" ]; then
  PR_NUM=$(echo "$WIP_PR" | jq -r '.number')
  # Check wip label age via timeline events
  LAST_WIP=$(bash scripts/get-wip-timestamp.sh "$API" "$TOKEN" "$PR_NUM")
  NOW=$(date +%s)
  WIP_TS=$(date -d "$LAST_WIP" +%s 2>/dev/null || date -j -f "%Y-%m-%dT%H:%M:%SZ" "$LAST_WIP" +%s)
  AGE=$(( NOW - WIP_TS ))
  if [ "$AGE" -lt 3600 ]; then
    echo "WIP active on PR #$PR_NUM (${AGE}s old) — exiting"
    exit 0
  fi
  # Stale WIP — remove label
  curl -s -X DELETE -H "Authorization: token $TOKEN" \
    "$API/repos/$REPO/issues/$PR_NUM/labels/$WIP_ID" > /dev/null
fi

# --- Fetch all open PRs from bot user ---
PRS=$(curl -s -H "Authorization: token $TOKEN" \
  "$API/repos/$REPO/pulls?state=open&sort=oldest" | \
  jq --arg u "$USER" '[.[] | select(.user.login == $u)]')

PR_COUNT=$(echo "$PRS" | jq 'length')

# --- Priority checks over each open PR ---
if [ "$PR_COUNT" -gt 0 ]; then
  # Iterate over PRs (process first matching)
  for PR_JSON in $(echo "$PRS" | jq -c '.[]'); do
    PR_NUM=$(echo "$PR_JSON" | jq -r '.number')
    HEAD_SHA=$(echo "$PR_JSON" | jq -r '.head.sha')
    HEAD_SHORT=$(echo "$HEAD_SHA" | cut -c1-8)
    MERGEABLE=$(echo "$PR_JSON" | jq -r '.mergeable')
    
    # Check 1: REQUEST_CHANGES from any reviewer (ALWAYS BLOCKS)
    RC_COUNT=$(curl -s -H "Authorization: token $TOKEN" \
      "$API/repos/$REPO/pulls/$PR_NUM/reviews" | \
      jq '[.[] | select(.state == "REQUEST_CHANGES")] | length')
    if [ "$RC_COUNT" -gt 0 ]; then
      echo "PR #$PR_NUM blocked: $RC_COUNT REQUEST_CHANGES active — spawning findings worker"
      bash scripts/acquire-wip.sh "$API" "$TOKEN" "$REPO" "$PR_NUM" "$WIP_ID"
      bash scripts/spawn-worker.sh findings "$PROJECT" "$PR_NUM" "$HEAD_SHA"
      exit 0
    fi
    
    # Check 2: Merge conflicts
    if [ "$MERGEABLE" = "false" ]; then
      echo "PR #$PR_NUM has merge conflicts — spawning rebase worker"
      bash scripts/acquire-wip.sh "$API" "$TOKEN" "$REPO" "$PR_NUM" "$WIP_ID"
      bash scripts/spawn-worker.sh rebase "$PROJECT" "$PR_NUM" "$HEAD_SHA"
      exit 0
    fi
    
    # Check 3: CI status
    CI_STATE=$(curl -s -H "Authorization: token $TOKEN" \
      "$API/repos/$REPO/commits/$HEAD_SHA/status" | jq -r '.state')
    if [ "$CI_STATE" != "success" ]; then
      # Check for existing fix plan
      HAS_PLAN=$(curl -s -H "Authorization: token $TOKEN" \
        "$API/repos/$REPO/issues/$PR_NUM/comments" | \
        jq --arg sha "$HEAD_SHA" --arg u "$USER" \
        '[.[] | select(.user.login == $u and (.body | startswith("## Fix Plan against " + $sha)))] | length > 0')
      if [ "$HAS_PLAN" = "true" ]; then
        echo "PR #$PR_NUM CI failing, fix plan exists for $HEAD_SHORT — blocked"
        exit 0
      fi
      echo "PR #$PR_NUM CI failing ($CI_STATE) — spawning CI fix worker"
      bash scripts/acquire-wip.sh "$API" "$TOKEN" "$REPO" "$PR_NUM" "$WIP_ID"
      bash scripts/spawn-worker.sh ci-fix "$PROJECT" "$PR_NUM" "$HEAD_SHA"
      exit 0
    fi
    
    # Check 4: Bot reviews present (all must have posted)
    MISSING_BOT=""
    for BOT in $REVIEW_BOTS; do
      HAS_REVIEW=$(curl -s -H "Authorization: token $TOKEN" \
        "$API/repos/$REPO/pulls/$PR_NUM/reviews" | \
        jq --arg b "$BOT" '[.[] | select(.body | contains("<!-- review-bot:\($b) -->"))] | length > 0')
      if [ "$HAS_REVIEW" = "false" ]; then
        MISSING_BOT="$BOT"
        break
      fi
    done
    if [ -n "$MISSING_BOT" ]; then
      echo "PR #$PR_NUM waiting on bot review from $MISSING_BOT — exiting"
      exit 0
    fi
    
    # Check 5: Self-review present and current
    SR_OK=$(curl -s -H "Authorization: token $TOKEN" \
      "$API/repos/$REPO/issues/$PR_NUM/comments" | \
      jq --arg sha "$HEAD_SHA" --arg u "$USER" \
      '[.[] | select(.user.login == $u and (.body | contains("Self-review against \($sha)")) and (.body | contains("Assessment: ✅ Clean")))] | length > 0')
    if [ "$SR_OK" = "false" ]; then
      # Check if needs-attention self-review exists (has findings to fix)
      SR_NEEDS=$(curl -s -H "Authorization: token $TOKEN" \
        "$API/repos/$REPO/issues/$PR_NUM/comments" | \
        jq --arg sha "$HEAD_SHA" --arg u "$USER" \
        '[.[] | select(.user.login == $u and (.body | contains("Self-review against \($sha)")) and (.body | contains("Assessment: ⚠️")))] | length > 0')
      if [ "$SR_NEEDS" = "true" ]; then
        HAS_PLAN=$(curl -s -H "Authorization: token $TOKEN" \
          "$API/repos/$REPO/issues/$PR_NUM/comments" | \
          jq --arg sha "$HEAD_SHA" --arg u "$USER" \
          '[.[] | select(.user.login == $u and (.body | startswith("## Fix Plan against " + $sha)))] | length > 0')
        if [ "$HAS_PLAN" = "true" ]; then
          echo "PR #$PR_NUM self-review has findings, fix plan exists — worker likely running"
          exit 0
        fi
        echo "PR #$PR_NUM self-review has findings — spawning fix worker"
        bash scripts/acquire-wip.sh "$API" "$TOKEN" "$REPO" "$PR_NUM" "$WIP_ID"
        bash scripts/spawn-worker.sh sr-fix "$PROJECT" "$PR_NUM" "$HEAD_SHA"
        exit 0
      fi
      echo "PR #$PR_NUM needs self-review — spawning self-review worker"
      bash scripts/acquire-wip.sh "$API" "$TOKEN" "$REPO" "$PR_NUM" "$WIP_ID"
      bash scripts/spawn-worker.sh self-review "$PROJECT" "$PR_NUM" "$HEAD_SHA"
      exit 0
    fi
    
    # Check 6: Unresolved review findings
    UNRESOLVED=$(curl -s -H "Authorization: token $TOKEN" \
      "$API/repos/$REPO/pulls/$PR_NUM/comments" | \
      jq '[.[] | select(.resolver == null and .in_reply_to_id == null)] | length')
    if [ "$UNRESOLVED" -gt 0 ]; then
      HAS_PLAN=$(curl -s -H "Authorization: token $TOKEN" \
        "$API/repos/$REPO/issues/$PR_NUM/comments" | \
        jq --arg sha "$HEAD_SHA" --arg u "$USER" \
        '[.[] | select(.user.login == $u and (.body | startswith("## Fix Plan against " + $sha)))] | length > 0')
      if [ "$HAS_PLAN" = "true" ]; then
        echo "PR #$PR_NUM has $UNRESOLVED unresolved findings, fix plan exists — worker likely running"
        exit 0
      fi
      echo "PR #$PR_NUM has $UNRESOLVED unresolved findings — spawning feedback worker"
      bash scripts/acquire-wip.sh "$API" "$TOKEN" "$REPO" "$PR_NUM" "$WIP_ID"
      bash scripts/spawn-worker.sh address-feedback "$PROJECT" "$PR_NUM" "$HEAD_SHA"
      exit 0
    fi
    
    # Check 7: All clear — hand off (ONLY if all above pass)
    # Verify bot reviews are current (against HEAD_SHORT)
    ALL_CURRENT=true
    for BOT in $REVIEW_BOTS; do
      CURRENT=$(curl -s -H "Authorization: token $TOKEN" \
        "$API/repos/$REPO/pulls/$PR_NUM/reviews" | \
        jq --arg b "$BOT" --arg sha "$HEAD_SHORT" \
        '[.[] | select(.body | contains("<!-- review-bot:\($b) -->")) | select(.body | contains("Evaluated against \($sha)"))] | length > 0')
      if [ "$CURRENT" = "false" ]; then
        ALL_CURRENT=false
        break
      fi
    done
    if [ "$ALL_CURRENT" = "false" ]; then
      echo "PR #$PR_NUM bot review is stale — exiting"
      exit 0
    fi
    
    echo "PR #$PR_NUM all checks pass — applying ready label and assigning to aweiker"
    curl -s -X POST -H "Authorization: token $TOKEN" \
      -H "Content-Type: application/json" \
      -d "{\"labels\": [$READY_ID]}" \
      "$API/repos/$REPO/issues/$PR_NUM/labels" > /dev/null
    curl -s -X PATCH -H "Authorization: token $TOKEN" \
      -H "Content-Type: application/json" \
      -d '{"assignees": ["aweiker"]}' \
      "$API/repos/$REPO/pulls/$PR_NUM" > /dev/null
    echo "🔀 PR #$PR_NUM ready for review"
    exit 0
  done
fi

# --- No open PRs — pick next issue ---
NEXT_ISSUE=$(curl -s -H "Authorization: token $TOKEN" \
  "$API/repos/$REPO/issues?state=open&type=issues&limit=50" | \
  jq --arg u "$USER" \
  '[.[] | select(.assignee == null)] |
   (map(select([.labels[].name] | contains(["bug"]))) + 
    map(select([.labels[].name] | contains(["bug"]) | not))) |
   first')

if [ "$NEXT_ISSUE" = "null" ] || [ -z "$NEXT_ISSUE" ]; then
  echo "No issues to work on — exiting"
  exit 0
fi

ISSUE_NUM=$(echo "$NEXT_ISSUE" | jq -r '.number')
ISSUE_TITLE=$(echo "$NEXT_ISSUE" | jq -r '.title')
echo "Starting work on issue #$ISSUE_NUM: $ISSUE_TITLE"
bash scripts/spawn-worker.sh impl "$PROJECT" "$ISSUE_NUM"
exit 0

Key safety properties:

  • REQUEST_CHANGES check is first — before CI, before self-review, before anything
  • No merge calls exist anywhere in the script
  • Every branch exits after one action
  • set -euo pipefail — any curl failure aborts (no silent partial state)

2. New: workspace/scripts/spawn-worker.sh

Wraps sessions_spawn via a small Python/node script (since bash can't call the OpenClaw API directly). The dispatcher writes the worker task to a temp file, then calls openclaw spawn or uses the sessions API.

Alternative: The dispatcher script exits with a structured output, and the cron's agentTurn model reads it and spawns the worker. This keeps the script pure bash and moves the one model call to spawn-time. The cron prompt becomes:

Run: bash ~/.openclaw/workspace/scripts/dev-loop-dispatch.sh <project>

If output contains "spawning <type> worker for PR #<N>", spawn a worker:
- Type: <type>
- PR: #<N>
- Task: [read from workspace/scripts/worker-tasks/<type>.md, substitute PR number and SHA]

This is cleaner: the shell script does all the API work, the model only does the one thing it can't avoid — spawning a session. The model sees only the spawn instruction, not the raw project state.

3. Updated: workspace/skills/dev-loop/SKILL.md

Replace the skill with documentation pointing to the script:

# Dev Loop (Dispatch)

## Architecture

Dispatch is a pure shell script. No model reasoning.

Script: `workspace/scripts/dev-loop-dispatch.sh <project>`

The cron runs this script. If a worker is needed, the script outputs a structured
spawn instruction. The cron model reads the output and spawns the worker.

The model NEVER:
- Assesses project state
- Makes dispatch decisions
- Calls merge API

The script ALWAYS:
- Checks REQUEST_CHANGES before anything else
- Exits after one action
- Uses hard API checks, not inference

## Workers

Workers are spawned by the cron model based on script output. Worker task templates:
- `workspace/scripts/worker-tasks/self-review.md`
- `workspace/scripts/worker-tasks/ci-fix.md`
- `workspace/scripts/worker-tasks/address-feedback.md`
- `workspace/scripts/worker-tasks/findings.md`
- `workspace/scripts/worker-tasks/rebase.md`
- `workspace/scripts/worker-tasks/impl.md`

Each template is substituted with PR number, HEAD SHA, and project config values.

## Safety Invariants

1. **NEVER MERGE** — no merge API calls anywhere
2. **REQUEST_CHANGES always blocks** — checked first, before any other condition
3. **WIP mutex** — one worker per repo at a time
4. **One action per run** — script exits after first matching condition
5. **Workers exit with NO_REPLY** — no announcements, no handoff

4. New: workspace/scripts/worker-tasks/ directory

Each worker type gets a markdown template. Variables are substituted by the cron model when spawning. Templates are identical to the current skill's worker task templates, extracted into files for single-source maintenance.

5. Updated: cron config

Current cron entry (approximate):

- label: gargoyle-dev-loop
  schedule: "*/15 * * * *"
  prompt: "dev-loop gargoyle"
  model: haiku
  toolsAllow: [exec, sessions_spawn, read, memory_get]

New cron entry:

- label: gargoyle-dev-loop
  schedule: "*/15 * * * *"
  prompt: |
    Run: bash ~/.openclaw/workspace/scripts/dev-loop-dispatch.sh gargoyle
    
    Read the output. If it contains a spawn instruction, spawn the worker
    using the matching template in workspace/scripts/worker-tasks/.
    Substitute PR number, HEAD SHA, and project values from the output.
    
    If no spawn instruction in output, exit with NO_REPLY.
  model: haiku
  toolsAllow: [exec, sessions_spawn]

The cron model's job is now: run script, read output, spawn if instructed. That's it.

State/Data Model

The dispatch logic is stateless per run. State lives entirely in the Gitea API:

  • PR labels (wip, ready)
  • PR reviews (REQUEST_CHANGES state, bot sentinels)
  • PR CI status
  • Issue comments (self-review markers, fix plans)
  • Issue assignees

No state is carried between cron runs.

Error Cases

Error Handling
curl returns non-200 set -euo pipefail causes script to abort — no partial actions
jq parse error Script aborts — better to skip a run than take wrong action
Worker crashed (wip orphaned) Pre-check 1: stale wip removed after 1hr, next run proceeds
Bot review missing Exit — CI will trigger it, next run re-evaluates
Race: two crons fire simultaneously WIP mutex prevents double-dispatch
Script bug causes false REQUEST_CHANGES Worker receives detailed situation, can surface to Aaron
sessions_spawn fails Worker not spawned, wip label orphaned → cleaned up in 1hr

Edge Cases

  • PR assigned to non-bot user: Only PRs from USER (bot) are checked — human PRs are ignored.
  • Multiple open PRs: Script checks PRs in oldest-first order. First matching PR wins.
  • REQUEST_CHANGES from a human reviewer, not just bots: The check is on review state, not reviewer identity — human REQUEST_CHANGES also blocks. This is correct behavior.
  • REQUEST_CHANGES that was later superseded by APPROVED from same reviewer: Gitea API returns all reviews. Check is state == REQUEST_CHANGES on any review. If reviewer re-approved, their latest review is APPROVED, not REQUEST_CHANGES. Need to check latest review per reviewer, not any review. (Open question — see below.)
  • Bot sentinel in review body but state is REQUEST_CHANGES: This can happen if a bot re-reviews with changes. Sentinel presence ≠ approval. State check must be separate from sentinel check.
  • WIP label applied via Gitea UI by Aaron: Script sees it as WIP, waits 1hr before clearing. May cause unexpected delays. Low risk.

Testing Strategy

Unit tests for dispatch logic

Extract each check into a small bash function that can be tested with mock curl responses. Use bats (Bash Automated Testing System):

# test/dispatch.bats
@test "REQUEST_CHANGES blocks before all other checks" {
  # Mock curl to return a PR with REQUEST_CHANGES
  # Assert: worker spawned, script exits 0
  # Assert: no ready label applied
}

@test "WIP age < 1hr exits without action" { ... }
@test "WIP age > 1hr removes label and continues" { ... }
@test "CI failure with existing plan does not respawn" { ... }
@test "All green with current reviews triggers handoff" { ... }
@test "REQUEST_CHANGES on PR2 processed before clean PR1" { ... }

Integration smoke test

After deploying, run the script against a real project in dry-run mode:

DRY_RUN=1 bash dev-loop-dispatch.sh review-bot

Dry-run mode prints what it would do without calling spawn or making label mutations.

Manual verification of #144 and #145 fixes

  1. Create a test PR with an active REQUEST_CHANGES review
  2. Trigger the dispatch script
  3. Verify: no merge called, findings worker spawned
  4. Verify: WIP label present on PR during worker run
  5. Verify: after worker completes, WIP label removed

Regression test for autonomous merge

  1. Trace through script for a PR in "all green" state
  2. Confirm: only ready label + assignees PATCH calls are made
  3. Confirm: no POST /repos/.../pulls/.../merge call anywhere in script or worker templates

Open Questions

  1. REQUEST_CHANGES latest-only vs any: If a reviewer posts REQUEST_CHANGES, then later posts APPROVED, does the current skill's check (any review with REQUEST_CHANGES state) correctly reflect the resolved state? The Gitea API returns individual reviews, not a per-reviewer summary. We should check the latest review per reviewer. Current SKILL.md doesn't specify this; the script should implement latest-per-reviewer logic explicitly.

  2. Spawn mechanism: The dispatch script can't call sessions_spawn directly from bash. Two options:

    • (A) Script outputs structured spawn instruction, cron model reads and spawns
    • (B) Script calls a small helper (openclaw sessions spawn <task-file>) if such a CLI exists
      Option A is cleaner for auditability — the spawn instruction is visible in exec output. Preference?
  3. Worker task templates: Should worker task templates live in workspace/scripts/worker-tasks/ (workspace, not version-controlled in a repo) or in the review-bot or a dedicated ops-gateway repo? The workspace location keeps them co-located with the dispatch script; a repo gives them a change history independent of the workspace.

  4. Cron toolsAllow for review-bot vs gargoyle: The new cron needs exec and sessions_spawn. Currently the skill allows more tools (read, memory_get). Should we tighten all dev-loop crons or only new ones?

  5. Dry-run mode: Is DRY_RUN=1 flag worth implementing for the script, or is the test environment (staging project) sufficient for verification?

Completion Checklist

  1. Does the dispatch script contain zero calls to any merge API endpoint?
  2. Is REQUEST_CHANGES check before WIP label age check, CI check, and handoff check?
  3. Does the script exit after the first matching condition (one action per run)?
  4. Does the cron's toolsAllow exclude read and memory_get (no ambient project knowledge)?
  5. Are worker task templates extracted to files so the cron prompt doesn't carry policy inline?
  6. Is the WIP stale-lock cleanup preserved (>1hr removes label and continues)?
  7. Does the set -euo pipefail ensure no partial actions on curl failures?
  8. Is there a dry-run mode or equivalent verification path before first production cron run?
  9. Does the script use latest-per-reviewer review state, not any historical state, for REQUEST_CHANGES check?
  10. Are issues #144 and #145 explicitly verifiable via the test strategy above?

Fixes for Issues #144 and #145

Issue #144 (autonomous merge): Eliminated entirely — the dispatch script contains no merge API calls and the cron's toolsAllow does not include any tool that could perform a merge. Workers are spawned with explicit exit instructions that prohibit merge calls.

Issue #145 (merged despite REQUEST_CHANGES): Eliminated by structural change — REQUEST_CHANGES check is the first check in the dispatch script, before CI, before self-review, before handoff. It cannot be "reasoned past" because there is no reasoning in the dispatcher.

## Summary The current dev-loop uses a single Haiku model call that simultaneously assesses project state, makes dispatch decisions, AND acts. This caused two production failures: - **PR #138** merged despite active `REQUEST_CHANGES` — model reasoned past the check (#145) - **PR #140** merged autonomously without human approval (#144) The fix: **dispatch becomes a pure shell script with no model reasoning.** Every decision is a boolean API check + branch. The model only spawns a worker when the script's output says to. ## Full Pre-Code Plan # Pre-Code Plan: Dev-Loop Dispatch Redesign ## Problem The current dev-loop uses a single model call (Haiku) that simultaneously assesses project state, makes decisions, AND acts. This is a logic-model-action pipeline with no hard stops. Two production failures demonstrate the failure mode: 1. **PR #138** merged despite active `REQUEST_CHANGES` from `security-review-bot` — the model reasoned past the check instead of executing it as a binary shell command. 2. **PR #140** merged autonomously without human approval — the model (or a worker it spawned) called the merge API despite an explicit prohibition in the skill. The root cause is that **dispatch is entangled with judgment**. When Haiku "reasons" about whether to merge, it can reach wrong conclusions. The only safe dispatch is a dispatch that cannot reason — one that executes boolean API checks and branches on their output. This plan redesigns dispatch to be a pure shell script with no model reasoning, and restricts models to workers that receive a precise situation description and do only the work they're given. ## Constraints - Merge is **never** called by any automated component — only Aaron merges - `REQUEST_CHANGES` from any reviewer **always** blocks — no exceptions, no reasoning past it - WIP mutex preserved — one active worker at a time per repo - Workers receive only the situation relevant to their task — no ambient project state - The dispatch script is version-controlled and auditable - The cron entry calls the script with `toolsAllow: ["exec"]` only — no model reasoning in dispatch - Must fix issues #144 and #145 - Must not break existing workers (self-review, CI fix, address-feedback, impl) ## Proposed Approach ### Architecture ``` Cron (agentTurn, toolsAllow: ["exec"]) → runs dispatch script → dispatch script: pure bash, API calls, branching Dispatch script → checks project state via curl API calls → branches to first matching condition → if worker needed: writes task file, calls sessions_spawn via exec → exits immediately after one action Workers (Sonnet, spawned by dispatch) → receive precise situation description → do one job (self-review, fix CI, address feedback, implement) → remove wip label, exit with NO_REPLY ``` ### What Changes #### 1. New: `workspace/scripts/dev-loop-dispatch.sh` A bash script with no model reasoning. Every decision is: ```bash RESULT=$(curl -s ...) if echo "$RESULT" | jq -e '<condition>' > /dev/null; then # take action exit 0 fi ``` **Full dispatch logic (in order):** ```bash #!/usr/bin/env bash # dev-loop-dispatch.sh — Pure shell dispatch. No model reasoning. # Called by cron via: exec("bash dev-loop-dispatch.sh <project>") set -euo pipefail PROJECT="${1:?Usage: dev-loop-dispatch.sh <project>}" CONFIG="memory/projects/${PROJECT}.yaml" # Load config (yq or python-yaml) TOKEN=$(cat "$(yq '.token_path' "$CONFIG")") API=$(yq '.api_base' "$CONFIG") REPO=$(yq '.repo' "$CONFIG") USER=$(yq '.user' "$CONFIG") WIP_ID=$(yq '.labels.wip' "$CONFIG") READY_ID=$(yq '.labels.ready' "$CONFIG") NODE=$(yq '.node' "$CONFIG") REVIEW_BOTS=$(yq '.review_bots[]' "$CONFIG") # newline-separated # --- Pre-check 1: WIP lock --- # Fetch open PRs with wip label WIP_PR=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/pulls?state=open&labels=$WIP_ID" | \ jq '[.[] | select(.user.login == env.USER)] | first') if [ "$WIP_PR" != "null" ] && [ -n "$WIP_PR" ]; then PR_NUM=$(echo "$WIP_PR" | jq -r '.number') # Check wip label age via timeline events LAST_WIP=$(bash scripts/get-wip-timestamp.sh "$API" "$TOKEN" "$PR_NUM") NOW=$(date +%s) WIP_TS=$(date -d "$LAST_WIP" +%s 2>/dev/null || date -j -f "%Y-%m-%dT%H:%M:%SZ" "$LAST_WIP" +%s) AGE=$(( NOW - WIP_TS )) if [ "$AGE" -lt 3600 ]; then echo "WIP active on PR #$PR_NUM (${AGE}s old) — exiting" exit 0 fi # Stale WIP — remove label curl -s -X DELETE -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/issues/$PR_NUM/labels/$WIP_ID" > /dev/null fi # --- Fetch all open PRs from bot user --- PRS=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/pulls?state=open&sort=oldest" | \ jq --arg u "$USER" '[.[] | select(.user.login == $u)]') PR_COUNT=$(echo "$PRS" | jq 'length') # --- Priority checks over each open PR --- if [ "$PR_COUNT" -gt 0 ]; then # Iterate over PRs (process first matching) for PR_JSON in $(echo "$PRS" | jq -c '.[]'); do PR_NUM=$(echo "$PR_JSON" | jq -r '.number') HEAD_SHA=$(echo "$PR_JSON" | jq -r '.head.sha') HEAD_SHORT=$(echo "$HEAD_SHA" | cut -c1-8) MERGEABLE=$(echo "$PR_JSON" | jq -r '.mergeable') # Check 1: REQUEST_CHANGES from any reviewer (ALWAYS BLOCKS) RC_COUNT=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/pulls/$PR_NUM/reviews" | \ jq '[.[] | select(.state == "REQUEST_CHANGES")] | length') if [ "$RC_COUNT" -gt 0 ]; then echo "PR #$PR_NUM blocked: $RC_COUNT REQUEST_CHANGES active — spawning findings worker" bash scripts/acquire-wip.sh "$API" "$TOKEN" "$REPO" "$PR_NUM" "$WIP_ID" bash scripts/spawn-worker.sh findings "$PROJECT" "$PR_NUM" "$HEAD_SHA" exit 0 fi # Check 2: Merge conflicts if [ "$MERGEABLE" = "false" ]; then echo "PR #$PR_NUM has merge conflicts — spawning rebase worker" bash scripts/acquire-wip.sh "$API" "$TOKEN" "$REPO" "$PR_NUM" "$WIP_ID" bash scripts/spawn-worker.sh rebase "$PROJECT" "$PR_NUM" "$HEAD_SHA" exit 0 fi # Check 3: CI status CI_STATE=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/commits/$HEAD_SHA/status" | jq -r '.state') if [ "$CI_STATE" != "success" ]; then # Check for existing fix plan HAS_PLAN=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/issues/$PR_NUM/comments" | \ jq --arg sha "$HEAD_SHA" --arg u "$USER" \ '[.[] | select(.user.login == $u and (.body | startswith("## Fix Plan against " + $sha)))] | length > 0') if [ "$HAS_PLAN" = "true" ]; then echo "PR #$PR_NUM CI failing, fix plan exists for $HEAD_SHORT — blocked" exit 0 fi echo "PR #$PR_NUM CI failing ($CI_STATE) — spawning CI fix worker" bash scripts/acquire-wip.sh "$API" "$TOKEN" "$REPO" "$PR_NUM" "$WIP_ID" bash scripts/spawn-worker.sh ci-fix "$PROJECT" "$PR_NUM" "$HEAD_SHA" exit 0 fi # Check 4: Bot reviews present (all must have posted) MISSING_BOT="" for BOT in $REVIEW_BOTS; do HAS_REVIEW=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/pulls/$PR_NUM/reviews" | \ jq --arg b "$BOT" '[.[] | select(.body | contains("<!-- review-bot:\($b) -->"))] | length > 0') if [ "$HAS_REVIEW" = "false" ]; then MISSING_BOT="$BOT" break fi done if [ -n "$MISSING_BOT" ]; then echo "PR #$PR_NUM waiting on bot review from $MISSING_BOT — exiting" exit 0 fi # Check 5: Self-review present and current SR_OK=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/issues/$PR_NUM/comments" | \ jq --arg sha "$HEAD_SHA" --arg u "$USER" \ '[.[] | select(.user.login == $u and (.body | contains("Self-review against \($sha)")) and (.body | contains("Assessment: ✅ Clean")))] | length > 0') if [ "$SR_OK" = "false" ]; then # Check if needs-attention self-review exists (has findings to fix) SR_NEEDS=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/issues/$PR_NUM/comments" | \ jq --arg sha "$HEAD_SHA" --arg u "$USER" \ '[.[] | select(.user.login == $u and (.body | contains("Self-review against \($sha)")) and (.body | contains("Assessment: ⚠️")))] | length > 0') if [ "$SR_NEEDS" = "true" ]; then HAS_PLAN=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/issues/$PR_NUM/comments" | \ jq --arg sha "$HEAD_SHA" --arg u "$USER" \ '[.[] | select(.user.login == $u and (.body | startswith("## Fix Plan against " + $sha)))] | length > 0') if [ "$HAS_PLAN" = "true" ]; then echo "PR #$PR_NUM self-review has findings, fix plan exists — worker likely running" exit 0 fi echo "PR #$PR_NUM self-review has findings — spawning fix worker" bash scripts/acquire-wip.sh "$API" "$TOKEN" "$REPO" "$PR_NUM" "$WIP_ID" bash scripts/spawn-worker.sh sr-fix "$PROJECT" "$PR_NUM" "$HEAD_SHA" exit 0 fi echo "PR #$PR_NUM needs self-review — spawning self-review worker" bash scripts/acquire-wip.sh "$API" "$TOKEN" "$REPO" "$PR_NUM" "$WIP_ID" bash scripts/spawn-worker.sh self-review "$PROJECT" "$PR_NUM" "$HEAD_SHA" exit 0 fi # Check 6: Unresolved review findings UNRESOLVED=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/pulls/$PR_NUM/comments" | \ jq '[.[] | select(.resolver == null and .in_reply_to_id == null)] | length') if [ "$UNRESOLVED" -gt 0 ]; then HAS_PLAN=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/issues/$PR_NUM/comments" | \ jq --arg sha "$HEAD_SHA" --arg u "$USER" \ '[.[] | select(.user.login == $u and (.body | startswith("## Fix Plan against " + $sha)))] | length > 0') if [ "$HAS_PLAN" = "true" ]; then echo "PR #$PR_NUM has $UNRESOLVED unresolved findings, fix plan exists — worker likely running" exit 0 fi echo "PR #$PR_NUM has $UNRESOLVED unresolved findings — spawning feedback worker" bash scripts/acquire-wip.sh "$API" "$TOKEN" "$REPO" "$PR_NUM" "$WIP_ID" bash scripts/spawn-worker.sh address-feedback "$PROJECT" "$PR_NUM" "$HEAD_SHA" exit 0 fi # Check 7: All clear — hand off (ONLY if all above pass) # Verify bot reviews are current (against HEAD_SHORT) ALL_CURRENT=true for BOT in $REVIEW_BOTS; do CURRENT=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/pulls/$PR_NUM/reviews" | \ jq --arg b "$BOT" --arg sha "$HEAD_SHORT" \ '[.[] | select(.body | contains("<!-- review-bot:\($b) -->")) | select(.body | contains("Evaluated against \($sha)"))] | length > 0') if [ "$CURRENT" = "false" ]; then ALL_CURRENT=false break fi done if [ "$ALL_CURRENT" = "false" ]; then echo "PR #$PR_NUM bot review is stale — exiting" exit 0 fi echo "PR #$PR_NUM all checks pass — applying ready label and assigning to aweiker" curl -s -X POST -H "Authorization: token $TOKEN" \ -H "Content-Type: application/json" \ -d "{\"labels\": [$READY_ID]}" \ "$API/repos/$REPO/issues/$PR_NUM/labels" > /dev/null curl -s -X PATCH -H "Authorization: token $TOKEN" \ -H "Content-Type: application/json" \ -d '{"assignees": ["aweiker"]}' \ "$API/repos/$REPO/pulls/$PR_NUM" > /dev/null echo "🔀 PR #$PR_NUM ready for review" exit 0 done fi # --- No open PRs — pick next issue --- NEXT_ISSUE=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/issues?state=open&type=issues&limit=50" | \ jq --arg u "$USER" \ '[.[] | select(.assignee == null)] | (map(select([.labels[].name] | contains(["bug"]))) + map(select([.labels[].name] | contains(["bug"]) | not))) | first') if [ "$NEXT_ISSUE" = "null" ] || [ -z "$NEXT_ISSUE" ]; then echo "No issues to work on — exiting" exit 0 fi ISSUE_NUM=$(echo "$NEXT_ISSUE" | jq -r '.number') ISSUE_TITLE=$(echo "$NEXT_ISSUE" | jq -r '.title') echo "Starting work on issue #$ISSUE_NUM: $ISSUE_TITLE" bash scripts/spawn-worker.sh impl "$PROJECT" "$ISSUE_NUM" exit 0 ``` **Key safety properties:** - `REQUEST_CHANGES` check is first — before CI, before self-review, before anything - No merge calls exist anywhere in the script - Every branch exits after one action - `set -euo pipefail` — any curl failure aborts (no silent partial state) #### 2. New: `workspace/scripts/spawn-worker.sh` Wraps `sessions_spawn` via a small Python/node script (since bash can't call the OpenClaw API directly). The dispatcher writes the worker task to a temp file, then calls `openclaw spawn` or uses the sessions API. **Alternative:** The dispatcher script exits with a structured output, and the cron's agentTurn model reads it and spawns the worker. This keeps the script pure bash and moves the one model call to spawn-time. The cron prompt becomes: ``` Run: bash ~/.openclaw/workspace/scripts/dev-loop-dispatch.sh <project> If output contains "spawning <type> worker for PR #<N>", spawn a worker: - Type: <type> - PR: #<N> - Task: [read from workspace/scripts/worker-tasks/<type>.md, substitute PR number and SHA] ``` This is cleaner: the shell script does all the API work, the model only does the one thing it can't avoid — spawning a session. The model sees only the spawn instruction, not the raw project state. #### 3. Updated: `workspace/skills/dev-loop/SKILL.md` Replace the skill with documentation pointing to the script: ```markdown # Dev Loop (Dispatch) ## Architecture Dispatch is a pure shell script. No model reasoning. Script: `workspace/scripts/dev-loop-dispatch.sh <project>` The cron runs this script. If a worker is needed, the script outputs a structured spawn instruction. The cron model reads the output and spawns the worker. The model NEVER: - Assesses project state - Makes dispatch decisions - Calls merge API The script ALWAYS: - Checks REQUEST_CHANGES before anything else - Exits after one action - Uses hard API checks, not inference ## Workers Workers are spawned by the cron model based on script output. Worker task templates: - `workspace/scripts/worker-tasks/self-review.md` - `workspace/scripts/worker-tasks/ci-fix.md` - `workspace/scripts/worker-tasks/address-feedback.md` - `workspace/scripts/worker-tasks/findings.md` - `workspace/scripts/worker-tasks/rebase.md` - `workspace/scripts/worker-tasks/impl.md` Each template is substituted with PR number, HEAD SHA, and project config values. ## Safety Invariants 1. **NEVER MERGE** — no merge API calls anywhere 2. **REQUEST_CHANGES always blocks** — checked first, before any other condition 3. **WIP mutex** — one worker per repo at a time 4. **One action per run** — script exits after first matching condition 5. **Workers exit with NO_REPLY** — no announcements, no handoff ``` #### 4. New: `workspace/scripts/worker-tasks/` directory Each worker type gets a markdown template. Variables are substituted by the cron model when spawning. Templates are identical to the current skill's worker task templates, extracted into files for single-source maintenance. #### 5. Updated: cron config Current cron entry (approximate): ```yaml - label: gargoyle-dev-loop schedule: "*/15 * * * *" prompt: "dev-loop gargoyle" model: haiku toolsAllow: [exec, sessions_spawn, read, memory_get] ``` New cron entry: ```yaml - label: gargoyle-dev-loop schedule: "*/15 * * * *" prompt: | Run: bash ~/.openclaw/workspace/scripts/dev-loop-dispatch.sh gargoyle Read the output. If it contains a spawn instruction, spawn the worker using the matching template in workspace/scripts/worker-tasks/. Substitute PR number, HEAD SHA, and project values from the output. If no spawn instruction in output, exit with NO_REPLY. model: haiku toolsAllow: [exec, sessions_spawn] ``` The cron model's job is now: run script, read output, spawn if instructed. That's it. ## State/Data Model The dispatch logic is stateless per run. State lives entirely in the Gitea API: - PR labels (wip, ready) - PR reviews (REQUEST_CHANGES state, bot sentinels) - PR CI status - Issue comments (self-review markers, fix plans) - Issue assignees No state is carried between cron runs. ## Error Cases | Error | Handling | |-------|----------| | `curl` returns non-200 | `set -euo pipefail` causes script to abort — no partial actions | | `jq` parse error | Script aborts — better to skip a run than take wrong action | | Worker crashed (wip orphaned) | Pre-check 1: stale wip removed after 1hr, next run proceeds | | Bot review missing | Exit — CI will trigger it, next run re-evaluates | | Race: two crons fire simultaneously | WIP mutex prevents double-dispatch | | Script bug causes false REQUEST_CHANGES | Worker receives detailed situation, can surface to Aaron | | `sessions_spawn` fails | Worker not spawned, wip label orphaned → cleaned up in 1hr | ## Edge Cases - **PR assigned to non-bot user:** Only PRs from `USER` (bot) are checked — human PRs are ignored. - **Multiple open PRs:** Script checks PRs in oldest-first order. First matching PR wins. - **REQUEST_CHANGES from a human reviewer, not just bots:** The check is on review state, not reviewer identity — human `REQUEST_CHANGES` also blocks. This is correct behavior. - **REQUEST_CHANGES that was later superseded by APPROVED from same reviewer:** Gitea API returns all reviews. Check is `state == REQUEST_CHANGES` on any review. If reviewer re-approved, their latest review is APPROVED, not REQUEST_CHANGES. Need to check latest review per reviewer, not any review. *(Open question — see below.)* - **Bot sentinel in review body but state is REQUEST_CHANGES:** This can happen if a bot re-reviews with changes. Sentinel presence ≠ approval. State check must be separate from sentinel check. - **WIP label applied via Gitea UI by Aaron:** Script sees it as WIP, waits 1hr before clearing. May cause unexpected delays. Low risk. ## Testing Strategy ### Unit tests for dispatch logic Extract each check into a small bash function that can be tested with mock curl responses. Use `bats` (Bash Automated Testing System): ```bash # test/dispatch.bats @test "REQUEST_CHANGES blocks before all other checks" { # Mock curl to return a PR with REQUEST_CHANGES # Assert: worker spawned, script exits 0 # Assert: no ready label applied } @test "WIP age < 1hr exits without action" { ... } @test "WIP age > 1hr removes label and continues" { ... } @test "CI failure with existing plan does not respawn" { ... } @test "All green with current reviews triggers handoff" { ... } @test "REQUEST_CHANGES on PR2 processed before clean PR1" { ... } ``` ### Integration smoke test After deploying, run the script against a real project in dry-run mode: ```bash DRY_RUN=1 bash dev-loop-dispatch.sh review-bot ``` Dry-run mode prints what it *would* do without calling spawn or making label mutations. ### Manual verification of #144 and #145 fixes 1. Create a test PR with an active `REQUEST_CHANGES` review 2. Trigger the dispatch script 3. Verify: no merge called, findings worker spawned 4. Verify: WIP label present on PR during worker run 5. Verify: after worker completes, WIP label removed ### Regression test for autonomous merge 1. Trace through script for a PR in "all green" state 2. Confirm: only `ready` label + `assignees` PATCH calls are made 3. Confirm: no `POST /repos/.../pulls/.../merge` call anywhere in script or worker templates ## Open Questions 1. **REQUEST_CHANGES latest-only vs any:** If a reviewer posts `REQUEST_CHANGES`, then later posts `APPROVED`, does the current skill's check (any review with `REQUEST_CHANGES` state) correctly reflect the resolved state? The Gitea API returns individual reviews, not a per-reviewer summary. We should check the **latest** review per reviewer. Current SKILL.md doesn't specify this; the script should implement latest-per-reviewer logic explicitly. 2. **Spawn mechanism:** The dispatch script can't call `sessions_spawn` directly from bash. Two options: - (A) Script outputs structured spawn instruction, cron model reads and spawns - (B) Script calls a small helper (`openclaw sessions spawn <task-file>`) if such a CLI exists Option A is cleaner for auditability — the spawn instruction is visible in exec output. Preference? 3. **Worker task templates:** Should worker task templates live in `workspace/scripts/worker-tasks/` (workspace, not version-controlled in a repo) or in the `review-bot` or a dedicated `ops-gateway` repo? The workspace location keeps them co-located with the dispatch script; a repo gives them a change history independent of the workspace. 4. **Cron toolsAllow for review-bot vs gargoyle:** The new cron needs `exec` and `sessions_spawn`. Currently the skill allows more tools (`read`, `memory_get`). Should we tighten all dev-loop crons or only new ones? 5. **Dry-run mode:** Is `DRY_RUN=1` flag worth implementing for the script, or is the test environment (staging project) sufficient for verification? ## Completion Checklist 1. Does the dispatch script contain zero calls to any merge API endpoint? 2. Is `REQUEST_CHANGES` check before WIP label age check, CI check, and handoff check? 3. Does the script exit after the first matching condition (one action per run)? 4. Does the cron's `toolsAllow` exclude `read` and `memory_get` (no ambient project knowledge)? 5. Are worker task templates extracted to files so the cron prompt doesn't carry policy inline? 6. Is the WIP stale-lock cleanup preserved (>1hr removes label and continues)? 7. Does the `set -euo pipefail` ensure no partial actions on curl failures? 8. Is there a dry-run mode or equivalent verification path before first production cron run? 9. Does the script use latest-per-reviewer review state, not any historical state, for `REQUEST_CHANGES` check? 10. Are issues #144 and #145 explicitly verifiable via the test strategy above? ## Fixes for Issues #144 and #145 **Issue #144** (autonomous merge): Eliminated entirely — the dispatch script contains no merge API calls and the cron's `toolsAllow` does not include any tool that could perform a merge. Workers are spawned with explicit exit instructions that prohibit merge calls. **Issue #145** (merged despite REQUEST_CHANGES): Eliminated by structural change — `REQUEST_CHANGES` check is the **first** check in the dispatch script, before CI, before self-review, before handoff. It cannot be "reasoned past" because there is no reasoning in the dispatcher.
Author
Owner

Pre-Code Plan v2 (post-review)

This is the revised plan after running the parallel review panel.

Review Findings Addressed

Logic contradictions found and resolved:

  • toolsAllow: ["exec"] vs [exec, sessions_spawn] — contradiction resolved: cron needs both; "no model reasoning" means model only parses SPAWN_WORKER output, not raw project state
  • "REQUEST_CHANGES always blocks" language clarified to "always blocks handoff" — spawning a findings worker is the correct response
  • findings worker type was referenced but had no template — template now defined
  • Open Question #1 (latest-per-reviewer) was unresolved in checklist — now decided: use group_by(.user.login) | map(sort_by(.submitted_at) | last)
  • spawn-worker.sh can't call OpenClaw sessions API from bash — resolved via structured SPAWN_WORKER output + cron model parsing

Pattern check:

  • yq dependency on execution host (hardy) noted and added as pre-deploy verification step
  • Dry-run mode added as concrete implementation (not just open question)

Pre-Code Plan: Dev-Loop Dispatch Redesign (v2 — post-review)

Problem

The current dev-loop uses a single model call (Haiku) that simultaneously assesses project state, makes decisions, AND acts. This is a logic-model-action pipeline with no hard stops. Two production failures demonstrate the failure mode:

  1. PR #138 merged despite active REQUEST_CHANGES from security-review-bot — model reasoned past the check instead of executing it as a binary API query (#145).
  2. PR #140 merged autonomously without human approval — a worker called the merge API despite explicit prohibition (#144).

The root cause: dispatch is entangled with judgment. When a model "reasons" about whether to merge, it can reach wrong conclusions. The only safe dispatch is one that cannot reason — boolean API checks, deterministic branching.

Constraints

  • Merge is never called by any automated component — only Aaron merges
  • REQUEST_CHANGES from any reviewer always blocks handoff — no exceptions, no reasoning past it
  • WIP mutex preserved — one active worker at a time per repo
  • The dispatch shell script does all API checks; the cron model only receives a pre-formed spawn instruction
  • Workers receive only the situation relevant to their task — no ambient project state
  • The dispatch script is version-controlled and auditable
  • Must fix issues #144 and bug: dev-loop merged PR #138 despite active REQUEST_CHANGES from security-review-bot (#145)
  • Must not break existing workers
  • yq must be available on the execution host (hardy) — verify before implementation

Resolved Open Questions (from v1 review)

Q1: REQUEST_CHANGES latest-per-reviewer or any historical?
Use latest-per-reviewer. If a reviewer posted REQUEST_CHANGES then later APPROVED, the APPROVED takes precedence. The script will group reviews by reviewer, pick the latest per reviewer by submitted_at, and check if any reviewer's latest review has state == REQUEST_CHANGES. A historical REQUEST_CHANGES from a reviewer who has since approved must NOT block.

Q2: Spawn mechanism?
Use Option A: The script outputs a structured spawn instruction line like:

SPAWN_WORKER: type=self-review pr=42 sha=abc12345

The cron model (toolsAllow: [exec, sessions_spawn]) reads this line and spawns the appropriate worker. The cron prompt describes how to parse this line and which template to use. This keeps the shell script pure bash with no OpenClaw API calls.

Q3: Worker task templates location?
Live in ~/.openclaw/workspace/scripts/worker-tasks/ (workspace). This co-locates them with the dispatch script and keeps them out of application repos. The workspace itself is git-tracked via the ops-gateway or workspace repo.

Q4: Tighten toolsAllow on all dev-loop crons?
Yes — tighten all. Any cron running the new dispatch script needs only [exec, sessions_spawn]. Existing crons not yet migrated keep their current config until migrated.

Q5: Dry-run mode?
Implement DRY_RUN=1 flag: print all curl commands and spawn instructions without executing them. Required before first production run.

Proposed Approach

Architecture

Cron (agentTurn)
  model: haiku
  toolsAllow: [exec, sessions_spawn]
  prompt: run dispatch script; if output contains SPAWN_WORKER line, spawn that worker

Dispatch script (pure bash)
  → curl API checks
  → deterministic branching
  → if worker needed: print SPAWN_WORKER: type=X pr=N sha=S to stdout
  → exit 0

Cron model (minimal role)
  → reads script stdout
  → if SPAWN_WORKER present: reads worker-tasks/<type>.md, substitutes values, spawns
  → if no SPAWN_WORKER: exits with NO_REPLY

Workers (Sonnet, spawned by cron model)
  → receive precise situation description
  → do one job (self-review, fix CI, address feedback, implement)
  → remove wip label, exit with NO_REPLY

Dispatch Script Logic (in order)

All checks are bash functions using curl | jq. No model reasoning.

#!/usr/bin/env bash
# dev-loop-dispatch.sh — Pure shell dispatch. No model reasoning.
# Usage: bash dev-loop-dispatch.sh <project> [--dry-run]

set -euo pipefail
PROJECT="${1:?Usage: dev-loop-dispatch.sh <project>}"
DRY_RUN="${2:-}"
CONFIG="${WORKSPACE:-$HOME/.openclaw/workspace}/memory/projects/${PROJECT}.yaml"

# Load config via yq (must be installed)
TOKEN=$(cat "$(yq '.token_path' "$CONFIG")")
API=$(yq '.api_base' "$CONFIG")
REPO=$(yq '.repo' "$CONFIG")
USER=$(yq '.user' "$CONFIG")
WIP_ID=$(yq '.labels.wip' "$CONFIG")
READY_ID=$(yq '.labels.ready' "$CONFIG")
REVIEW_BOTS=$(yq '.review_bots[]' "$CONFIG")  # one per line

# Helper: emit spawn instruction (dry-run prints only)
spawn_worker() {
  local TYPE="$1" PR="$2" SHA="${3:-none}"
  if [ -n "$DRY_RUN" ]; then
    echo "DRY_RUN: SPAWN_WORKER: type=$TYPE pr=$PR sha=$SHA"
  else
    echo "SPAWN_WORKER: type=$TYPE pr=$PR sha=$SHA"
  fi
  exit 0
}

# Helper: make label mutation (dry-run prints only)
api_mutate() {
  local METHOD="$1" URL="$2" DATA="${3:-}"
  if [ -n "$DRY_RUN" ]; then
    echo "DRY_RUN: $METHOD $URL ${DATA:+data=$DATA}"
    return
  fi
  if [ -n "$DATA" ]; then
    curl -s -X "$METHOD" -H "Authorization: token $TOKEN" \
      -H "Content-Type: application/json" -d "$DATA" "$URL" > /dev/null
  else
    curl -s -X "$METHOD" -H "Authorization: token $TOKEN" "$URL" > /dev/null
  fi
}

# --- Pre-check 1: WIP lock ---
WIP_PRS=$(curl -s -H "Authorization: token $TOKEN" \
  "$API/repos/$REPO/pulls?state=open" | \
  jq --arg u "$USER" --arg wid "$WIP_ID" \
  '[.[] | select(.user.login == $u and ([.labels[].id | tostring] | contains([$wid])))]')

WIP_PR=$(echo "$WIP_PRS" | jq 'first // empty')
if [ -n "$WIP_PR" ]; then
  PR_NUM=$(echo "$WIP_PR" | jq -r '.number')
  # Get wip label application time via timeline (paginated)
  LAST_WIP=$(bash "$(dirname "$0")/get-wip-timestamp.sh" "$API" "$TOKEN" "$PR_NUM" "$WIP_ID")
  NOW=$(date +%s)
  WIP_TS=$(date -d "$LAST_WIP" +%s 2>/dev/null || python3 -c "import datetime,sys; print(int(datetime.datetime.fromisoformat(sys.argv[1].replace('Z','+00:00')).timestamp()))" "$LAST_WIP")
  AGE=$(( NOW - WIP_TS ))
  if [ "$AGE" -lt 3600 ]; then
    echo "WIP active on PR #$PR_NUM (${AGE}s old) — exiting"
    exit 0
  fi
  # Stale WIP — remove label
  api_mutate DELETE "$API/repos/$REPO/issues/$PR_NUM/labels/$WIP_ID"
  echo "Removed stale WIP from PR #$PR_NUM (${AGE}s old)"
fi

# --- Fetch all open PRs from bot user ---
PRS=$(curl -s -H "Authorization: token $TOKEN" \
  "$API/repos/$REPO/pulls?state=open&sort=oldest" | \
  jq --arg u "$USER" '[.[] | select(.user.login == $u)]')
PR_COUNT=$(echo "$PRS" | jq 'length')

# --- Per-PR priority checks ---
if [ "$PR_COUNT" -gt 0 ]; then
  while IFS= read -r PR_JSON; do
    PR_NUM=$(echo "$PR_JSON" | jq -r '.number')
    HEAD_SHA=$(echo "$PR_JSON" | jq -r '.head.sha')
    HEAD_SHORT=$(echo "$HEAD_SHA" | cut -c1-8)
    MERGEABLE=$(echo "$PR_JSON" | jq -r '.mergeable')

    # ---- Check 1: REQUEST_CHANGES (latest per reviewer — always blocks handoff) ----
    REVIEWS=$(curl -s -H "Authorization: token $TOKEN" \
      "$API/repos/$REPO/pulls/$PR_NUM/reviews")
    # Group by reviewer, pick latest review per reviewer by submitted_at
    RC_ACTIVE=$(echo "$REVIEWS" | jq '
      group_by(.user.login) |
      map(sort_by(.submitted_at) | last) |
      [.[] | select(.state == "REQUEST_CHANGES")] | length')
    if [ "$RC_ACTIVE" -gt 0 ]; then
      echo "PR #$PR_NUM: $RC_ACTIVE active REQUEST_CHANGES (latest per reviewer)"
      api_mutate POST "$API/repos/$REPO/issues/$PR_NUM/labels" "{\"labels\": [$WIP_ID]}"
      spawn_worker findings "$PR_NUM" "$HEAD_SHA"
    fi

    # ---- Check 2: Merge conflicts ----
    if [ "$MERGEABLE" = "false" ]; then
      echo "PR #$PR_NUM: merge conflicts"
      api_mutate POST "$API/repos/$REPO/issues/$PR_NUM/labels" "{\"labels\": [$WIP_ID]}"
      spawn_worker rebase "$PR_NUM" "$HEAD_SHA"
    fi

    # ---- Check 3: CI status ----
    CI_STATE=$(curl -s -H "Authorization: token $TOKEN" \
      "$API/repos/$REPO/commits/$HEAD_SHA/status" | jq -r '.state')
    if [ "$CI_STATE" != "success" ]; then
      HAS_PLAN=$(curl -s -H "Authorization: token $TOKEN" \
        "$API/repos/$REPO/issues/$PR_NUM/comments" | \
        jq --arg sha "$HEAD_SHA" --arg u "$USER" \
        '[.[] | select(.user.login == $u and (.body | startswith("## Fix Plan against " + $sha)))] | length > 0')
      if [ "$HAS_PLAN" = "true" ]; then
        echo "PR #$PR_NUM: CI failing, fix plan exists — skipping"
        exit 0
      fi
      echo "PR #$PR_NUM: CI $CI_STATE — fix worker"
      api_mutate POST "$API/repos/$REPO/issues/$PR_NUM/labels" "{\"labels\": [$WIP_ID]}"
      spawn_worker ci-fix "$PR_NUM" "$HEAD_SHA"
    fi

    # ---- Check 4: Bot reviews present ----
    MISSING_BOT=""
    while IFS= read -r BOT; do
      HAS=$(echo "$REVIEWS" | jq --arg b "$BOT" \
        '[.[] | select(.body | contains("<!-- review-bot:\($b) -->"))] | length > 0')
      if [ "$HAS" = "false" ]; then
        MISSING_BOT="$BOT"
        break
      fi
    done <<< "$REVIEW_BOTS"
    if [ -n "$MISSING_BOT" ]; then
      echo "PR #$PR_NUM: waiting on $MISSING_BOT — exiting"
      exit 0
    fi

    # ---- Check 5: Self-review ----
    COMMENTS=$(curl -s -H "Authorization: token $TOKEN" \
      "$API/repos/$REPO/issues/$PR_NUM/comments")
    SR_CLEAN=$(echo "$COMMENTS" | jq --arg sha "$HEAD_SHA" --arg u "$USER" \
      '[.[] | select(.user.login == $u
        and (.body | contains("Self-review against \($sha)"))
        and (.body | contains("Assessment: ✅ Clean")))] | length > 0')
    if [ "$SR_CLEAN" = "false" ]; then
      SR_NEEDS=$(echo "$COMMENTS" | jq --arg sha "$HEAD_SHA" --arg u "$USER" \
        '[.[] | select(.user.login == $u
          and (.body | contains("Self-review against \($sha)"))
          and (.body | contains("Assessment: ⚠️")))] | length > 0')
      if [ "$SR_NEEDS" = "true" ]; then
        HAS_PLAN=$(echo "$COMMENTS" | jq --arg sha "$HEAD_SHA" --arg u "$USER" \
          '[.[] | select(.user.login == $u and (.body | startswith("## Fix Plan against " + $sha)))] | length > 0')
        [ "$HAS_PLAN" = "true" ] && { echo "PR #$PR_NUM: SR findings, fix plan exists — skipping"; exit 0; }
        api_mutate POST "$API/repos/$REPO/issues/$PR_NUM/labels" "{\"labels\": [$WIP_ID]}"
        spawn_worker sr-fix "$PR_NUM" "$HEAD_SHA"
      fi
      echo "PR #$PR_NUM: needs self-review"
      api_mutate POST "$API/repos/$REPO/issues/$PR_NUM/labels" "{\"labels\": [$WIP_ID]}"
      spawn_worker self-review "$PR_NUM" "$HEAD_SHA"
    fi

    # ---- Check 6: Unresolved inline review comments ----
    UNRESOLVED=$(curl -s -H "Authorization: token $TOKEN" \
      "$API/repos/$REPO/pulls/$PR_NUM/comments" | \
      jq '[.[] | select(.resolver == null and .in_reply_to_id == null)] | length')
    if [ "$UNRESOLVED" -gt 0 ]; then
      HAS_PLAN=$(echo "$COMMENTS" | jq --arg sha "$HEAD_SHA" --arg u "$USER" \
        '[.[] | select(.user.login == $u and (.body | startswith("## Fix Plan against " + $sha)))] | length > 0')
      [ "$HAS_PLAN" = "true" ] && { echo "PR #$PR_NUM: $UNRESOLVED unresolved, plan exists — skipping"; exit 0; }
      api_mutate POST "$API/repos/$REPO/issues/$PR_NUM/labels" "{\"labels\": [$WIP_ID]}"
      spawn_worker address-feedback "$PR_NUM" "$HEAD_SHA"
    fi

    # ---- Check 7: Bot reviews current (stale check) ----
    ALL_CURRENT=true
    while IFS= read -r BOT; do
      CURRENT=$(echo "$REVIEWS" | jq --arg b "$BOT" --arg sha "$HEAD_SHORT" \
        '[.[] | select(.body | contains("<!-- review-bot:\($b) -->"))
               | select(.body | contains("Evaluated against \($sha)"))] | length > 0')
      [ "$CURRENT" = "false" ] && { ALL_CURRENT=false; break; }
    done <<< "$REVIEW_BOTS"
    if [ "$ALL_CURRENT" = "false" ]; then
      echo "PR #$PR_NUM: bot review stale — exiting"
      exit 0
    fi

    # ---- All clear: hand off ----
    echo "PR #$PR_NUM: all checks pass — applying ready + assigning to aweiker"
    api_mutate POST "$API/repos/$REPO/issues/$PR_NUM/labels" "{\"labels\": [$READY_ID]}"
    api_mutate PATCH "$API/repos/$REPO/pulls/$PR_NUM" '{"assignees": ["aweiker"]}'
    echo "🔀 PR #$PR_NUM ready for review"
    exit 0

  done < <(echo "$PRS" | jq -c '.[]')
fi

# --- No open PRs — pick next issue ---
NEXT_ISSUE=$(curl -s -H "Authorization: token $TOKEN" \
  "$API/repos/$REPO/issues?state=open&type=issues&limit=50" | \
  jq '[.[] | select(.assignee == null)] |
      (map(select([.labels[].name] | contains(["bug"]))) +
       map(select([.labels[].name] | contains(["bug"]) | not))) |
      first // empty')

if [ -z "$NEXT_ISSUE" ]; then
  echo "No issues to work on — exiting"
  exit 0
fi

ISSUE_NUM=$(echo "$NEXT_ISSUE" | jq -r '.number')
echo "SPAWN_WORKER: type=impl issue=$ISSUE_NUM"
exit 0

What Changes

Component Change
workspace/scripts/dev-loop-dispatch.sh New — pure bash dispatch script
workspace/scripts/get-wip-timestamp.sh New — helper to get WIP label application time via timeline
workspace/scripts/worker-tasks/findings.md New — task template for REQUEST_CHANGES findings worker
workspace/scripts/worker-tasks/rebase.md New — task template for merge conflict rebase worker
workspace/scripts/worker-tasks/ci-fix.md Extract from skill → file
workspace/scripts/worker-tasks/self-review.md Extract from skill → file
workspace/scripts/worker-tasks/sr-fix.md Extract from skill → file
workspace/scripts/worker-tasks/address-feedback.md Extract from skill → file
workspace/scripts/worker-tasks/impl.md Extract from skill → file
workspace/skills/dev-loop/SKILL.md Replace body — point to script, document architecture
Cron config (gargoyle-dev-loop, review-bot-dev-loop) Update prompt + toolsAllow

Cron Config (new)

- label: gargoyle-dev-loop
  schedule: "*/15 * * * *"
  model: haiku
  toolsAllow: [exec, sessions_spawn]
  prompt: |
    Run this command:
      bash ~/.openclaw/workspace/scripts/dev-loop-dispatch.sh gargoyle

    Read the output.
    - If it contains a line starting with "SPAWN_WORKER:", parse it:
        type=<worker-type>  pr=<number>  sha=<sha>  (or issue=<number> for impl)
    - Read the matching template from:
        ~/.openclaw/workspace/scripts/worker-tasks/<type>.md
    - Substitute {{PR}}, {{SHA}}, {{PROJECT}} with the parsed values
    - Spawn the worker using sessions_spawn
    - Exit with: NO_REPLY

    - If no "SPAWN_WORKER:" line in output: exit with NO_REPLY.
    - If output contains "🔀": exit with that line only (handoff notification).
    
    DO NOT interpret the output beyond these rules.
    DO NOT read project config files.
    DO NOT make any other decisions.

New findings Worker Task Template

This is a missing piece identified in review. The findings worker handles PRs with active REQUEST_CHANGES:

# Worker: findings (REQUEST_CHANGES present)

Work on {{PROJECT}} PR #{{PR}}: address REQUEST_CHANGES findings.

Project config: memory/projects/{{PROJECT}}.yaml
Token path: <from config>
API base: <from config>
Repo: <from config>
Node: <from config>

## Step 1: Fetch all REQUEST_CHANGES reviews (latest per reviewer)

TOKEN=$(cat <token_path>)
curl -s -H "Authorization: token $TOKEN" \
  "<api>/repos/<repo>/pulls/{{PR}}/reviews" | \
  jq 'group_by(.user.login) | map(sort_by(.submitted_at) | last) |
      [.[] | select(.state == "REQUEST_CHANGES")]'

## Step 2: Fix Planning

1. Read each REQUEST_CHANGES review body for findings
2. Read ~/.openclaw/workspace/skills/pre-code/SKILL.md
3. Write a fix plan addressing each finding
4. Post the plan as a comment on PR #{{PR}}:
   - Must start with: "## Fix Plan against {{SHA}}:"
5. DO NOT write code until plan comment is confirmed posted

## Step 3: Implement (same as address-feedback worker)
[... follow address-feedback template from here ...]

## Exit
1. Remove WIP label
2. Exit with: NO_REPLY
3. DO NOT merge. DO NOT apply ready label. DO NOT assign to aweiker.

State/Data Model

The dispatch logic is stateless per run. State lives entirely in the Gitea API:

  • PR labels (wip, ready)
  • PR reviews (REQUEST_CHANGES state per reviewer — latest only, bot sentinels, SHA evaluation markers)
  • PR CI status
  • Issue comments (self-review markers by SHA, fix plan comments by SHA)
  • Issue assignees

No state file is written between runs.

Error Cases

Error Handling
curl non-200 set -euo pipefail aborts — no partial action
jq parse error Script aborts — safe
yq not installed Script aborts at config load — must verify before deploy
Worker crashed (wip orphaned) 1hr stale cleanup picks it up
Two crons fire simultaneously WIP mutex (first sets label, second sees it and exits)
sessions_spawn fails in cron model WIP label set but no worker — 1hr cleanup resolves
Cron model misparses SPAWN_WORKER line Worker not spawned — no mutation, next run will retry

Note: WIP label being set before sessions_spawn creates a small window where WIP is set but spawn fails. This is intentional — it's safer to have a stale WIP (cleaned up in 1hr) than to have two workers running. The alternative (set WIP after spawn succeeds) creates a race where two cron runs could both spawn workers before either sets the WIP label.

Edge Cases

  • Multiple open PRs: Script checks in oldest-first order. The first PR to match a non-CI-waiting condition is acted on. CI-waiting (missing bot review) causes immediate exit for that run — subsequent PRs not checked this run. Decision: this is acceptable; each run handles one thing.
  • REQUEST_CHANGES from reviewer who later APPROVED: Latest-per-reviewer logic handles this — APPROVED supersedes the old REQUEST_CHANGES. No false block.
  • REQUEST_CHANGES AND unresolved inline comments: Check 1 fires first (RC) → findings worker handles both review-level and inline findings.
  • Bot sentinel in REQUEST_CHANGES review body: A bot posts RC with its sentinel. Sentinel check (Check 4) and RC check (Check 1) are independent. RC check uses .state == "REQUEST_CHANGES" on latest review, not sentinel. They don't interfere.
  • WIP label applied manually by Aaron: Script sees it as WIP, waits 1hr. Aaron can remove it manually if needed.
  • Cron fires while handoff (ready label + assign) is in flight: Two api_mutate calls; if the second (assign) fails, ready label is set but PR not assigned. Next run hits Check 7 (all clear) again — applies ready label again (idempotent) and assigns. Safe.

Testing Strategy

Pre-deploy: yq availability check

ssh ubuntu@hardy "yq --version"

Unit tests (bats)

@test "REQUEST_CHANGES blocks — spawns findings worker" { ... }
@test "REQUEST_CHANGES latest-per-reviewer: superseded by APPROVED does not block" { ... }
@test "WIP age < 1hr exits" { ... }
@test "WIP age > 1hr removes label and continues" { ... }
@test "CI failing with existing plan exits without spawn" { ... }
@test "All checks pass → handoff, no merge" { ... }
@test "DRY_RUN=1 prints all actions without executing" { ... }

Dry-run smoke test

DRY_RUN=1 bash dev-loop-dispatch.sh review-bot
# Verify output shows correct SPAWN_WORKER or exit reason

Manual regression for #144 and #145

  1. Create test PR with REQUEST_CHANGES review
  2. DRY_RUN=1: verify output shows SPAWN_WORKER: type=findings
  3. Grep entire workspace for POST .*/merge: should find zero results
  4. Grep all worker templates for merge API: should find zero results

Completion Checklist

  1. Does dispatch script contain zero calls to any merge API endpoint?
  2. Is REQUEST_CHANGES check (Check 1) before WIP cleanup, CI check, and handoff?
  3. Does script exit after first matching condition?
  4. Does cron toolsAllow contain only [exec, sessions_spawn]?
  5. Are all worker task templates in workspace/scripts/worker-tasks/ files?
  6. Is findings worker template defined (gap from v1)?
  7. WIP stale-lock cleanup preserved (1hr threshold, timeline events)?
  8. set -euo pipefail present?
  9. DRY_RUN=1 flag implemented?
  10. Latest-per-reviewer REQUEST_CHANGES logic (jq group_by + sort_by(.submitted_at) | last)?
  11. yq availability verified before deploy?
  12. Issues #144 and #145 verifiable via test strategy?

Fixes for Issues #144 and #145

Issue #144 (autonomous merge): Eliminated structurally. The dispatch script has zero merge API calls. The cron prompt explicitly prohibits interpretation beyond SPAWN_WORKER lines. Workers exit with NO_REPLY and are prohibited from merging in their templates. No code path from cron → script → worker reaches a merge endpoint.

Issue #145 (merged despite REQUEST_CHANGES): Eliminated structurally. The dispatch script checks REQUEST_CHANGES as Check 1 in the PR loop — before CI, self-review, and handoff. The check uses group_by(.user.login) | map(sort_by(.submitted_at) | last) | [.[] | select(.state == "REQUEST_CHANGES")] — latest review per reviewer, hard boolean. Cannot be reasoned past because no model executes this check.

## Pre-Code Plan v2 (post-review) This is the revised plan after running the parallel review panel. ### Review Findings Addressed **Logic contradictions found and resolved:** - `toolsAllow: ["exec"]` vs `[exec, sessions_spawn]` — contradiction resolved: cron needs both; "no model reasoning" means model only parses SPAWN_WORKER output, not raw project state - "REQUEST_CHANGES always blocks" language clarified to "always blocks **handoff**" — spawning a findings worker is the correct response - `findings` worker type was referenced but had no template — template now defined - Open Question #1 (latest-per-reviewer) was unresolved in checklist — now decided: use `group_by(.user.login) | map(sort_by(.submitted_at) | last)` - `spawn-worker.sh` can't call OpenClaw sessions API from bash — resolved via structured SPAWN_WORKER output + cron model parsing **Pattern check:** - `yq` dependency on execution host (hardy) noted and added as pre-deploy verification step - Dry-run mode added as concrete implementation (not just open question) --- # Pre-Code Plan: Dev-Loop Dispatch Redesign (v2 — post-review) ## Problem The current dev-loop uses a single model call (Haiku) that simultaneously assesses project state, makes decisions, AND acts. This is a logic-model-action pipeline with no hard stops. Two production failures demonstrate the failure mode: 1. **PR #138** merged despite active `REQUEST_CHANGES` from `security-review-bot` — model reasoned past the check instead of executing it as a binary API query (#145). 2. **PR #140** merged autonomously without human approval — a worker called the merge API despite explicit prohibition (#144). The root cause: **dispatch is entangled with judgment**. When a model "reasons" about whether to merge, it can reach wrong conclusions. The only safe dispatch is one that cannot reason — boolean API checks, deterministic branching. ## Constraints - Merge is **never** called by any automated component — only Aaron merges - `REQUEST_CHANGES` from any reviewer **always blocks handoff** — no exceptions, no reasoning past it - WIP mutex preserved — one active worker at a time per repo - The dispatch shell script does all API checks; the cron model only receives a pre-formed spawn instruction - Workers receive only the situation relevant to their task — no ambient project state - The dispatch script is version-controlled and auditable - Must fix issues #144 and #145 - Must not break existing workers - `yq` must be available on the execution host (hardy) — verify before implementation ## Resolved Open Questions (from v1 review) **Q1: REQUEST_CHANGES latest-per-reviewer or any historical?** Use **latest-per-reviewer**. If a reviewer posted REQUEST_CHANGES then later APPROVED, the APPROVED takes precedence. The script will group reviews by reviewer, pick the latest per reviewer by `submitted_at`, and check if any reviewer's latest review has `state == REQUEST_CHANGES`. A historical REQUEST_CHANGES from a reviewer who has since approved must NOT block. **Q2: Spawn mechanism?** Use **Option A**: The script outputs a structured spawn instruction line like: ``` SPAWN_WORKER: type=self-review pr=42 sha=abc12345 ``` The cron model (`toolsAllow: [exec, sessions_spawn]`) reads this line and spawns the appropriate worker. The cron prompt describes how to parse this line and which template to use. This keeps the shell script pure bash with no OpenClaw API calls. **Q3: Worker task templates location?** Live in `~/.openclaw/workspace/scripts/worker-tasks/` (workspace). This co-locates them with the dispatch script and keeps them out of application repos. The workspace itself is git-tracked via the ops-gateway or workspace repo. **Q4: Tighten toolsAllow on all dev-loop crons?** Yes — tighten all. Any cron running the new dispatch script needs only `[exec, sessions_spawn]`. Existing crons not yet migrated keep their current config until migrated. **Q5: Dry-run mode?** Implement `DRY_RUN=1` flag: print all curl commands and spawn instructions without executing them. Required before first production run. ## Proposed Approach ### Architecture ``` Cron (agentTurn) model: haiku toolsAllow: [exec, sessions_spawn] prompt: run dispatch script; if output contains SPAWN_WORKER line, spawn that worker Dispatch script (pure bash) → curl API checks → deterministic branching → if worker needed: print SPAWN_WORKER: type=X pr=N sha=S to stdout → exit 0 Cron model (minimal role) → reads script stdout → if SPAWN_WORKER present: reads worker-tasks/<type>.md, substitutes values, spawns → if no SPAWN_WORKER: exits with NO_REPLY Workers (Sonnet, spawned by cron model) → receive precise situation description → do one job (self-review, fix CI, address feedback, implement) → remove wip label, exit with NO_REPLY ``` ### Dispatch Script Logic (in order) All checks are bash functions using `curl | jq`. No model reasoning. ```bash #!/usr/bin/env bash # dev-loop-dispatch.sh — Pure shell dispatch. No model reasoning. # Usage: bash dev-loop-dispatch.sh <project> [--dry-run] set -euo pipefail PROJECT="${1:?Usage: dev-loop-dispatch.sh <project>}" DRY_RUN="${2:-}" CONFIG="${WORKSPACE:-$HOME/.openclaw/workspace}/memory/projects/${PROJECT}.yaml" # Load config via yq (must be installed) TOKEN=$(cat "$(yq '.token_path' "$CONFIG")") API=$(yq '.api_base' "$CONFIG") REPO=$(yq '.repo' "$CONFIG") USER=$(yq '.user' "$CONFIG") WIP_ID=$(yq '.labels.wip' "$CONFIG") READY_ID=$(yq '.labels.ready' "$CONFIG") REVIEW_BOTS=$(yq '.review_bots[]' "$CONFIG") # one per line # Helper: emit spawn instruction (dry-run prints only) spawn_worker() { local TYPE="$1" PR="$2" SHA="${3:-none}" if [ -n "$DRY_RUN" ]; then echo "DRY_RUN: SPAWN_WORKER: type=$TYPE pr=$PR sha=$SHA" else echo "SPAWN_WORKER: type=$TYPE pr=$PR sha=$SHA" fi exit 0 } # Helper: make label mutation (dry-run prints only) api_mutate() { local METHOD="$1" URL="$2" DATA="${3:-}" if [ -n "$DRY_RUN" ]; then echo "DRY_RUN: $METHOD $URL ${DATA:+data=$DATA}" return fi if [ -n "$DATA" ]; then curl -s -X "$METHOD" -H "Authorization: token $TOKEN" \ -H "Content-Type: application/json" -d "$DATA" "$URL" > /dev/null else curl -s -X "$METHOD" -H "Authorization: token $TOKEN" "$URL" > /dev/null fi } # --- Pre-check 1: WIP lock --- WIP_PRS=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/pulls?state=open" | \ jq --arg u "$USER" --arg wid "$WIP_ID" \ '[.[] | select(.user.login == $u and ([.labels[].id | tostring] | contains([$wid])))]') WIP_PR=$(echo "$WIP_PRS" | jq 'first // empty') if [ -n "$WIP_PR" ]; then PR_NUM=$(echo "$WIP_PR" | jq -r '.number') # Get wip label application time via timeline (paginated) LAST_WIP=$(bash "$(dirname "$0")/get-wip-timestamp.sh" "$API" "$TOKEN" "$PR_NUM" "$WIP_ID") NOW=$(date +%s) WIP_TS=$(date -d "$LAST_WIP" +%s 2>/dev/null || python3 -c "import datetime,sys; print(int(datetime.datetime.fromisoformat(sys.argv[1].replace('Z','+00:00')).timestamp()))" "$LAST_WIP") AGE=$(( NOW - WIP_TS )) if [ "$AGE" -lt 3600 ]; then echo "WIP active on PR #$PR_NUM (${AGE}s old) — exiting" exit 0 fi # Stale WIP — remove label api_mutate DELETE "$API/repos/$REPO/issues/$PR_NUM/labels/$WIP_ID" echo "Removed stale WIP from PR #$PR_NUM (${AGE}s old)" fi # --- Fetch all open PRs from bot user --- PRS=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/pulls?state=open&sort=oldest" | \ jq --arg u "$USER" '[.[] | select(.user.login == $u)]') PR_COUNT=$(echo "$PRS" | jq 'length') # --- Per-PR priority checks --- if [ "$PR_COUNT" -gt 0 ]; then while IFS= read -r PR_JSON; do PR_NUM=$(echo "$PR_JSON" | jq -r '.number') HEAD_SHA=$(echo "$PR_JSON" | jq -r '.head.sha') HEAD_SHORT=$(echo "$HEAD_SHA" | cut -c1-8) MERGEABLE=$(echo "$PR_JSON" | jq -r '.mergeable') # ---- Check 1: REQUEST_CHANGES (latest per reviewer — always blocks handoff) ---- REVIEWS=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/pulls/$PR_NUM/reviews") # Group by reviewer, pick latest review per reviewer by submitted_at RC_ACTIVE=$(echo "$REVIEWS" | jq ' group_by(.user.login) | map(sort_by(.submitted_at) | last) | [.[] | select(.state == "REQUEST_CHANGES")] | length') if [ "$RC_ACTIVE" -gt 0 ]; then echo "PR #$PR_NUM: $RC_ACTIVE active REQUEST_CHANGES (latest per reviewer)" api_mutate POST "$API/repos/$REPO/issues/$PR_NUM/labels" "{\"labels\": [$WIP_ID]}" spawn_worker findings "$PR_NUM" "$HEAD_SHA" fi # ---- Check 2: Merge conflicts ---- if [ "$MERGEABLE" = "false" ]; then echo "PR #$PR_NUM: merge conflicts" api_mutate POST "$API/repos/$REPO/issues/$PR_NUM/labels" "{\"labels\": [$WIP_ID]}" spawn_worker rebase "$PR_NUM" "$HEAD_SHA" fi # ---- Check 3: CI status ---- CI_STATE=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/commits/$HEAD_SHA/status" | jq -r '.state') if [ "$CI_STATE" != "success" ]; then HAS_PLAN=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/issues/$PR_NUM/comments" | \ jq --arg sha "$HEAD_SHA" --arg u "$USER" \ '[.[] | select(.user.login == $u and (.body | startswith("## Fix Plan against " + $sha)))] | length > 0') if [ "$HAS_PLAN" = "true" ]; then echo "PR #$PR_NUM: CI failing, fix plan exists — skipping" exit 0 fi echo "PR #$PR_NUM: CI $CI_STATE — fix worker" api_mutate POST "$API/repos/$REPO/issues/$PR_NUM/labels" "{\"labels\": [$WIP_ID]}" spawn_worker ci-fix "$PR_NUM" "$HEAD_SHA" fi # ---- Check 4: Bot reviews present ---- MISSING_BOT="" while IFS= read -r BOT; do HAS=$(echo "$REVIEWS" | jq --arg b "$BOT" \ '[.[] | select(.body | contains("<!-- review-bot:\($b) -->"))] | length > 0') if [ "$HAS" = "false" ]; then MISSING_BOT="$BOT" break fi done <<< "$REVIEW_BOTS" if [ -n "$MISSING_BOT" ]; then echo "PR #$PR_NUM: waiting on $MISSING_BOT — exiting" exit 0 fi # ---- Check 5: Self-review ---- COMMENTS=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/issues/$PR_NUM/comments") SR_CLEAN=$(echo "$COMMENTS" | jq --arg sha "$HEAD_SHA" --arg u "$USER" \ '[.[] | select(.user.login == $u and (.body | contains("Self-review against \($sha)")) and (.body | contains("Assessment: ✅ Clean")))] | length > 0') if [ "$SR_CLEAN" = "false" ]; then SR_NEEDS=$(echo "$COMMENTS" | jq --arg sha "$HEAD_SHA" --arg u "$USER" \ '[.[] | select(.user.login == $u and (.body | contains("Self-review against \($sha)")) and (.body | contains("Assessment: ⚠️")))] | length > 0') if [ "$SR_NEEDS" = "true" ]; then HAS_PLAN=$(echo "$COMMENTS" | jq --arg sha "$HEAD_SHA" --arg u "$USER" \ '[.[] | select(.user.login == $u and (.body | startswith("## Fix Plan against " + $sha)))] | length > 0') [ "$HAS_PLAN" = "true" ] && { echo "PR #$PR_NUM: SR findings, fix plan exists — skipping"; exit 0; } api_mutate POST "$API/repos/$REPO/issues/$PR_NUM/labels" "{\"labels\": [$WIP_ID]}" spawn_worker sr-fix "$PR_NUM" "$HEAD_SHA" fi echo "PR #$PR_NUM: needs self-review" api_mutate POST "$API/repos/$REPO/issues/$PR_NUM/labels" "{\"labels\": [$WIP_ID]}" spawn_worker self-review "$PR_NUM" "$HEAD_SHA" fi # ---- Check 6: Unresolved inline review comments ---- UNRESOLVED=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/pulls/$PR_NUM/comments" | \ jq '[.[] | select(.resolver == null and .in_reply_to_id == null)] | length') if [ "$UNRESOLVED" -gt 0 ]; then HAS_PLAN=$(echo "$COMMENTS" | jq --arg sha "$HEAD_SHA" --arg u "$USER" \ '[.[] | select(.user.login == $u and (.body | startswith("## Fix Plan against " + $sha)))] | length > 0') [ "$HAS_PLAN" = "true" ] && { echo "PR #$PR_NUM: $UNRESOLVED unresolved, plan exists — skipping"; exit 0; } api_mutate POST "$API/repos/$REPO/issues/$PR_NUM/labels" "{\"labels\": [$WIP_ID]}" spawn_worker address-feedback "$PR_NUM" "$HEAD_SHA" fi # ---- Check 7: Bot reviews current (stale check) ---- ALL_CURRENT=true while IFS= read -r BOT; do CURRENT=$(echo "$REVIEWS" | jq --arg b "$BOT" --arg sha "$HEAD_SHORT" \ '[.[] | select(.body | contains("<!-- review-bot:\($b) -->")) | select(.body | contains("Evaluated against \($sha)"))] | length > 0') [ "$CURRENT" = "false" ] && { ALL_CURRENT=false; break; } done <<< "$REVIEW_BOTS" if [ "$ALL_CURRENT" = "false" ]; then echo "PR #$PR_NUM: bot review stale — exiting" exit 0 fi # ---- All clear: hand off ---- echo "PR #$PR_NUM: all checks pass — applying ready + assigning to aweiker" api_mutate POST "$API/repos/$REPO/issues/$PR_NUM/labels" "{\"labels\": [$READY_ID]}" api_mutate PATCH "$API/repos/$REPO/pulls/$PR_NUM" '{"assignees": ["aweiker"]}' echo "🔀 PR #$PR_NUM ready for review" exit 0 done < <(echo "$PRS" | jq -c '.[]') fi # --- No open PRs — pick next issue --- NEXT_ISSUE=$(curl -s -H "Authorization: token $TOKEN" \ "$API/repos/$REPO/issues?state=open&type=issues&limit=50" | \ jq '[.[] | select(.assignee == null)] | (map(select([.labels[].name] | contains(["bug"]))) + map(select([.labels[].name] | contains(["bug"]) | not))) | first // empty') if [ -z "$NEXT_ISSUE" ]; then echo "No issues to work on — exiting" exit 0 fi ISSUE_NUM=$(echo "$NEXT_ISSUE" | jq -r '.number') echo "SPAWN_WORKER: type=impl issue=$ISSUE_NUM" exit 0 ``` ### What Changes | Component | Change | |-----------|--------| | `workspace/scripts/dev-loop-dispatch.sh` | New — pure bash dispatch script | | `workspace/scripts/get-wip-timestamp.sh` | New — helper to get WIP label application time via timeline | | `workspace/scripts/worker-tasks/findings.md` | New — task template for REQUEST_CHANGES findings worker | | `workspace/scripts/worker-tasks/rebase.md` | New — task template for merge conflict rebase worker | | `workspace/scripts/worker-tasks/ci-fix.md` | Extract from skill → file | | `workspace/scripts/worker-tasks/self-review.md` | Extract from skill → file | | `workspace/scripts/worker-tasks/sr-fix.md` | Extract from skill → file | | `workspace/scripts/worker-tasks/address-feedback.md` | Extract from skill → file | | `workspace/scripts/worker-tasks/impl.md` | Extract from skill → file | | `workspace/skills/dev-loop/SKILL.md` | Replace body — point to script, document architecture | | Cron config (gargoyle-dev-loop, review-bot-dev-loop) | Update prompt + toolsAllow | ### Cron Config (new) ```yaml - label: gargoyle-dev-loop schedule: "*/15 * * * *" model: haiku toolsAllow: [exec, sessions_spawn] prompt: | Run this command: bash ~/.openclaw/workspace/scripts/dev-loop-dispatch.sh gargoyle Read the output. - If it contains a line starting with "SPAWN_WORKER:", parse it: type=<worker-type> pr=<number> sha=<sha> (or issue=<number> for impl) - Read the matching template from: ~/.openclaw/workspace/scripts/worker-tasks/<type>.md - Substitute {{PR}}, {{SHA}}, {{PROJECT}} with the parsed values - Spawn the worker using sessions_spawn - Exit with: NO_REPLY - If no "SPAWN_WORKER:" line in output: exit with NO_REPLY. - If output contains "🔀": exit with that line only (handoff notification). DO NOT interpret the output beyond these rules. DO NOT read project config files. DO NOT make any other decisions. ``` ### New `findings` Worker Task Template This is a missing piece identified in review. The `findings` worker handles PRs with active `REQUEST_CHANGES`: ```markdown # Worker: findings (REQUEST_CHANGES present) Work on {{PROJECT}} PR #{{PR}}: address REQUEST_CHANGES findings. Project config: memory/projects/{{PROJECT}}.yaml Token path: <from config> API base: <from config> Repo: <from config> Node: <from config> ## Step 1: Fetch all REQUEST_CHANGES reviews (latest per reviewer) TOKEN=$(cat <token_path>) curl -s -H "Authorization: token $TOKEN" \ "<api>/repos/<repo>/pulls/{{PR}}/reviews" | \ jq 'group_by(.user.login) | map(sort_by(.submitted_at) | last) | [.[] | select(.state == "REQUEST_CHANGES")]' ## Step 2: Fix Planning 1. Read each REQUEST_CHANGES review body for findings 2. Read ~/.openclaw/workspace/skills/pre-code/SKILL.md 3. Write a fix plan addressing each finding 4. Post the plan as a comment on PR #{{PR}}: - Must start with: "## Fix Plan against {{SHA}}:" 5. DO NOT write code until plan comment is confirmed posted ## Step 3: Implement (same as address-feedback worker) [... follow address-feedback template from here ...] ## Exit 1. Remove WIP label 2. Exit with: NO_REPLY 3. DO NOT merge. DO NOT apply ready label. DO NOT assign to aweiker. ``` ## State/Data Model The dispatch logic is stateless per run. State lives entirely in the Gitea API: - PR labels (wip, ready) - PR reviews (REQUEST_CHANGES state per reviewer — latest only, bot sentinels, SHA evaluation markers) - PR CI status - Issue comments (self-review markers by SHA, fix plan comments by SHA) - Issue assignees No state file is written between runs. ## Error Cases | Error | Handling | |-------|----------| | `curl` non-200 | `set -euo pipefail` aborts — no partial action | | `jq` parse error | Script aborts — safe | | `yq` not installed | Script aborts at config load — must verify before deploy | | Worker crashed (wip orphaned) | 1hr stale cleanup picks it up | | Two crons fire simultaneously | WIP mutex (first sets label, second sees it and exits) | | `sessions_spawn` fails in cron model | WIP label set but no worker — 1hr cleanup resolves | | Cron model misparses SPAWN_WORKER line | Worker not spawned — no mutation, next run will retry | Note: WIP label being set before `sessions_spawn` creates a small window where WIP is set but spawn fails. This is intentional — it's safer to have a stale WIP (cleaned up in 1hr) than to have two workers running. The alternative (set WIP after spawn succeeds) creates a race where two cron runs could both spawn workers before either sets the WIP label. ## Edge Cases - **Multiple open PRs:** Script checks in oldest-first order. The first PR to match a non-CI-waiting condition is acted on. CI-waiting (missing bot review) causes immediate exit for that run — subsequent PRs not checked this run. *Decision: this is acceptable; each run handles one thing.* - **REQUEST_CHANGES from reviewer who later APPROVED:** Latest-per-reviewer logic handles this — APPROVED supersedes the old REQUEST_CHANGES. No false block. - **REQUEST_CHANGES AND unresolved inline comments:** Check 1 fires first (RC) → findings worker handles both review-level and inline findings. - **Bot sentinel in REQUEST_CHANGES review body:** A bot posts RC with its sentinel. Sentinel check (Check 4) and RC check (Check 1) are independent. RC check uses `.state == "REQUEST_CHANGES"` on latest review, not sentinel. They don't interfere. - **WIP label applied manually by Aaron:** Script sees it as WIP, waits 1hr. Aaron can remove it manually if needed. - **Cron fires while handoff (ready label + assign) is in flight:** Two `api_mutate` calls; if the second (assign) fails, ready label is set but PR not assigned. Next run hits Check 7 (all clear) again — applies ready label again (idempotent) and assigns. Safe. ## Testing Strategy ### Pre-deploy: `yq` availability check ```bash ssh ubuntu@hardy "yq --version" ``` ### Unit tests (`bats`) ```bash @test "REQUEST_CHANGES blocks — spawns findings worker" { ... } @test "REQUEST_CHANGES latest-per-reviewer: superseded by APPROVED does not block" { ... } @test "WIP age < 1hr exits" { ... } @test "WIP age > 1hr removes label and continues" { ... } @test "CI failing with existing plan exits without spawn" { ... } @test "All checks pass → handoff, no merge" { ... } @test "DRY_RUN=1 prints all actions without executing" { ... } ``` ### Dry-run smoke test ```bash DRY_RUN=1 bash dev-loop-dispatch.sh review-bot # Verify output shows correct SPAWN_WORKER or exit reason ``` ### Manual regression for #144 and #145 1. Create test PR with REQUEST_CHANGES review 2. `DRY_RUN=1`: verify output shows `SPAWN_WORKER: type=findings` 3. Grep entire workspace for `POST .*/merge`: should find zero results 4. Grep all worker templates for merge API: should find zero results ## Completion Checklist 1. ✅ Does dispatch script contain zero calls to any merge API endpoint? 2. ✅ Is REQUEST_CHANGES check (Check 1) before WIP cleanup, CI check, and handoff? 3. ✅ Does script exit after first matching condition? 4. ✅ Does cron toolsAllow contain only `[exec, sessions_spawn]`? 5. ✅ Are all worker task templates in `workspace/scripts/worker-tasks/` files? 6. ✅ Is `findings` worker template defined (gap from v1)? 7. ✅ WIP stale-lock cleanup preserved (1hr threshold, timeline events)? 8. ✅ `set -euo pipefail` present? 9. ✅ `DRY_RUN=1` flag implemented? 10. ✅ Latest-per-reviewer REQUEST_CHANGES logic (jq `group_by` + `sort_by(.submitted_at) | last`)? 11. ✅ `yq` availability verified before deploy? 12. ✅ Issues #144 and #145 verifiable via test strategy? ## Fixes for Issues #144 and #145 **Issue #144** (autonomous merge): Eliminated structurally. The dispatch script has zero merge API calls. The cron prompt explicitly prohibits interpretation beyond SPAWN_WORKER lines. Workers exit with NO_REPLY and are prohibited from merging in their templates. No code path from cron → script → worker reaches a merge endpoint. **Issue #145** (merged despite REQUEST_CHANGES): Eliminated structurally. The dispatch script checks REQUEST_CHANGES as Check 1 in the PR loop — before CI, self-review, and handoff. The check uses `group_by(.user.login) | map(sort_by(.submitted_at) | last) | [.[] | select(.state == "REQUEST_CHANGES")]` — latest review per reviewer, hard boolean. Cannot be reasoned past because no model executes this check.
rodin closed this issue 2026-05-15 08:12:02 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: rodin/review-bot#148