feat: redesign dev-loop dispatch as pure shell script — no model reasoning in dispatch #148
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
The current dev-loop uses a single Haiku model call that simultaneously assesses project state, makes dispatch decisions, AND acts. This caused two production failures:
REQUEST_CHANGES— model reasoned past the check (#145)The fix: dispatch becomes a pure shell script with no model reasoning. Every decision is a boolean API check + branch. The model only spawns a worker when the script's output says to.
Full Pre-Code Plan
Pre-Code Plan: Dev-Loop Dispatch Redesign
Problem
The current dev-loop uses a single model call (Haiku) that simultaneously assesses project state, makes decisions, AND acts. This is a logic-model-action pipeline with no hard stops. Two production failures demonstrate the failure mode:
REQUEST_CHANGESfromsecurity-review-bot— the model reasoned past the check instead of executing it as a binary shell command.The root cause is that dispatch is entangled with judgment. When Haiku "reasons" about whether to merge, it can reach wrong conclusions. The only safe dispatch is a dispatch that cannot reason — one that executes boolean API checks and branches on their output.
This plan redesigns dispatch to be a pure shell script with no model reasoning, and restricts models to workers that receive a precise situation description and do only the work they're given.
Constraints
REQUEST_CHANGESfrom any reviewer always blocks — no exceptions, no reasoning past ittoolsAllow: ["exec"]only — no model reasoning in dispatchProposed Approach
Architecture
What Changes
1. New:
workspace/scripts/dev-loop-dispatch.shA bash script with no model reasoning. Every decision is:
Full dispatch logic (in order):
Key safety properties:
REQUEST_CHANGEScheck is first — before CI, before self-review, before anythingset -euo pipefail— any curl failure aborts (no silent partial state)2. New:
workspace/scripts/spawn-worker.shWraps
sessions_spawnvia a small Python/node script (since bash can't call the OpenClaw API directly). The dispatcher writes the worker task to a temp file, then callsopenclaw spawnor uses the sessions API.Alternative: The dispatcher script exits with a structured output, and the cron's agentTurn model reads it and spawns the worker. This keeps the script pure bash and moves the one model call to spawn-time. The cron prompt becomes:
This is cleaner: the shell script does all the API work, the model only does the one thing it can't avoid — spawning a session. The model sees only the spawn instruction, not the raw project state.
3. Updated:
workspace/skills/dev-loop/SKILL.mdReplace the skill with documentation pointing to the script:
4. New:
workspace/scripts/worker-tasks/directoryEach worker type gets a markdown template. Variables are substituted by the cron model when spawning. Templates are identical to the current skill's worker task templates, extracted into files for single-source maintenance.
5. Updated: cron config
Current cron entry (approximate):
New cron entry:
The cron model's job is now: run script, read output, spawn if instructed. That's it.
State/Data Model
The dispatch logic is stateless per run. State lives entirely in the Gitea API:
No state is carried between cron runs.
Error Cases
curlreturns non-200set -euo pipefailcauses script to abort — no partial actionsjqparse errorsessions_spawnfailsEdge Cases
USER(bot) are checked — human PRs are ignored.REQUEST_CHANGESalso blocks. This is correct behavior.state == REQUEST_CHANGESon any review. If reviewer re-approved, their latest review is APPROVED, not REQUEST_CHANGES. Need to check latest review per reviewer, not any review. (Open question — see below.)Testing Strategy
Unit tests for dispatch logic
Extract each check into a small bash function that can be tested with mock curl responses. Use
bats(Bash Automated Testing System):Integration smoke test
After deploying, run the script against a real project in dry-run mode:
Dry-run mode prints what it would do without calling spawn or making label mutations.
Manual verification of #144 and #145 fixes
REQUEST_CHANGESreviewRegression test for autonomous merge
readylabel +assigneesPATCH calls are madePOST /repos/.../pulls/.../mergecall anywhere in script or worker templatesOpen Questions
REQUEST_CHANGES latest-only vs any: If a reviewer posts
REQUEST_CHANGES, then later postsAPPROVED, does the current skill's check (any review withREQUEST_CHANGESstate) correctly reflect the resolved state? The Gitea API returns individual reviews, not a per-reviewer summary. We should check the latest review per reviewer. Current SKILL.md doesn't specify this; the script should implement latest-per-reviewer logic explicitly.Spawn mechanism: The dispatch script can't call
sessions_spawndirectly from bash. Two options:openclaw sessions spawn <task-file>) if such a CLI existsOption A is cleaner for auditability — the spawn instruction is visible in exec output. Preference?
Worker task templates: Should worker task templates live in
workspace/scripts/worker-tasks/(workspace, not version-controlled in a repo) or in thereview-botor a dedicatedops-gatewayrepo? The workspace location keeps them co-located with the dispatch script; a repo gives them a change history independent of the workspace.Cron toolsAllow for review-bot vs gargoyle: The new cron needs
execandsessions_spawn. Currently the skill allows more tools (read,memory_get). Should we tighten all dev-loop crons or only new ones?Dry-run mode: Is
DRY_RUN=1flag worth implementing for the script, or is the test environment (staging project) sufficient for verification?Completion Checklist
REQUEST_CHANGEScheck before WIP label age check, CI check, and handoff check?toolsAllowexcludereadandmemory_get(no ambient project knowledge)?set -euo pipefailensure no partial actions on curl failures?REQUEST_CHANGEScheck?Fixes for Issues #144 and #145
Issue #144 (autonomous merge): Eliminated entirely — the dispatch script contains no merge API calls and the cron's
toolsAllowdoes not include any tool that could perform a merge. Workers are spawned with explicit exit instructions that prohibit merge calls.Issue #145 (merged despite REQUEST_CHANGES): Eliminated by structural change —
REQUEST_CHANGEScheck is the first check in the dispatch script, before CI, before self-review, before handoff. It cannot be "reasoned past" because there is no reasoning in the dispatcher.Pre-Code Plan v2 (post-review)
This is the revised plan after running the parallel review panel.
Review Findings Addressed
Logic contradictions found and resolved:
toolsAllow: ["exec"]vs[exec, sessions_spawn]— contradiction resolved: cron needs both; "no model reasoning" means model only parses SPAWN_WORKER output, not raw project statefindingsworker type was referenced but had no template — template now definedgroup_by(.user.login) | map(sort_by(.submitted_at) | last)spawn-worker.shcan't call OpenClaw sessions API from bash — resolved via structured SPAWN_WORKER output + cron model parsingPattern check:
yqdependency on execution host (hardy) noted and added as pre-deploy verification stepPre-Code Plan: Dev-Loop Dispatch Redesign (v2 — post-review)
Problem
The current dev-loop uses a single model call (Haiku) that simultaneously assesses project state, makes decisions, AND acts. This is a logic-model-action pipeline with no hard stops. Two production failures demonstrate the failure mode:
REQUEST_CHANGESfromsecurity-review-bot— model reasoned past the check instead of executing it as a binary API query (#145).The root cause: dispatch is entangled with judgment. When a model "reasons" about whether to merge, it can reach wrong conclusions. The only safe dispatch is one that cannot reason — boolean API checks, deterministic branching.
Constraints
REQUEST_CHANGESfrom any reviewer always blocks handoff — no exceptions, no reasoning past ityqmust be available on the execution host (hardy) — verify before implementationResolved Open Questions (from v1 review)
Q1: REQUEST_CHANGES latest-per-reviewer or any historical?
Use latest-per-reviewer. If a reviewer posted REQUEST_CHANGES then later APPROVED, the APPROVED takes precedence. The script will group reviews by reviewer, pick the latest per reviewer by
submitted_at, and check if any reviewer's latest review hasstate == REQUEST_CHANGES. A historical REQUEST_CHANGES from a reviewer who has since approved must NOT block.Q2: Spawn mechanism?
Use Option A: The script outputs a structured spawn instruction line like:
The cron model (
toolsAllow: [exec, sessions_spawn]) reads this line and spawns the appropriate worker. The cron prompt describes how to parse this line and which template to use. This keeps the shell script pure bash with no OpenClaw API calls.Q3: Worker task templates location?
Live in
~/.openclaw/workspace/scripts/worker-tasks/(workspace). This co-locates them with the dispatch script and keeps them out of application repos. The workspace itself is git-tracked via the ops-gateway or workspace repo.Q4: Tighten toolsAllow on all dev-loop crons?
Yes — tighten all. Any cron running the new dispatch script needs only
[exec, sessions_spawn]. Existing crons not yet migrated keep their current config until migrated.Q5: Dry-run mode?
Implement
DRY_RUN=1flag: print all curl commands and spawn instructions without executing them. Required before first production run.Proposed Approach
Architecture
Dispatch Script Logic (in order)
All checks are bash functions using
curl | jq. No model reasoning.What Changes
workspace/scripts/dev-loop-dispatch.shworkspace/scripts/get-wip-timestamp.shworkspace/scripts/worker-tasks/findings.mdworkspace/scripts/worker-tasks/rebase.mdworkspace/scripts/worker-tasks/ci-fix.mdworkspace/scripts/worker-tasks/self-review.mdworkspace/scripts/worker-tasks/sr-fix.mdworkspace/scripts/worker-tasks/address-feedback.mdworkspace/scripts/worker-tasks/impl.mdworkspace/skills/dev-loop/SKILL.mdCron Config (new)
New
findingsWorker Task TemplateThis is a missing piece identified in review. The
findingsworker handles PRs with activeREQUEST_CHANGES:State/Data Model
The dispatch logic is stateless per run. State lives entirely in the Gitea API:
No state file is written between runs.
Error Cases
curlnon-200set -euo pipefailaborts — no partial actionjqparse erroryqnot installedsessions_spawnfails in cron modelNote: WIP label being set before
sessions_spawncreates a small window where WIP is set but spawn fails. This is intentional — it's safer to have a stale WIP (cleaned up in 1hr) than to have two workers running. The alternative (set WIP after spawn succeeds) creates a race where two cron runs could both spawn workers before either sets the WIP label.Edge Cases
.state == "REQUEST_CHANGES"on latest review, not sentinel. They don't interfere.api_mutatecalls; if the second (assign) fails, ready label is set but PR not assigned. Next run hits Check 7 (all clear) again — applies ready label again (idempotent) and assigns. Safe.Testing Strategy
Pre-deploy:
yqavailability checkUnit tests (
bats)Dry-run smoke test
Manual regression for #144 and #145
DRY_RUN=1: verify output showsSPAWN_WORKER: type=findingsPOST .*/merge: should find zero resultsCompletion Checklist
[exec, sessions_spawn]?workspace/scripts/worker-tasks/files?findingsworker template defined (gap from v1)?set -euo pipefailpresent?DRY_RUN=1flag implemented?group_by+sort_by(.submitted_at) | last)?yqavailability verified before deploy?Fixes for Issues #144 and #145
Issue #144 (autonomous merge): Eliminated structurally. The dispatch script has zero merge API calls. The cron prompt explicitly prohibits interpretation beyond SPAWN_WORKER lines. Workers exit with NO_REPLY and are prohibited from merging in their templates. No code path from cron → script → worker reaches a merge endpoint.
Issue #145 (merged despite REQUEST_CHANGES): Eliminated structurally. The dispatch script checks REQUEST_CHANGES as Check 1 in the PR loop — before CI, self-review, and handoff. The check uses
group_by(.user.login) | map(sort_by(.submitted_at) | last) | [.[] | select(.state == "REQUEST_CHANGES")]— latest review per reviewer, hard boolean. Cannot be reasoned past because no model executes this check.