fix(#120): detect VCS host for releases API and derive action-repo #122

Closed
rodin wants to merge 2 commits from issue-120 into feature/github-support
3 changed files with 97 additions and 16 deletions
+26 -9
View File
@@ -104,6 +104,10 @@ inputs:
description: 'Path to custom persona JSON file' description: 'Path to custom persona JSON file'
required: false required: false
default: '' default: ''
action-repo:
description: 'Repository hosting the review-bot binary (owner/name). Defaults to rodin/review-bot on Gitea, or strat/review-bot on GitHub.'
required: false
default: ''
runs: runs:
using: 'composite' using: 'composite'
@@ -112,10 +116,21 @@ runs:
id: version id: version
security-review-bot marked this conversation as resolved
Review

[MINOR] No enforcement of HTTPS for SERVER_URL; allowing http or arbitrary schemes permits MITM and integrity compromise of the downloaded binary and checksum.

**[MINOR]** No enforcement of HTTPS for SERVER_URL; allowing http or arbitrary schemes permits MITM and integrity compromise of the downloaded binary and checksum.
Review

[NIT] Network requests (curl) do not specify explicit timeouts, which can cause long hangs under adverse network conditions. Consider adding --max-time or --connect-timeout to reduce DoS risk from slow/unresponsive endpoints.

**[NIT]** Network requests (curl) do not specify explicit timeouts, which can cause long hangs under adverse network conditions. Consider adding --max-time or --connect-timeout to reduce DoS risk from slow/unresponsive endpoints.
Review

[MINOR] VCS detection relies on checking if the server URL contains the substring 'gitea'. This is brittle (a Gitea instance may not include 'gitea' in the hostname). Consider a more robust detection or a configurable input (e.g., explicit vcs-type or probing /api/v1 vs /api/v3) with fallback.

**[MINOR]** VCS detection relies on checking if the server URL contains the substring 'gitea'. This is brittle (a Gitea instance may not include 'gitea' in the hostname). Consider a more robust detection or a configurable input (e.g., explicit vcs-type or probing /api/v1 vs /api/v3) with fallback.
shell: bash shell: bash
run: | run: |
GITEA_URL="${{ inputs.gitea-url || github.server_url }}" SERVER_URL="${{ inputs.gitea-url || github.server_url }}"
Review

[MINOR] The VCS detection heuristic grep -qi 'gitea' is hostname-based string matching. A GitHub Enterprise instance at a URL like https://github.example-gitea-internal.com would incorrectly be detected as Gitea and use /api/v1/. This is an edge case, but since the input is gitea-url (user-supplied), it's possible a user passes a GHE URL that happens to contain 'gitea' in the domain. The approach is pragmatic and acceptable for the known use cases, but worth documenting or guarding.

**[MINOR]** The VCS detection heuristic `grep -qi 'gitea'` is hostname-based string matching. A GitHub Enterprise instance at a URL like `https://github.example-gitea-internal.com` would incorrectly be detected as Gitea and use `/api/v1/`. This is an edge case, but since the input is `gitea-url` (user-supplied), it's possible a user passes a GHE URL that happens to contain 'gitea' in the domain. The approach is pragmatic and acceptable for the known use cases, but worth documenting or guarding.
REPO="${{ inputs.repo || 'rodin/review-bot' }}" # Detect VCS type: Gitea uses /api/v1/, GitHub uses /api/v3/
if echo "$SERVER_URL" | grep -qi 'gitea'; then
API_BASE="${SERVER_URL}/api/v1"
Review

[MINOR] The VCS detection heuristic grep -qi 'gitea' matches on the word 'gitea' anywhere in the server URL. This works for the known cases (e.g. https://gitea.example.com) but would misfire for a GitHub Enterprise Server instance with 'gitea' in its hostname, or fail for a self-hosted Gitea instance that doesn't include 'gitea' in its URL (e.g. https://code.mycompany.com). A more robust approach would be to attempt the Gitea API path and fall back, or to expose an explicit api-version input. This is acceptable as a pragmatic heuristic for the stated use-cases but worth documenting as a known limitation.

**[MINOR]** The VCS detection heuristic `grep -qi 'gitea'` matches on the word 'gitea' anywhere in the server URL. This works for the known cases (e.g. `https://gitea.example.com`) but would misfire for a GitHub Enterprise Server instance with 'gitea' in its hostname, or fail for a self-hosted Gitea instance that doesn't include 'gitea' in its URL (e.g. `https://code.mycompany.com`). A more robust approach would be to attempt the Gitea API path and fall back, or to expose an explicit `api-version` input. This is acceptable as a pragmatic heuristic for the stated use-cases but worth documenting as a known limitation.
DEFAULT_ACTION_REPO="rodin/review-bot"
else
API_BASE="${SERVER_URL}/api/v3"
Review

[MINOR] VCS host detection relies on a simple substring match (grep -qi 'gitea' on server_url). This will misclassify Gitea instances without 'gitea' in the hostname (e.g., code.example.com), breaking API calls by selecting /api/v3. Consider a more robust detection (e.g., explicit input, or probing a known endpoint) or an input to force API version.

**[MINOR]** VCS host detection relies on a simple substring match (`grep -qi 'gitea'` on server_url). This will misclassify Gitea instances without 'gitea' in the hostname (e.g., code.example.com), breaking API calls by selecting /api/v3. Consider a more robust detection (e.g., explicit input, or probing a known endpoint) or an input to force API version.
DEFAULT_ACTION_REPO="strat/review-bot"
Review

[MINOR] When querying GitHub's releases API, the parameter should be 'per_page=1' rather than 'limit=1'. While GitHub ignores unknown params and still returns results, updating this would be more correct and avoid relying on default pagination. Also consider supporting an Authorization header for private repos.

**[MINOR]** When querying GitHub's releases API, the parameter should be 'per_page=1' rather than 'limit=1'. While GitHub ignores unknown params and still returns results, updating this would be more correct and avoid relying on default pagination. Also consider supporting an Authorization header for private repos.
fi
ACTION_REPO="${{ inputs.action-repo || '' }}"
Review

[NIT] ACTION_REPO="${{ inputs.action-repo || '' }}" — the || '' is redundant since the input already has default: ''. It's harmless but slightly noisy.

**[NIT]** `ACTION_REPO="${{ inputs.action-repo || '' }}"` — the `|| ''` is redundant since the input already has `default: ''`. It's harmless but slightly noisy.
if [ -z "$ACTION_REPO" ]; then
ACTION_REPO="$DEFAULT_ACTION_REPO"
security-review-bot marked this conversation as resolved
Review

[MAJOR] Command injection risk: unvalidated input (ACTION_REPO) is interpolated into a shell command argument for curl ("${API_BASE}/repos/${ACTION_REPO}/..."). If ACTION_REPO contains command substitution like $(...) or backticks, it will be executed by the shell even within double quotes.

**[MAJOR]** Command injection risk: unvalidated input (ACTION_REPO) is interpolated into a shell command argument for curl ("${API_BASE}/repos/${ACTION_REPO}/..."). If ACTION_REPO contains command substitution like $(...) or backticks, it will be executed by the shell even within double quotes.
Review

[MINOR] Potential SSRF/internal network access: SERVER_URL is user-configurable and used for server-side requests without allowlisting or validation, enabling connections to internal services from the runner.

**[MINOR]** Potential SSRF/internal network access: SERVER_URL is user-configurable and used for server-side requests without allowlisting or validation, enabling connections to internal services from the runner.
Review

[MINOR] Missing network timeouts/retries for curl may cause the job to hang or be susceptible to long delays; consider adding --max-time/--connect-timeout and bounded retries.

**[MINOR]** Missing network timeouts/retries for curl may cause the job to hang or be susceptible to long delays; consider adding --max-time/--connect-timeout and bounded retries.
fi
if [ "${{ inputs.version }}" = "latest" ]; then if [ "${{ inputs.version }}" = "latest" ]; then
VERSION=$(curl -sSf "${GITEA_URL}/api/v1/repos/${REPO}/releases?limit=1" \ VERSION=$(curl -sSf "${API_BASE}/repos/${ACTION_REPO}/releases?limit=1" \
| python3 -c "import sys, json; releases = json.load(sys.stdin); print(releases[0]['tag_name'] if releases else '')") | python3 -c "import sys, json; releases = json.load(sys.stdin); print(releases[0]['tag_name'] if releases else '')")
if [ -z "$VERSION" ]; then if [ -z "$VERSION" ]; then
echo "Failed to determine latest version" >&2 echo "Failed to determine latest version" >&2
@@ -125,6 +140,8 @@ runs:
VERSION="${{ inputs.version }}" VERSION="${{ inputs.version }}"
fi fi
echo "version=${VERSION}" >> "$GITHUB_OUTPUT" echo "version=${VERSION}" >> "$GITHUB_OUTPUT"
echo "action-repo=${ACTION_REPO}" >> "$GITHUB_OUTPUT"
echo "server-url=${SERVER_URL}" >> "$GITHUB_OUTPUT"
- name: Cache review-bot binary - name: Cache review-bot binary
id: cache id: cache
@@ -137,14 +154,14 @@ runs:
if: steps.cache.outputs.cache-hit != 'true' if: steps.cache.outputs.cache-hit != 'true'
Review

[MINOR] The binary integrity check relies on a checksums.txt downloaded from the same untrusted release source. If ACTION_REPO or SERVER_URL are misconfigured or attacker-controlled (via workflow inputs), a malicious binary and matching checksum could be served and executed with access to secrets. Consider verifying signatures, pinning to an allowlist of repos, or pinning versions/hashes from a trusted source.

**[MINOR]** The binary integrity check relies on a checksums.txt downloaded from the same untrusted release source. If ACTION_REPO or SERVER_URL are misconfigured or attacker-controlled (via workflow inputs), a malicious binary and matching checksum could be served and executed with access to secrets. Consider verifying signatures, pinning to an allowlist of repos, or pinning versions/hashes from a trusted source.
shell: bash shell: bash
run: | run: |
GITEA_URL="${{ inputs.gitea-url || github.server_url }}" SERVER_URL="${{ steps.version.outputs.server-url }}"
REPO="${{ inputs.repo || 'rodin/review-bot' }}" ACTION_REPO="${{ steps.version.outputs.action-repo }}"
VERSION="${{ steps.version.outputs.version }}" VERSION="${{ steps.version.outputs.version }}"
BINARY="review-bot-linux-amd64" BINARY="review-bot-linux-amd64"
security-review-bot marked this conversation as resolved
Review

[MAJOR] Command injection risk: unvalidated inputs (SERVER_URL and ACTION_REPO) are used in curl URLs ("${SERVER_URL}/${ACTION_REPO}/releases/download..."). Embedded $(...) or backticks in either value will trigger command substitution and arbitrary command execution.

**[MAJOR]** Command injection risk: unvalidated inputs (SERVER_URL and ACTION_REPO) are used in curl URLs ("${SERVER_URL}/${ACTION_REPO}/releases/download..."). Embedded $(...) or backticks in either value will trigger command substitution and arbitrary command execution.
curl -sSfL "${GITEA_URL}/${REPO}/releases/download/${VERSION}/${BINARY}" \ curl -sSfL "${SERVER_URL}/${ACTION_REPO}/releases/download/${VERSION}/${BINARY}" \
-o "${{ runner.temp }}/review-bot" -o "${{ runner.temp }}/review-bot"
curl -sSfL "${GITEA_URL}/${REPO}/releases/download/${VERSION}/checksums.txt" \ curl -sSfL "${SERVER_URL}/${ACTION_REPO}/releases/download/${VERSION}/checksums.txt" \
-o "${{ runner.temp }}/checksums.txt" -o "${{ runner.temp }}/checksums.txt"
# Verify SHA-256 checksum # Verify SHA-256 checksum
@@ -169,8 +186,8 @@ runs:
- name: Run review - name: Run review
shell: bash shell: bash
env: env:
GITEA_URL: ${{ inputs.gitea-url || github.server_url }} GITHUB_SERVER_URL: ${{ inputs.gitea-url || github.server_url }}
GITEA_REPO: ${{ inputs.repo || github.repository }} GITHUB_REPOSITORY: ${{ inputs.repo || github.repository }}
PR_NUMBER: ${{ inputs.pr-number || github.event.pull_request.number }} PR_NUMBER: ${{ inputs.pr-number || github.event.pull_request.number }}
REVIEWER_TOKEN: ${{ inputs.reviewer-token }} REVIEWER_TOKEN: ${{ inputs.reviewer-token }}
REVIEWER_NAME: ${{ inputs.reviewer-name }} REVIEWER_NAME: ${{ inputs.reviewer-name }}
+24 -7
View File
@@ -104,6 +104,10 @@ inputs:
description: 'Path to custom persona JSON file' description: 'Path to custom persona JSON file'
required: false required: false
default: '' default: ''
action-repo:
description: 'Repository hosting the review-bot binary (owner/name). Defaults to rodin/review-bot on Gitea, or strat/review-bot on GitHub.'
required: false
default: ''
runs: runs:
using: 'composite' using: 'composite'
@@ -112,10 +116,21 @@ runs:
id: version id: version
security-review-bot marked this conversation as resolved
Review

[MINOR] No enforcement of HTTPS for SERVER_URL; permitting non-HTTPS allows downgrade/MITM when downloading and executing the binary.

**[MINOR]** No enforcement of HTTPS for SERVER_URL; permitting non-HTTPS allows downgrade/MITM when downloading and executing the binary.
Review

[NIT] Network requests (curl) lack explicit timeouts. Adding timeouts (--max-time/--connect-timeout) can improve robustness and reduce the risk of workflow stalls.

**[NIT]** Network requests (curl) lack explicit timeouts. Adding timeouts (--max-time/--connect-timeout) can improve robustness and reduce the risk of workflow stalls.
Review

[MINOR] Same brittle VCS detection as the Gitea action: using 'grep -qi gitea' on the server URL may mis-detect non-standard Gitea hostnames. Consider a more robust detection or a user-provided override with fallback probing.

**[MINOR]** Same brittle VCS detection as the Gitea action: using 'grep -qi gitea' on the server URL may mis-detect non-standard Gitea hostnames. Consider a more robust detection or a user-provided override with fallback probing.
shell: bash shell: bash
run: | run: |
GITEA_URL="${{ inputs.gitea-url || github.server_url }}" SERVER_URL="${{ inputs.gitea-url || github.server_url }}"
REPO="${{ inputs.repo || 'rodin/review-bot' }}" # Detect VCS type: Gitea uses /api/v1/, GitHub uses /api/v3/
if echo "$SERVER_URL" | grep -qi 'gitea'; then
API_BASE="${SERVER_URL}/api/v1"
DEFAULT_ACTION_REPO="rodin/review-bot"
else
API_BASE="${SERVER_URL}/api/v3"
Review

[MINOR] Same VCS detection heuristic issue as in the Gitea action: relying on 'gitea' substring in server_url may mis-detect some Gitea instances. Provide an explicit override or more robust probe to avoid false negatives.

**[MINOR]** Same VCS detection heuristic issue as in the Gitea action: relying on `'gitea'` substring in server_url may mis-detect some Gitea instances. Provide an explicit override or more robust probe to avoid false negatives.
DEFAULT_ACTION_REPO="strat/review-bot"
Review

[MINOR] Releases API call uses 'limit=1'; for GitHub v3 the canonical parameter is 'per_page=1'. Also consider adding optional Authorization to support private repos.

**[MINOR]** Releases API call uses 'limit=1'; for GitHub v3 the canonical parameter is 'per_page=1'. Also consider adding optional Authorization to support private repos.
fi
ACTION_REPO="${{ inputs.action-repo || '' }}"
if [ -z "$ACTION_REPO" ]; then
ACTION_REPO="$DEFAULT_ACTION_REPO"
security-review-bot marked this conversation as resolved
Review

[MAJOR] Command injection risk: unvalidated ACTION_REPO is expanded into curl URL ("${API_BASE}/repos/${ACTION_REPO}/...") and may execute $(...) or ... via command substitution.

**[MAJOR]** Command injection risk: unvalidated ACTION_REPO is expanded into curl URL ("${API_BASE}/repos/${ACTION_REPO}/...") and may execute $(...) or `...` via command substitution.
Review

[MINOR] Potential SSRF/internal network access: SERVER_URL is used for server-side requests without validation or allowlisting, enabling connections to arbitrary hosts from the runner.

**[MINOR]** Potential SSRF/internal network access: SERVER_URL is used for server-side requests without validation or allowlisting, enabling connections to arbitrary hosts from the runner.
Review

[MINOR] Missing network timeouts/retries for curl may cause hangs; add --max-time/--connect-timeout and safe retry logic.

**[MINOR]** Missing network timeouts/retries for curl may cause hangs; add --max-time/--connect-timeout and safe retry logic.
fi
if [ "${{ inputs.version }}" = "latest" ]; then if [ "${{ inputs.version }}" = "latest" ]; then
VERSION=$(curl -sSf "${GITEA_URL}/api/v1/repos/${REPO}/releases?limit=1" \ VERSION=$(curl -sSf "${API_BASE}/repos/${ACTION_REPO}/releases?limit=1" \
| python3 -c "import sys, json; releases = json.load(sys.stdin); print(releases[0]['tag_name'] if releases else '')") | python3 -c "import sys, json; releases = json.load(sys.stdin); print(releases[0]['tag_name'] if releases else '')")
if [ -z "$VERSION" ]; then if [ -z "$VERSION" ]; then
echo "Failed to determine latest version" >&2 echo "Failed to determine latest version" >&2
@@ -125,6 +140,8 @@ runs:
VERSION="${{ inputs.version }}" VERSION="${{ inputs.version }}"
fi fi
echo "version=${VERSION}" >> "$GITHUB_OUTPUT" echo "version=${VERSION}" >> "$GITHUB_OUTPUT"
echo "action-repo=${ACTION_REPO}" >> "$GITHUB_OUTPUT"
echo "server-url=${SERVER_URL}" >> "$GITHUB_OUTPUT"
- name: Cache review-bot binary - name: Cache review-bot binary
id: cache id: cache
@@ -137,14 +154,14 @@ runs:
if: steps.cache.outputs.cache-hit != 'true' if: steps.cache.outputs.cache-hit != 'true'
Review

[MINOR] Same supply-chain concern as above: the checksum file is fetched from the same source as the binary, which does not provide strong integrity guarantees if inputs are misconfigured. Strengthen verification by using signed releases or a maintained allowlist of action repos and expected hashes.

**[MINOR]** Same supply-chain concern as above: the checksum file is fetched from the same source as the binary, which does not provide strong integrity guarantees if inputs are misconfigured. Strengthen verification by using signed releases or a maintained allowlist of action repos and expected hashes.
shell: bash shell: bash
run: | run: |
GITEA_URL="${{ inputs.gitea-url || github.server_url }}" SERVER_URL="${{ steps.version.outputs.server-url }}"
REPO="${{ inputs.repo || 'rodin/review-bot' }}" ACTION_REPO="${{ steps.version.outputs.action-repo }}"
VERSION="${{ steps.version.outputs.version }}" VERSION="${{ steps.version.outputs.version }}"
BINARY="review-bot-linux-amd64" BINARY="review-bot-linux-amd64"
security-review-bot marked this conversation as resolved
Review

[MAJOR] Command injection risk: unvalidated SERVER_URL and ACTION_REPO are used in curl calls ("${SERVER_URL}/${ACTION_REPO}/releases/download...") allowing command substitution if inputs contain $(...) or backticks.

**[MAJOR]** Command injection risk: unvalidated SERVER_URL and ACTION_REPO are used in curl calls ("${SERVER_URL}/${ACTION_REPO}/releases/download...") allowing command substitution if inputs contain $(...) or backticks.
curl -sSfL "${GITEA_URL}/${REPO}/releases/download/${VERSION}/${BINARY}" \ curl -sSfL "${SERVER_URL}/${ACTION_REPO}/releases/download/${VERSION}/${BINARY}" \
-o "${{ runner.temp }}/review-bot" -o "${{ runner.temp }}/review-bot"
curl -sSfL "${GITEA_URL}/${REPO}/releases/download/${VERSION}/checksums.txt" \ curl -sSfL "${SERVER_URL}/${ACTION_REPO}/releases/download/${VERSION}/checksums.txt" \
-o "${{ runner.temp }}/checksums.txt" -o "${{ runner.temp }}/checksums.txt"
# Verify SHA-256 checksum # Verify SHA-256 checksum
+47
View File
@@ -0,0 +1,47 @@
# Self-review workflow for strat/review-bot on GitHub Enterprise Server.
# Tests that the composite action runs correctly on GitHub runners:
# - GITHUB_SERVER_URL and GITHUB_REPOSITORY env vars are set correctly
# - Binary is downloaded from gitea.weiker.me (where releases live)
# - Review is posted to the corresponding Gitea PR
name: Review
on:
pull_request:
types: [opened, synchronize]
jobs:
review:
runs-on: ubuntu-24.04
if: github.event_name == 'pull_request'
strategy:
matrix:
include:
- name: sonnet
token_secret: SONNET_REVIEW_TOKEN
model: anthropic--claude-4.6-sonnet
- name: gpt
token_secret: GPT_REVIEW_TOKEN
model: gpt-5
steps:
- uses: actions/checkout@v4
- name: Run ${{ matrix.name }} review
Review

[NIT] The workflow uses the composite action from '.gitea/actions/review'. Since an equivalent exists in '.github/actions/review' and both are meant to be identical, consider referencing the .github path here for consistency when running on GitHub.

**[NIT]** The workflow uses the composite action from '.gitea/actions/review'. Since an equivalent exists in '.github/actions/review' and both are meant to be identical, consider referencing the .github path here for consistency when running on GitHub.
uses: ./.gitea/actions/review
with:
# Download binary from Gitea (releases live there, not on GHE)
Review

[MINOR] The self-test workflow uses ./.gitea/actions/review (the Gitea action path) even though it runs on GitHub. This means the GitHub runner will execute the .gitea/actions/review/action.yml file. This is intentional based on the comment ('Download binary from Gitea, post review to Gitea'), but it's slightly odd that the GitHub workflow uses the .gitea action path — the two action files are now identical, so it works, but it could confuse future maintainers who expect .github/workflows to use .github/actions.

**[MINOR]** The self-test workflow uses `./.gitea/actions/review` (the Gitea action path) even though it runs on GitHub. This means the GitHub runner will execute the `.gitea/actions/review/action.yml` file. This is intentional based on the comment ('Download binary from Gitea, post review to Gitea'), but it's slightly odd that the GitHub workflow uses the `.gitea` action path — the two action files are now identical, so it works, but it could confuse future maintainers who expect `.github/workflows` to use `.github/actions`.
gitea-url: https://gitea.weiker.me
# Post review to the corresponding Gitea repo
Outdated
Review

[MINOR] The self-test workflow uses ./.gitea/actions/review (line 32: uses: ./.gitea/actions/review) rather than ./.github/actions/review. On GitHub the runner will look for local composite actions under .github/, not .gitea/. While GitHub Actions does allow resolving .gitea/ paths in some environments, this is unexpected and could break on a standard GitHub or GHES runner that doesn't support the Gitea convention. The intent of this workflow is to validate the GitHub path, so it should reference ./.github/actions/review.

**[MINOR]** The self-test workflow uses `./.gitea/actions/review` (line 32: `uses: ./.gitea/actions/review`) rather than `./.github/actions/review`. On GitHub the runner will look for local composite actions under `.github/`, not `.gitea/`. While GitHub Actions does allow resolving `.gitea/` paths in some environments, this is unexpected and could break on a standard GitHub or GHES runner that doesn't support the Gitea convention. The intent of this workflow is to validate the GitHub path, so it should reference `./.github/actions/review`.
repo: rodin/review-bot
reviewer-token: ${{ secrets[matrix.token_secret] }}
reviewer-name: ${{ matrix.name }}
llm-model: ${{ matrix.model }}
llm-provider: aicore
aicore-client-id: ${{ secrets.AICORE_CLIENT_ID }}
aicore-client-secret: ${{ secrets.AICORE_CLIENT_SECRET }}
aicore-auth-url: ${{ secrets.AICORE_AUTH_URL }}
aicore-api-url: ${{ secrets.AICORE_API_URL }}
aicore-resource-group: ${{ secrets.AICORE_RESOURCE_GROUP }}
conventions-file: CONVENTIONS.md
patterns-repo: rodin/go-patterns
patterns-files: 'README.md,patterns/'
dry-run: 'true'
timeout: '600'