@@ -12,7 +12,7 @@ up here, that workflow can be retired.
1212| Platform | Label | Specs | Disk |
1313| --- | --- | --- | --- |
1414| Linux x64 | ` warp-ubuntu-2204-x64-32x ` | 32 vCPU / 128 GB | 150 GB |
15- | Windows x64 | ` warp-windows-2025 -x64-32x ` | 32 vCPU / 128 GB | 256 GB |
15+ | Windows x64 | ` warp-windows-2022 -x64-32x ` | 32 vCPU / 128 GB | 256 GB |
1616| macOS arm64 | ` warp-macos-15-arm64-12x ` | M4 Pro, 12 vCPU / 44 GB | 270 GB |
1717
1818There is no 32-core macOS tier; 12x is WarpBuild's largest Mac. WarpBuild
@@ -21,6 +21,51 @@ apply — but `timeout-minutes` must be set explicitly (the implicit default is
2121360). Linux's 150 GB disk is the tightest fit: ~ 60-75 GB checkout +
2222~ 25-40 GB out dir + OS image. The workflow prints ` df -h ` after each build.
2323
24+ ## One-time setup (WarpBuild)
25+
26+ The ` warpbuildbot ` GitHub app is installed org-wide on ` browseros-ai `
27+ (since 2026-06-11). Two more things must be true before any ` warp-* ` job
28+ leaves ` queued ` :
29+
30+ 1 . ** The org must allow self-hosted runners on public repos.** WarpBuild
31+ runners register as org-level self-hosted runners, and GitHub blocks
32+ those on public repositories by default
33+ (https://www.warpbuild.com/docs/ci/public-repos ). BrowserOS is public,
34+ so an org admin must check: Organization Settings → Actions → Runner
35+ groups → Default → "Allow public repositories". Via API (needs
36+ ` admin:org ` scope):
37+
38+ ``` bash
39+ gh auth refresh -h github.com -s admin:org
40+ gh api orgs/browseros-ai/actions/runner-groups \
41+ --jq ' .runner_groups[] | {id, name, allows_public_repositories}'
42+ gh api -X PATCH " orgs/browseros-ai/actions/runner-groups/<id>" \
43+ -F allows_public_repositories=true
44+ ```
45+
46+ Before flipping the toggle, check what else lives in that group — it
47+ widens exposure for every runner in it:
48+
49+ ``` bash
50+ gh api " orgs/browseros-ai/actions/runner-groups/<id>/runners" \
51+ --jq ' .runners[] | {name, status, labels: [.labels[].name]}'
52+ ```
53+
54+ Expect only ephemeral ` warp-* ` runners (usually none while idle). The
55+ signed-nightly Mac (` browseros-builder ` ) is registered at the repo
56+ level, so this org-group toggle does not change its exposure. If the
57+ group ever holds other persistent org-level runners, give WarpBuild a
58+ dedicated runner group instead of widening Default.
59+
60+ 2 . ** The WarpBuild org must be active** : sign in at
61+ https://app.warpbuild.com/ , confirm the ` browseros-ai ` connection and
62+ that billing/credits are set up — runners are not provisioned without
63+ an active account.
64+
65+ Smoke test after changing either:
66+ ` gh workflow run "Nightly Release Build" -f platforms=linux ` , then watch
67+ the build job leave ` queued ` within ~ 5 minutes (` gh run watch ` ).
68+
2469## Per-night pipeline (per platform)
2570
26711 . ` actions/checkout ` + ` astral-sh/setup-uv ` .
@@ -127,3 +172,44 @@ The first run per platform is the cache warm-up; expect cold timings. If a
127172pin bump lands, the next night is cold again for that version. To force a
128173fresh checkout, bump the ` v1 ` in the cache key (workflow) — for Windows also
129174delete the old object under ` ci-cache/chromium/ ` in R2.
175+
176+ ## Troubleshooting: jobs stuck in ` queued `
177+
178+ A job no runner ever picked up shows ` runner_id: 0 ` and empty steps:
179+
180+ ``` bash
181+ gh run view < run-id> --json jobs --jq ' .jobs[] | {name, status}'
182+ gh api repos/browseros-ai/BrowserOS/actions/jobs/< job-id> \
183+ --jq ' {status, runner_id, runner_name, labels}'
184+ ```
185+
186+ Causes, in the order to check:
187+
188+ 1 . ** Runner group blocks public repos** — see one-time setup above. This
189+ stalls all platforms at once.
190+ 2 . ** Label not in WarpBuild's catalog** — supported images: Ubuntu
191+ 22.04/24.04 (x64, arm64), macOS 14/15/26 (arm64), Windows Server 2022
192+ (x64) (https://www.warpbuild.com/docs/ci/preinstalled-software ). An
193+ unsupported label queues forever (this workflow originally shipped a
194+ ` windows-2025 ` label that WarpBuild does not image); WarpBuild reports
195+ no error back to GitHub.
196+ 3 . ** WarpBuild account** — org connection or billing lapsed
197+ (https://app.warpbuild.com/ ).
198+ 4 . ** WarpBuild capacity or incident** — rare; check their dashboard.
199+
200+ Mechanics worth knowing:
201+
202+ - GitHub discards self-hosted jobs queued for more than 24h, and the
203+ workflow's ` nightly-release ` concurrency group
204+ (` cancel-in-progress: false ` ) makes the next run wait (newer pending
205+ runs supersede older pending ones) — one stuck night delays the next
206+ by a full day (runs 27367077749 → 27407228486 did exactly this). The ` queue-watchdog ` job therefore steps in at the
207+ 20-minute mark: it cancels the run when no build job is actually
208+ running (everything stuck in queue or already finished), and fails
209+ loudly without cancelling while any build is in progress. In that
210+ mixed case, cancel the run manually once the live builds finish — a
211+ still-queued job otherwise pins the group for up to 24h with no
212+ watcher left.
213+ - Fixing the root cause does not revive already-queued jobs: WarpBuild
214+ provisions on the ` workflow_job.queued ` webhook, which has already
215+ fired. Cancel the stuck run and re-dispatch.
0 commit comments