ci: fix the always-failing PR checks (LFS-404 + upstream-only gating) by Pterjudin · Pull Request #48 · OpenCortexIDE/cortexide

Pterjudin · 2026-05-24T09:20:58Z

Summary

Every PR on this repo currently shows ~15 red checks. Almost all of them have always been red since long before the 1.118.1 sync (confirmed on PR #44 from Feb 2026). The team has been relying on cortexide-builder for real validation. This PR triages the inherited-from-upstream noise without disabling any check that's catching real regressions.

This is a draft; do not merge until reviewed.

Root causes found

After pulling logs from PRs #46 and #47, the failures collapse into three buckets:

LFS-404 (drives ~15 of the failures). actions/checkout@v6 with lfs: true fails because every LFS pointer in the repo resolves to 404 on our LFS server:
```
[d326ad15c05a...] Object does not exist on the server: [404]
error: failed to fetch some objects from 'https://github.com/OpenCortexIDE/cortexide.git/info/lfs'
The process '/usr/bin/git' failed with exit code 2
```
The only LFS-tracked files in the repo are extensions/copilot/test/simulation/cache/*.sqlite (and extensions/copilot/.gitattributes declares them). The pointer files are committed; the actual blobs were never uploaded to our LFS endpoint.
Microsoft-specific infrastructure the fork doesn't have credentials/runners/endpoints for:
- Monaco Editor checks — part of upstream's npm-publish flow for monaco-editor.
- Prevent engineering system changes in PRs — queries microsoft/vscode collaborator permissions (403 Resource not accessible by integration) and references vs-code-engineering[bot].
- Check API Proposal Version Changes — enforces upstream's vscode.proposed.*.d.ts versioning policy that we don't manage.
- Checking Component Screenshots — uploads to hediet-screenshots.azurewebsites.net with an OIDC token our identity isn't authorized for (curl: (22) The requested URL returned error: 403).
- copilot-setup-steps — uses runs-on: vscode-large-runners (Microsoft-only self-hosted runner), sits queued 24h before GitHub force-fails it.
Real test failures — Component Fixture Tests is failing 8 Playwright tests in tests/imageCarousel.spec.ts (locator('.image-carousel-editor') never becomes visible). This fails identically on PR Sync/vscode 1.110.0 #44 from February, so it's a pre-existing bug unrelated to the 1.118.1 rebase. Not fixed in this PR; see follow-ups below.

Fixes in this PR

Check	Root cause	Fix
Compile & Hygiene	LFS-404	`lfs: false` in pr.yml (compile doesn't read sqlite)
Linux / {Browser, Electron, Remote}	LFS-404	`lfs: false` in pr-linux-test.yml
macOS / {Browser, Electron, Remote}	LFS-404	`lfs: false` in pr-darwin-test.yml
Windows / {Browser, Electron, Remote}	LFS-404	`lfs: false` in pr-win32-test.yml
Linux / CLI	LFS-404	`lfs: false` in pr-linux-cli-test.yml
Copilot - Check Telemetry	LFS-404	`lfs: false` in pr.yml (telemetry extractor reads TS sources)
Copilot - Check Test Cache	needs LFS-tracked sqlite	Gated off on fork via `if: github.repository_owner != 'OpenCortexIDE'`
Copilot - Test (Linux)	needs LFS-tracked sqlite for simulate-ci	Gated off on fork
Copilot - Test (Windows)	needs LFS-tracked sqlite for simulate-ci	Gated off on fork
Monaco Editor checks	upstream npm-publish flow + LFS-404	Gated off on fork
Prevent engineering system changes in PRs	queries microsoft/vscode permissions	Gated off on fork
Check API Proposal Version Changes	upstream API governance	Gated off on fork
Checking Component Screenshots	upstream Azure endpoint (403)	Gated off on fork
copilot-setup-steps	uses upstream's self-hosted runner pool	Gated off on fork

All gating uses if: github.repository_owner != 'OpenCortexIDE' so the workflows are kept verbatim and will run normally on any other fork that has the upstream infra. To revert, drop the if: line.

Intentionally NOT fixed

Component Fixture Tests (Playwright imageCarousel timeouts) — this is a real test failure, but it pre-existed the rebase by several months and isn't load-bearing for shipping (the actual product build runs in cortexide-builder, which is green). Fixing it requires investigating why .image-carousel-editor never mounts in headless Chromium and is out of scope here. Suggest a separate fix/image-carousel-playwright PR.
chat-lib tests (ubuntu/macos/windows) — these are passing and unaffected.
Check metadata (telemetry.yml) — passing.
pr-node-modules.yml — push-to-main only, doesn't run on PRs.
sessions-e2e.yml / chat-perf.yml — already workflow_dispatch-only.
The cortexide-builder workflows in the sibling repo were not touched.

Follow-ups requiring user decision (not in this PR)

LFS data: do we want to actually upload the copilot simulation sqlite files to our LFS server (~1-2 GB) so the simulate-ci jobs can run? If yes, ungate the three copilot test jobs. If no, leave gated.
Engineering-system guardrail: upstream's "no PR may modify .github/workflows or build/" rule is a sensible policy. If you want it enforced on this repo too, the workflow could be rewritten to query OpenCortexIDE/cortexide collaborator permissions instead of microsoft/vscode. Happy to do that in a follow-up.
Screenshot diff service: if there's appetite, we can stand up our own Azure-Static-Web-Apps screenshot service and re-enable the workflow.

Test plan

Open this draft PR and watch the checks list — only valid checks should run; the gated ones should not appear (or appear as skipped).
Confirm Compile & Hygiene, Linux/macOS/Windows × Browser/Electron/Remote, Linux / CLI, and Copilot - Check Telemetry all get past actions/checkout (they may still fail later for other reasons — that's a separate problem we want to see).
Confirm chat-lib tests and Check metadata still pass.
Verify gated workflows show no run for this PR head SHA.

🤖 Generated with Claude Code

The repo's only LFS-tracked files are copilot simulation-cache sqlite databases under extensions/copilot/test/simulation/cache/. Their LFS objects are not present on our LFS server (all return 404), which makes every `actions/checkout@v6` with `lfs: true` fail before any real work. The Linux/macOS/Windows electron/browser/remote tests and the Linux CLI Rust tests don't touch those files, so we can safely set `lfs: false` to unblock checkout. The copilot simulation jobs themselves are addressed separately in a follow-up commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nt jobs - Compile & Hygiene and Copilot - Check Telemetry don't touch the LFS- tracked simulation cache, so switch `lfs: true` to `lfs: false` to unblock checkout (our LFS server returns 404 for those objects). - Copilot - Check Test Cache, Copilot - Test (Linux), and Copilot - Test (Windows) genuinely open the sqlite simulation databases via cache-cli check / simulate-ci. Without LFS data they cannot work, so gate them off on our fork (`if: github.repository_owner != 'OpenCortexIDE'`) rather than masking the missing data. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Monaco Editor is published from microsoft/vscode; this checks belongs to that publishing flow and isn't relevant for downstream forks. It also hit the LFS-404 issue, which the gate makes moot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This workflow queries microsoft/vscode collaborator permissions and enforces upstream's bot allow-list. It returns 403 on our token and references identities (vs-code-engineering[bot], etc.) that don't apply to this fork. Engineering-system changes here are governed by normal PR review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This check enforces upstream VS Code's policy around versioned proposed APIs (vscode.proposed.*.d.ts). We don't publish proposed APIs from this fork and don't want to block PRs that touch d.ts files inherited from an upstream sync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The workflow uploads to hediet-screenshots.azurewebsites.net (upstream's Azure-hosted screenshot diff service) using a token derived from OIDC. Our OIDC identity is not authorized at that endpoint (403 on every run). Until we run an equivalent screenshot service, skip the workflow on this fork rather than fail every PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This workflow targets `runs-on: vscode-large-runners`, a self-hosted runner pool that exists only in the microsoft/vscode org. On our fork the job sits queued for 24h and is then auto-failed by GitHub. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Tajudeen and others added 7 commits May 24, 2026 10:19

Pterjudin marked this pull request as ready for review May 24, 2026 09:24

Pterjudin merged commit a5a6126 into main May 24, 2026
14 of 25 checks passed

Pterjudin deleted the fix/pr-checks-2026-05-24 branch May 24, 2026 09:24

Pterjudin mentioned this pull request May 25, 2026

fix(build): restore type-namespace usage for event-stream #49

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: fix the always-failing PR checks (LFS-404 + upstream-only gating)#48

ci: fix the always-failing PR checks (LFS-404 + upstream-only gating)#48
Pterjudin merged 7 commits into
mainfrom
fix/pr-checks-2026-05-24

Pterjudin commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Pterjudin commented May 24, 2026

Summary

Root causes found

Fixes in this PR

Intentionally NOT fixed

Follow-ups requiring user decision (not in this PR)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant