ci: add Socket Firewall (SFW) composite action and test workflow#250
ci: add Socket Firewall (SFW) composite action and test workflow#250
Conversation
|
Updated container with node being built: https://github.com/NomicFoundation/solx/actions/runs/22863167235 |
hedgar2017
left a comment
There was a problem hiding this comment.
Review: SFW CI Integration
Overall the approach is sound — SHA-pinning socketdev/action, the ${SFW_PREFIX:-} pattern for graceful degradation, and the label-gated test workflow are all good design choices. Two bugs need fixing before this is ready.
🔴 Bugs (blocking)
1. setup-sfw does not skip Windows (.github/actions/setup-sfw/action.yml:26)
The PR description and action description both say Windows is skipped, but the detect step only checks for containers. On a Windows runner, available=true is set and socketdev/action is invoked unconditionally. If socketdev/action doesn't support Windows, the workflow fails instead of skipping gracefully. Either add an explicit Windows skip or correct the documentation to say Windows is intentionally included.
2. cargo-checks container job: "Set SFW prefix" step is a no-op (.github/workflows/test.yaml:66–71)
The cargo-checks job runs in the CI container which doesn't have sfw installed. command -v sfw always fails, so SFW_PREFIX is never set and all the prefixed cargo calls in cargo-check/action.yml fall back to plain cargo. The build-and-test matrix job correctly does npm install -g sfw before the prefix check; cargo-checks is missing the equivalent step.
🟡 Medium concerns
3. npm install -g sfw is unpinned
Both sfw-test.yaml:90 and test.yaml:148 install SFW without a version — ironic for a supply-chain security tool. Pin to a specific version (e.g. sfw@1.2.3).
4. Known SFW+cargo SSL failure in production path
The continue-on-error: true on the test workflow's sfw cargo fetch step documents a known SSL incompatibility. The production workflows (test.yaml) also prefix cargo commands with ${SFW_PREFIX:-} and have no such safety net. Confirm the SSL issue is resolved before promoting this beyond the ci:sfw label gate, or link a tracking issue in the comment.
5. sanitizer job has no SFW coverage
The sanitizer job in test.yaml has neither npm install -g sfw nor "Set SFW prefix". This may be intentional (it's a specialized label-gated job), but if so it's worth a comment explaining the omission.
🟢 Things that look correct
socketdev/actionis pinned to a full 40-char SHA — correct supply chain hygiene. ✓socketdev/ socket.dev is a legitimate and well-known supply chain security vendor. ✓${SFW_PREFIX:-}expansion is correct POSIX bash and works correctly in bothbashandmsys2 {0}shells on Windows (both inherit$GITHUB_ENVand$GITHUB_PATH). ✓/.dockerenv+$containerheuristic is reliable for GitHub Actions Docker containers. ✓cargo2junitleft unprefixed is correct — it's a local binary reading stdin, not a network call. ✓- The label-check → dependent-jobs pattern for
ci:sfwgating is correct. ✓ - Node.js 24 via nodesource is appropriate; the
curl | bashrisk exists only at image-build time and is mitigated by GPG package signing. ✓ - Image SHA updated consistently across all six affected workflow files. ✓
Remove redundant sfw-test.yaml since SFW is exercised in all real workflows. Add SFW install to cargo-checks, integration-tests, and coverage jobs. Remove SFW prefix from local-only commands (check, fmt, clippy, udeps) in cargo-check action — only network commands (fetch, install) need the firewall wrapper. Split chained fetch+test in coverage.yaml so only fetch gets the SFW prefix. Replace curl|bash nodesource install in Dockerfile with multi-stage COPY from node:24-slim pinned by digest. Pin ubuntu:24.04 base image by digest. Remove lsb-release and software-properties-common packages that were only needed by the nodesource setup script.
Add setup-sfw composite action that installs SFW with platform detection (skips Windows and container environments) and a label-gated (ci:sfw) test workflow to validate SFW across runners, containers, and Windows.
The container job needs packages:read to pull from GHCR.
The SFW container job needs npm to install the Socket Firewall CLI. Bake Node.js into the image via NodeSource so container jobs don't need a separate setup-node step.
Point all workflows at the newly built solx-ci-runner image which includes Node.js 24. Also remove the setup-node stopgap from the SFW container job since npm is now available in the image.
- Remove Windows skip from setup-sfw action (SFW now ships Windows binaries) - Add macOS x86 (macos-15-intel) and Windows (windows-2025) to sfw-test matrix - Remove windows-skip verification job (no longer needed) - Use `command -v sfw` instead of `which sfw` for cross-platform compatibility - Add SFW setup to test.yaml build-and-test job (native runners via action, container jobs via npm install)
Add SFW_PREFIX environment variable set after SFW installation in
build-and-test and cargo-checks jobs. All cargo invocations in
composite actions now use ${SFW_PREFIX:-} prefix — when set to "sfw"
it routes through Socket Firewall, when unset it's a no-op so other
workflows (release, integration-tests) are unaffected.
Modified actions: rust-unit-tests, cargo-check, build-llvm, build-solc,
build-rust.
SFW intercepts HTTPS via a MITM proxy and auto-injects its CA cert for Node.js (NODE_EXTRA_CA_CERTS), but cargo's TLS backends don't trust it, causing `sfw cargo fetch` to fail on Windows and macOS runners. The setup-sfw action now discovers SFW's CA cert, creates a combined PEM bundle (system CAs + SFW CA), and exports CARGO_HTTP_CAINFO/GIT_SSL_CAINFO so cargo and git trust the proxy cert. sfw-test.yaml gains diagnostic steps to confirm the cert path and a side-by-side comparison of cargo fetch with and without the CA fix.
On Windows, `sfw sh -c '...'` spawns sh via cmd.exe which splits the Git Bash path at spaces (`C:\Program Files\...`), causing "'C:\Program' is not recognized" errors. Using `sfw node -e` avoids this since Node.js handles its own path resolution.
SFW wraps every command's stdout with banner/footer lines, so capturing the cert path via `sfw node -e "console.log(...)"` returns a multi-line string that fails the file existence check on all platforms. On Windows, inline JS quoting also breaks under cmd.exe with set -e aborting the step. Fix by writing a small JS script to a temp file, running it under sfw (with stdout/stderr redirected to /dev/null), and reading the cert path from a temp output file. This avoids both stdout pollution and Windows quoting issues.
Rustup uses rustls-native-certs which honors SSL_CERT_FILE but not CARGO_HTTP_CAINFO. Without this, rustup toolchain syncs fail with InvalidCertificate(UnknownIssuer) when routed through SFW's MITM proxy.
Cargo's libgit2 uses platform-native TLS (LibreSSL on macOS, Schannel on Windows) which rejects SFW's MITM proxy certificate even with GIT_SSL_CAINFO set. Setting CARGO_NET_GIT_FETCH_WITH_CLI=true makes cargo shell out to the system git CLI, which uses OpenSSL and properly honors GIT_SSL_CAINFO on all platforms.
SFW's MITM proxy has fundamental TLS incompatibilities with macOS (LibreSSL rejects cert with RSA padding error) and Windows (Schannel rejects cert signature). These are crypto-level failures, not CA trust issues — no amount of CA bundle configuration can fix them. - Add Linux-only gate in setup-sfw action (outputs available=false on non-Linux with a notice) - Remove macOS/Windows from sfw-test.yaml runner matrix - Remove now-unnecessary macOS/Windows CA paths, keychain export fallback, and CARGO_NET_GIT_FETCH_WITH_CLI workaround
Remove redundant sfw-test.yaml since SFW is exercised in all real workflows. Add SFW install to cargo-checks, integration-tests, and coverage jobs. Remove SFW prefix from local-only commands (check, fmt, clippy, udeps) in cargo-check action — only network commands (fetch, install) need the firewall wrapper. Split chained fetch+test in coverage.yaml so only fetch gets the SFW prefix. Replace curl|bash nodesource install in Dockerfile with multi-stage COPY from node:24-slim pinned by digest. Pin ubuntu:24.04 base image by digest. Remove lsb-release and software-properties-common packages that were only needed by the nodesource setup script.
|
One more docker image build: https://github.com/NomicFoundation/solx/actions/runs/23412242798 |
Remove redundant sfw-test.yaml since SFW is exercised in all real workflows. Add SFW install to cargo-checks, integration-tests, and coverage jobs. Remove SFW prefix from local-only commands (check, fmt, clippy, udeps) in cargo-check action — only network commands (fetch, install) need the firewall wrapper. Split chained fetch+test in coverage.yaml so only fetch gets the SFW prefix. Replace curl|bash nodesource install in Dockerfile with multi-stage COPY from node:24-slim pinned by digest. Pin ubuntu:24.04 base image by digest. Remove lsb-release and software-properties-common packages that were only needed by the nodesource setup script.
Point all workflows at the newly built solx-ci-runner image which includes Node.js 24 via nodesource apt repo.
Address PR #250 review feedback: - Add ${SFW_PREFIX:-} to all bare cargo commands in integration-tests.yaml and coverage.yaml for consistent SFW monitoring coverage. - Pin npm install -g sfw to sfw@2.0.4 across all workflows to avoid unpinned installs of a supply-chain security tool.
Extend setup-sfw action to handle both container and non-container environments, eliminating the repeated install/prefix pattern from all workflow files. The action now detects the environment and either installs via socketdev/action (bare-metal) or npm (container), then sets SFW_PREFIX automatically.
| run: cargo build --frozen --release --bin solx-dev --bin solx-tester | ||
| run: ${SFW_PREFIX:-} cargo build --frozen --release --bin solx-dev --bin solx-tester | ||
|
|
||
| - name: Build LLVM (PR) |
There was a problem hiding this comment.
Duplicate PR build block. Lines 154-179 (Build LLVM (PR), Build solc (PR), Fetch dependencies (PR), Build solx (PR), Build solx-dev and solx-tester (PR)) duplicate the work already done at lines 80-131. Looks like a merge artifact. These should be removed — the LLVM/solc builds are cached so they won't recompile, but the cargo steps and step overhead are unnecessary.
| runs-on: solx-linux-amd64 | ||
| container: | ||
| image: ghcr.io/nomicfoundation/solx-ci-runner@sha256:c0866d146261cd0a51dc7d9077444b8ac3dde12c53d2151137834e6be149dbc7 | ||
| image: ghcr.io/nomicfoundation/solx-ci-runner@sha256:c2b03000a1074d2cc3e6cf25a1c4fdc6eb0a61d23e63bef59205050249fa1d6e |
There was a problem hiding this comment.
This workflow uses build-llvm, build-solc, and rust-unit-tests composite actions that now have ${SFW_PREFIX:-} cargo prefixes, but there's no Setup SFW step in this job. SFW monitoring is silently absent. If this is intentional (specialized label-gated job), a comment explaining the exclusion would help. Otherwise, add uses: ./.github/actions/setup-sfw after the submodule checkout.
| # (GitHub sets 'container' env var for container jobs; | ||
| # /.dockerenv is a fallback heuristic) | ||
| if [[ -n "${container:-}" ]] || [[ -f /.dockerenv ]]; then |
There was a problem hiding this comment.
The comment says "GitHub sets 'container' env var for container jobs" but GitHub Actions exposes the container configuration through the job.container expression context, not as a shell environment variable. ${container:-} will always be empty in bash. The detection works correctly via the /.dockerenv fallback, but the first condition is dead code and the comment is misleading.
Consider simplifying to just [[ -f /.dockerenv ]] or passing ${{ job.container.id }} as an input to the action.
| - name: Set SFW prefix | ||
| id: set-prefix | ||
| if: steps.detect.outputs.method != 'none' |
There was a problem hiding this comment.
When method=none (macOS/Windows), this step is skipped entirely, leaving outputs.available as an empty string rather than "false". The output description on line 16 says true/false, so any consumer checking == 'false' would get wrong results.
Either add a separate step for the none branch that writes available=false, or document that callers must check != 'true'.
| if [ -z "$SFW_CA" ] || [ ! -f "$SFW_CA" ]; then | ||
| echo "::warning::SFW CA cert not found — cargo may fail with SSL errors under sfw" | ||
| exit 0 |
There was a problem hiding this comment.
When the SFW CA cert isn't found, this warns and exits 0, but the set-prefix step at line 113 still runs and sets SFW_PREFIX=sfw. Subsequent ${SFW_PREFIX:-} cargo commands will route through SFW's MITM proxy without proper CA trust, producing opaque SSL errors.
Consider either skipping the prefix setup on cert failure (e.g. set an output that the set-prefix step checks) or failing the step.
| ${SFW_PREFIX:-} cargo install cargo-udeps@0.1.60 --locked | ||
| cargo +nightly udeps --frozen --all-targets |
There was a problem hiding this comment.
cargo install cargo-udeps is prefixed (line 39) but cargo +nightly udeps --frozen on the next line is not. The --frozen flag means it doesn't hit the network, so this is functionally fine, but it's inconsistent with the approach in rust-unit-tests which prefixes all cargo commands including --frozen ones.
| - name: "Linux x86 gnu" | ||
| runner: ubuntu-24.04 | ||
| image: ghcr.io/nomicfoundation/solx-ci-runner@sha256:c0866d146261cd0a51dc7d9077444b8ac3dde12c53d2151137834e6be149dbc7 | ||
| image: ghcr.io/nomicfoundation/solx-ci-runner@sha256:c2b03000a1074d2cc3e6cf25a1c4fdc6eb0a61d23e63bef59205050249fa1d6e |
There was a problem hiding this comment.
slang-tests.yaml, cache-warmup.yaml, and release.yaml all use composite actions that have ${SFW_PREFIX:-} cargo prefixes but never call setup-sfw. The ${SFW_PREFIX:-} graceful degradation means builds still work, but SFW monitoring is silently absent. release.yaml is especially notable since it produces published binaries — the highest-value target for supply chain attacks.
Add setup-sfw composite action that installs SFW with platform detection (skips Windows and container environments) and a label-gated (ci:sfw) test workflow to validate SFW across runners, containers, and Windows.