Skip to content

ci: add Socket Firewall (SFW) composite action and test workflow#250

Open
nebasuke wants to merge 20 commits intomainfrom
add-sfw-ci
Open

ci: add Socket Firewall (SFW) composite action and test workflow#250
nebasuke wants to merge 20 commits intomainfrom
add-sfw-ci

Conversation

@nebasuke
Copy link
Member

@nebasuke nebasuke commented Mar 9, 2026

Add setup-sfw composite action that installs SFW with platform detection (skips Windows and container environments) and a label-gated (ci:sfw) test workflow to validate SFW across runners, containers, and Windows.

@nebasuke nebasuke added the ci:sfw Temp label to test sfw label Mar 9, 2026
@nebasuke
Copy link
Member Author

nebasuke commented Mar 9, 2026

Updated container with node being built: https://github.com/NomicFoundation/solx/actions/runs/22863167235

Copy link
Contributor

@hedgar2017 hedgar2017 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: SFW CI Integration

Overall the approach is sound — SHA-pinning socketdev/action, the ${SFW_PREFIX:-} pattern for graceful degradation, and the label-gated test workflow are all good design choices. Two bugs need fixing before this is ready.


🔴 Bugs (blocking)

1. setup-sfw does not skip Windows (.github/actions/setup-sfw/action.yml:26)
The PR description and action description both say Windows is skipped, but the detect step only checks for containers. On a Windows runner, available=true is set and socketdev/action is invoked unconditionally. If socketdev/action doesn't support Windows, the workflow fails instead of skipping gracefully. Either add an explicit Windows skip or correct the documentation to say Windows is intentionally included.

2. cargo-checks container job: "Set SFW prefix" step is a no-op (.github/workflows/test.yaml:66–71)
The cargo-checks job runs in the CI container which doesn't have sfw installed. command -v sfw always fails, so SFW_PREFIX is never set and all the prefixed cargo calls in cargo-check/action.yml fall back to plain cargo. The build-and-test matrix job correctly does npm install -g sfw before the prefix check; cargo-checks is missing the equivalent step.


🟡 Medium concerns

3. npm install -g sfw is unpinned
Both sfw-test.yaml:90 and test.yaml:148 install SFW without a version — ironic for a supply-chain security tool. Pin to a specific version (e.g. sfw@1.2.3).

4. Known SFW+cargo SSL failure in production path
The continue-on-error: true on the test workflow's sfw cargo fetch step documents a known SSL incompatibility. The production workflows (test.yaml) also prefix cargo commands with ${SFW_PREFIX:-} and have no such safety net. Confirm the SSL issue is resolved before promoting this beyond the ci:sfw label gate, or link a tracking issue in the comment.

5. sanitizer job has no SFW coverage
The sanitizer job in test.yaml has neither npm install -g sfw nor "Set SFW prefix". This may be intentional (it's a specialized label-gated job), but if so it's worth a comment explaining the omission.


🟢 Things that look correct

  • socketdev/action is pinned to a full 40-char SHA — correct supply chain hygiene. ✓
  • socketdev / socket.dev is a legitimate and well-known supply chain security vendor. ✓
  • ${SFW_PREFIX:-} expansion is correct POSIX bash and works correctly in both bash and msys2 {0} shells on Windows (both inherit $GITHUB_ENV and $GITHUB_PATH). ✓
  • /.dockerenv + $container heuristic is reliable for GitHub Actions Docker containers. ✓
  • cargo2junit left unprefixed is correct — it's a local binary reading stdin, not a network call. ✓
  • The label-check → dependent-jobs pattern for ci:sfw gating is correct. ✓
  • Node.js 24 via nodesource is appropriate; the curl | bash risk exists only at image-build time and is mitigated by GPG package signing. ✓
  • Image SHA updated consistently across all six affected workflow files. ✓

Copy link
Contributor

@hedgar2017 hedgar2017 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more suggestion:

nebasuke added a commit that referenced this pull request Mar 18, 2026
Remove redundant sfw-test.yaml since SFW is exercised in all real
workflows. Add SFW install to cargo-checks, integration-tests, and
coverage jobs. Remove SFW prefix from local-only commands (check, fmt,
clippy, udeps) in cargo-check action — only network commands (fetch,
install) need the firewall wrapper. Split chained fetch+test in
coverage.yaml so only fetch gets the SFW prefix.

Replace curl|bash nodesource install in Dockerfile with multi-stage
COPY from node:24-slim pinned by digest. Pin ubuntu:24.04 base image
by digest. Remove lsb-release and software-properties-common packages
that were only needed by the nodesource setup script.
nebasuke added 12 commits March 22, 2026 12:51
Add setup-sfw composite action that installs SFW with platform detection
(skips Windows and container environments) and a label-gated (ci:sfw)
test workflow to validate SFW across runners, containers, and Windows.
The container job needs packages:read to pull from GHCR.
The SFW container job needs npm to install the Socket Firewall CLI.
Bake Node.js into the image via NodeSource so container jobs don't
need a separate setup-node step.
Point all workflows at the newly built solx-ci-runner image which
includes Node.js 24. Also remove the setup-node stopgap from the
SFW container job since npm is now available in the image.
- Remove Windows skip from setup-sfw action (SFW now ships Windows binaries)
- Add macOS x86 (macos-15-intel) and Windows (windows-2025) to sfw-test matrix
- Remove windows-skip verification job (no longer needed)
- Use `command -v sfw` instead of `which sfw` for cross-platform compatibility
- Add SFW setup to test.yaml build-and-test job (native runners via action,
  container jobs via npm install)
Add SFW_PREFIX environment variable set after SFW installation in
build-and-test and cargo-checks jobs. All cargo invocations in
composite actions now use ${SFW_PREFIX:-} prefix — when set to "sfw"
it routes through Socket Firewall, when unset it's a no-op so other
workflows (release, integration-tests) are unaffected.

Modified actions: rust-unit-tests, cargo-check, build-llvm, build-solc,
build-rust.
SFW intercepts HTTPS via a MITM proxy and auto-injects its CA cert for
Node.js (NODE_EXTRA_CA_CERTS), but cargo's TLS backends don't trust it,
causing `sfw cargo fetch` to fail on Windows and macOS runners.

The setup-sfw action now discovers SFW's CA cert, creates a combined PEM
bundle (system CAs + SFW CA), and exports CARGO_HTTP_CAINFO/GIT_SSL_CAINFO
so cargo and git trust the proxy cert.

sfw-test.yaml gains diagnostic steps to confirm the cert path and a
side-by-side comparison of cargo fetch with and without the CA fix.
On Windows, `sfw sh -c '...'` spawns sh via cmd.exe which splits
the Git Bash path at spaces (`C:\Program Files\...`), causing
"'C:\Program' is not recognized" errors. Using `sfw node -e` avoids
this since Node.js handles its own path resolution.
SFW wraps every command's stdout with banner/footer lines, so capturing
the cert path via `sfw node -e "console.log(...)"` returns a multi-line
string that fails the file existence check on all platforms. On Windows,
inline JS quoting also breaks under cmd.exe with set -e aborting the step.

Fix by writing a small JS script to a temp file, running it under sfw
(with stdout/stderr redirected to /dev/null), and reading the cert path
from a temp output file. This avoids both stdout pollution and
Windows quoting issues.
Rustup uses rustls-native-certs which honors SSL_CERT_FILE but not
CARGO_HTTP_CAINFO. Without this, rustup toolchain syncs fail with
InvalidCertificate(UnknownIssuer) when routed through SFW's MITM proxy.
Cargo's libgit2 uses platform-native TLS (LibreSSL on macOS, Schannel
on Windows) which rejects SFW's MITM proxy certificate even with
GIT_SSL_CAINFO set. Setting CARGO_NET_GIT_FETCH_WITH_CLI=true makes
cargo shell out to the system git CLI, which uses OpenSSL and properly
honors GIT_SSL_CAINFO on all platforms.
SFW's MITM proxy has fundamental TLS incompatibilities with macOS
(LibreSSL rejects cert with RSA padding error) and Windows (Schannel
rejects cert signature). These are crypto-level failures, not CA trust
issues — no amount of CA bundle configuration can fix them.

- Add Linux-only gate in setup-sfw action (outputs available=false
  on non-Linux with a notice)
- Remove macOS/Windows from sfw-test.yaml runner matrix
- Remove now-unnecessary macOS/Windows CA paths, keychain export
  fallback, and CARGO_NET_GIT_FETCH_WITH_CLI workaround
nebasuke added a commit that referenced this pull request Mar 22, 2026
Remove redundant sfw-test.yaml since SFW is exercised in all real
workflows. Add SFW install to cargo-checks, integration-tests, and
coverage jobs. Remove SFW prefix from local-only commands (check, fmt,
clippy, udeps) in cargo-check action — only network commands (fetch,
install) need the firewall wrapper. Split chained fetch+test in
coverage.yaml so only fetch gets the SFW prefix.

Replace curl|bash nodesource install in Dockerfile with multi-stage
COPY from node:24-slim pinned by digest. Pin ubuntu:24.04 base image
by digest. Remove lsb-release and software-properties-common packages
that were only needed by the nodesource setup script.
@nebasuke
Copy link
Member Author

nebasuke commented Mar 22, 2026

One more docker image build: https://github.com/NomicFoundation/solx/actions/runs/23412242798

Remove redundant sfw-test.yaml since SFW is exercised in all real
workflows. Add SFW install to cargo-checks, integration-tests, and
coverage jobs. Remove SFW prefix from local-only commands (check, fmt,
clippy, udeps) in cargo-check action — only network commands (fetch,
install) need the firewall wrapper. Split chained fetch+test in
coverage.yaml so only fetch gets the SFW prefix.

Replace curl|bash nodesource install in Dockerfile with multi-stage
COPY from node:24-slim pinned by digest. Pin ubuntu:24.04 base image
by digest. Remove lsb-release and software-properties-common packages
that were only needed by the nodesource setup script.
Point all workflows at the newly built solx-ci-runner image which
includes Node.js 24 via nodesource apt repo.
Address PR #250 review feedback:
- Add ${SFW_PREFIX:-} to all bare cargo commands in
  integration-tests.yaml and coverage.yaml for consistent
  SFW monitoring coverage.
- Pin npm install -g sfw to sfw@2.0.4 across all workflows
  to avoid unpinned installs of a supply-chain security tool.
Extend setup-sfw action to handle both container and non-container
environments, eliminating the repeated install/prefix pattern from
all workflow files. The action now detects the environment and either
installs via socketdev/action (bare-metal) or npm (container), then
sets SFW_PREFIX automatically.
@nebasuke nebasuke requested a review from hedgar2017 March 23, 2026 21:38
run: cargo build --frozen --release --bin solx-dev --bin solx-tester
run: ${SFW_PREFIX:-} cargo build --frozen --release --bin solx-dev --bin solx-tester

- name: Build LLVM (PR)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate PR build block. Lines 154-179 (Build LLVM (PR), Build solc (PR), Fetch dependencies (PR), Build solx (PR), Build solx-dev and solx-tester (PR)) duplicate the work already done at lines 80-131. Looks like a merge artifact. These should be removed — the LLVM/solc builds are cached so they won't recompile, but the cargo steps and step overhead are unnecessary.

runs-on: solx-linux-amd64
container:
image: ghcr.io/nomicfoundation/solx-ci-runner@sha256:c0866d146261cd0a51dc7d9077444b8ac3dde12c53d2151137834e6be149dbc7
image: ghcr.io/nomicfoundation/solx-ci-runner@sha256:c2b03000a1074d2cc3e6cf25a1c4fdc6eb0a61d23e63bef59205050249fa1d6e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow uses build-llvm, build-solc, and rust-unit-tests composite actions that now have ${SFW_PREFIX:-} cargo prefixes, but there's no Setup SFW step in this job. SFW monitoring is silently absent. If this is intentional (specialized label-gated job), a comment explaining the exclusion would help. Otherwise, add uses: ./.github/actions/setup-sfw after the submodule checkout.

Comment on lines +37 to +39
# (GitHub sets 'container' env var for container jobs;
# /.dockerenv is a fallback heuristic)
if [[ -n "${container:-}" ]] || [[ -f /.dockerenv ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "GitHub sets 'container' env var for container jobs" but GitHub Actions exposes the container configuration through the job.container expression context, not as a shell environment variable. ${container:-} will always be empty in bash. The detection works correctly via the /.dockerenv fallback, but the first condition is dead code and the comment is misleading.

Consider simplifying to just [[ -f /.dockerenv ]] or passing ${{ job.container.id }} as an input to the action.

Comment on lines +113 to +115
- name: Set SFW prefix
id: set-prefix
if: steps.detect.outputs.method != 'none'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When method=none (macOS/Windows), this step is skipped entirely, leaving outputs.available as an empty string rather than "false". The output description on line 16 says true/false, so any consumer checking == 'false' would get wrong results.

Either add a separate step for the none branch that writes available=false, or document that callers must check != 'true'.

Comment on lines +76 to +78
if [ -z "$SFW_CA" ] || [ ! -f "$SFW_CA" ]; then
echo "::warning::SFW CA cert not found — cargo may fail with SSL errors under sfw"
exit 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the SFW CA cert isn't found, this warns and exits 0, but the set-prefix step at line 113 still runs and sets SFW_PREFIX=sfw. Subsequent ${SFW_PREFIX:-} cargo commands will route through SFW's MITM proxy without proper CA trust, producing opaque SSL errors.

Consider either skipping the prefix setup on cert failure (e.g. set an output that the set-prefix step checks) or failing the step.

Comment on lines +39 to 40
${SFW_PREFIX:-} cargo install cargo-udeps@0.1.60 --locked
cargo +nightly udeps --frozen --all-targets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo install cargo-udeps is prefixed (line 39) but cargo +nightly udeps --frozen on the next line is not. The --frozen flag means it doesn't hit the network, so this is functionally fine, but it's inconsistent with the approach in rust-unit-tests which prefixes all cargo commands including --frozen ones.

- name: "Linux x86 gnu"
runner: ubuntu-24.04
image: ghcr.io/nomicfoundation/solx-ci-runner@sha256:c0866d146261cd0a51dc7d9077444b8ac3dde12c53d2151137834e6be149dbc7
image: ghcr.io/nomicfoundation/solx-ci-runner@sha256:c2b03000a1074d2cc3e6cf25a1c4fdc6eb0a61d23e63bef59205050249fa1d6e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slang-tests.yaml, cache-warmup.yaml, and release.yaml all use composite actions that have ${SFW_PREFIX:-} cargo prefixes but never call setup-sfw. The ${SFW_PREFIX:-} graceful degradation means builds still work, but SFW monitoring is silently absent. release.yaml is especially notable since it produces published binaries — the highest-value target for supply chain attacks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:sfw Temp label to test sfw

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants