Skip to content

Conversation

iammasterbrucewayne
Copy link
Contributor

@iammasterbrucewayne iammasterbrucewayne commented Sep 11, 2025

Description

Fixes #8117

  • Added probe_worker_security utility to polkadot/node/core/pvf to run the prepare-worker binary with special CLI flags (similar to version checks)
  • Integrated security probes into determine_workers_paths so that landlock, seccomp (x86_64 only), unshare+change_root, and secure_clone are validated upfront during node startup
  • Ensures node aborts early if required Linux kernel security features are missing, rather than discovering issues later in PVF execution.
  • Test added for probe utility and worker path determination should_probe_worker_security_successfully
  • All probes gated behind #[cfg(target_os = "linux")]

Integration

This PR adds worker security feature probes (landlock, seccomp, unshare+change_root, and secure_clone) during node startup. These checks are executed by calling the prepare-worker binary with CLI flags, in the same way that version checks are already performed.

  • No changes are required for downstream projects consuming polkadot-service or other crates.
  • The interface of determine_workers_paths is unchanged; only its behavior is extended to run additional security checks.
  • On Linux targets, the node will now abort earlier if required kernel security features are unavailable. This does not affect downstream crates unless they rely on bypassing worker binary checks.
  • On non-Linux platforms (macOS, Windows), the probes are gated behind #[cfg(target_os = "linux")], so downstream projects remain unaffected.

Review Notes

This PR extends worker binary validation in determine_workers_paths by adding security capability probes, in addition to the existing version checks.

Previously, determine_workers_paths only:

  • Located worker binaries (prepare and execute).
  • Verified executability.
  • Ensured the workers’ reported version matched the node version.

Now, on Linux hosts, the node also probes security features directly by spawning the worker with special CLI flags. This mirrors the checks already implemented in security.rs but ensures they run at worker selection time, before the node starts up fully.

Implementation Details

  • Added a new utility:
/// Call into a worker binary with a `--check-*` flag (and optionally a path arg).
///
/// Returns `Ok(())` if the check succeeds, otherwise returns an `io::Error` with
/// the worker's stderr.
pub fn probe_worker_security<I, S>(
	worker_path: &Path,
	check_arg: &'static str,
	extra_args: I,
) -> std::io::Result<()>
where
    I: IntoIterator<Item = S>,
    S: AsRef<OsStr>,
{
	let mut command = Command::new(worker_path);

	// Clear env vars. (Running with different envs could affect results.)
	command.env_clear();
	// Restore only what's relevant for logging.
	if let Ok(value) = std::env::var("RUST_LOG") {
		command.env("RUST_LOG", value);
	}

	// Add the flag + any extra args, then execute.
	let output = command.arg(check_arg).args(extra_args).output()?;

	if output.status.success() {
        Ok(())
    } else {
		let stderr = String::from_utf8_lossy(&output.stderr).trim().to_string();
		let msg = if stderr.is_empty() {
			format!("{check_arg}: not available")
		} else {
			format!("{check_arg}: {stderr}")
		};
		Err(std::io::Error::new(std::io::ErrorKind::Other, msg))
	}
}
  • determine_workers_paths now calls this function after version checks:
	let worker_version = polkadot_node_core_pvf::get_worker_version(&exec_worker_path)?;
	if worker_version != node_version { ... }
	#[cfg(target_os = "linux")]
        {
            polkadot_node_core_pvf::probe_worker_security(
                &prep_worker_path,
                "--check-can-enable-landlock",
                std::iter::empty::<&str>(),
            )?;
    
            // Seccomp (only on x86_64)
            #[cfg(target_arch = "x86_64")]
            polkadot_node_core_pvf::probe_worker_security(
                &prep_worker_path,
                "--check-can-enable-seccomp",
                std::iter::empty::<&str>(),
            )?;
    
            // Unshare user namespace + change root needs a temp dir
            let tmp = tempfile::Builder::new()
                .prefix("pvf-check-can-unshare-")
                .tempdir()?;
            polkadot_node_core_pvf::probe_worker_security(
                &prep_worker_path,
                "--check-can-unshare-user-namespace-and-change-root",
                std::iter::once(tmp.path()),
            )?;
    
            // Secure clone
            polkadot_node_core_pvf::probe_worker_security(
                &prep_worker_path,
                "--check-can-do-secure-clone",
                std::iter::empty::<&str>(),
            )?;
        }
  • Added basic unit test:
	#[test]
    fn should_probe_worker_security_successfully() {
        with_temp_dir_structure(|tempdir, _| {
            let worker_path = tempdir.join("usr/bin/polkadot-prepare-worker");
    
            // Write a dummy worker that succeeds on a specific check flag.
            let program = r#"#!/usr/bin/env bash
if [[ "$1" == "--check-can-enable-landlock" ]]; then
    exit 0
else
    echo "unexpected flag: $1" >&2
    exit 1
fi
"#;
            std::fs::write(&worker_path, program)?;
            std::fs::set_permissions(&worker_path, std::os::unix::fs::PermissionsExt::from_mode(0o744))?;
    
            // Expect probe to succeed with the supported flag.
            assert!(polkadot_node_core_pvf::probe_worker_security(
                &worker_path,
                "--check-can-enable-landlock",
                std::iter::empty::<&str>(),
            ).is_ok());
    
            // Expect probe to fail with an unsupported flag.
            assert!(polkadot_node_core_pvf::probe_worker_security(
                &worker_path,
                "--check-can-enable-seccomp",
                std::iter::empty::<&str>(),
            ).is_err());
    
            Ok(())
        }).unwrap();
    }

CI check comments:
I presume that all tests starting with "Zombienet" aren't related to my code, but not 100% sure if "tests linux stable" checks have anything to do with what's introduced in this PR.

Check semver is failing right now, because I thought I should wait for a maintainer's guidance on the versioning to avoid breaking something due to a wrong bump.

fixes paritytech#8117

- Added `probe_worker_security` utility to `polkadot/node/core/pvf` to
run the prepare-worker binary with special CLI flags (similar to version
checks)
- Integrated security probes into `determine_workers_paths` so that
landlock, seccomp (x86_64 only), unshare+change_root, and secure_clone
are validated upfront during node startup
- Ensures node aborts early if required Linux kernel security features
are missing, rather than discovering issues later in PVF execution.
- Test added for probe utility and worker path determination
`should_probe_worker_security_successfully`
- All probes gated behind `#[cfg(target_os = "linux")]`
@bkchr bkchr added the T8-polkadot This PR/Issue is related to/affects the Polkadot network. label Sep 12, 2025
@iammasterbrucewayne iammasterbrucewayne marked this pull request as ready for review September 12, 2025 23:56
Comment on lines +97 to +126
#[cfg(target_os = "linux")]
{
polkadot_node_core_pvf::probe_worker_security(
&prep_worker_path,
"--check-can-enable-landlock",
std::iter::empty::<&str>(),
)?;

// Seccomp (only on x86_64)
#[cfg(target_arch = "x86_64")]
polkadot_node_core_pvf::probe_worker_security(
&prep_worker_path,
"--check-can-enable-seccomp",
std::iter::empty::<&str>(),
)?;

// Unshare user namespace + change root needs a temp dir
let tmp = tempfile::Builder::new().prefix("pvf-check-can-unshare-").tempdir()?;
polkadot_node_core_pvf::probe_worker_security(
&prep_worker_path,
"--check-can-unshare-user-namespace-and-change-root",
std::iter::once(tmp.path()),
)?;

// Secure clone
polkadot_node_core_pvf::probe_worker_security(
&prep_worker_path,
"--check-can-do-secure-clone",
std::iter::empty::<&str>(),
)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of fucked up in my issue description! Sorry :)

These tests that you are calling here are already optional and only enabled when they succeeded before, this is the reason there exist a command line flag for them already.

What I actually want, is that everything in between here: https://github.com/paritytech/polkadot-sdk/blob/d2fd53645654d3b8e12cbf735b67b93078d70113/polkadot/node/core/pvf/common/src/worker/mod.rs#L320-395
That is returning an error, is checked when launching the node. This requires that you introduce a new cli argument that will do these checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it, I'll try to get on it around this weekend

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification earlier! I wasn’t able to find time over the weekend but I'll try to get this done soon (hopefully within this week).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T8-polkadot This PR/Issue is related to/affects the Polkadot network.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Polkadot: Verify the workers at startup
2 participants