Skip to content

Support environment with restricted commands#32

Draft
jo-basevi wants to merge 2 commits intomainfrom
move-conda-env-vars-from-modulefiles
Draft

Support environment with restricted commands#32
jo-basevi wants to merge 2 commits intomainfrom
move-conda-env-vars-from-modulefiles

Conversation

@jo-basevi
Copy link
Copy Markdown
Collaborator

Overall issue we are trying to solve is running these containers alongside conda environments. One way is to not activate the environment via the module file and restrict external launcher script commands to only the tool (e.g. payu).

To avoid activating the environment in the modulefile in 738442c, I tried replicating the modulefile insert that activates the conda environment - but instead write a script that be run as part of a launcher script. I don't think my solution here is that good. Alternative solutions to activate the environment in the launcher scripts could be:

  • Run command using micromamba command directly, e.g.
    ${CONDA_EXE} run --prefix "${CONDA_BASE_ENVIRONMENT}" "${cmd_to_run[@]}". A downside is that none of the environments are currently dependent on the micromamba install at runtime as it's only used to create the environments, and this would add in that dependency.
  • Create an activate script within the environment and run it every time the command inside the environment runs (similar to this previous PR).
  • Have the environment module insert instead set SINGULARITYENV_* variables so they are only set when container is launched. This might have an issue when multiple modules that set SINGULARITYENV_* are loaded at the same time, and it'll use the most recently loaded module values.

In c96af86, I'm creating -lite version of the module that only activates the environment in the launcher script and exposes a limited set of commands. So there would a payu/$version module, which works similar to before, and a payu/$version-lite module. Both modules use the same squashfs environment file. So it wouldn't add too much in terms of storage. The idea of keeping the original environment module is because it's used for creating virtual environments.

This PR shouldn't be merged as hopefully there's a cleaner solution, and it is not necessary until the payu environment has limited tools installed which are only needed for running payu.

…pt commands

- Add launcher_commands array to environments config.sh. If empty, a lite environment isn't setup
- Add a command_cmd_v1 module file that points to the limited script directory, and the same squashfs as the full environment.
It does not activate the environment when the module is loaded.
- Move setting ENV_LAUNCHER_SCRIPT_PATH in the launcher script (used by payu) rather than the common launcher conf file
@atteggiani
Copy link
Copy Markdown
Collaborator

atteggiani commented Oct 3, 2025

Thank you @jo-basevi for the explanation above.
I had a thought about this, and would like to discuss some specifics here.

Please feel free to correct any mistakes or add up to my understandings below:

In the current payu environments there are a few commands and packages users might want to use:

  • the payu command, which is an application supposed to be run only from the command line (not within a Python script) and, therefore, each version of it can have a specific set of dependencies to be run with (pretty much its own environment where it should run).
  • some other tools similar to payu in their type (i.e. to be used as an application). Some examples are um2nc or Minghang's experiment-generator and experiment-runner, and there might be others that I'm not aware of.
  • some packages to be used as Python APIs, within python scripts. I cannot get any specific one on the top of my mind.

If the above is true, I propose the following general solution:
Separate the "applications" from the Python APIs.

Practically:

  • I would create a module to load all the tools, including payu (for continuity, this could be the "new" payu module). This module would expose only the tools entry points, which in turn would call the bash script that runs the singularity container with the proper environment. Which specific environment is run can be controlled directly by the specific tool, with the possibility to reuse the same environment for multiple tools if they are compatible (for example payu could use the environment X and experiment-generator could use environment Y, but um2nc might be able to use Y as well).
    Note different versions of the same tool could use different environments.
    Also note that "environment" here doesn't refer only to the Python environment, but its the more general shell environment (including the python environment too) because that's possible through singularity.
    This solution would allow this module to be loaded and used together with any Python (even conda) environment, since it doesn't load a python environment. The environments are used only at runtime, in a "subshell" (singularity).

  • For the Python APIs, we might create and expose a python environment (hopefully just one) through the load of a different module (for example payu-env, but again it doesn't need to have "payu" in its name) that more or less works similarly to how the current payu module works (without exposing the actual payu, um2nc, etc. commands though).
    For this point however, I don't think there is much benefit in creating another Python environment when there is conda/analysis3 already. So I would prefer directing any Python API need to those environments instead.

What do you think about this approach?

@jo-basevi
Copy link
Copy Markdown
Collaborator Author

Thanks @atteggiani for looking into this.

the payu command, which is an application supposed to be run only from the command line (not within a Python script)

Mostly yes, though I think experiment-generator/experiment_runner has some payu imports - so I imagine they will then share an environment.

I think um2nc would be OK to be a different environment as it's normally run in a separate post-processing job for payu configurations. payu has "user-scripts" which are scripts that are run at various points of the main payu model run job. Though a current issue with the payu singularity environments, is that these user-scripts can't easily load another singularity environment unless the squashfs files were added an overlay in the main payu singularity launch command. So at the moment, the user-scripts are limited to using the environment that payu is currently using - or loading a separate un-containerised conda environment.

I would create a module to load all the tools, including payu (for continuity, this could be the "new" payu module). This module would expose only the tools entry points, which in turn would call the bash script that runs the singularity container with the proper environment.

Ok so similar to the "lite" restricted command environment, but with separating the environments out? I think that sounds good!

We will need to think about what to do for the payu environment used in CI repro testing, as currently it is virtual python environments build on top of a payu environment so the test environments can be light-weight. So it currently needs the python executable for the payu environments to build the virtual environment and then still have access to the payu commands.

@atteggiani
Copy link
Copy Markdown
Collaborator

atteggiani commented Oct 5, 2025

Thank you for the replies @jo-basevi.

has some payu imports

With this, do you mean they have lines like import payu or from payu import ...?
Or they simply import some packages that are currently in the payu conda environment?

payu has "user-scripts" which are scripts that are run at various points of the main payu model run job. Though a current issue with the payu singularity environments, is that these user-scripts can't easily load another singularity environment unless the squashfs files were added an overlay in the main payu singularity launch command. So at the moment, the user-scripts are limited to using the environment that payu is currently using - or loading a separate un-containerised conda environment.

Hmm okay, this is definitely a problem to think about carefully.
Within the entire "model run job" (so when the payu run command is run) I believe the payu singularity environment (which I will refer to as containerized wrapper environment here) is "strictly" needed only for certain tasks, that are the ones that involve utilising payu as the manager to run other scripts and binaries. But there will be some other tasks that I believe don't run within the same containerized wrapper environment (for example the actual model run, runs in a PBS job and has its own environment, independent from the payu containerized wrapper environment). And this tasks are still triggered by payu within its containerized wrapper environment. Wouldn't something similar be possible for any task that should not strictly depend on the containerized wrapper environment (for example any user-scripts)?
I can't give more details because I don't know exactly how payu works internally, but based on my rough understanding, I think qualitatively it should be possible.

Ok so similar to the "lite" restricted command environment, but with separating the environments out? I think that sounds good!

Yes I think so. And the environments should be thought as being separate and independent, but of course if they are compatible among multiple tools, the same environment can be used by those multiple tools.

We will need to think about what to do for the payu environment used in CI repro testing, as currently it is virtual python environments build on top of a payu environment so the test environments can be light-weight. So it currently needs the python executable for the payu environments to build the virtual environment and then still have access to the payu commands.

So the CI repo testing would need both access to the payu command-line interface, and also to the Python environment where that payu CLI runs into?
Basically it needs to run the tests (I think it uses pytest right?) using the same Python environment used by the payu CLI right?
If that's the case, to run the tests we could simply manually run the Python env by running payu ..., which would be something like: /path/to/launcher.sh /path/to/bin/python/inside/singularity/ -m pytest ....
Note this works even without loading the payu module (for example for the latest payu 1.1.7:

/g/data/vk83/apps/conda_scripts/payu-1.1.7.d/bin/launcher.sh /g/data/vk83/apps/base_conda/envs/payu-1.1.7/bin/python

@jo-basevi
Copy link
Copy Markdown
Collaborator Author

With this, do you mean they have lines like import payu or from payu import ...?

Yes, it has lines with "from payu import ...".

But there will be some other tasks that I believe don't run within the same containerized wrapper environment (for example the actual model run, runs in a PBS job and has its own environment, independent from the payu containerized wrapper environment).

When payu submits a PBS job, it's submitting another payu command that runs within the PBS job - so the payu containerized environment is being launched on the PBS node. It then setups up for the model run, runs the mpirun command directly using the subprocess lib, and then runs any archive steps.

So the CI repo testing would need both access to the payu command-line interface, and also to the Python environment where that payu CLI runs into?

Yes, it uses "payu" cli commands, and also I think for the "access-om3" tests, they are importing "payu.model" to reuse a configuration parsing method..

/path/to/launcher.sh /path/to/bin/python/inside/singularity/ -m pytest ...

We would still need a way to install test dependencies such as pytest, and the test package, but it could work, e.g.:

module load payu/<version>
# Launch a shell inside the container
launcher.sh bash
/path/to/bin/python/inside/singularity/  -m venv <path/to/test-venv> --system-site-packages
source <path/to/test-venv>/bin/activate
pip install model-config-tests
# ..

@atteggiani
Copy link
Copy Markdown
Collaborator

Yes, it has lines with "from payu import ...".

Oh ok, well in that case this means that payu dependency need to be installed in the Python environment used for the experiment-generator tools. This might as well mean that the same Python environment is used for payu tool and experiment-generator tools, unless there are incompatibilities that make payu not compatible with other dependencies that the experiment-generator might need.

When payu submits a PBS job, it's submitting another payu command that runs within the PBS job - so the payu containerized environment is being launched on the PBS node. It then setups up for the model run, runs the mpirun command directly using the subprocess lib, and then runs any archive steps.

Yes, alright. In this case we might have to find a good solution for this.
Even though the payu containerised environment is running on the PBS node, I still don't think the actual mpirun command would run within the container right? Otherwise which filesystem is mounted on the container that runs the mpirun? What if the model run job needs data in a project folder that is not mount within the container?

The problem about the user-scripts is only related to the impossibility of loading another container (e.g., loading the conda/analysis3 environment I believe)? Is it also a problem making sure that all project folders used within the user-scripts are mounted within the container, or that is not an issue because is handled in some ways?

Yes, it uses "payu" cli commands, and also I think for the "access-om3" tests, they are importing "payu.model" to reuse a configuration parsing method..

I think in general anything that needs payu as an API might be able to use the same environment.

We would still need a way to install test dependencies such as pytest, and the test package, but it could work, ...

Yes, I think we can find ways to make that work in the best way for our use case.

@jo-basevi
Copy link
Copy Markdown
Collaborator Author

Even though the payu containerised environment is running on the PBS node, I still don't think the actual mpirun command would run within the container right? Otherwise which filesystem is mounted on the container that runs the mpirun? What if the model run job needs data in a project folder that is not mount within the container?

When running inside the container, it seems to have access to /g/data and /scratch directories that the user has access to. This might be a configured setting on gadi with singularity generally as I can't find any code matches to scratch in this repository.

When running a python process inside a container, I don't know how to "break out" of the container to run shell commands but then return to the python process in the container if that makes sense?

The problem about the user-scripts is only related to the impossibility of loading another container (e.g., loading the conda/analysis3 environment I believe)?

Yes, it can't load another container. One way to get around it with the conda/analysis environments (as they use the same container setup), is to load all required modules before running anything inside a container. Each module prepends the environment squashfs files to CONTAINER_OVERLAY_PATH environment variable. This means when the the container is launched, all squashfs files are added as an overlay - so all environments files are accessible inside the container.

@atteggiani
Copy link
Copy Markdown
Collaborator

atteggiani commented Oct 7, 2025

When running a python process inside a container, I don't know how to "break out" of the container to run shell commands but then return to the python process in the container if that makes sense?

Yes I get what you mean, you could do it within payu (for example using ssh), but not only by setting the launcher in a specific way (without changing how payu handles that). At least I cannot think about any generalisable solution that might work for any tool run from a containerised environment.

Yes, it can't load another container. One way to get around it with the conda/analysis environments (as they use the same container setup), is to load all required modules before running anything inside a container. Each module prepends the environment squashfs files to CONTAINER_OVERLAY_PATH environment variable. This means when the the container is launched, all squashfs files are added as an overlay - so all environments files are accessible inside the container.

Can you point me out to a containerised environment that is currently not possible to run within the payu containerised environment?

@atteggiani
Copy link
Copy Markdown
Collaborator

atteggiani commented Oct 7, 2025

Using ssh localhost ... to run user-scripts and other parts that don't need the payu environment (for example the mpirun command I think) is still a viable option.
I don't know if you have considered that, but that definitely would make it possible to run "not payu-related code" in the host environment (wherever environment payu was called from).

This doesn't fix the other issues with containerised environment, but would make it possible loading a different conda environment (even containerised) in a user-script.

@jo-basevi
Copy link
Copy Markdown
Collaborator Author

Oh I did not know about ssh localhost! Is it possible to run that on PBS nodes? If so, that sounds like it could be an option if we are sticking to running payu in singularity.

Can you point me out to a containerised environment that is currently not possible to run within the payu containerised environment?

Currently conda/analysis3 would not work in a userscript - payu currently does not load modules on the pbs node before running the python code.

@atteggiani
Copy link
Copy Markdown
Collaborator

atteggiani commented Oct 7, 2025

Oh I did not know about ssh localhost! Is it possible to run that on PBS nodes? If so, that sounds like it could be an option if we are sticking to running payu in singularity.

I think by default apptainer has the same network namespace as the host, so localhost within the container would be the same localhost "outside" of the container.

I think from a PBS job localhost would resolve to the same PBS computing node (not the host that started the PBS job, that would be impossible I believe, for security reasons). However, we could still use it to "get out" from the container. We would definitely need proper testing to be 100% sure.

In general, I would think about using ssh localhost <command> when issuing commands from within payu that are using subprocess. That would make those commands execute on the host (either login nodes or computing nodes) outside of the container.

Currently conda/analysis3 would not work in a userscript - payu currently does not load modules on the pbs node before running the python code.

Are userscripts the only problematic part about loading another containerised environment? Are there any other use-cases where containerisation creates problem?

@jo-basevi
Copy link
Copy Markdown
Collaborator Author

jo-basevi commented Oct 8, 2025

In general, I would think about using ssh localhost when issuing commands from within payu that are using subprocess. That would make those commands execute on the host (either login nodes or computing nodes) outside of the container.

Yeah I wonder if any commands would need some "state", e.g. environment variables set up or be in the same current working directory as the payu python process. So would need some testing.

Are userscripts the only problematic part about loading another containerised environment? Are there any other use-cases where containerisation creates problem?

I think userscripts are the only problematic part in payu about loading another containerised environment. Though if I remember about more use-cases I'll make sure to add an update!

@atteggiani
Copy link
Copy Markdown
Collaborator

atteggiani commented Oct 8, 2025

Note

Apologies but I thought I had replied with a different message (like I'm doing here), but instead I had edited your previous message 😅 I didn't mean to do that. I rewrote your initial message.

Yeah I wonder if any commands would need some "state", e.g. environment variables set up or be in the same current working directory as the payu python process. So would need some testing.

Yes, in general there might be some minor set up steps to the commands, but in general I think such a scenario is achievable.

I think userscripts are the only problematic part in payu about loading another containerised environment. Though if I remember about more use-cases I'll make sure to add an update!

Ok, in this case running userscripts via ssh localhost ... would solve the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants