Hello dear GRETA developers,
I'm trying to run the GRETA pipeline on our cluster and have encountered a few issues, primarily during the execution of local rules when targeting anl/topo/brain.all.sims_mult.csv. For now I am trying to run the standard pipeline on the brain dataset as a test. Later I would like to run it on a neuron dataset that we produced.
It seems some issues that I have encountered stem from the execution of the pipeline within the conda environment that I initially set up to run the workflow. Would it be possible for you to give a full environment definition of your “snakemake” or “greta” env?
My Setup:
• Snakemake Version: 9.2.0 (installed via Conda/Mamba)
• Conda Environment (greta_env):
o Python: 3.11.12
o Key packages installed: snakemake-executor-plugin-slurm, htslib, r-base, bioconductor-biomart, snapatac2==2.8.0.
• So far I had to adjust the slurm profile. Slurm Profile (config/slurm/config.yaml):
o Using executor: slurm (native SLURM plugin).
o Default resources include slurm_partition: mediumq, slurm_account: lab_gsf, slurm_extra: "--qos=mediumq", runtime, mem_mb.
o use-conda: true and conda-prefix: "/path/to/my/conda/envs" are set.
o use-apptainer: true.
• Container Runtime: Apptainer (loaded via module load apptainer on the system).
Issues Encountered:
-
Container Image Dependency/Race Condition:
o When Snakemake runs local rules that use Singularity/Apptainer images (e.g., gen_genome_celloracle trying to use workflow/envs/celloracle.sif), it sometimes fails with: FATAL: could not open image /path/to/greta/workflow/envs/celloracle.sif: no such file or directory
o This seems to happen because the rule attempts to use the image before the corresponding dwn_image rule has finished downloading it.
o Observation: It seems these rules (e.g., gen_genome_celloracle) do not list the .sif file as an explicit input: dependency, which might be causing this race condition.
-
Dictys Conda Environment Issues:
o Rules related to Dictys (e.g., install_dictys, gen_ann_dictys) are failing with errors related to Conda environment activation. The conda: directive in the Snakefile (e.g., workflow/rules/dbs/gen.smk for gen_ann_dictys) appears to use a hardcoded absolute path: conda: "../../../../../home/user/miniforge3/envs/dictys".
o This results in errors like: EnvironmentLocationNotFound: Not a conda environment: /home/user/miniforge3/envs/dictys and CondaValueError: Invalid environment name: '/home/user/miniforge3/envs/dictys'. Characters not allowed: {':', '/', ' ', '#'}
o This also leads to subsequent errors like dictys_helper: command not found because the environment isn't activated correctly.o Observation: Using an absolute path as an environment name in the conda: directive seems to be the issue. Snakemake typically expects a relative path to an environment.yaml or a simple environment name.
Done so far:
• Successfully ran initial data preprocessing steps for the "brain" dataset up to dts/brain/cases/all/mdata.h5mu and dts/brain/cases/all/runs/pando.pre.h5mu
• The current errors arise when targeting anl/top/brain.all.sims_mult.csv.
- Are there known best practices or specific configurations for the container mage dependencies to ensure they are present before use?
- What is the recommended way to set up and specify the Conda environment for Dictys within the GRETA workflow? Is there an environment.yaml for it?
- Is there a specific version of Snakemake or other key dependencies that the current version of GRETA was tested with?
Any guidance on resolving these setup issues for the local rules would be greatly appreciated.
Thanks,
BR
Michael
Hello dear GRETA developers,
I'm trying to run the GRETA pipeline on our cluster and have encountered a few issues, primarily during the execution of local rules when targeting anl/topo/brain.all.sims_mult.csv. For now I am trying to run the standard pipeline on the brain dataset as a test. Later I would like to run it on a neuron dataset that we produced.
It seems some issues that I have encountered stem from the execution of the pipeline within the conda environment that I initially set up to run the workflow. Would it be possible for you to give a full environment definition of your “snakemake” or “greta” env?
My Setup:
• Snakemake Version: 9.2.0 (installed via Conda/Mamba)
• Conda Environment (greta_env):
o Python: 3.11.12
o Key packages installed: snakemake-executor-plugin-slurm, htslib, r-base, bioconductor-biomart, snapatac2==2.8.0.
• So far I had to adjust the slurm profile. Slurm Profile (config/slurm/config.yaml):
o Using executor: slurm (native SLURM plugin).
o Default resources include slurm_partition: mediumq, slurm_account: lab_gsf, slurm_extra: "--qos=mediumq", runtime, mem_mb.
o use-conda: true and conda-prefix: "/path/to/my/conda/envs" are set.
o use-apptainer: true.
• Container Runtime: Apptainer (loaded via module load apptainer on the system).
Issues Encountered:
Container Image Dependency/Race Condition:
o When Snakemake runs local rules that use Singularity/Apptainer images (e.g., gen_genome_celloracle trying to use workflow/envs/celloracle.sif), it sometimes fails with: FATAL: could not open image /path/to/greta/workflow/envs/celloracle.sif: no such file or directory
o This seems to happen because the rule attempts to use the image before the corresponding dwn_image rule has finished downloading it.
o Observation: It seems these rules (e.g., gen_genome_celloracle) do not list the .sif file as an explicit input: dependency, which might be causing this race condition.
Dictys Conda Environment Issues:
o Rules related to Dictys (e.g., install_dictys, gen_ann_dictys) are failing with errors related to Conda environment activation. The conda: directive in the Snakefile (e.g., workflow/rules/dbs/gen.smk for gen_ann_dictys) appears to use a hardcoded absolute path: conda: "../../../../../home/user/miniforge3/envs/dictys".
o This results in errors like: EnvironmentLocationNotFound: Not a conda environment: /home/user/miniforge3/envs/dictys and CondaValueError: Invalid environment name: '/home/user/miniforge3/envs/dictys'. Characters not allowed: {':', '/', ' ', '#'}
o This also leads to subsequent errors like dictys_helper: command not found because the environment isn't activated correctly.o Observation: Using an absolute path as an environment name in the conda: directive seems to be the issue. Snakemake typically expects a relative path to an environment.yaml or a simple environment name.
Done so far:
• Successfully ran initial data preprocessing steps for the "brain" dataset up to dts/brain/cases/all/mdata.h5mu and dts/brain/cases/all/runs/pando.pre.h5mu
• The current errors arise when targeting anl/top/brain.all.sims_mult.csv.
Any guidance on resolving these setup issues for the local rules would be greatly appreciated.
Thanks,
BR
Michael