Skip to content

Commit

Permalink
Add pipeline manager (#181)
Browse files Browse the repository at this point in the history
* Add pspipe-run binary to handle pipeline scheme through yaml files

* fix script_base_dir path and use same syntax as slurm for number of nodes

* Add batch mode and README instructions

The batch mode only produces the script file, no batch script is actually sent to slurm computer
farm.

* Update batch mode support

* Finalize batch mode

* Add cmdline argument to feed simulation numbers to heavy computation scripts

* Fix interactive mode

* Fix slurm nnodes
  • Loading branch information
xgarrido authored Jan 8, 2025
1 parent e6127b5 commit 2827b72
Show file tree
Hide file tree
Showing 14 changed files with 926 additions and 122 deletions.
5 changes: 0 additions & 5 deletions .github/workflows/testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,6 @@ jobs:
with:
python-version: ${{ matrix.python-version }}

- name: Ubuntu dependencies
if: matrix.os == 'ubuntu-latest'
run: |
sudo apt-get install -y libcfitsio-dev libfftw3-dev
- name: Install dependencies via pip
run: |
python -m pip install --upgrade pip wheel numpy
Expand Down
162 changes: 65 additions & 97 deletions project/data_analysis/README.rst
Original file line number Diff line number Diff line change
@@ -1,126 +1,94 @@
**************************
***************************************************
INSTALLING THE ACT POWER SPECTRUM PIPELINE AT NERSC
**************************
***************************************************

Here we give instructions to install and to run the full thing on interactive nodes, you can of
course also submit it to NERSC standard nodes

Installation steps
---------------------------------------------------------
------------------

First, we strongly recommand to install everything in a virtual ``python`` environment in order to
avoid clash with other ``python`` modules installed, for instance, within the ``.local``
directory. You can use the following script to install everything (the ``mpi4py`` installation
command is especially important @ NERSC)
directory. You can use the `setup.sh
<https://github.com/simonsobs/PSpipe/tree/master/project/data_analysis/setup.sh>`_ script file to
install everything (the ``mpi4py`` installation command is especially important @ NERSC).

.. code:: shell
slurm_account=mp107
export SBATCH_ACCOUNT=${slurm_account}
export SALLOC_ACCOUNT=${slurm_account}
module load python
module load intel
base_dir=/path/to/base/dir
pyenv_dir=${base_dir}/pyenv/perlmutter
if [ ! -d ${pyenv_dir} ]; then
python -m venv ${pyenv_dir}
source ${pyenv_dir}/bin/activate
python -m pip install -U pip wheel
python -m pip install ipython
python -m pip install numpy
module swap PrgEnv-${PE_ENV,,} PrgEnv-gnu
MPICC="cc -shared" pip install --force-reinstall --no-cache-dir --no-binary=mpi4py mpi4py
fi
software_dir=${base_dir}/software
if [ ! -d ${software_dir} ]; then
mkdir -p ${software_dir}
(
cd ${software_dir}
git clone [email protected]:simonsobs/pspy.git
cd pspy
python -m pip install -e .
)
(
cd ${software_dir}
git clone [email protected]:simonsobs/pspipe_utils.git
cd pspipe_utils
python -m pip install -e .
)
(
cd ${software_dir}
git clone [email protected]:simonsobs/PSpipe.git
cd PSpipe
python -m pip install -e .
)
(
cd ${software_dir}
git clone [email protected]:AdriJD/optweight.git
cd optweight
python -m pip install -e .
)
(
cd ${software_dir}
git clone [email protected]:amaurea/enlib.git
cd enlib
export ENLIB_COMP=cca_intel
make array_ops
)
(
cd ${software_dir}
git clone [email protected]:simonsobs/sofind.git
cd sofind
python -m pip install -e .
)
(
cd ${software_dir}
git clone [email protected]:simonsobs/mnms.git
cd mnms
python -m pip install -e .
)
fi
export SOFIND_SYSTEM=perlmutter
export PYTHONPATH=$PYTHONPATH:${software_dir}
source ${pyenv_dir}/bin/activate
The ``base_dir`` is where everything (virtual env. and ``pspipe`` scripts) will be located. Save the
above commands within a ``setup.sh`` file and run it with
source setup.sh
In this case, everything will be installed in the current working directory. You can set the
installation path by exporting the ``BASE_DIR`` before running the ``source`` command.

Every time you log to NERSC machines, you **need to source this file** with ``source setup.sh`` to
get into the virtual environment and use the proper software suite.

Running the DR6 pipelines
-------------------------

When installing ``pspipe``, you will get a ``pspipe-run`` binary that can be used to sequentially
run the different modules involved in DR6 power spectra production. The ``pspipe-run`` is feed by a
``yaml`` file that holds the sequence of modules with their corresponding computer resources needs @
NERSC. For instance, you can run the DR6 pipeline (see next section) with the following command

.. code:: shell
source setup.sh
pspipe-run -p data_analysis/yaml/pipeline_dr6.yml
Within the ``yaml/pipeline_dr6.yml`` file, the pipeline itself is defined inside the ``pipeline``
block where a module block is defined as follow

.. code:: yaml
The first time you run the script, it will install everything. Every time you log to NERSC machines,
you **need to source this file** with ``source setup.sh`` to get into the virtual environment and
use the proper software suite.
get_covariance_blocks:
force: true
minimal_needed_time: 03:00:00
slurm:
nodes: 2
ntasks: 8
cpus_per_task: 64
Requirements
============
The module name refers to the ``python`` script located in ``data_analysis/python``
directory. Another script directory can be set on top of the ``yaml`` file with the
``script_base_dir`` variable. The ``force: true`` directive means the module will always be
processed even if it was already done. The other parameters relate to slurm allocation when running
the pipeline in an **interactive node**. If you want to use the pipeline in batch mode, you can
refer to `pipeline_mnms.yml
<https://github.com/simonsobs/PSpipe/tree/master/project/data_analysis/yaml/pipeline_mnms.yml>`_.

* pspy >= 1.7.0
* pspipe_utils >= 0.1.4
The next sections will be linked to their corresponding ``pipeline.yml`` file.


Running the dr6 main analysis
---------------------------------------------------------
-----------------------------

To run the main dr6 analysis follow the instructions in `dr6 <https://github.com/simonsobs/PSpipe/tree/master/project/data_analysis/dr6.rst/>`_
To run the main dr6 analysis follow the detailed instructions in `dr6
<https://github.com/simonsobs/PSpipe/tree/master/project/data_analysis/dr6.rst>`_. You can also run
the whole pipeline (with a limited set of 50 simulations) with the `pipeline_dr6.yml
<https://github.com/simonsobs/PSpipe/tree/master/project/data_analysis/yaml/pipeline_dr6.yml>`_
file.

Running the dr6xPlanck pipeline
---------------------------------------------------------
To run the dr6xPlanck analysis follow the instructions in `dr6xplanck <https://github.com/simonsobs/PSpipe/tree/master/project/data_analysis/dr6xplanck.rst/>`_
-------------------------------

To run the dr6xPlanck analysis follow the instructions in `dr6xplanck
<https://github.com/simonsobs/PSpipe/tree/master/project/data_analysis/dr6xplanck.rst>`_. The
corresponding pipeline file is `pipeline_dr6xplanck.yml
<https://github.com/simonsobs/PSpipe/tree/master/project/data_analysis/yaml/pipeline_dr6xplanck.yml>`_.

Estimation of the dust
---------------------------------------------------------
To estimate the dust in the dr6 patch we use Planck 353 GHz maps, follow the instructions in `dust <https://github.com/simonsobs/PSpipe/tree/master/project/data_analysis/dust.rst/>`_
----------------------

To estimate the dust in the dr6 patch we use Planck 353 GHz maps, follow the instructions in `dust
<https://github.com/simonsobs/PSpipe/tree/master/project/data_analysis/dust.rst/>`_ and run the
pipeline with the `pipeline_dust.yml
<https://github.com/simonsobs/PSpipe/tree/master/project/data_analysis/yaml/pipeline_dust.yml>`_.

Running our reproduction of the Planck pipeline
---------------------------------------------------------
To run a reproduction of the Planck official result follow the instructions in `planck <https://github.com/simonsobs/PSpipe/tree/master/project/data_analysis/planck.rst/>`_
-----------------------------------------------

To run a reproduction of the Planck official result follow the instructions in `planck
<https://github.com/simonsobs/PSpipe/tree/master/project/data_analysis/planck.rst>`_ (same as before
with the `pipeline_planck.yml
<https://github.com/simonsobs/PSpipe/tree/master/project/data_analysis/yaml/pipeline_planck.yml>`_).
23 changes: 19 additions & 4 deletions project/data_analysis/python/montecarlo/mc_mnms_get_nlms.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,19 @@
from pspipe_utils import log
import numpy as np
import time
import sys
import argparse

# Parse arguments
parser = argparse.ArgumentParser()

parser.add_argument("--iStart", help="Set starting index of simultions", type=int)
parser.add_argument("--iStop", help="Set stopping index of simulations", type=int)
parser.add_argument("--bunch", help="Set bunch index", type=int)
parser.add_argument("--nbunch", help="Set number of simulation per bunch", default=50, type=int)
args, dict_file = parser.parse_known_args()

d = so_dict.so_dict()
d.read_from_file(sys.argv[1])
d.read_from_file(dict_file[0])
log = log.get_logger(**d)

surveys = ["dr6"]
Expand Down Expand Up @@ -47,6 +56,12 @@
for id_split in range(n_splits[sv]):
mpi_list.append((sv, id_split))

iStart = args.iStart or d["iStart"]
iStop = args.iStop or d["iStop"]
if args.bunch is not None:
iStart = int(args.bunch * args.nbunch)
iStop = int((args.bunch + 1) * args.nbunch) - 1

# we will use mpi over the number of splits
so_mpi.init(True)
#subtasks = so_mpi.taskrange(imin=d["iStart"], imax=d["iStop"])
Expand All @@ -62,7 +77,7 @@
t1 = time.time()
for wafer_name in wafers:

for iii in range(d["iStart"], d["iStop"]+1):
for iii in range(iStart, iStop+1):

t2 = time.time()
sim_arrays = noise_models[wafer_name].get_sim(split_num=k,
Expand All @@ -84,4 +99,4 @@

noise_models[wafer_name].cache_clear()

log.info(f"split {k} {d['iStop']-d['iStart']+1} sims generated in {time.time()-t1:.2f} s")
log.info(f"split {k} {iStop-iStart+1} sims generated in {time.time()-t1:.2f} s")
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,26 @@
the fg is based on fgspectra, note that the noise sim include the pixwin so we have to convolve the signal sim with it
"""

import sys
import time
import argparse

import healpy as hp
import numpy as np
from pixell import curvedsky, enmap
from pspipe_utils import kspace, log, misc, pspipe_list, simulation, transfer_function
from pspy import pspy_utils, so_dict, so_map, so_mcm, so_mpi, so_spectra, sph_tools

# Parse arguments
parser = argparse.ArgumentParser()

parser.add_argument("--iStart", help="Set starting index of simultions", type=int)
parser.add_argument("--iStop", help="Set stopping index of simulations", type=int)
parser.add_argument("--bunch", help="Set bunch index", type=int)
parser.add_argument("--nbunch", help="Set number of simulation per bunch", default=50, type=int)
args, dict_file = parser.parse_known_args()

d = so_dict.so_dict()
d.read_from_file(sys.argv[1])
d.read_from_file(dict_file[0])
log = log.get_logger(**d)

surveys = d["surveys"]
Expand All @@ -30,7 +39,6 @@
else: raise ValueError(f"Unsupported sim_alm_dtype {sim_alm_dtype}")
dtype = np.float32 if sim_alm_dtype == "complex64" else np.float64

window_dir = "windows"
mcm_dir = "mcms"
spec_dir = "sim_spectra"
bestfit_dir = "best_fits"
Expand Down Expand Up @@ -92,7 +100,13 @@

# we will use mpi over the number of simulations
so_mpi.init(True)
subtasks = so_mpi.taskrange(imin=d["iStart"], imax=d["iStop"])
iStart = args.iStart or d["iStart"]
iStop = args.iStop or d["iStop"]
if args.bunch is not None:
iStart = int(args.bunch * args.nbunch)
iStop = int((args.bunch + 1) * args.nbunch) - 1

subtasks = so_mpi.taskrange(imin=iStart, imax=iStop)

for iii in subtasks:
t0 = time.time()
Expand Down
10 changes: 4 additions & 6 deletions project/data_analysis/python/planck/check_src_subtraction.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
pspy_utils.create_directory(out_dir)

# Read catalog
cat_file = "/global/cfs/cdirs/act/data/tlouis/s17s18s19/catalogs/cat_skn_multifreq_20220526_nightonly.txt"
cat_file = "/global/cfs/cdirs/act/data/tlouis/dr6v4/catalogs/cat_skn_multifreq_20220526_nightonly.txt"
cat = pd.read_table(cat_file, escapechar="#", sep="\s+")
cat = cat.shift(1, axis=1)

Expand Down Expand Up @@ -75,11 +75,11 @@
sub = so_map.get_submap_car(m, box)
sub_srcfree = so_map.get_submap_car(m_srcfree, box)

sub.plot(file_name=f"{out_dir}/{release}_f{freq}_{split}_{task:03d}",
sub.plot(file_name=f"{out_dir}/{release}_f{freq}_{split}_{task:03d}",
color_range=[250, 100, 100],
ticks_spacing_car=0.6
)
sub_srcfree.plot(file_name=f"{out_dir}/{release}_srcfree_f{freq}_{split}_{task:03d}",
sub_srcfree.plot(file_name=f"{out_dir}/{release}_srcfree_f{freq}_{split}_{task:03d}",
color_range=[250, 100, 100],
ticks_spacing_car=0.6
)
Expand Down Expand Up @@ -116,7 +116,7 @@
prefix = f"{release}{type}"
g.write('<div class=map>\n')
for map_name in maps:

freq, split = map_name.split("_")
file_name = f"{prefix}_f{freq}_{split}_{src_id:03d}_T.png"
g.write(f'<h2> {freq} GHz split {split} - {prefix} [Source no {src_id:03d}] </p> \n')
Expand All @@ -128,5 +128,3 @@
g.write('</body> \n')
g.write('</html> \n')
g.close()


Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@
# subtraction code does not work properly with truncated beams.

# Path to dory.py (part of the tenki python package)
dory_path=/path/to/tenki/point_sources
dory_path=${TENKI_PATH}/point_sources

# Path to npipe maps & beams
map_path=planck_projected
beam_path=npipe
beam_path=legacy

# Path to the input point source catalog
ps_catalog_path=cat_skn_090_20220526_nightonly_ordered.txt
Expand Down Expand Up @@ -40,10 +40,9 @@ freqs=("100" "143" "217")
for freq in ${freqs[@]}; do
for split in ${splits[@]}; do


map_file=${map_path}/HFI_SkyMap_2048_R3.01_halfmission-${split}_f${freq}_map.fits
ivar_file=${map_path}/HFI_SkyMap_2048_R3.01_halfmission-${split}_f${freq}_ivar.fits
beam_file=${beam_path}/bl_T_npipe_${freq}_coadd.dat
beam_file=${beam_path}/bl_T_legacy_${freq}_coadd.dat
out_map_file=${map_path}/HFI_SkyMap_2048_R3.01_halfmission-${split}_f${freq}_map_srcfree.fits
out_map_model_file=${map_path}/HFI_SkyMap_2048_R3.01_halfmission-${split}_f${freq}_map_model.fits

Expand Down
Loading

0 comments on commit 2827b72

Please sign in to comment.