Skip to content

Commit f509ad6

Browse files
authored
Remove all references to LSF (#780)
Remove all references to LSF and LSB components, including `JsrunSettings`, `BsubBatchSettings`, and so on. [ committed by @al-rigazzi ]
1 parent 4701e8c commit f509ad6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+56
-2706
lines changed

Diff for: .github/workflows/run_tests.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ jobs:
5555
fail-fast: false
5656
matrix:
5757
subset: [backends, slow_tests, group_a, group_b]
58-
os: [macos-12, macos-14, ubuntu-22.04] # Operating systems
58+
os: [macos-14, ubuntu-22.04] # Operating systems
5959
compiler: [8] # GNU compiler version
6060
rai: [1.2.7] # Redis AI versions
6161
py_v: ["3.9", "3.10", "3.11"] # Python versions

Diff for: .wci.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
Machine Learning (ML) libraries, like PyTorch and TensorFlow,
1111
in combination with High Performance Computing (HPC) simulations and applications.
1212
SmartSim launches ML infrastructure on HPC systems alongside user workloads
13-
and supports most HPC workload managers (e.g. Slurm, PBSPro, LSF).
13+
and supports most HPC workload managers (e.g. Slurm, PBSPro, SGE).
1414
SmartSim also provides a set of client libraries in Python, C++, C, and Fortran.
1515
These client libraries allow users to send and receive data between user
1616
applications and the machine learning infrastructure. Moreover, the
@@ -40,7 +40,7 @@
4040
resource_managers:
4141
- Slurm
4242
- PBSPro
43-
- LSF
43+
- SGE
4444
- Linux/MacOS
4545
transfer_protocols:
4646
- TCP/IP

Diff for: README.md

+3-9
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,6 @@ SmartSim](https://www.craylabs.org/docs/api/smartsim_api.html#settings).
144144
- ``MpirunSettings``
145145
- ``SrunSettings``
146146
- ``AprunSettings``
147-
- ``JsrunSettings``
148147

149148
The following example launches a hello world MPI program using the local launcher
150149
for single compute node, workstations and laptops.
@@ -177,7 +176,7 @@ SmartSim integrates with common HPC schedulers providing batch and interactive
177176
launch capabilities for all applications:
178177

179178
- Slurm
180-
- LSF
179+
- SGE
181180
- PBSPro
182181
- Local (for laptops/single node, no batch)
183182

@@ -197,11 +196,9 @@ salloc -N 3 --ntasks-per-node=20 --ntasks 60 --exclusive -t 00:10:00
197196
# get interactive allocation (PBS)
198197
qsub -l select=3:ncpus=20 -l walltime=00:10:00 -l place=scatter -I -q <queue>
199198

200-
# get interactive allocation (LSF)
201-
bsub -Is -W 00:10 -nnodes 3 -P <project> $SHELL
202199
```
203200

204-
This same script will run on a SLURM, PBS, or LSF system as the ``launcher``
201+
This same script will run on a SLURM, PBS, or SGE system as the ``launcher``
205202
is set to `auto` in the [Experiment](https://www.craylabs.org/docs/api/smartsim_api.html#experiment)
206203
initialization. The run command like ``mpirun``,
207204
``aprun`` or ``srun`` will be automatically detected from what is available on the
@@ -281,7 +278,7 @@ python hello_ensemble.py
281278
```
282279

283280
Similar to the interactive example, this same script will run on a SLURM, PBS,
284-
or LSF system as the ``launcher`` is set to `auto` in the
281+
or SGE system as the ``launcher`` is set to `auto` in the
285282
[Experiment](https://www.craylabs.org/docs/api/smartsim_api.html#experiment)
286283
initialization. Local launching does not support batch workloads.
287284

@@ -343,9 +340,6 @@ salloc -N 3 --ntasks-per-node=1 --exclusive -t 00:10:00
343340
# get interactive allocation (PBS)
344341
qsub -l select=3:ncpus=1 -l walltime=00:10:00 -l place=scatter -I -q queue
345342

346-
# get interactive allocation (LSF)
347-
bsub -Is -W 00:10 -nnodes 3 -P project $SHELL
348-
349343
```
350344

351345
```python

Diff for: conftest.py

+2-15
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,6 @@
6262
from smartsim.settings import (
6363
AprunSettings,
6464
DragonRunSettings,
65-
JsrunSettings,
6665
MpiexecSettings,
6766
MpirunSettings,
6867
PalsMpiexecSettings,
@@ -120,7 +119,7 @@ def print_test_configuration() -> None:
120119

121120
def pytest_configure() -> None:
122121
pytest.test_launcher = test_launcher
123-
pytest.wlm_options = ["slurm", "pbs", "lsf", "pals", "dragon", "sge"]
122+
pytest.wlm_options = ["slurm", "pbs", "pals", "dragon", "sge"]
124123
account = get_account()
125124
pytest.test_account = account
126125
pytest.test_device = test_device
@@ -386,15 +385,10 @@ def get_base_run_settings(
386385
run_args = {"--np": ntasks, "--hostfile": host_file}
387386
run_args.update(kwargs)
388387
return RunSettings(exe, args, run_command="mpiexec", run_args=run_args)
389-
if test_launcher == "lsf":
390-
run_args = {"--np": ntasks, "--nrs": nodes}
391-
run_args.update(kwargs)
392-
settings = RunSettings(exe, args, run_command="jsrun", run_args=run_args)
393-
return settings
394388
if test_launcher != "local":
395389
raise SSConfigError(
396390
"Base run settings are available for Slurm, PBS, "
397-
f"and LSF, but launcher was {test_launcher}"
391+
f"and Dragon, but launcher was {test_launcher}"
398392
)
399393
# TODO allow user to pick aprun vs MPIrun
400394
return RunSettings(exe, args)
@@ -429,13 +423,6 @@ def get_run_settings(
429423
run_args = {"np": ntasks, "hostfile": host_file}
430424
run_args.update(kwargs)
431425
return PalsMpiexecSettings(exe, args, run_args=run_args)
432-
if test_launcher == "lsf":
433-
run_args = {
434-
"nrs": nodes,
435-
"tasks_per_rs": max(ntasks // nodes, 1),
436-
}
437-
run_args.update(kwargs)
438-
return JsrunSettings(exe, args, run_args=run_args)
439426

440427
return RunSettings(exe, args)
441428

Diff for: doc/api/smartsim_api.rst

-64
Original file line numberDiff line numberDiff line change
@@ -59,11 +59,9 @@ Types of Settings:
5959
MpirunSettings
6060
MpiexecSettings
6161
OrterunSettings
62-
JsrunSettings
6362
DragonRunSettings
6463
SbatchSettings
6564
QsubBatchSettings
66-
BsubBatchSettings
6765

6866
Settings objects can accept a container object that defines a container
6967
runtime, image, and arguments to use for the workload. Below is a list of
@@ -187,41 +185,6 @@ for Slurm and PBS sessions, respectively).
187185
:members:
188186

189187

190-
.. _jsrun_api:
191-
192-
JsrunSettings
193-
-------------
194-
195-
196-
``JsrunSettings`` can be used on any system that supports the
197-
IBM LSF launcher.
198-
199-
``JsrunSettings`` can be used in interactive session (on allocation)
200-
and within batch launches (i.e. ``BsubBatchSettings``)
201-
202-
203-
.. autosummary::
204-
205-
JsrunSettings.set_num_rs
206-
JsrunSettings.set_cpus_per_rs
207-
JsrunSettings.set_gpus_per_rs
208-
JsrunSettings.set_rs_per_host
209-
JsrunSettings.set_tasks
210-
JsrunSettings.set_tasks_per_rs
211-
JsrunSettings.set_binding
212-
JsrunSettings.make_mpmd
213-
JsrunSettings.set_mpmd_preamble
214-
JsrunSettings.update_env
215-
JsrunSettings.set_erf_sets
216-
JsrunSettings.format_env_vars
217-
JsrunSettings.format_run_args
218-
219-
220-
.. autoclass:: JsrunSettings
221-
:inherited-members:
222-
:undoc-members:
223-
:members:
224-
225188
.. _openmpi_run_api:
226189

227190
MpirunSettings
@@ -361,33 +324,6 @@ be launched as a batch on PBSPro systems.
361324
:members:
362325

363326

364-
.. _bsub_api:
365-
366-
BsubBatchSettings
367-
-----------------
368-
369-
370-
``BsubBatchSettings`` are used to configure jobs that should
371-
be launched as a batch on LSF systems.
372-
373-
374-
.. autosummary::
375-
376-
BsubBatchSettings.set_walltime
377-
BsubBatchSettings.set_smts
378-
BsubBatchSettings.set_project
379-
BsubBatchSettings.set_nodes
380-
BsubBatchSettings.set_expert_mode_req
381-
BsubBatchSettings.set_hostlist
382-
BsubBatchSettings.set_tasks
383-
BsubBatchSettings.format_batch_args
384-
385-
386-
.. autoclass:: BsubBatchSettings
387-
:inherited-members:
388-
:undoc-members:
389-
:members:
390-
391327
.. _singularity_api:
392328

393329
Singularity

Diff for: doc/batch_settings.rst

-27
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,6 @@ launching capabilities tailored for specific workload managers (WLMs). Each Smar
1616
- :ref:`SbatchSettings<sbatch_api>`
1717
- The PBS Pro `launcher` supports:
1818
- :ref:`QsubBatchSettings<qsub_api>`
19-
- The LSF `launcher` supports:
20-
- :ref:`BsubBatchSettings<bsub_api>`
2119

2220
.. note::
2321
The local `launcher` does not support batch jobs.
@@ -97,31 +95,6 @@ Below are examples of how to initialize a ``BatchSettings`` object per `launcher
9795
If `launcher="auto"`, SmartSim will detect that the ``Experiment`` is running on a PBS Pro based
9896
machine and set the launcher to `"pbs"`.
9997

100-
.. group-tab:: LSF
101-
To instantiate the ``BsubBatchSettings`` object, which interfaces with the LSF job scheduler, specify
102-
`launcher="lsf"` when initializing the ``Experiment``. Upon calling ``create_batch_settings``,
103-
SmartSim will detect the job scheduler and return the appropriate batch settings object.
104-
105-
.. code-block:: python
106-
107-
from smartsim import Experiment
108-
109-
# Initialize the experiment and provide launcher LSF
110-
exp = Experiment("name-of-experiment", launcher="lsf")
111-
112-
# Initialize a BsubBatchSettings object
113-
bsub_batch_settings = exp.create_batch_settings(nodes=1, time="10:00:00", batch_args={"ntasks": 1})
114-
# Set the account for the lsf batch job
115-
bsub_batch_settings.set_account("12345-Cray")
116-
# Set the partition for the lsf batch job
117-
bsub_batch_settings.set_queue("default")
118-
119-
The initialized ``BsubBatchSettings`` instance can now be passed to a SmartSim entity
120-
(``Model`` or ``Ensemble``) via the `batch_settings` argument in ``create_batch_settings``.
121-
122-
.. note::
123-
If `launcher="auto"`, SmartSim will detect that the ``Experiment`` is running on a LSF based
124-
machine and set the launcher to `"lsf"`.
12598

12699
.. warning::
127100
Note that initialization values provided (e.g., `nodes`, `time`, etc) will overwrite the same arguments in `batch_args` if present.

Diff for: doc/changelog.md

+12-6
Original file line numberDiff line numberDiff line change
@@ -13,25 +13,31 @@ To be released at some point in the future
1313

1414
Description
1515

16+
- Terminate LSF and LSB support
1617
- Implement workaround for Tensorflow that allows RedisAI to build with GCC-14
1718
- Add instructions for installing SmartSim on PML's Scylla
1819
- Fix typos in documentation
1920

2021
Detailed Notes
2122

23+
- After the supercomputer Summit was decommissioned, a decision was made to
24+
terminate SmartSim's support of the LSF launcher and LSB scheduler. If
25+
this impacts your work, please contact us.
26+
([SmartSim-PR780](https://github.com/CrayLabs/SmartSim/pull/780))
27+
- Fix typos in the `train_surrogate` tutorial documentation.
28+
([SmartSim-PR758](https://github.com/CrayLabs/SmartSim/pull/758))
29+
- PML's Scylla is still under development. The usual SmartSim
30+
build instructions do not apply because the GPU dependencies
31+
have yet to be installed at a system-wide level. Scylla has
32+
its own entry in the documentation.
33+
([SmartSim-PR733](https://github.com/CrayLabs/SmartSim/pull/733))
2234
- In libtensorflow, the input argument to TF_SessionRun seems to be mistyped to
2335
TF_Output instead of TF_Input. These two types differ only in name. GCC-14
2436
catches this and throws an error, even though earlier versions allow this. To
2537
solve this problem, patches are applied to the Tensorflow backend in RedisAI.
2638
Future versions of Tensorflow may fix this problem, but for now this seems to be
2739
the best workaround.
2840
([SmartSim-PR738](https://github.com/CrayLabs/SmartSim/pull/738))
29-
- PML's Scylla is still under development. The usual SmartSim
30-
build instructions do not apply because the GPU dependencies
31-
have yet to be installed at a system-wide level. Scylla has
32-
its own entry in the documentation.
33-
([SmartSim-PR733](https://github.com/CrayLabs/SmartSim/pull/733))
34-
- Fix typos in the `train_surrogate` tutorial documentation
3541

3642

3743
### 0.8.0

Diff for: doc/developer.rst

+2-5
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ If any of the above commands are used, the test suite will run the "light" test
9090
suite by default.
9191

9292

93-
PBSPro, Slurm, LSF
93+
PBSPro, Slurm, SGE
9494
==================
9595

9696
To run the full test suite, users will have to be on a system with one of the
@@ -105,17 +105,14 @@ of at least 3 nodes.
105105
# for PBSPro (with aprun)
106106
qsub -l select=3 -l place=scatter -l walltime=00:10:00 -q queue
107107
108-
# for LSF (with jsrun)
109-
bsub -Is -W 00:30 -nnodes 3 -P project $SHELL
110-
111108
Values for queue, account, or project should be substituted appropriately.
112109

113110
Once in an iterative allocation, users will need to set the test launcher
114111
environment variable: ``SMARTSIM_TEST_LAUNCHER`` to one of the following values
115112

116113
- slurm
117114
- pbs
118-
- lsf
115+
- sge
119116
- local
120117

121118
If tests have to run on an account or project, the environment variable

Diff for: doc/experiment.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Overview
77
SmartSim helps automate the deployment of AI-enabled workflows on HPC systems. With SmartSim, users
88
can describe and launch combinations of applications and AI/ML infrastructure to produce novel and
99
scalable workflows. SmartSim supports launching these workflows on a diverse set of systems, including
10-
local environments such as Mac or Linux, as well as HPC job schedulers (e.g. Slurm, PBS Pro, and LSF).
10+
local environments such as Mac or Linux, as well as HPC job schedulers (e.g. Slurm, PBS Pro, and SGE).
1111

1212
The ``Experiment`` API is SmartSim's top level API that provides users with methods for creating, combining,
1313
configuring, launching and monitoring :ref:`entities<entities_exp_docs>` in an AI-enabled workflow. More specifically, the
@@ -49,7 +49,7 @@ workflow in the :ref:`Example<exp_example>` section of this page.
4949
Launchers
5050
=========
5151
SmartSim supports launching AI-enabled workflows on a wide variety of systems, including locally on a Mac or
52-
Linux machine or on HPC machines with a job scheduler (e.g. Slurm, PBS Pro, and LSF). When creating a SmartSim
52+
Linux machine or on HPC machines with a job scheduler (e.g. Slurm, PBS Pro, and SGE). When creating a SmartSim
5353
``Experiment``, the user has the opportunity to specify the `launcher` type or defer to automatic `launcher` selection.
5454
`Launcher` selection determines how SmartSim translates entity configurations into system calls to launch,
5555
manage, and monitor. Currently, SmartSim supports 7 `launcher` options:
@@ -58,7 +58,7 @@ manage, and monitor. Currently, SmartSim supports 7 `launcher` options:
5858
2. ``slurm``: for systems using the Slurm scheduler
5959
3. ``pbs``: for systems using the PBS Pro scheduler
6060
4. ``pals``: for systems using the PALS scheduler
61-
5. ``lsf``: for systems using the LSF scheduler
61+
5. ``sge``: for systems using the SGE scheduler
6262
6. ``dragon``: if Dragon is installed in the current Python environment, see :ref:`Dragon Install <dragon_install>`
6363
7. ``auto``: have SmartSim auto-detect the launcher to use (will not detect ``dragon``)
6464

Diff for: doc/overview.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ The key features of the IL are:
6161
- An API to start, monitor, and stop HPC jobs from Python or from a Jupyter notebook.
6262
- Automated deployment of in-memory data staging (`Redis <https://redis.io>`_) and computational
6363
storage (`RedisAI <https://redisai.io>`_).
64-
- Programmatic launches of batch and in-allocation jobs on PBS, Slurm, and LSF systems.
64+
- Programmatic launches of batch and in-allocation jobs on PBS, Slurm, and SGE systems.
6565
- Creating and configuring ensembles of workloads with isolated communication channels.
6666

6767
The IL can configure and launch batch jobs as well as jobs within interactive

0 commit comments

Comments
 (0)