60x Slowdown Using Concurrency #6728

JohanLoknaQC · 2024-11-22T09:25:13Z

Description

There seems to be a clear issue related to how lightgbm handles resource sharing. When restricting the number of cores associated with a process, the runtime increases significantly.

In the example provided below, the run time using all cores (0-15) is about 1.821 seconds. When restricting the process to all cores but one (0 - 14), the runtime increases to 109.31 seconds; more than a 60x increase. This only happens if the resource restriction is done from within the Python script. If the affinity is set beforehand using taskset -c 0-14 the runtime is approximately the same, 1.796 seconds.

This makes training multiple lightgbm models in parallel undesirable, at least if the subprocesses are called from within a Python script. As this a common pattern of implementing concurrency, this appears to be a limitation which can hopefully be easily addressed and fixed.

Thanks!

Reproducible example

lgbm_affinity.py

import argparse
import lightgbm as lgb
import numpy as np
import os

np.random.seed(42)


def main(use_setaffinity: bool = False, use_taskset: bool = False):

    n = os.cpu_count()

    # Set affinity using ``os.sched_setaffinity``
    if use_setaffinity:
        os.sched_setaffinity(0, set(range(n - 1)))
        os.environ['OMP_NUM_THREADS'] = str(n - 1)  # Added after suggestion

    # Set affinity using ``taskset``
    if use_taskset:
        pid = os.getpid()
        command = f"taskset -cp 0-{n - 2} {pid}"
        os.system(command)
        os.environ['OMP_NUM_THREADS'] = str(n - 1)  # Added after suggestion

    # Generate a data set
    nrows, ncols = 1_000, 10
    X = np.random.normal(size=(nrows, ncols))
    y = X @ np.random.normal(size=ncols) + np.random.normal(size=nrows)

    lgb_train = lgb.Dataset(X, y)

    # Train model
    params = {
        "objective": "regression",
        "metric": "rmse",
        "num_leaves": 31,  # the default value
        "learning_rate": 0.05,
        "feature_fraction": 0.9,
        "bagging_fraction": 0.8,
        "bagging_freq": 5,
        "verbose": 0
    }
    lgb.train(params, lgb_train)


if __name__ == "__main__":

    parser = argparse.ArgumentParser()

    parser.add_argument(
        "--use-setaffinity", 
        dest="use_setaffinity", 
        action="store_true",
    )

    parser.add_argument(
        "--use-taskset", 
        dest="use_taskset", 
        action="store_true",
    )

    args = parser.parse_args()
    main(**vars(args))

lgbm_affinity.sh

time python lgbm_affinity.py > /dev/null 2>&1
time python lgbm_affinity.py  --use-setaffinity > /dev/null 2>&1
time python lgbm_affinity.py  --use-taskset > /dev/null 2>&1
time taskset -c 0-14 python lgbm_affinity.py > /dev/null 2>&1

Output

# Using all cores
real    0m1.821s
user    0m4.394s
sys     0m0.178s

# Using ``set_affinity`` from within the process
real    1m49.313s
user    25m44.344s
sys     0m1.109s

# Using ``taskset`` from within the process
real    1m48.820s
user    25m54.104s
sys     0m0.959s

# Using ``taskset`` before initializing the process
real    0m1.796s
user    0m4.135s
sys     0m0.203s

Environment info

LightGBM version or commit hash:

liblightgbm  4.5.0    cpu_h155599f_3  conda-forge
lightgbm     4.5.0    cpu_py_3        conda-forge

Command(s) you used to install LightGBM

micromamba install lightgbm

Other used packages:

numpy     1.26.4   py312heda63a1_0  conda-forge

The example was run on an AWS instance (ml.m5.4xlarge) with 16 cores.

Additional Comments

The text was updated successfully, but these errors were encountered:

jmoralez · 2024-11-22T17:56:20Z

Hey @JohanLoknaQC, thanks for using LightGBM.

By default LightGBM uses all available threads on the machine unless you tell it otherwise. So in your examples you're submitting n tasks and assigning only n - 1 threads, so they have to fight each other to execute them. I think the easiest way to fix this is by doing something like os.environ['OMP_NUM_THREADS'] = str(n-1), that way you tell LightGBM to use the number of threads that you've limited the process to have.

JohanLoknaQC · 2024-11-25T12:03:35Z

Thanks a lot for the answer! However, after adding the suggested fix (see code above) the run-times remains virtually unchanged. It does seem like something else might be causing this additional run-time.

jmoralez · 2024-11-25T15:15:10Z

Sorry, I think that only works if provided through the command line. Can you please set the num_threads argument instead? e.g.

    params = {
        "objective": "regression",
        "metric": "rmse",
        "num_leaves": 31,  # the default value
        "learning_rate": 0.05,
        "feature_fraction": 0.9,
        "bagging_fraction": 0.8,
        "bagging_freq": 5,
        "verbose": 0,
        "num_threads": n - 1,  # <- set this
    }

JohanLoknaQC · 2024-11-26T14:58:21Z

Thank you very much - this solved this issue.

Just for reference, it also worked when the affinities were set quite arbitrarily, e.g. 3-12. It therefore seems to a quite general solution. 👍

jmoralez added awaiting response question labels Nov 22, 2024

github-actions bot removed the awaiting response label Nov 25, 2024

jmoralez added the awaiting response label Nov 25, 2024

JohanLoknaQC closed this as completed Nov 26, 2024

github-actions bot removed the awaiting response label Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

60x Slowdown Using Concurrency #6728

60x Slowdown Using Concurrency #6728

JohanLoknaQC commented Nov 22, 2024 •

edited

Loading

jmoralez commented Nov 22, 2024

JohanLoknaQC commented Nov 25, 2024

jmoralez commented Nov 25, 2024

JohanLoknaQC commented Nov 26, 2024

60x Slowdown Using Concurrency #6728

60x Slowdown Using Concurrency #6728

Comments

JohanLoknaQC commented Nov 22, 2024 • edited Loading

Description

Reproducible example

Environment info

Additional Comments

jmoralez commented Nov 22, 2024

JohanLoknaQC commented Nov 25, 2024

jmoralez commented Nov 25, 2024

JohanLoknaQC commented Nov 26, 2024

JohanLoknaQC commented Nov 22, 2024 •

edited

Loading