Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

60x Slowdown Using Concurrency #6728

Closed
JohanLoknaQC opened this issue Nov 22, 2024 · 4 comments
Closed

60x Slowdown Using Concurrency #6728

JohanLoknaQC opened this issue Nov 22, 2024 · 4 comments
Labels

Comments

@JohanLoknaQC
Copy link

JohanLoknaQC commented Nov 22, 2024

Description

There seems to be a clear issue related to how lightgbm handles resource sharing. When restricting the number of cores associated with a process, the runtime increases significantly.

In the example provided below, the run time using all cores (0-15) is about 1.821 seconds. When restricting the process to all cores but one (0 - 14), the runtime increases to 109.31 seconds; more than a 60x increase. This only happens if the resource restriction is done from within the Python script. If the affinity is set beforehand using taskset -c 0-14 the runtime is approximately the same, 1.796 seconds.

This makes training multiple lightgbm models in parallel undesirable, at least if the subprocesses are called from within a Python script. As this a common pattern of implementing concurrency, this appears to be a limitation which can hopefully be easily addressed and fixed.

Thanks!

Reproducible example

lgbm_affinity.py

import argparse
import lightgbm as lgb
import numpy as np
import os

np.random.seed(42)


def main(use_setaffinity: bool = False, use_taskset: bool = False):

    n = os.cpu_count()

    # Set affinity using ``os.sched_setaffinity``
    if use_setaffinity:
        os.sched_setaffinity(0, set(range(n - 1)))
        os.environ['OMP_NUM_THREADS'] = str(n - 1)  # Added after suggestion

    # Set affinity using ``taskset``
    if use_taskset:
        pid = os.getpid()
        command = f"taskset -cp 0-{n - 2} {pid}"
        os.system(command)
        os.environ['OMP_NUM_THREADS'] = str(n - 1)  # Added after suggestion

    # Generate a data set
    nrows, ncols = 1_000, 10
    X = np.random.normal(size=(nrows, ncols))
    y = X @ np.random.normal(size=ncols) + np.random.normal(size=nrows)

    lgb_train = lgb.Dataset(X, y)

    # Train model
    params = {
        "objective": "regression",
        "metric": "rmse",
        "num_leaves": 31,  # the default value
        "learning_rate": 0.05,
        "feature_fraction": 0.9,
        "bagging_fraction": 0.8,
        "bagging_freq": 5,
        "verbose": 0
    }
    lgb.train(params, lgb_train)


if __name__ == "__main__":

    parser = argparse.ArgumentParser()

    parser.add_argument(
        "--use-setaffinity", 
        dest="use_setaffinity", 
        action="store_true",
    )

    parser.add_argument(
        "--use-taskset", 
        dest="use_taskset", 
        action="store_true",
    )

    args = parser.parse_args()
    main(**vars(args))

lgbm_affinity.sh

time python lgbm_affinity.py > /dev/null 2>&1
time python lgbm_affinity.py  --use-setaffinity > /dev/null 2>&1
time python lgbm_affinity.py  --use-taskset > /dev/null 2>&1
time taskset -c 0-14 python lgbm_affinity.py > /dev/null 2>&1

Output

# Using all cores
real    0m1.821s
user    0m4.394s
sys     0m0.178s

# Using ``set_affinity`` from within the process
real    1m49.313s
user    25m44.344s
sys     0m1.109s

# Using ``taskset`` from within the process
real    1m48.820s
user    25m54.104s
sys     0m0.959s

# Using ``taskset`` before initializing the process
real    0m1.796s
user    0m4.135s
sys     0m0.203s

Environment info

LightGBM version or commit hash:

liblightgbm  4.5.0    cpu_h155599f_3  conda-forge
lightgbm     4.5.0    cpu_py_3        conda-forge

Command(s) you used to install LightGBM

micromamba install lightgbm

Other used packages:

numpy     1.26.4   py312heda63a1_0  conda-forge

The example was run on an AWS instance (ml.m5.4xlarge) with 16 cores.

Additional Comments

@jmoralez
Copy link
Collaborator

Hey @JohanLoknaQC, thanks for using LightGBM.

By default LightGBM uses all available threads on the machine unless you tell it otherwise. So in your examples you're submitting n tasks and assigning only n - 1 threads, so they have to fight each other to execute them. I think the easiest way to fix this is by doing something like os.environ['OMP_NUM_THREADS'] = str(n-1), that way you tell LightGBM to use the number of threads that you've limited the process to have.

@JohanLoknaQC
Copy link
Author

Thanks a lot for the answer! However, after adding the suggested fix (see code above) the run-times remains virtually unchanged. It does seem like something else might be causing this additional run-time.

@jmoralez
Copy link
Collaborator

Sorry, I think that only works if provided through the command line. Can you please set the num_threads argument instead? e.g.

    params = {
        "objective": "regression",
        "metric": "rmse",
        "num_leaves": 31,  # the default value
        "learning_rate": 0.05,
        "feature_fraction": 0.9,
        "bagging_fraction": 0.8,
        "bagging_freq": 5,
        "verbose": 0,
        "num_threads": n - 1,  # <- set this
    }

@JohanLoknaQC
Copy link
Author

Thank you very much - this solved this issue.

Just for reference, it also worked when the affinities were set quite arbitrarily, e.g. 3-12. It therefore seems to a quite general solution. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants