Skip to content

Commit 2df1473

Browse files
qibaoyuanbrian-dellabetta
authored andcommitted
[Performance] Speed up DataLoader with Multiple DataLoader Workers (#1821)
when the dataset is too large, it should be speeded up to min(8x,cpu//2) times SUMMARY: When the dataset is large, using the default DataLoader is too slow. In my experiment, it took 1.5 hours to prepare the cache. After setting num_workers to min(8, cpu // 2), the time was reduced to 7 minutes. TEST PLAN: Tested with a 5K dataset, comparing the performance between this patch and the original implementation. --------- Signed-off-by: Baoyuan Qi <[email protected]> Co-authored-by: Brian Dellabetta <[email protected]>
1 parent 54dba55 commit 2df1473

File tree

1 file changed

+10
-0
lines changed

1 file changed

+10
-0
lines changed

src/llmcompressor/datasets/utils.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
import multiprocessing
12
import re
23
from typing import Any, Callable, Dict, List, Optional
34

@@ -138,13 +139,22 @@ def format_calibration_data(
138139
tokenized_dataset = tokenized_dataset.shuffle()
139140
tokenized_calibration = tokenized_dataset.select(range(safe_calibration_samples))
140141

142+
MAX_DATALOADER_WORKERS = 8
143+
try:
144+
num_workers = min(MAX_DATALOADER_WORKERS, multiprocessing.cpu_count() // 2)
145+
except NotImplementedError:
146+
logger.warning(
147+
"Could not determine number of CPUs, defaulting to 0 dataloader workers."
148+
)
149+
num_workers = 0
141150
dataloader_params = {
142151
"batch_size": 1,
143152
"sampler": RandomSampler(tokenized_calibration)
144153
if do_shuffle
145154
else SequentialSampler(tokenized_calibration),
146155
"collate_fn": collate_fn,
147156
"pin_memory": True,
157+
"num_workers": num_workers,
148158
}
149159

150160
calibration_dataloader = DataLoader(tokenized_calibration, **dataloader_params)

0 commit comments

Comments
 (0)