[ENH] Precompute data to accelerate training in GPU #1850

jobs-git · 2025-05-27T15:10:14Z

Description

Fixes: #1849; Fixes: #1426; Fixes: #1860; Fixes: #922; Fixes: #715; Fixes: sktime/sktime#8278

Closes: #1846; Closes: #991

Supersedes: #806

The __getitem__() and _collate_fn() methods in TimeSeriesDataset are performance bottlenecks. They run computation extensive branching logic on every call, which temporarily stalls compute devices and slows down training.

To improve in these, the following major changes were made:

introduced a pre-compute function __precompute__ that performs the needed calculations conducted by __getitem__ and _collate_fn() then stores batch data to self.precollate_cache, the __precompute__ does not assume its own data order, but follows the idx provided by BatchSamplers
introduced fast collate function __fast_collate_fn__ which performs retrieval of batch data from self.precollate_cache
renamed __getitem__ to __item_tensor__ so it can be used in pre-compute
modify the __getitem__ to accommodate the usage of old code and fast path when precompute=True
during precompute=True, __getitem__ will return None as only the return value of __fast_collate_fn__ is needed by DataLoader to retrieve necessary batch to train models

Note: TimeSeriesDataset defaults to slow path, so the old method is used.

Caveat and Limitations

This feature stores precomputed data in RAM. Therefore, sufficient memory must be available; otherwise, an out-of-memory (OOM) error may occur.

Super batching could be considered if precompute=True is required. However, implementing that is beyond the scope of this PR.

Recommend to use: precompute=False to use the original slow path.
For extremely large datasets, FSDP (Fully Sharded Data Parallel) may be used with PyTorch Lightning for distributed training. This, too, is outside the scope of this PR.

Recommend to use: precompute=False to use the original slow path.

Checklist

Accelerate __getitem__()
Accelerate collate_fn()
Add timeseries test with precompute=True
Add timeseries test to check output from precompute=True is consistent to default behaviour (precompute=False)
Linked issues (if existing)
Amended changelog for large changes (and added myself there as contributor)
Added/modified tests
Used pre-commit hooks when committing to ensure that code is compliant with hooks. Install hooks with pre-commit install.
To run hooks independent of commit, execute pre-commit run --all-files

The primary issue here is that `__getitem__ ()` performs pre-processing, which are typically conducted before actually training. As a result, the GPU becomes frequently idle, resulting to slower training completion. Its known that GPU typically achieve higher throughput the more it can be made busy, but the pre-processing done in `__getitem__ ()` each time an item is retrieved massively impacts this. This commit ensures that pre-processing is performed prior to training. This is achieved by the following: 1. calls with `to_dataloader ()` will also call the `precompute ()` function 2. the `precompute ()` function handles the collection of pre-computed items from `__precompute__getitem__ ()` and store it in a cache. this function relies on sampler to be able to retrieve data index 3. the `__precompute__getitem__ ()` is the unmodified algorithm of the original `__getitem__ ()` to ensure equivalent outcome 4. the new `__getitem__ ()` retrieves items from cache in order, this is because relying on `idx` may result to a different index sequence due to the first sampler call from `precompute ()`

fkiraly

Nice!

FYI @phoeenniixx, @PranavBhatP, @xandie985 - is this something we need to take into account in v2? We should also think about adding profiling checks.

Code quality (linting) is failing, this can be fixed by using pre-commit or automating formatting. The pytorch-forecasting does not have this in the docs (we should add it), but this is the same as in sktime:
https://www.sktime.net/en/stable/developer_guide/coding_standards.html

fkiraly · 2025-05-28T06:39:35Z

FYI @agobbifbk

agobbifbk · 2025-05-28T07:38:39Z

Agree, most of the time you can fit your data in memory and we should include the precomputation possibility in the d2 layer. We should have already the correct indexes computed, it is just a matter of create the tensors according to those indexes.

When testing with performing tuning with pytorch-lighting it is not able to perform hyperparameter optimization with this info: Failed to compute suggestion for learning rate because there are not enough points. Increase the loop iteration limits or the size of your dataset/dataloader.

When you say hyperparameter optimization you mean the learning rate using trainer.tune or do you really mean the hyperparameters of the model?

jobs-git · 2025-05-28T10:15:45Z

Agree, most of the time you can fit your data in memory and we should include the precomputation possibility in the d2 layer. We should have already the correct indexes computed, it is just a matter of create the tensors according to those indexes.

When testing with performing tuning with pytorch-lighting it is not able to perform hyperparameter optimization with this info: Failed to compute suggestion for learning rate because there are not enough points. Increase the loop iteration limits or the size of your dataset/dataloader.

When you say hyperparameter optimization you mean the learning rate using trainer.tune or do you really mean the hyperparameters of the model?

This is not an issue anymore, I missed removing the fit function (conducting during my testing) I placed prior to the Tuner.

fast path can be activated by enabling `precompute=True` in to_dataloader: ```python .to_dataloader (..., precompute=True) ```

phoeenniixx · 2025-05-28T12:49:53Z

Hi @jobs-git, I see you are still facing some linting issues, may I suggest you try setting up pre-commit in your local repo, the process is similar to sktime as mentioned by @fkiraly
https://www.sktime.net/en/stable/developer_guide/coding_standards.html#setting-up-local-code-quality-checks

I would suggest setting up pre-commit locally and then check if your code-quality fails using

pre-commit run --files <path-to-your-files>

and this will reduce your effort very much ( it automatically solves some issues, not requiring you to make changes)

phoeenniixx · 2025-05-28T12:50:46Z

you donot need to wait here and make changes based on the errors shown in code-quality ci workflows :)

codecov · 2025-05-28T14:55:23Z

Codecov Report

❌ Patch coverage is 82.05128% with 7 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@e1cc1ce). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
pytorch_forecasting/data/timeseries/_timeseries.py	82.05%	7 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1850   +/-   ##
=======================================
  Coverage        ?   87.22%           
=======================================
  Files           ?      158           
  Lines           ?     9312           
  Branches        ?        0           
=======================================
  Hits            ?     8122           
  Misses          ?     1190           
  Partials        ?        0

Flag	Coverage Δ
cpu	`87.22% <82.05%> (?)`
pytest	`87.22% <82.05%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pytorch_forecasting/data/timeseries/_timeseries.py

fkiraly · 2025-06-03T13:47:55Z

is this ready for review?

jobs-git · 2025-06-03T18:16:25Z

is this ready for review?

Yes, please. Inputs are welcome.

fkiraly

Thanks - while we wait for reviews, can you kindly add tests? For TimeSeriesDataSet in isolation, and in integration with the networks?

fkiraly · 2025-06-12T16:03:13Z

We could probably discuss achieving close to parity with old behaviour in another PR.

We need exact parity for any previously working code - we cannot push a breaking change with a release - it would be great if we could also ensure this with a test. Namely, that __getitem__ returns exactly the batch that previously was returned.

jobs-git · 2025-06-12T16:15:36Z

We could probably discuss achieving close to parity with old behaviour in another PR.

We need exact parity for any previously working code - we cannot push a breaking change with a release - it would be great if we could also ensure this with a test. Namely, that __getitem__ returns exactly the batch that previously was returned.

Got it, I will test this and add that as sample, and probably some demonstration on available real world samples from sktime too.

This notebook demonstrates the benefit of precomputing tensors prior to model training in GPU. GPU is a througthput device, that means we get higher performance the more it is active. However, it is well known that there is a very high cost to stalling data transfer from CPU to GPU, in addition just-in-time calculations that CPU does in between batches and item retrieval further starves our expected througthput. To overcome this, we can precompute all the tensors that the Model needs to be able to continously perform forward and backward prop with minimal delay in the GPU. * Initial setup * Testing on precompute = True * Testing on default code * Performance comparisons

review-notebook-app · 2025-06-12T17:18:56Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

…sampler is set

…False

fkiraly · 2025-07-10T17:18:32Z

@jobs-git, are you still working on this? Let us know if you need help!

jobs-git · 2025-07-11T17:23:21Z

@jobs-git, are you still working on this? Let us know if you need help!

nothing anymore, it's all complete

fkiraly · 2025-07-11T18:51:03Z

I see, then you should notify reviewers by "re-request review"!

fkiraly

quick request: can you kindly move the precompute_bench notebooks to another branch? They have quite large size.

jobs-git · 2025-07-13T04:42:30Z

quick request: can you kindly move the precompute_bench notebooks to another branch? They have quite large size.

Sure, will separate the notebooks.

jobs-git · 2025-09-14T12:46:11Z

The commit to achieve close to parity algorithm with the old batching lowered the gain. So far, I can achieve about 900% in synthetic data.

Will create a separate PR that demonstrate its usage and benchmark performance.

jobs-git requested review from benHeid, fkiraly, fnhirwa, jdb78 and yarnabrina as code owners May 27, 2025 15:10

fkiraly added the enhancement New feature or request label May 27, 2025

fkiraly reviewed May 27, 2025

View reviewed changes

jobs-git added 6 commits May 28, 2025 18:39

Enable using both slow and fast path on __getitem__ ()

913984b

fast path can be activated by enabling `precompute=True` in to_dataloader: ```python .to_dataloader (..., precompute=True) ```

Fix formatting for lint

e41ce8b

Fix check error due to Dict

ab34f02

Fix check error in code quality

394bc79

Fix check error due to spacing

cd12c80

Fix check error due to spacing

087799f

jobs-git added 2 commits May 28, 2025 20:55

Fix check error due to spacing

7831ea7

Fix check error due to spacing

a2007f9

fkiraly reviewed May 29, 2025

View reviewed changes

pytorch_forecasting/data/timeseries/_timeseries.py Outdated Show resolved Hide resolved

jobs-git changed the title ~~[WIP] [BUG] fix precompute data to massively accelerate training in GPU~~ [WIP] [ENH] fix precompute data to massively accelerate training in GPU May 30, 2025

Add precompute as settings in TimeSeriesDataset class

9b9085e

jobs-git changed the title ~~[WIP] [ENH] fix precompute data to massively accelerate training in GPU~~ [ENH] fix precompute data to massively accelerate training in GPU Jun 1, 2025

jobs-git changed the title ~~[ENH] fix precompute data to massively accelerate training in GPU~~ [ENH] Precompute data to massively accelerate training in GPU Jun 1, 2025

fkiraly requested changes Jun 4, 2025

View reviewed changes

jobs-git changed the title ~~[WIP] [ENH] Precompute data to massively accelerate training in GPU by ~2000%~~ [WIP] [ENH] Precompute data to massively accelerate training by ~2000% in GPU and ~4000% in CPU Jun 13, 2025

Merge branch 'sktime:main' into patch-1

07d0ba9

jobs-git changed the title ~~[WIP] [ENH] Precompute data to massively accelerate training by ~2000% in GPU and ~4000% in CPU~~ [WIP] [ENH] Precompute data to massively accelerate training by ~2000% in GPU Jun 13, 2025

jobs-git and others added 7 commits June 14, 2025 19:13

added benchmark and usage examples

8c25a39

fix check error due to missing sktime dataset module

06613a1

fix check error due to missing cuda device from github test env

c0c519b

defaults to torch dataloader when no batch_sampler is set

7f97963

prevent from overriding batch_sampler when both precompute and batch_…

23868e3

…sampler is set

Merge branch 'sktime:main' into patch-1

3ef3417

update changelog

590a3b4

jobs-git force-pushed the patch-1 branch from e49ef61 to a59da60 Compare June 14, 2025 16:26

jobs-git changed the title ~~[WIP] [ENH] Precompute data to massively accelerate training by ~2000% in GPU~~ [ENH] Precompute data to massively accelerate training by ~2000% in GPU Jun 14, 2025

added test to check batch shape mismatch between precompute=True and …

841f8cf

…False

jobs-git force-pushed the patch-1 branch from a59da60 to 841f8cf Compare June 14, 2025 16:58

updated changelog

d8de451

fkiraly requested changes Jul 11, 2025

View reviewed changes

Merge branch 'main' into pr/1850

7df1500

jobs-git and others added 3 commits September 11, 2025 19:22

Merge branch 'sktime:main' into patch-1

a2d4b8a

remove tutorial and sample usage

13b59e6

updated changelog

8e09831

jobs-git changed the title ~~[ENH] Precompute data to massively accelerate training by ~2000% in GPU~~ [ENH] Precompute data to accelerate training in GPU Sep 14, 2025

[ENH] Precompute data to accelerate training in GPU #1850

Are you sure you want to change the base?

[ENH] Precompute data to accelerate training in GPU #1850

Uh oh!

Conversation

jobs-git commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Caveat and Limitations

Checklist

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

fkiraly commented May 28, 2025

Uh oh!

agobbifbk commented May 28, 2025

Uh oh!

jobs-git commented May 28, 2025

Uh oh!

phoeenniixx commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phoeenniixx commented May 28, 2025

Uh oh!

codecov bot commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

fkiraly commented Jun 3, 2025

Uh oh!

jobs-git commented Jun 3, 2025

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

fkiraly commented Jun 12, 2025

Uh oh!

jobs-git commented Jun 12, 2025

Uh oh!

review-notebook-app bot commented Jun 12, 2025

Uh oh!

fkiraly commented Jul 10, 2025

Uh oh!

jobs-git commented Jul 11, 2025

Uh oh!

fkiraly commented Jul 11, 2025

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

jobs-git commented Jul 13, 2025

Uh oh!

jobs-git commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jobs-git commented May 27, 2025 •

edited

Loading

phoeenniixx commented May 28, 2025 •

edited

Loading

codecov bot commented May 28, 2025 •

edited

Loading

jobs-git commented Sep 14, 2025 •

edited

Loading