improve TimeSeries split performance by samirromdhani · Pull Request #3933 · powsybl/powsybl-core

samirromdhani · 2026-05-29T14:31:25Z

Please check if the PR fulfills these requirements

The commit message follows our guidelines
Tests for the changes have been added (for bug fixes / features)
Docs have been added / updated (for bug fixes / features)
A PR or issue has been opened in all impacted repositories (if any)

Does this PR already have an issue describing the problem?

Fixes #1634

What kind of change does this PR introduce?

What is the current behavior?

What is the new behavior (if this is a feature change)?

Does this PR introduce a breaking change or deprecate an API?

Yes
No

If yes, please check if the following requirements are fulfilled

The Breaking Change or Deprecated label has been added
The migration steps are described in the following section

What changes might users need to make in their application due to this PR? (migration steps)

The default behavior of the split method was changed to improve performance in both execution time and memory consumption.
A new method was introduced:toCompactArray, split now uses that.
toArray still available, users can choose toCompactArray for improved memory usage.

Other information:

powsybl-benchmark way with tsSize = 100000:

Benchmark     Mode  Cnt        Score   Error  Units
splitV0       avgt       3276192,839          us/op       <- split before performance improvement
split         avgt          3236,725          us/op       <- split after performance improvement

splitV0:gc.alloc.rate.norm  avgt       40016480045,333            B/op  <- split before performance improvement
split:gc.alloc.rate.norm    avgt           6801644,438            B/op <- split after performance improvement

Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

CalculatedTimeSeries is not concerned by compact array, calc split returns copies and not create NaN Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

…nce-for-many-small-chunks

sonarqubecloud · 2026-06-17T09:46:46Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

rolnico

I'm not sure if we go the right way on this.

What if we tried to change the method AbstractTimeSeries.split(int) instead?

    public List<T> split(int newChunkSize) {
        if (chunks.isEmpty()) {
            return List.of();
        }

        int minOffset = getMinOffset();

        // Sort chunks by offset
        List<C> sortedChunks = getSortedChunks();

        // Map from bucket index -> list of chunk pieces that fall in that bucket.
        Map<Integer, List<C>> bucketMap = new LinkedHashMap<>();

        for (C chunk : sortedChunks) {
            int chunkStart = chunk.getOffset();
            int chunkEnd = chunkStart + chunk.getLength() - 1;

            int firstBucket = (chunkStart - minOffset) / newChunkSize;
            int lastBucket = (chunkEnd - minOffset) / newChunkSize;

            for (int b = firstBucket; b <= lastBucket; b++) {
                int bucketStart = minOffset + b * newChunkSize;
                int bucketEnd = bucketStart + newChunkSize - 1;

                // Intersection of chunk and bucket
                int intersectStart = Math.max(chunkStart, bucketStart);
                int intersectEnd = Math.min(chunkEnd, bucketEnd);

                // Trim the chunk to [intersectStart, intersectEnd]
                C slice = chunk;
                if (intersectStart > chunkStart) {
                    slice = slice.splitAt(intersectStart).getChunk2();
                }
                int sliceEnd = intersectStart + slice.getLength() - 1;
                if (sliceEnd > intersectEnd) {
                    slice = slice.splitAt(intersectEnd + 1).getChunk1();
                }

                bucketMap.computeIfAbsent(b, k -> new ArrayList<>()).add(slice);
            }
        }

        // Build one time series per non-empty bucket
        List<T> result = new ArrayList<>(bucketMap.size());
        for (List<C> pieces : bucketMap.values()) {
            result.add(createTimeSeries(pieces));
        }
        return result;
    }

With something like this, we would avoid getting chunks filled with NaN. However, there are still issues because the method TimeSeries.split(List, int) expect a chunkCount based on the index, so we would have to change that or to generate empty chunks?

What do you think of it?

rolnico · 2026-06-17T10:00:12Z

    }

+    @Override
+    public List<DoubleTimeSeries> split(int newChunkSize) {


This code would potentially generate chunks/time series with only NaN values if there is a gap between the initial chunks:

@Test void testSplitIssue() { RegularTimeSeriesIndex index = RegularTimeSeriesIndex.create(Interval.parse("2015-01-01T00:00:00Z/2015-01-01T01:45:00Z"), Duration.ofMinutes(15)); TimeSeriesMetadata metadata = new TimeSeriesMetadata("ts1", TimeSeriesDataType.DOUBLE, index); UncompressedDoubleDataChunk chunk1 = new UncompressedDoubleDataChunk(0, new double[]{1d, 2d, 3d}); UncompressedDoubleDataChunk chunk2 = new UncompressedDoubleDataChunk(6, new double[]{7d, 8d}); StoredDoubleTimeSeries timeSeries = new StoredDoubleTimeSeries(metadata, chunk1, chunk2); // Split on multiple sizes List<DoubleTimeSeries> split2TimeSeries = timeSeries.split(2); List<DoubleTimeSeries> split3TimeSeries = timeSeries.split(3); List<DoubleTimeSeries> split4TimeSeries = timeSeries.split(4); assertEquals(4, split2TimeSeries.size()); assertEquals(3, split3TimeSeries.size()); assertEquals(2, split4TimeSeries.size()); assertArrayEquals(new double[]{1d, 2d}, split2TimeSeries.get(0).toCompactArray(), 0d); assertArrayEquals(new double[]{3d, NaN}, split2TimeSeries.get(1).toCompactArray(), 0d); assertArrayEquals(new double[]{NaN, NaN}, split2TimeSeries.get(2).toCompactArray(), 0d); assertArrayEquals(new double[]{7d, 8d}, split2TimeSeries.get(3).toCompactArray(), 0d); }

Do we want this?

You're right, i think if the range is entirely a gap, it should return empty series.

I will keep an empty series instead of either skipping it or filling it with NaN values, this preserves positional contract for split(List, int) and preserves the memory goal (e.g splitTestHuge)

…nce-for-many-small-chunks

finchello · 2026-06-26T12:20:43Z

Hi @rolnico, thanks for looping me in — happy to share a view.

I like the bucket approach: slicing each chunk into aligned [minOffset + b*newChunkSize, …] buckets is much easier to reason about than the recursive merge logic, and it makes the #3941 behaviour fall out for free — a gap is just an empty bucket, so there's no "merge across a gap" case and no Chunks are not successive exception.

On your open question, I think it's the real constraint and it points to the answer. TimeSeries.split(List, int) (TimeSeries.java ~L189-197) zips positionally:

int chunkCount = computeChunkCount(index, newChunkSize); // ceil(pointCount / newChunkSize)
for (int i = 0; i < chunkCount; i++) {
    splitList.get(i).add(split.get(i));
}

so it needs every series' split(int) to return exactly chunkCount pieces, one per window, in order — bucket b must sit at position b. If we build the result only from the non-empty bucketMap.values(), series with gaps return fewer pieces and that positional zip misaligns / throws.

So instead of "change split(List, int)" or "fill with NaN", maybe a third option: emit all chunkCount buckets in order, and represent an empty bucket as a data-less series (empty chunk list) rather than a NaN-filled chunk. That keeps the positional contract, still avoids allocating the NaN fill (the memory win you're after), and resolves #3941 structurally. toArray() on an empty-chunk series already yields the correct all-NaN window via getCheckedChunks's gap fill, so consumers shouldn't see a difference.

Two things I'd check first: createTimeSeries/the constructor path needs to accept an empty chunk list for a bucket (today it takes a single chunk), and whether any caller relies on each split piece being non-empty.

If that direction sounds right, it would also mean #3941 is fixed by this PR directly — happy to drop my separate adjacency-guard patch and instead add a few gapped-chunk test cases for the bucket version (incl. the split(4) repro). Thanks!

rolnico · 2026-06-26T13:03:07Z

Hi @rolnico, thanks for looping me in — happy to share a view.
[...]
If that direction sounds right, it would also mean #3941 is fixed by this PR directly — happy to drop my separate adjacency-guard patch and instead add a few gapped-chunk test cases for the bucket version (incl. the split(4) repro). Thanks!

It seems like an interesting idea worth testing. Could you open a new PR with your proposal, so that we can have a look, compare and test it?

finchello · 2026-06-26T14:25:55Z

Quick update — I prototyped the bucket version locally and ran it against the time-series suite. Testing surfaced the real trade-off behind your "change split(List,int) or generate empty chunks?" question:

If split(int) emits all chunkCount buckets (empty ones as data-less series), the positional split(List, int) zip stays correct and [TimeSeries] split handle non-successive chunks inconsistently #3941 is fixed — but it allocates O(pointCount / newChunkSize) series regardless of data sparsity. The existing splitTestHuge (pointCount ~1e8, split(2)) makes it concrete: ~3 series today → ~50M, which works against this PR's memory goal.
If split(int) emits only non-empty buckets (your original sketch), memory scales with actual data — but the positional zip in split(List, int) (split.get(i) for i in 0..chunkCount) misaligns.

So the crux is the one you flagged: to keep buckets sparse and alignable, split(List, int) would need to align by bucket index rather than list position (each piece tagged with its bucket index, or split returning a Map<bucketIndex, series>). That gets both the memory win and consistent gap handling.

Before I open the PR: which contract do you prefer — (a) dense grid of chunkCount pieces (simpler, but O(pointCount/chunkSize) memory), or (b) sparse buckets + index-based alignment in split(List, int) (more change, keeps the perf win)? Happy to implement either — I have (a) working and can adapt.

(Small aside: bucket alignment should be to absolute index 0, not minOffset, otherwise series with different minOffsets won't line up in split(List, int).)

Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

samirromdhani · 2026-06-30T15:50:03Z

Hello, thanks for the feedback,

I agree that the bucket slicing is clearer than the current recursive split, but it change split semantics, and here are the two options, as @rolnico noted :

Generate empty chunks: this breaks some tests that don't treat a gap as a chunk. (tests like splitTest and splitTestHuge)
or change TimeSeries.split(List, int)

I'm not against that direction, but I'd rather keep it separate:

This PR: performance fix (target Timeseries split quadratic performance for many small chunks #1634) on the current split path (toCompactArray, reduced allocations, use of system arraycopy ...) without breaking existing functional behavior
Seperate PR: bucket based split, with alignment fix in TimeSeries.split(List, int)

What do you think ?

…ct chunk view - Rewrite AbstractTimeSeries.split(int) with compact chunk - Remove recursive split and splitChunk helper (inused) - Add tests Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

samirromdhani · 2026-07-01T14:44:19Z

I'm not sure if we go the right way on this.

What if we tried to change the method AbstractTimeSeries.split(int) instead?

    public List<T> split(int newChunkSize) {
        if (chunks.isEmpty()) {
            return List.of();
        }

        int minOffset = getMinOffset();

        // Sort chunks by offset
        List<C> sortedChunks = getSortedChunks();

        // Map from bucket index -> list of chunk pieces that fall in that bucket.
        Map<Integer, List<C>> bucketMap = new LinkedHashMap<>();

        for (C chunk : sortedChunks) {
            int chunkStart = chunk.getOffset();
            int chunkEnd = chunkStart + chunk.getLength() - 1;

            int firstBucket = (chunkStart - minOffset) / newChunkSize;
            int lastBucket = (chunkEnd - minOffset) / newChunkSize;

            for (int b = firstBucket; b <= lastBucket; b++) {
                int bucketStart = minOffset + b * newChunkSize;
                int bucketEnd = bucketStart + newChunkSize - 1;

                // Intersection of chunk and bucket
                int intersectStart = Math.max(chunkStart, bucketStart);
                int intersectEnd = Math.min(chunkEnd, bucketEnd);

                // Trim the chunk to [intersectStart, intersectEnd]
                C slice = chunk;
                if (intersectStart > chunkStart) {
                    slice = slice.splitAt(intersectStart).getChunk2();
                }
                int sliceEnd = intersectStart + slice.getLength() - 1;
                if (sliceEnd > intersectEnd) {
                    slice = slice.splitAt(intersectEnd + 1).getChunk1();
                }

                bucketMap.computeIfAbsent(b, k -> new ArrayList<>()).add(slice);
            }
        }

        // Build one time series per non-empty bucket
        List<T> result = new ArrayList<>(bucketMap.size());
        for (List<C> pieces : bucketMap.values()) {
            result.add(createTimeSeries(pieces));
        }
        return result;
    }

With something like this, we would avoid getting chunks filled with NaN. However, there are still issues because the method TimeSeries.split(List, int) expect a chunkCount based on the index, so we would have to change that or to generate empty chunks?

What do you think of it?

After testing cases, it seems that the bucket approach needs remerging and not only alignment with TimeSeries.split(List, int)

The bucket split keeps one slice per source, (see splitMultiChunkTimeSeriesTest()): a bucket ends up with [2.0] + [3.0] even they are successive, so adjacent slices have to be merged back into one chunk! doing that merge here will add complexity
That's the reason for the compact way (toCompactChunk + splitAt): the data is already in one continuous block, so no merge step and in case of a full gap, we have a empty series to covers existing behaviours.

I'm fine moving further changes to a separate PR if they're considered out of the perf scope. still, these changes can be justified by the points discussed in this issue:

gaps are now handled cleanly (empty series, no NaN filled chunks), no exceptions from split(List, int)
The recursive split are removed

sonarqubecloud · 2026-07-01T14:51:44Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
93.5% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

finchello · 2026-07-01T19:40:13Z

Thanks @samirromdhani — that compact-chunk approach is neat. Compacting to one continuous block before splitAt sidesteps exactly the remerge problem I hit prototyping the bucket version (adjacent slices from different source chunks landing in the same window, as in splitMultiChunkTimeSeriesTest), and getting an empty series for full gaps for free is a clean way to keep existing behaviour while dropping the recursive split. Nice.
+1 on the scoping: keep this PR as the perf fix (+ the cleaner gap handling that falls out of it), and move the bucket-based split with the split(List, int) index-alignment change to a separate PR — that one carries the real contract change and deserves its own review.
If your updated split now returns an empty series for a gapped window and no longer throws Chunks are not successive, that effectively resolves #3941 here. Happy to contribute the gapped-chunk regression tests (the #3941 repro incl. split(4), plus a split(List, int) alignment case) so that behaviour is locked in — then I can close my separate #3941 patch. Just say the word and I'll open a small test-only PR against your branch (or paste the cases here). Thanks for driving this!

samirromdhani added 2 commits May 12, 2026 17:35

WIP: add toCompactArray as replacement for toArray

3c413d9

Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

add tests: split, toArray and toCompacArray

82753e5

CalculatedTimeSeries is not concerned by compact array, calc split returns copies and not create NaN Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

samirromdhani changed the base branch from main to fix/1609-split-time-series-and-toarray-wastes-a-lot-memory May 29, 2026 15:47

samirromdhani force-pushed the fix/1634-timeseries-split-quadratic-performance-for-many-small-chunks branch 3 times, most recently from 46fb6ab to 6893ad2 Compare May 29, 2026 16:02

add toCompactArray for StringTimeSeries impl

d8771b0

Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

samirromdhani force-pushed the fix/1634-timeseries-split-quadratic-performance-for-many-small-chunks branch 5 times, most recently from fc02f7c to 2865686 Compare June 1, 2026 14:17

samirromdhani added Performance Time series labels Jun 1, 2026

samirromdhani self-assigned this Jun 1, 2026

samirromdhani marked this pull request as ready for review June 1, 2026 15:26

samirromdhani changed the title ~~WIP: improve TimeSeries split performance~~ improve TimeSeries split performance Jun 1, 2026

samirromdhani requested review from MatthieuSAUR and rolnico June 1, 2026 15:32

samirromdhani marked this pull request as draft June 2, 2026 09:28

add test: NaN value in the middle should be preserved when compact

08453cf

Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

samirromdhani force-pushed the fix/1609-split-time-series-and-toarray-wastes-a-lot-memory branch from 64b7a54 to 08453cf Compare June 2, 2026 13:40

samirromdhani added 6 commits June 2, 2026 16:28

apply requested changes

2cb2904

Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

review: added api that allows get value by the original index

7a49080

Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

review: add test for get by index

5bbd671

Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

review: add tests for get by index

5932654

Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

Add coverage

549c779

Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

test.

bef6bba

Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

samirromdhani force-pushed the fix/1634-timeseries-split-quadratic-performance-for-many-small-chunks branch 2 times, most recently from 6811406 to 1a27bd3 Compare June 3, 2026 12:38

samirromdhani marked this pull request as ready for review June 3, 2026 12:44

samirromdhani added 6 commits June 3, 2026 17:15

review, move tests

84d7a20

Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

WIP: reduce split complexity

25f193d

Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

fixes related to toArray optimization

66a6bae

Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

add same improve for StringTimeSeries

d855cef

Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

refactor doSplitTest method

8c9af77

Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

test review

57690a9

Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

samirromdhani marked this pull request as draft June 4, 2026 07:24

samirromdhani force-pushed the fix/1634-timeseries-split-quadratic-performance-for-many-small-chunks branch from 1a27bd3 to 57690a9 Compare June 4, 2026 07:25

samirromdhani marked this pull request as ready for review June 4, 2026 07:26

Base automatically changed from fix/1609-split-time-series-and-toarray-wastes-a-lot-memory to main June 4, 2026 07:31

samirromdhani and others added 2 commits June 4, 2026 09:48

Merge branch 'main' into fix/1634-timeseries-split-quadratic-performa…

29209f6

…nce-for-many-small-chunks

Merge branch 'main' into fix/1634-timeseries-split-quadratic-performa…

bd8fdca

…nce-for-many-small-chunks

rolnico requested changes Jun 17, 2026

View reviewed changes

rolnico mentioned this pull request Jun 17, 2026

[TimeSeries] split handle non-successive chunks inconsistently #3941

Open

Merge branch 'main' into fix/1634-timeseries-split-quadratic-performa…

8db44f7

…nce-for-many-small-chunks

Fix javadoc in time-series/time-series-api

35f8909

Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

samirromdhani marked this pull request as draft July 1, 2026 12:34

perf(timeseries): review and improve ts split performance using compa…

32726b2

…ct chunk view - Rewrite AbstractTimeSeries.split(int) with compact chunk - Remove recursive split and splitChunk helper (inused) - Add tests Signed-off-by: Samir Romdhani <samir.romdhani_externe@rte-france.com>

samirromdhani marked this pull request as ready for review July 1, 2026 14:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

improve TimeSeries split performance#3933

improve TimeSeries split performance#3933
samirromdhani wants to merge 21 commits into
mainfrom
fix/1634-timeseries-split-quadratic-performance-for-many-small-chunks

samirromdhani commented May 29, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented Jun 17, 2026

Uh oh!

rolnico left a comment

Uh oh!

rolnico Jun 17, 2026

Uh oh!

samirromdhani Jun 30, 2026

Uh oh!

samirromdhani Jul 1, 2026

Uh oh!

finchello commented Jun 26, 2026

Uh oh!

rolnico commented Jun 26, 2026

Uh oh!

finchello commented Jun 26, 2026

Uh oh!

samirromdhani commented Jun 30, 2026 •

edited

Loading

Uh oh!

samirromdhani commented Jul 1, 2026

Uh oh!

sonarqubecloud Bot commented Jul 1, 2026

Uh oh!

finchello commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

samirromdhani commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 17, 2026

Quality Gate passed

Uh oh!

rolnico left a comment

Choose a reason for hiding this comment

Uh oh!

rolnico Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

samirromdhani Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

samirromdhani Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

finchello commented Jun 26, 2026

Uh oh!

rolnico commented Jun 26, 2026

Uh oh!

finchello commented Jun 26, 2026

Uh oh!

samirromdhani commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samirromdhani commented Jul 1, 2026

Uh oh!

sonarqubecloud Bot commented Jul 1, 2026

Quality Gate passed

Uh oh!

finchello commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

samirromdhani commented May 29, 2026 •

edited

Loading

samirromdhani commented Jun 30, 2026 •

edited

Loading