Vectorise expand8 in ForUtil using JDK Vector API #15198

RamakrishnaChilaka · 2025-09-17T21:46:53Z

This PR optimizes the expand8 routine by leveraging the JDK Vector API.

Benchmarks

I have validated performance using a standalone benchmark (see postings_expand_benchmark) for block_size: 256. Key take-aways are as follows. Benchmarks ran on i5-13600k and 256 bit vectors.

Benchmark	Mode	Cnt	Score	Error	Units
expand16 (Scalar)	thrpt	5	112.842	± 0.221	ops/us
expand16 (Vector)	thrpt	5	105.594	± 1.307	ops/us
expand8 (Scalar)	thrpt	5	66.726	± 0.452	ops/us
expand8 (Vector)	thrpt	5	105.821	± 0.272	ops/us

expand8: Vectorized version is ~59% faster than scalar (66.7 → 105.8 ops/us).
expand16: Scalar slightly outperforms vector (112.8 vs 105.6 ops/us).

Lucene Microbenchmarks


baseline
Benchmark                                (bpv)   Mode  Cnt   Score   Error   Units
PostingIndexInputBenchmark.decode            2  thrpt   15  35.409 ± 0.120  ops/us
PostingIndexInputBenchmark.decode            3  thrpt   15  29.128 ± 0.017  ops/us
PostingIndexInputBenchmark.decode            4  thrpt   15  41.492 ± 0.305  ops/us
PostingIndexInputBenchmark.decode            5  thrpt   15  32.205 ± 0.350  ops/us
PostingIndexInputBenchmark.decode            6  thrpt   15  31.237 ± 0.245  ops/us
PostingIndexInputBenchmark.decode            7  thrpt   15  29.984 ± 0.582  ops/us
PostingIndexInputBenchmark.decode            8  thrpt   15  56.366 ± 0.134  ops/us
PostingIndexInputBenchmark.decode            9  thrpt   15  22.802 ± 0.077  ops/us
PostingIndexInputBenchmark.decode           10  thrpt   15  23.502 ± 0.037  ops/us
PostingIndexInputBenchmark.decodeVector      2  thrpt   15  53.151 ± 0.070  ops/us
PostingIndexInputBenchmark.decodeVector      3  thrpt   15  48.863 ± 1.455  ops/us
PostingIndexInputBenchmark.decodeVector      4  thrpt   15  54.284 ± 2.195  ops/us
PostingIndexInputBenchmark.decodeVector      5  thrpt   15  39.302 ± 0.659  ops/us
PostingIndexInputBenchmark.decodeVector      6  thrpt   15  38.414 ± 0.830  ops/us
PostingIndexInputBenchmark.decodeVector      7  thrpt   15  39.609 ± 0.551  ops/us
PostingIndexInputBenchmark.decodeVector      8  thrpt   15  56.373 ± 0.118  ops/us
PostingIndexInputBenchmark.decodeVector      9  thrpt   15  27.295 ± 0.351  ops/us
PostingIndexInputBenchmark.decodeVector     10  thrpt   15  30.058 ± 0.172  ops/us


contender
Benchmark                                (bpv)   Mode  Cnt   Score   Error   Units
PostingIndexInputBenchmark.decode            2  thrpt   15  35.238 ± 0.209  ops/us
PostingIndexInputBenchmark.decode            3  thrpt   15  29.214 ± 0.098  ops/us
PostingIndexInputBenchmark.decode            4  thrpt   15  41.559 ± 0.580  ops/us
PostingIndexInputBenchmark.decode            5  thrpt   15  32.543 ± 0.175  ops/us
PostingIndexInputBenchmark.decode            6  thrpt   15  31.323 ± 0.061  ops/us
PostingIndexInputBenchmark.decode            7  thrpt   15  29.525 ± 0.315  ops/us
PostingIndexInputBenchmark.decode            8  thrpt   15  52.348 ± 0.079  ops/us
PostingIndexInputBenchmark.decode            9  thrpt   15  24.919 ± 0.056  ops/us
PostingIndexInputBenchmark.decode           10  thrpt   15  26.581 ± 0.049  ops/us
PostingIndexInputBenchmark.decodeVector      2  thrpt   15  71.223 ± 6.921  ops/us
PostingIndexInputBenchmark.decodeVector      3  thrpt   15  53.237 ± 1.962  ops/us
PostingIndexInputBenchmark.decodeVector      4  thrpt   15  73.437 ± 0.284  ops/us
PostingIndexInputBenchmark.decodeVector      5  thrpt   15  41.201 ± 2.067  ops/us
PostingIndexInputBenchmark.decodeVector      6  thrpt   15  46.622 ± 0.289  ops/us
PostingIndexInputBenchmark.decodeVector      7  thrpt   15  45.505 ± 1.044  ops/us
PostingIndexInputBenchmark.decodeVector      8  thrpt   15  58.368 ± 0.977  ops/us
PostingIndexInputBenchmark.decodeVector      9  thrpt   15  27.243 ± 0.358  ops/us
PostingIndexInputBenchmark.decodeVector     10  thrpt   15  30.059 ± 0.105  ops/us

Summary

bpv -9,10 uses primitive size as 16, hence no change in performance.

bpv	baseline vector (ops/μs)	contender vector (ops/μs)	Δ
2	53.2	71.2	+33.8 %
3	48.9	53.2	+8.8 %
4	54.3	73.4	+35.2 %
5	39.3	41.2	+4.8 %
6	38.4	46.6	+21.4 %
7	39.6	45.5	+14.9 %
8	56.3	58.4	+3.7 %
9	27.3	27.2	–0.4 %
10	30.1	30.1	0.0 %

github-actions · 2025-09-17T21:47:44Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

RamakrishnaChilaka · 2025-09-18T10:13:20Z

lucene util benchmark

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                          Fuzzy1       52.76     (11.7%)       49.37      (9.3%)   -6.4% ( -24% -   16%) 0.055
         AndHighMedDayTaxoFacets       30.57      (3.9%)       29.19      (3.9%)   -4.5% ( -11% -    3%) 0.000
                         Respell       25.63      (9.5%)       24.90     (10.2%)   -2.8% ( -20% -   18%) 0.363
           BrowseMonthTaxoFacets        2.63      (5.1%)        2.56      (6.7%)   -2.8% ( -13% -    9%) 0.136
                      TermDTSort      152.52      (6.0%)      149.44      (7.4%)   -2.0% ( -14% -   12%) 0.344
            HighIntervalsOrdered       26.57      (9.0%)       26.08      (8.5%)   -1.9% ( -17% -   17%) 0.500
            HighTermTitleBDVSort       17.65      (7.5%)       17.36      (6.4%)   -1.6% ( -14% -   13%) 0.457
                          Fuzzy2       50.14     (11.6%)       49.33      (8.6%)   -1.6% ( -19% -   20%) 0.615
                    OrNotHighLow      987.37      (3.1%)      974.78      (4.7%)   -1.3% (  -8% -    6%) 0.313
            BrowseDateSSDVFacets        0.70      (8.8%)        0.69      (9.7%)   -0.9% ( -17% -   19%) 0.753
            BrowseDateTaxoFacets        2.49      (9.9%)        2.47     (11.0%)   -0.8% ( -19% -   22%) 0.800
     BrowseRandomLabelSSDVFacets        2.44     (12.9%)        2.43     (13.8%)   -0.4% ( -24% -   30%) 0.917
                        Wildcard      301.90      (2.7%)      300.69      (2.9%)   -0.4% (  -5% -    5%) 0.651
                         Prefix3      408.43      (2.4%)      407.03      (3.2%)   -0.3% (  -5% -    5%) 0.706
                     LowSpanNear      159.18      (8.2%)      158.74      (6.7%)   -0.3% ( -14% -   15%) 0.907
                   OrNotHighHigh      163.69      (7.6%)      163.29      (8.0%)   -0.3% ( -14% -   16%) 0.920
                      AndHighLow     1087.95      (4.9%)     1085.33      (4.1%)   -0.2% (  -8% -    9%) 0.866
                    OrHighNotMed      561.31      (6.0%)      560.41      (5.7%)   -0.2% ( -11% -   12%) 0.931
                          IntSet      497.27      (6.5%)      496.90      (8.1%)   -0.1% ( -13% -   15%) 0.975
             LowIntervalsOrdered       12.07      (4.1%)       12.07      (4.5%)    0.0% (  -8% -    8%) 0.977
               HighTermMonthSort      851.18      (5.4%)      852.11      (5.5%)    0.1% ( -10% -   11%) 0.950
                    OrNotHighMed      274.91      (7.4%)      275.42      (6.4%)    0.2% ( -12% -   15%) 0.933
        AndHighHighDayTaxoFacets        7.60      (7.3%)        7.61      (8.7%)    0.2% ( -14% -   17%) 0.933
                          IntNRQ      533.39     (13.8%)      534.64     (12.7%)    0.2% ( -23% -   31%) 0.956
             MedIntervalsOrdered        9.57      (4.5%)        9.60      (6.3%)    0.3% ( -10% -   11%) 0.881
            MedTermDayTaxoFacets       16.62      (6.5%)       16.68      (7.4%)    0.3% ( -12% -   15%) 0.882
                     MedSpanNear       87.38     (12.0%)       87.69      (8.5%)    0.4% ( -18% -   23%) 0.914
                      HighPhrase       56.33      (4.7%)       56.58      (5.0%)    0.4% (  -8% -   10%) 0.778
       BrowseDayOfYearTaxoFacets        2.51     (10.5%)        2.52     (11.7%)    0.5% ( -19% -   25%) 0.896
           BrowseMonthSSDVFacets        3.38     (10.8%)        3.40      (8.8%)    0.5% ( -17% -   22%) 0.864
               HighTermTitleSort       90.42      (2.9%)       91.10      (3.0%)    0.8% (  -4% -    6%) 0.414
     BrowseRandomLabelTaxoFacets        1.85      (5.1%)        1.87      (3.2%)    0.8% (  -7% -    9%) 0.540
                    OrHighNotLow      672.02      (8.7%)      679.42      (6.5%)    1.1% ( -13% -   17%) 0.652
           HighTermDayOfYearSort      218.30      (2.9%)      220.71      (3.3%)    1.1% (  -5% -    7%) 0.266
                        PKLookup      146.99     (13.5%)      149.20      (8.5%)    1.5% ( -18% -   27%) 0.673
                       MedPhrase      127.53      (4.9%)      129.64      (3.6%)    1.7% (  -6% -   10%) 0.224
                    HighSpanNear        8.31      (7.2%)        8.46      (5.3%)    1.8% (  -9% -   15%) 0.377
                           range     3829.26      (5.5%)     3901.35      (7.2%)    1.9% ( -10% -   15%) 0.350
                       OrHighMed      414.19      (7.1%)      421.98      (5.2%)    1.9% (  -9% -   15%) 0.338
                         LowTerm     1311.02      (7.6%)     1336.03      (5.8%)    1.9% ( -10% -   16%) 0.373
       BrowseDayOfYearSSDVFacets        3.15     (10.5%)        3.23      (6.2%)    2.5% ( -12% -   21%) 0.355
                      AndHighMed      449.54      (3.0%)      461.19      (2.3%)    2.6% (  -2% -    8%) 0.002
                   OrHighNotHigh      264.69      (7.4%)      271.60      (7.9%)    2.6% ( -11% -   19%) 0.279
                       LowPhrase       64.81      (3.4%)       66.60      (3.4%)    2.8% (  -3% -    9%) 0.009
                     AndHighHigh      183.33     (10.2%)      189.00     (10.8%)    3.1% ( -16% -   26%) 0.352
                 MedSloppyPhrase       50.66      (4.3%)       52.28      (3.6%)    3.2% (  -4% -   11%) 0.012
                      OrHighHigh      194.55     (10.0%)      200.90      (8.5%)    3.3% ( -13% -   24%) 0.266
          OrHighMedDayTaxoFacets        7.88     (11.1%)        8.17      (8.8%)    3.7% ( -14% -   26%) 0.240
                         MedTerm      792.86      (8.5%)      823.44      (7.5%)    3.9% ( -11% -   21%) 0.127
                 LowSloppyPhrase       29.20      (5.7%)       30.44      (7.8%)    4.2% (  -8% -   18%) 0.049
                HighSloppyPhrase       33.31     (10.7%)       34.91      (6.8%)    4.8% ( -11% -   24%) 0.089
                       OrHighLow      615.53      (6.9%)      645.25      (5.6%)    4.8% (  -7% -   18%) 0.015
                        HighTerm      586.61     (10.6%)      618.50      (9.7%)    5.4% ( -13% -   28%) 0.091

mikemccand · 2025-09-19T14:47:17Z

Very cool -- thank you for running JFR (micro benchmark) and luceneutil (macro?)!

What exactly is expand8 and where it is it used in Lucene? Is it postings decode when bitwidth is 8?

lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorUtilSupport.java

RamakrishnaChilaka · 2025-09-20T10:30:54Z

Thank you @mikemccand, @jpountz for reviewing the PR.

What exactly is expand8 and where it is it used in Lucene? Is it postings decode when bitwidth is 8?

The patch vectorises ForUtil.expand8 with the JDK Vector API.
expand8 is the low-level routine that inflates 1–8-bit packed integers back to 32-bit during postings decode; it is on the hot path for every segment that stores doc IDs, frequencies, or positions with ≤ 8 bits per value.

Added Javadocs now!

RamakrishnaChilaka · 2025-09-23T06:53:37Z

Shows good speedup in the nightly benchmarks ~(1-3.5%). Will push an annotation.

https://benchmarks.mikemccandless.com/2025.09.21.18.04.40.html

lucene/core/src/java/org/apache/lucene/util/VectorUtil.java

github-actions bot added the module:core/codecs label Sep 17, 2025

RamakrishnaChilaka force-pushed the vectorise_expand8_expand16 branch from 2070af4 to c95ab5b Compare September 17, 2025 22:02

github-actions bot added this to the 10.4.0 milestone Sep 17, 2025

RamakrishnaChilaka force-pushed the vectorise_expand8_expand16 branch from c95ab5b to 883afea Compare September 18, 2025 09:41

Optimize ForUtil.expand8 using the JDK Vector API

de7f475

RamakrishnaChilaka force-pushed the vectorise_expand8_expand16 branch from 883afea to de7f475 Compare September 18, 2025 10:42

jpountz approved these changes Sep 19, 2025

View reviewed changes

lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorUtilSupport.java Show resolved Hide resolved

adding JavaDoc for expand8

66397c4

tidy

ed17ae2

RamakrishnaChilaka merged commit 15ed5d7 into apache:main Sep 21, 2025
8 checks passed

RamakrishnaChilaka deleted the vectorise_expand8_expand16 branch September 21, 2025 05:24

RamakrishnaChilaka mentioned this pull request Sep 23, 2025

annotation for 'Vectorise expand8 in ForUtil using JDK Vector API' mikemccand/luceneutil#469

Merged

uschindler reviewed Sep 23, 2025

View reviewed changes

lucene/core/src/java/org/apache/lucene/util/VectorUtil.java Show resolved Hide resolved

RamakrishnaChilaka mentioned this pull request Sep 23, 2025

add JavaDoc for expand8 in VectorUtil #15218

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vectorise expand8 in ForUtil using JDK Vector API #15198

Vectorise expand8 in ForUtil using JDK Vector API #15198

Uh oh!

RamakrishnaChilaka commented Sep 17, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 17, 2025

Uh oh!

RamakrishnaChilaka commented Sep 18, 2025

Uh oh!

mikemccand commented Sep 19, 2025

Uh oh!

Uh oh!

RamakrishnaChilaka commented Sep 20, 2025

Uh oh!

Uh oh!

RamakrishnaChilaka commented Sep 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Vectorise expand8 in ForUtil using JDK Vector API #15198

Vectorise expand8 in ForUtil using JDK Vector API #15198

Uh oh!

Conversation

RamakrishnaChilaka commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Lucene Microbenchmarks

Summary

Uh oh!

github-actions bot commented Sep 17, 2025

Uh oh!

RamakrishnaChilaka commented Sep 18, 2025

Uh oh!

mikemccand commented Sep 19, 2025

Uh oh!

Uh oh!

RamakrishnaChilaka commented Sep 20, 2025

Uh oh!

Uh oh!

RamakrishnaChilaka commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RamakrishnaChilaka commented Sep 17, 2025 •

edited

Loading

RamakrishnaChilaka commented Sep 23, 2025 •

edited

Loading