Skip to content

Conversation

@pracucci
Copy link
Collaborator

@pracucci pracucci commented Jan 5, 2026

What this PR does

In this PR I'm adding a new temporarily config option (ingestion-concurrency-sequential-pusher-enabled) that is enabled by default (to preserve the existing behaviour) but, when disabled, is expected to speed up the ingestion from Kafka by up to 2x on ingesters with mixed sized tenants. The config option is temporarily because my plan is to use it to gradually roll it out at Grafana Labs and then, if everything goes well, remove it and always use the new behaviour (that is removing sequential pusher at all).

This PR is large, but the real logic change is just a couple of lines. The rest is tests (more details below).

What's the optimization about?

When concurrent ingestion is enabled, the write request encoded in a Kafka record can be ingested in Mimir using two different paths:

  • Sequential pusher
  • Parallel pusher

When sequential pusher is used, the write request is ingested into TSDB "as is" and synchronously. When parallel pusher is used, the content of the write request is sharded (optional) and then appended to a batch and the batch (when full) is ingested into TSDB asynchronously.

Since the parallel pusher has some extra work to do for sharding and batching, the old belief was that overhead would have hit negatively for tenants with a small number of time series to ingest. For this reason, we dynamically chose between the sequential and parallel pusher based on the actual number of estimated series to ingest for a given tenant.

What we didn't consider in the past is that parallel pusher not only shard the metrics into N shards for parallel ingestion, but it also makes the ingestion asynchronous. This means that while we ingest metrics for a tenant we can have other tenants ingesting into TSDB too. On the contrary, sequential pusher blocks on each write to TSDB. This means that when we have some tenants using sequential pusher and others using parallel pusher, each time we use the sequential pusher we're effectively pausing the ingestion for all other tenants, even the ones that would have used the parallel pusher, because the sequential pusher is synchronous.

In this PR I'm just adding an option to never use the sequential pusher at all, and always use the parallel pusher even when no sharding is required (but using the parallel pusher we take advantage of the batching and asynchronous ingestion). As you will see below, in local testing I couldn't measure any performance degradation in any scenario.

Benchmarks

To benchmark it I've dumped real production data from a few selected Mimir clusters at Grafana Labs, including clusters with 1 single tenant and clusters with many tenants. Clusters with only 1 tenant don't see any big benefit because they were already using always the parallel pusher, but clusters with mixed size tenants (where some tenants used sequential pusher and others used the parallel one) showed up to 2x speed up.

Then, based on the real production data dump, I've generated some fixtures that try to mimic the production data pattern. This will allow everyone to run these benchmark over time.

These are the test results using the real production data dump (I've redacted the actual name of Grafana Cloud clusters):

goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/ingester
cpu: Apple M3 Pro
                                                          │ BenchmarkIngester_ReplayFromKafka_Dump-before.txt │ BenchmarkIngester_ReplayFromKafka_Dump-after.txt │
                                                          │                      sec/op                       │          sec/op            vs base               │
Ingester_ReplayFromKafka_Dump/mimir-cell-1.dump-11                                                 9.920 ± 4%                 9.598 ±  4%   -3.24% (p=0.041 n=6)
Ingester_ReplayFromKafka_Dump/mimir-cell-2.dump-11                                                 7.201 ± 4%                 3.505 ±  9%  -51.32% (p=0.002 n=6)
Ingester_ReplayFromKafka_Dump/mimir-cell-3.dump-11                                                17.833 ± 7%                 6.982 ± 24%  -60.85% (p=0.002 n=6)
Ingester_ReplayFromKafka_Dump/mimir-cell-4.dump-11                                                 10.57 ± 6%                 10.67 ±  5%        ~ (p=0.180 n=6)
Ingester_ReplayFromKafka_Dump/mimir-cell-5.dump-11                                                 7.738 ± 6%                 5.373 ±  6%  -30.56% (p=0.002 n=6)
geomean                                                                                            10.08                      6.696        -33.58%

                                                          │ BenchmarkIngester_ReplayFromKafka_Dump-before.txt │ BenchmarkIngester_ReplayFromKafka_Dump-after.txt │
                                                          │                       B/op                        │           B/op             vs base               │
Ingester_ReplayFromKafka_Dump/mimir-cell-1.dump-11                                               11.84Gi ± 2%                12.08Gi ± 2%   +1.98% (p=0.026 n=6)
Ingester_ReplayFromKafka_Dump/mimir-cell-2.dump-11                                               10.75Gi ± 1%                10.25Gi ± 1%   -4.66% (p=0.002 n=6)
Ingester_ReplayFromKafka_Dump/mimir-cell-3.dump-11                                               19.06Gi ± 1%                16.62Gi ± 1%  -12.80% (p=0.002 n=6)
Ingester_ReplayFromKafka_Dump/mimir-cell-4.dump-11                                               10.32Gi ± 0%                10.34Gi ± 1%        ~ (p=0.310 n=6)
Ingester_ReplayFromKafka_Dump/mimir-cell-5.dump-11                                               11.24Gi ± 3%                11.19Gi ± 4%        ~ (p=0.240 n=6)
geomean                                                                                          12.30Gi                     11.89Gi        -3.30%

                                                          │ BenchmarkIngester_ReplayFromKafka_Dump-before.txt │ BenchmarkIngester_ReplayFromKafka_Dump-after.txt │
                                                          │                     allocs/op                     │         allocs/op          vs base               │
Ingester_ReplayFromKafka_Dump/mimir-cell-1.dump-11                                                24.88M ± 0%                 25.12M ± 0%   +0.96% (p=0.002 n=6)
Ingester_ReplayFromKafka_Dump/mimir-cell-2.dump-11                                                42.45M ± 0%                 37.78M ± 0%  -11.01% (p=0.002 n=6)
Ingester_ReplayFromKafka_Dump/mimir-cell-3.dump-11                                                91.26M ± 1%                 69.48M ± 0%  -23.86% (p=0.002 n=6)
Ingester_ReplayFromKafka_Dump/mimir-cell-4.dump-11                                                37.92M ± 0%                 37.93M ± 0%        ~ (p=0.937 n=6)
Ingester_ReplayFromKafka_Dump/mimir-cell-5.dump-11                                                48.67M ± 0%                 47.59M ± 0%   -2.22% (p=0.002 n=6)
geomean                                                                                           44.67M                      41.22M        -7.73%

These are the tests based on the fixtures generated from the patterns observed in production workload:

  • "1_large_tenant" fixtures are based on "mimir-cell-1"
  • "100_mixed_tenants" fixtures are based on "mimir-cell-3"
  • "350_mixed_tenants" fixtures are based on "mimir-cell-5"
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/ingester
cpu: Apple M3 Pro
                                              │  before.txt  │              after.txt              │
                                              │    sec/op    │    sec/op     vs base               │
Ingester_ReplayFromKafka/350_mixed_tenants-11   3.389µ ± 18%   2.258µ ± 10%  -33.38% (p=0.002 n=6)
Ingester_ReplayFromKafka/1_large_tenant-11      705.2n ±  2%   643.6n ± 65%        ~ (p=0.065 n=6)
Ingester_ReplayFromKafka/100_mixed_tenants-11   2.463µ ±  3%   1.040µ ±  9%  -57.75% (p=0.002 n=6)
geomean                                         1.805µ         1.148µ        -36.43%

                                              │  before.txt   │              after.txt              │
                                              │     B/op      │     B/op      vs base               │
Ingester_ReplayFromKafka/350_mixed_tenants-11   7.810Ki ± 14%   6.654Ki ± 6%  -14.80% (p=0.002 n=6)
Ingester_ReplayFromKafka/1_large_tenant-11      3.960Ki ± 10%   4.227Ki ± 3%        ~ (p=0.065 n=6)
Ingester_ReplayFromKafka/100_mixed_tenants-11   5.555Ki ±  8%   4.611Ki ± 9%  -16.99% (p=0.002 n=6)
geomean                                         5.559Ki         5.062Ki        -8.95%

                                              │  before.txt   │              after.txt               │
                                              │   allocs/op   │  allocs/op   vs base                 │
Ingester_ReplayFromKafka/350_mixed_tenants-11   60.00 ± 20%     41.50 ± 11%  -30.83% (p=0.002 n=6)
Ingester_ReplayFromKafka/1_large_tenant-11      0.000 ±  0%     0.000 ±  0%        ~ (p=1.000 n=6) ¹
Ingester_ReplayFromKafka/100_mixed_tenants-11   22.00 ±  5%     14.00 ±  7%  -36.36% (p=0.002 n=6)
geomean                                                     ²                -23.93%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Other notes

  • No changelog because my plan is to add a changelog entry once I will remove the temporarily config option and make the change the default.
  • I've added a general purpose fixture generator, because I couldn't reproduce the issue locally with dummy fake data. I had to learn the actual data patterns from prod to reproduce it with fake data too.
  • I've added kafkatool dump analyse to analyse a dump and extract the key information to configure the fixture generator.
  • In a follow up PR I will optimize a bit parallelStorageShards.PushToStorageAndReleaseRequest() for the case there's only 1 shard, but I've already tested it and it's not very impactful.
  • I suggest to review it with "hide whitespace changes" enabled.

Which issue(s) this PR fixes or relates to

N/A

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
  • about-versioning.md updated with experimental features.

Note

Introduces an experimental switch to prefer parallel ingestion over sequential for small-tenant workloads and adds tooling/benchmarks to validate performance.

  • Adds ingestion_concurrency_sequential_pusher_enabled (default: true) to Kafka ingest config; updates config descriptors, help text, docs, and defaults
  • Pusher changes: buffer record unmarshalling, plumb flag into parallelStoragePusher; use sequential pusher only when enabled and idealShards<=1
  • Minor API tweaks: export LabelAdaptersHash; rename writer config disableLinger -> DisableLinger
  • New fixture generator (pkg/storage/ingest/fixture_generator*) and extensive benchmarks for Kafka replay and pusher consumer; test updates to pass logger
  • kafkatool: new dump analyse command, refactored dump parsing helpers, and improved offset flag help

Written by Cursor Bugbot for commit 2dd6c27. This will update automatically on new commits. Configure here.

@pracucci pracucci requested review from a team and tacole02 as code owners January 5, 2026 20:06
@pracucci pracucci added the changelog-not-needed PRs that don't need a CHANGELOG.md entry label Jan 5, 2026
// On cancellation (e.g., pushToStorage error), any records remaining in the buffer won't
// be processed. This is acceptable because errors trigger a retry of the entire batch,
// and the memory will be freed by GC.
recordsChannel = make(chan parsedRecord, 128)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: this optimization is slightly impactful. The big benefit comes from not using the sequential pusher at all.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

💻 Deploy preview deleted (Optimize ingestion from Kafka on ingesters with mixed size tenants ).

idealShards := c.idealShardsFor(userID)
var p PusherCloser
if idealShards <= 1 {
if idealShards <= 1 && c.sequentialPusherEnabled {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: this the optimization. Not using sequential pusher at all.

Copy link
Contributor

@tacole02 tacole02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs look good! I left a few minor suggestions. Thank you!

}
if numLarge == 0 && cfg.LargeTenants.TenantPercent > 0 {
numLarge = 1
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixture generator may create more tenants than requested

Low Severity

The tenant count calculation computes numLarge as the remainder (numTenants - numSmall - numMedium) before applying the "ensure at least 1 tenant" adjustments to numSmall and numMedium. When those values are later incremented, the total tenant count can exceed the requested numTenants. For example, with numTenants=2 and mixed percentages (40%/44%/16%), the initial calculation yields 0/0/2, but after adjustments becomes 1/1/2, creating 4 tenants instead of 2. This only affects the benchmark fixture generator, not production ingestion code.

Fix in Cursor Fix in Web

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is expected. We're fine with it.

Copy link
Contributor

@tcard tcard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few suggestions.

tenant.nextSeriesIdx = (tenant.nextSeriesIdx + 1) % tenant.uniqueSeries
}

return &mimirpb.WriteRequest{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate this is meant for unit tests, but I think it's useful for those "synthetic" WriteRequests to have their BufferHolder set and holding UnsafeMutableStrings to the buffer, e. g. by marshaling and then unmarshaling. This makes them more realistic, and increases the coverage of reference leaks detection we're introducing in #13609.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like in 2dd6c27 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm only now realizing this requires the new mimirpb.Unmarshal introduced in #13609. I'll make a note to reconcile this later.

pracucci and others added 6 commits January 7, 2026 16:01
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
@pracucci pracucci force-pushed the investigate-higher-consumption branch from bdc6faf to 2dd6c27 Compare January 7, 2026 15:22
@seizethedave seizethedave self-requested a review January 7, 2026 16:05
Copy link
Contributor

@seizethedave seizethedave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense to me and I can't find anything wrong with it.

@pracucci pracucci merged commit 9514688 into main Jan 8, 2026
41 checks passed
@pracucci pracucci deleted the investigate-higher-consumption branch January 8, 2026 08:10
pracucci added a commit that referenced this pull request Jan 8, 2026
… 1 shard (#13961)

#### What this PR does

This PR is a follow up of #13924.
In this PR I'm doing a micro optimization to `parallelStorageShards` for
the case there's only 1 shard. The impact of this optimization is
minimal, but I think doesn't make the code harder to follow, so it may
be worth to keep it.

My local benchmarks are not super stable, but here you can get an idea
of the impact (minimal). Keep in mind
`PusherConsumer_ParallelPusher_MultiTenant` uses a mocked backend. In
the real world, where metrics are actually ingested in TSDB, the impact
is much smaller:

```
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/storage/ingest
cpu: Apple M3 Pro
                                                                       │ BenchmarkPusherConsumer_ParallelPusher_MultiTenant-before.txt │ BenchmarkPusherConsumer_ParallelPusher_MultiTenant-after.txt │
                                                                       │                            sec/op                             │                sec/op                  vs base               │
PusherConsumer_ParallelPusher_MultiTenant/records=1,tenants=1-11                                                          8.421µ ±  5%                            8.862µ ±  2%   +5.24% (p=0.009 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=1,tenants=10-11                                                         8.578µ ± 81%                            8.971µ ± 46%        ~ (p=0.240 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=1,tenants=100-11                                                        8.394µ ±  3%                            9.005µ ±  1%   +7.28% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=1,tenants=1000-11                                                       8.308µ ±  2%                            8.998µ ±  0%   +8.31% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=10,tenants=1-11                                                         21.22µ ±  4%                            21.35µ ±  2%        ~ (p=0.937 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=10,tenants=10-11                                                        43.02µ ±  6%                            40.24µ ±  0%   -6.46% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=10,tenants=100-11                                                       43.24µ ±  1%                            40.50µ ±  0%   -6.33% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=10,tenants=1000-11                                                      43.37µ ±  2%                            40.73µ ±  1%   -6.10% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=100,tenants=1-11                                                        126.3µ ±  0%                            120.2µ ±  1%   -4.85% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=100,tenants=10-11                                                       137.7µ ±  0%                            122.0µ ±  3%  -11.40% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=100,tenants=100-11                                                      304.7µ ±  3%                            275.2µ ±  0%   -9.69% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=100,tenants=1000-11                                                     303.5µ ±  2%                            275.1µ ±  0%   -9.35% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=1000,tenants=1-11                                                       1.142m ±  1%                            1.124m ±  1%   -1.58% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=1000,tenants=10-11                                                      1.051m ±  1%                            1.115m ±  0%   +6.10% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=1000,tenants=100-11                                                     1.264m ±  1%                            1.301m ±  0%   +2.91% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=1000,tenants=1000-11                                                    3.060m ±  3%                            2.996m ±  4%        ~ (p=0.240 n=6)
geomean                                                                                                                   97.29µ                                  95.70µ         -1.63%

                                                                       │ BenchmarkPusherConsumer_ParallelPusher_MultiTenant-before.txt │ BenchmarkPusherConsumer_ParallelPusher_MultiTenant-after.txt │
                                                                       │                             B/op                              │                  B/op                   vs base              │
PusherConsumer_ParallelPusher_MultiTenant/records=1,tenants=1-11                                                          17.40Ki ± 0%                             17.44Ki ± 0%  +0.19% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=1,tenants=10-11                                                         17.40Ki ± 0%                             17.43Ki ± 0%  +0.19% (p=0.035 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=1,tenants=100-11                                                        17.39Ki ± 0%                             17.43Ki ± 0%  +0.21% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=1,tenants=1000-11                                                       17.39Ki ± 0%                             17.43Ki ± 0%  +0.20% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=10,tenants=1-11                                                         25.15Ki ± 0%                             25.21Ki ± 0%  +0.25% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=10,tenants=10-11                                                        78.70Ki ± 0%                             78.70Ki ± 0%       ~ (p=0.223 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=10,tenants=100-11                                                       78.70Ki ± 0%                             78.70Ki ± 0%  +0.01% (p=0.037 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=10,tenants=1000-11                                                      78.69Ki ± 0%                             78.70Ki ± 0%       ~ (p=0.058 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=100,tenants=1-11                                                        120.9Ki ± 0%                             121.7Ki ± 0%  +0.68% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=100,tenants=10-11                                                       158.2Ki ± 0%                             159.4Ki ± 0%  +0.80% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=100,tenants=100-11                                                      707.4Ki ± 0%                             708.3Ki ± 0%  +0.13% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=100,tenants=1000-11                                                     707.3Ki ± 0%                             708.3Ki ± 0%  +0.14% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=1000,tenants=1-11                                                       985.3Ki ± 0%                             978.7Ki ± 1%  -0.66% (p=0.041 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=1000,tenants=10-11                                                      1.064Mi ± 0%                             1.091Mi ± 0%  +2.54% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=1000,tenants=100-11                                                     1.479Mi ± 0%                             1.547Mi ± 0%  +4.62% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=1000,tenants=1000-11                                                    6.935Mi ± 0%                             6.947Mi ± 0%  +0.17% (p=0.002 n=6)
geomean                                                                                                                   156.1Ki                                  157.0Ki       +0.59%

                                                                       │ BenchmarkPusherConsumer_ParallelPusher_MultiTenant-before.txt │ BenchmarkPusherConsumer_ParallelPusher_MultiTenant-after.txt │
                                                                       │                           allocs/op                           │              allocs/op                vs base                │
PusherConsumer_ParallelPusher_MultiTenant/records=1,tenants=1-11                                                            48.00 ± 0%                             48.00 ± 0%       ~ (p=1.000 n=6) ¹
PusherConsumer_ParallelPusher_MultiTenant/records=1,tenants=10-11                                                           48.00 ± 0%                             48.00 ± 0%       ~ (p=1.000 n=6) ¹
PusherConsumer_ParallelPusher_MultiTenant/records=1,tenants=100-11                                                          48.00 ± 0%                             48.00 ± 0%       ~ (p=1.000 n=6) ¹
PusherConsumer_ParallelPusher_MultiTenant/records=1,tenants=1000-11                                                         48.00 ± 0%                             48.00 ± 0%       ~ (p=1.000 n=6) ¹
PusherConsumer_ParallelPusher_MultiTenant/records=10,tenants=1-11                                                           166.0 ± 0%                             166.0 ± 0%       ~ (p=1.000 n=6) ¹
PusherConsumer_ParallelPusher_MultiTenant/records=10,tenants=10-11                                                          291.0 ± 0%                             291.0 ± 0%       ~ (p=1.000 n=6) ¹
PusherConsumer_ParallelPusher_MultiTenant/records=10,tenants=100-11                                                         291.0 ± 0%                             291.0 ± 0%       ~ (p=1.000 n=6) ¹
PusherConsumer_ParallelPusher_MultiTenant/records=10,tenants=1000-11                                                        291.0 ± 0%                             291.0 ± 0%       ~ (p=1.000 n=6) ¹
PusherConsumer_ParallelPusher_MultiTenant/records=100,tenants=1-11                                                         1.364k ± 0%                            1.366k ± 0%  +0.15% (p=0.004 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=100,tenants=10-11                                                        1.466k ± 0%                            1.467k ± 0%  +0.07% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=100,tenants=100-11                                                       2.664k ± 0%                            2.664k ± 0%       ~ (p=0.545 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=100,tenants=1000-11                                                      2.663k ± 0%                            2.664k ± 0%       ~ (p=0.182 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=1000,tenants=1-11                                                        13.28k ± 0%                            13.27k ± 0%       ~ (p=0.093 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=1000,tenants=10-11                                                       13.38k ± 0%                            13.41k ± 0%  +0.19% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=1000,tenants=100-11                                                      14.36k ± 0%                            14.37k ± 0%  +0.08% (p=0.002 n=6)
PusherConsumer_ParallelPusher_MultiTenant/records=1000,tenants=1000-11                                                     26.23k ± 0%                            26.22k ± 0%  -0.07% (p=0.002 n=6)
geomean                                                                                                                     784.6                                  784.8       +0.02%
¹ all samples are equal
```

No changelog because I will add one at the end of this work, once
`-ingest-storage.kafka.ingestion-concurrency-sequential-pusher-enabled`
will be removed and the new behaviour will be the default. With the
default config `parallelStorageShards` is never used with only 1 shard
(you need to set
`-ingest-storage.kafka.ingestion-concurrency-sequential-pusher-enabled=false`
to use `parallelStorageShards` even when there's only 1 shard).

#### Which issue(s) this PR fixes or relates to

N/A

#### Checklist

- [ ] Tests updated.
- [ ] Documentation added.
- [ ] `CHANGELOG.md` updated - the order of entries should be
`[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry
is not needed, please add the `changelog-not-needed` label to the PR.
- [ ]
[`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md)
updated with experimental features.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Optimizes shard routing when `numShards == 1` and refines performance
benchmarks.
> 
> - In `parallelStorageShards.PushToStorageAndReleaseRequest`, bypasses
label hashing/modulo and always targets shard `0` when only one shard;
metadata routing now skips round-robin and random start when
single-shard.
> - `BenchmarkPusherConsumer_ParallelPusher_MultiTenant`: replaces
single-parameter sweep with nested `records×tenants` cases, prebuilds
records per case, and sets
`IngestionConcurrencySequentialPusherEnabled=false` to exercise parallel
pusher consistently.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
49af667. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Signed-off-by: Marco Pracucci <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog-not-needed PRs that don't need a CHANGELOG.md entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants