Skip to content

Conversation

@knutwannheden
Copy link

The fix computes an appropriate highestToLowestValueRatio from the configured minimumExpectedValue and maximumExpectedValue in DistributionStatisticConfig, allowing HdrHistogram to cover the expected range without constant resizing.

Problem

The current implementation hardcodes a ratio of 2:

new DoubleHistogram(percentilePrecision(distributionStatisticConfig));
// This constructor internally calls: this(2, numberOfSignificantValueDigits, ...)

This means the histogram can only handle values that differ by a factor of 2 (e.g., 1ms to 2ms). When recording typical operation times that span microseconds to seconds, HdrHistogram constantly attempts to resize, frequently throwing ArrayIndexOutOfBoundsException during the resize operations.

In production environments, this can result in hundreds of thousands of exceptions being thrown, causing significant performance overhead.

Solution

This PR modifies TimeWindowPercentileHistogram to:

  1. Compute the ratio from configuration: When minimumExpectedValue and maximumExpectedValue are provided, calculate an appropriate ratio
  2. Use the computed ratio: Pass this ratio to HdrHistogram constructors instead of relying on the default
  3. Maintain backward compatibility: When min/max values are not configured, fall back to the original behavior (ratio=2)

Performance Impact

Before

  • Frequent ArrayIndexOutOfBoundsException as values exceed the narrow range
  • Initial HdrHistogram#counts has length of 127

After

  • No exceptions when values are within the configured min/max range
  • Initial HdrHistogram#counts has length of 272 when using defaults (1ms - 30s)

This last bullet could be a problem and require a different solution.

Issues

@shakuzen
Copy link
Member

The changes sound reasonable, but I'd like to make sure I understand.

When recording typical operation times that span microseconds to seconds, HdrHistogram constantly attempts to resize, frequently throwing ArrayIndexOutOfBoundsException during the resize operations.

Do you mean ArrayIndexOutOfBoundsException is uncaught, or it is thrown and caught internally in HdrHistogram? I would not expect an uncaught exception under normal circumstances such as auto-resizing. If the values being recorded would fit in a DoubleHistogram, they should generally fit whether resizing happens or not.

@knutwannheden
Copy link
Author

Do you mean ArrayIndexOutOfBoundsException is uncaught, or it is thrown and caught internally in HdrHistogram? I would not expect an uncaught exception under normal circumstances such as auto-resizing. If the values being recorded would fit in a DoubleHistogram, they should generally fit whether resizing happens or not.

I am talking about the various places in HdrHistogram throwing this exception, like here:
https://github.com/HdrHistogram/HdrHistogram/blob/de84b0a7de2378abfc405da503bf4898e84ea98e/src/main/java/org/HdrHistogram/AbstractHistogram.java#L883-L884 and then end up getting handled: https://github.com/HdrHistogram/HdrHistogram/blob/de84b0a7de2378abfc405da503bf4898e84ea98e/src/main/java/org/HdrHistogram/DoubleHistogram.java#L490-L492

When my use case requires a large range of different values, this can happen a lot (in my profiler metrics I've seen 1K+ exceptions per second). While exceptions generally don't add that much runtime overhead, it would still be nice with some user control.

@shakuzen
Copy link
Member

Thanks for the explanation. I would suggest opening an issue in HdrHistogram about changing the flow control to not use Exceptions for the reason you stated. We can still consider the changes here, but if the use of exceptions for flow control is the concern, this is really a workaround.

@shakuzen
Copy link
Member

Could you also take a look at the DCO check that's failing? See this section of our contributing guide.

@knutwannheden
Copy link
Author

Thanks for the explanation. I would suggest opening an issue in HdrHistogram about changing the flow control to not use Exceptions for the reason you stated. We can still consider the changes here, but if the use of exceptions for flow control is the concern, this is really a workaround.

I could ask there, but I suspect the answer would be to not create the DoubleHistogram with highestToLowestValueRatio=2.

Could you also take a look at the DCO check that's failing? See this section of our contributing guide.

Yes, let me take a look.

…d values

The fix computes an appropriate `highestToLowestValueRatio` from the configured `minimumExpectedValue` and `maximumExpectedValue` in `DistributionStatisticConfig`, allowing HdrHistogram to cover the expected range without constant resizing.

## Problem

The current implementation hardcodes a ratio of 2:
```java
new DoubleHistogram(percentilePrecision(distributionStatisticConfig));
// This constructor internally calls: this(2, numberOfSignificantValueDigits, ...)
```

This means the histogram can only handle values that differ by a factor of 2 (e.g., 1ms to 2ms). When recording typical operation times that span microseconds to seconds, HdrHistogram constantly attempts to resize, frequently throwing `ArrayIndexOutOfBoundsException` during the resize operations.

In production environments, this can result in hundreds of thousands of exceptions being thrown, causing significant performance overhead.

## Solution

This PR modifies `TimeWindowPercentileHistogram` to:

1. **Compute the ratio from configuration**: When `minimumExpectedValue` and `maximumExpectedValue` are provided, calculate an appropriate ratio
2. **Use the computed ratio**: Pass this ratio to HdrHistogram constructors instead of relying on the default
3. **Maintain backward compatibility**: When min/max values are not configured, fall back to the original behavior (ratio=2)

## Performance Impact

### Before
- Frequent `ArrayIndexOutOfBoundsException` as values exceed the narrow range
- Initial `HdrHistogram#counts` has length of 127

### After
- No exceptions when values are within the configured min/max range
- Initial `HdrHistogram#counts` has length of 272 when using defaults (1ms - 30s)

This last bullet could be a problem and require a different solution.

## Issues

- Fixes micrometer-metrics#4327

Signed-off-by: Knut Wannheden <[email protected]>
@knutwannheden knutwannheden force-pushed the percentile-histogram-fix branch from 7b1b2fc to 41e8876 Compare September 24, 2025 09:13
@knutwannheden
Copy link
Author

It looks like the DoubleRecorder cannot be configured to auto-resize. I am assuming this is a deliberate design in HdrHistogram. I will do some more research.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose api in TimeWindowPercentileHistogram for custom upper and lower bounds of the bucket of internal HdrHistogram.

2 participants