Non-exclusive experiment assignments on layer due to bucketization rounding error

**Is this a support request?**
This issue tracker is maintained by LaunchDarkly SDK developers and is intended for feedback on the code in this library. If you're not sure whether the problem you are having is specifically related to this library, or to the LaunchDarkly service overall, it may be more appropriate to contact the LaunchDarkly support team; they can help to investigate the problem and will consult the SDK team if necessary. You can submit a support request by going [here](https://support.launchdarkly.com/) and clicking "submit a request", or by emailing support@launchdarkly.com.

Note that issues filed on this issue tracker are publicly accessible. Do not provide any private account information on your issues. If your problem is specific to your account, you should submit a support request as described above.

**Describe the bug**
We have multiple feature flags and run experiments for each of these flags. All experiments are on the same layer and should thus be mutually exclusive. However, we observe that sometimes, more than one feature flag evaluate to be "in the experiment" for a single context. This happens consistently on a small fraction of contexts.

By logging the conflict cases, I could create an example to reproduce the issue (see below). It turns out that the bucketization in [Evaluator.java](https://github.com/launchdarkly/java-core/blob/main/lib/sdk/server/src/main/java/com/launchdarkly/sdk/server/Evaluator.java) suffers from floating point precision issues. Concretely, when [iterating through the weighted variations](https://github.com/launchdarkly/java-core/blob/8cac36230cba0c50608dc1196f0ad77848ed1e1d/lib/sdk/server/src/main/java/com/launchdarkly/sdk/server/Evaluator.java#L351), the cumulative sum is computed by converting the weights to float and dividing by 100,000. With a large number of weights, floating point imprecisions accumulate. Since different flags have different weight lists, rounding errors are different across flags and experiment buckets can start to overlap. It would be more accurate and especially consistent between flags to sum up the integer weights and divide and convert to float after.

The weights in the example are taken from a productive flag, so the case is realistic.

**To reproduce**
Run the "test" (it's more of a demonstration) implemented in https://github.com/atschofen/java-core/tree/atschofen-bucketing-accuracy-test and look at the output.

**Expected behavior**
Experiments on the same layer should never trigger on the same context.

**Logs**
none

**SDK version**
launchdarkly-java-server-sdk: v7.10.2

**Language version, developer tools**
Java 19

**OS/platform**
MacOS 15

**Additional context**
none


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Non-exclusive experiment assignments on layer due to bucketization rounding error #94

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Non-exclusive experiment assignments on layer due to bucketization rounding error #94

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions