Skip to content

Unified API for retrieving global min/max values of a numeric field #15740

@salvatore-campagna

Description

@salvatore-campagna

Description

Lucene currently has two ways to retrieve the global min/max value of a numeric field across segments:

  • PointValues.getMinPackedValue() / PointValues.getMaxPackedValue(): returns null when no points exist for the field.
  • DocValuesSkipper.globalMinValue() / DocValuesSkipper.globalMaxValue(): returns sentinel values (Long.MIN_VALUE or Long.MAX_VALUE) when no data exists or when the skipper is not available for a leaf reader.

These two APIs have different "no data" semantics. PointValues returns null, which callers can check for and handle cleanly. DocValuesSkipper returns sentinel values that callers must know about and filter out. Specifically:

  • globalMinValue() returns Long.MAX_VALUE when no segments have the field, and Long.MIN_VALUE when a leaf reader has the field info but no skipper.
  • globalMaxValue() returns Long.MIN_VALUE when no segments have the field, and Long.MAX_VALUE when a leaf reader has the field info but no skipper.

This makes it error-prone for callers that need to retrieve min/max values from a field: they must first determine which data structure is available, then call the right API, and then handle the different "no data" conventions. If a caller picks the wrong API or forgets to filter sentinels, invalid values propagate silently.

Proposal

Introduce a unified API for retrieving the global min/max value of a numeric field, abstracting over the underlying data structure. The API should:

  1. Return null when no data exists, regardless of whether the field uses BKD trees or doc values skippers.
  2. Automatically delegate to whichever data structure is available for the field.
  3. Define clear behavior when both structures are available or when neither is available (return null).

A possible solution:

public record MinMax(long min, long max) {}

// Returns null if values cannot be loaded
public static MinMax getGlobalMinMax(IndexReader reader, String field) throws IOException { ... }

This would eliminate the need for callers to know which underlying data structure a field uses and would prevent sentinel values from leaking into application logic.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions