Skip to content

Switch to a fixed CFS threshold #14959

@jpountz

Description

@jpountz

By default, Lucene currently uses compound files for flushed segments, and merged segments that use less than 10% of the total index size (computed either as a number of docs, or as a byte size depending on the merge policy).

I am considering switching to a fixed threshold, e.g. using compound files for all segments below 64MB for byte-size-based merge policies (TieredMergePolicy, LogByteSizeMergePolicy) or 65,536 docs for doc-based merge policies (LogDocMergePolicy).

I like it better for a few reasons:

  • Whether a segment is compound or not is more deterministic (and thus easier to reason about) as it doesn't depend on the total size of the index at the time of merging.
  • The current ratio doesn't work well in multi-tenant scenarios where you could still have plenty of small files overall due to many small indexes.

Ideally we would have a single switch on IndexWriterConfig instead of having flushes and merges independently make decisions about whether a segments qualifies for being compound.

I'm also wondering if we need to keep the current approach that is based on a ratio, or if only supporting a fixed threshold would be good enough.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions