Add the capability to turboload segments onto historicals #17775

adarshsanjeev · 2025-03-04T05:24:06Z

Add the capability to historicals to focus on loading segments at the cost of query performance

Context
One of the things that take the longest time when the historical starts up is the time required to segment loading. This includes the time for fetching the segments from deep storage, unzipping and loading it into memory. This can be improved by having a larger number of threads to speed this up.

Release Notes

Added a new dynamic coordinator configuration, turboLoadHistoricals. Historicals in this list will load using their bootstrap threadpool. This will load the segments faster at the cost of query performance.

This PR has:

This reverts commit c9a9fd5.

maytasm · 2025-03-04T05:30:58Z

One thing I noticed is that when you have the cluster starting up, the historical will not / can not serve any query until all the segments are loaded (I think there is a config for this which is enabled by default). In this case, we should always use turboLoadHistoricals (by default), since the historical cannot serve query anyway.

server/src/main/java/org/apache/druid/server/http/SegmentListerResource.java

kfaraz · 2025-03-04T05:54:11Z

One thing I noticed is that when you have the cluster starting up, the historical will not / can not serve any query until all the segments are loaded (I think there is a config for this which is enabled by default). In this case, we should always use turboLoadHistoricals (by default), since the historical cannot serve query anyway.

@maytasm , IIUC, you are referring to the bootstrap segments, which includes broadcast segments and segments already present on the historical disk.
For these bootstrap segments, we already use the turbo loading thread pool.

The change here allows to optionally put a historical in turbo mode to load non-bootstrap segments, i.e. segments assigned later by the coordinator.

kfaraz

Thanks a lot for the changes, @adarshsanjeev ! This will be very helpful for cluster upgrades.

I have done a surface review and left some feedback.
Will take another deeper look today.

server/src/main/java/org/apache/druid/server/http/SegmentLoadingMode.java