-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize TimeSeriesBlockHash #125461
Optimize TimeSeriesBlockHash #125461
Conversation
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Left one comment, LGTM otherwise.
final LongVector timestampVector = Objects.requireNonNull(timestampBlock.asVector(), "timestamp input must be a vector"); | ||
try (var ordsBuilder = blockFactory.newIntVectorBuilder(tsidVector.getPositionCount())) { | ||
final BytesRef spare = new BytesRef(); | ||
// TODO: optimize incoming ordinal block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean here that in a future change the time series source operator should generate the time series ordinal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind, I see this is tested indirectly via TimeSeriesAggregationOperatorTests
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I think we should have a separate unit test for this. I'll add it in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean here that in a future change the time series source operator should generate the time series ordinal?
Yes, I will try to improve TimeSeriesSortedSourceOperatorFactory.
/** | ||
* An optimized block hash that receives two blocks: tsid and timestamp, which are sorted. | ||
* Since the incoming data is sorted, this block hash appends the incoming data to the internal arrays without lookup. | ||
*/ | ||
public final class TimeSeriesBlockHash extends BlockHash { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find a unit test for this class. In case if it is missing, should we add a unit test for this block hash implementation?
Thanks @martijnvg. |
Since the data returned from time-series indices (shard-level) is sorted by tsid and timestamp, we can optimize the timeseries blockhash by comparing the incoming tsid and timestamp only with the last values, thus avoiding the need for lookups or internal hashes. Additionally, we can reuse the internal large arrays when rebuilding key blocks.
Since the data returned from time-series indices (shard-level) is sorted by tsid and timestamp, we can optimize the timeseries blockhash by comparing the incoming tsid and timestamp only with the last values, thus avoiding the need for lookups or internal hashes. Additionally, we can reuse the internal large arrays when rebuilding key blocks.