-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel][Metrics][PR#7] Support ScanReport to log metrics for a Scan operation #4068
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are all moved to ScanReportSuite
/////////////////////////////// | ||
// Log replay metrics tests /// | ||
/////////////////////////////// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The below tests were all copied from ActiveAddFilesLogReplayMetricsSuite
kernel/kernel-api/src/main/java/io/delta/kernel/internal/ScanImpl.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Tests are super solid. Left some comments.
kernel/kernel-api/src/main/java/io/delta/kernel/internal/ScanImpl.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/ScanImpl.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/metrics/MetricsReportSerializers.java
Outdated
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/metrics/ScanReportSuite.scala
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/ScanImpl.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! 2 minor comments, 1 question, and then LGTM!
expectedNumActiveAddFiles = 2 | ||
) | ||
|
||
// With partition filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we expecting any of these filters to impact any of the scan metrics? e.g. expectedNumAddFiles expectedNumAddFilesFromDeltaFiles expectedNumActiveAddFiles ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no we only increment the file counts before filtering; we can add additional metrics for filtering in the future (but this is where it would be really helpful for something like a SV to have an implicit count field or something)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay! It would be nice if you made it easier for the reader to know that we do not expect scans with partition/data filters to have different scan metrics than scans without the filters.
You could do this via a comment or even save noFilterScanReport = .....
(idk if that method returns a scan report, but tit could) and then do checkScanReport(..., expectedNumAddFiles = noFilterScanReport.numAddFiles)
etc.
You choose which you think is best
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will LGTM after this!
Which Delta project/connector is this regarding?
Description
Adds
ScanReport
for reporting a Scan.We record
ScanReport
either after all the scan files have successfully been consumed (and the iterator closed), or if an exception is thrown while reading/filtering/preparing the scan files to be returned to the connector. This is done within thehasNext
andnext
methods on the returned iterator since that is when we do all of the kernel work/eval (since the iterator is lazily loaded). We only record a report for failures that happen within Kernel, if there are failures from within the connector code, no report will be emitted.We also add support for serializing
ScanReport
in this PR.How was this patch tested?
Adds unit tests.
Does this PR introduce any user-facing changes?
No.