Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel][Metrics][PR#7] Support ScanReport to log metrics for a Scan operation #4068

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

allisonport-db
Copy link
Collaborator

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Adds ScanReport for reporting a Scan.

We record ScanReport either after all the scan files have successfully been consumed (and the iterator closed), or if an exception is thrown while reading/filtering/preparing the scan files to be returned to the connector. This is done within the hasNext and next methods on the returned iterator since that is when we do all of the kernel work/eval (since the iterator is lazily loaded). We only record a report for failures that happen within Kernel, if there are failures from within the connector code, no report will be emitted.

We also add support for serializing ScanReport in this PR.

How was this patch tested?

Adds unit tests.

Does this PR introduce any user-facing changes?

No.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are all moved to ScanReportSuite

Comment on lines +324 to +326
///////////////////////////////
// Log replay metrics tests ///
///////////////////////////////
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The below tests were all copied from ActiveAddFilesLogReplayMetricsSuite

Copy link
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Tests are super solid. Left some comments.

Copy link
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! 2 minor comments, 1 question, and then LGTM!

expectedNumActiveAddFiles = 2
)

// With partition filter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we expecting any of these filters to impact any of the scan metrics? e.g. expectedNumAddFiles expectedNumAddFilesFromDeltaFiles expectedNumActiveAddFiles ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no we only increment the file counts before filtering; we can add additional metrics for filtering in the future (but this is where it would be really helpful for something like a SV to have an implicit count field or something)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay! It would be nice if you made it easier for the reader to know that we do not expect scans with partition/data filters to have different scan metrics than scans without the filters.

You could do this via a comment or even save noFilterScanReport = ..... (idk if that method returns a scan report, but tit could) and then do checkScanReport(..., expectedNumAddFiles = noFilterScanReport.numAddFiles) etc.

You choose which you think is best

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will LGTM after this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants