Skip to content

Address a performance regression in recent upgrade #177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 28, 2025

Conversation

lcartey
Copy link
Contributor

@lcartey lcartey commented Feb 28, 2025

getCdsDeclaration() experienced a performance regression in larger codebases with the upgrade to 2.20.4. An example of a partially completed set of tuple counts:

Evaluated relational algebra for predicate CDS::UserDefinedApplicationService.getCdsDeclaration/0#dispred#80b6f393@15836fu7 with tuple counts:
                3237       ~0%    {3} r1 = SCAN containerparent OUTPUT In.1, _, In.0
                3237       ~0%    {3}    | REWRITE WITH Out.1 := 2
                3237       ~1%    {5}    | JOIN WITH `Files::Container.splitAbsolutePath/2#dispred#43b82b7b_021#join_rhs` ON FIRST 2 OUTPUT Lhs.2, Lhs.0, _, Rhs.2, _
                 572       ~0%    {3}    | REWRITE WITH Tmp.2 := "", Tmp.4 := ".cds", Out.2 := InverseAppend(Tmp.2,Tmp.4,In.3) KEEPING 3
                9529     ~232%    {4}    | JOIN WITH containerparent ON FIRST 1 OUTPUT Lhs.2, _, Lhs.1, Rhs.1
                9529     ~234%    {4}    | REWRITE WITH Out.1 := 2
               51748    ~1132%    {3}    | JOIN WITH `Files::Container.splitAbsolutePath/2#dispred#43b82b7b_120#join_rhs` ON FIRST 2 OUTPUT Lhs.2, Lhs.3, Rhs.2
          2969847789    ~1182%    {4}    | JOIN WITH `JSON::JsonValue.getJsonFile/0#dispred#2c4d2f29#bf_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Lhs.1, Lhs.2
            23973500  ~630445%    {3}    | JOIN WITH project#CDL::CdlElement#class#12896241 ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.3
        568488449775  ~197988%    {3}    | JOIN WITH `Locations::Locatable.getFile/0#dispred#58f05aaa_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2
            24415500  ~145193%    {3}    | JOIN WITH CDL::CdlService#ccbb5e6c ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.0
           891311496  ~306096%    {3}    | JOIN WITH `DataFlowPrivate::Node.getFile/0#dispred#4a057003_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2
               95500   ~29220%    {3}    | JOIN WITH CDS::UserDefinedApplicationService#class#e1298e76 ON FIRST 1 OUTPUT Lhs.0, Lhs.1, Lhs.2
                   0       ~0%    {2}    | JOIN WITH `DataFlowPrivate::Node.getFile/0#dispred#4a057003` ON FIRST 2 OUTPUT Lhs.0, Lhs.2
                                  return r1

The join order created a cartesian product on CdlElements and JSON files.

This has been addressed by:

  • Using CdlService.getImplementation() to implement UserDefinedService.getCdsDeclaration(). This avoids duplication of logic.
  • Update getImplementation() to use a simpler path matching definition, to avoid join order problems.

There is a performance issue with getCdsDeclaration, so ensure
we're using a single definition.
This now restricts the files to exist in the same directory, not
just the same name.
Copy link
Contributor

@jeongsoolee09 jeongsoolee09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the changes, thank you!

@lcartey lcartey merged commit 94ba8d2 into main Feb 28, 2025
5 checks passed
@lcartey lcartey deleted the lcartey/performance-improvements branch February 28, 2025 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants