-
Notifications
You must be signed in to change notification settings - Fork 32
perf: fetch_sbom_details optimize CPE-based query (TC-3139) #2076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: mrizzi <[email protected]> Assisted-by: Claude Code
Reviewer's GuideThe PR refactors the product_advisory_info_sql function to leverage CTEs for pre-filtering and replaces the costly OR-based JOIN with two UNIONed match branches, and introduces a new database migration to add an index on product_status.package to accelerate those joins. Entity relationship diagram for product_status and related tables after index additionerDiagram
PRODUCT_STATUS {
string id
string package
string advisory_id
string vulnerability_id
string status_id
string context_cpe_id
}
SBOM_PACKAGE_PURL_REF {
string qualified_purl_id
string sbom_id
string node_id
}
QUALIFIED_PURL {
string id
string versioned_purl_id
}
VERSIONED_PURL {
string id
string base_purl_id
}
BASE_PURL {
string id
string name
string namespace
}
PRODUCT_STATUS ||--o| ADVISORY : "advisory_id"
PRODUCT_STATUS ||--o| VULNERABILITY : "vulnerability_id"
PRODUCT_STATUS ||--o| STATUS : "status_id"
PRODUCT_STATUS ||--o| CPE : "context_cpe_id"
PRODUCT_STATUS ||--o| PRODUCT_STATUS_PACKAGE_IDX : "package (indexed)"
SBOM_PACKAGE_PURL_REF ||--o| QUALIFIED_PURL : "qualified_purl_id"
QUALIFIED_PURL ||--o| VERSIONED_PURL : "versioned_purl_id"
VERSIONED_PURL ||--o| BASE_PURL : "base_purl_id"
BASE_PURL {
string name
string namespace
}
Class diagram for migration adding product_status.package indexclassDiagram
class Migration {
+up(manager: SchemaManager) Result<(), DbErr>
+down(manager: SchemaManager) Result<(), DbErr>
}
class Indexes {
+ProductStatusPackageIdx
}
class ProductStatus {
+Table
+Package
}
Migration --> Indexes
Migration --> ProductStatus
Flow diagram for optimized product_advisory_info_sql query logicflowchart TD
A["Start: Input sbom_id"] --> B["Compute related_nodes (CTE)"]
B --> C["Compute sbom_cpes (CTE)"]
C --> D["Compute filtered_cpes (CTE)"]
D --> E["Compute generalized_cpes (CTE)"]
E --> F["Compute allowed_cpe_ids (CTE)"]
F --> G["Pre-filter sbom_purls (CTE)"]
G --> H["product_status_matches_name (CTE): JOIN on package = name"]
G --> I["product_status_matches_namespace (CTE): JOIN on package = namespace/name"]
H --> J["all_matches (UNION)"]
I --> J
J --> K["Final SELECT with joins to advisory, vulnerability, status, etc."]
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2076 +/- ##
==========================================
+ Coverage 68.02% 68.17% +0.14%
==========================================
Files 367 368 +1
Lines 20590 20680 +90
Branches 20590 20680 +90
==========================================
+ Hits 14006 14098 +92
+ Misses 5747 5741 -6
- Partials 837 841 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks for looking into this.
I hope we can tackle purl_statuses table next and somehow again extract this CPE filtering logic into a single "function".
Should we run scale tests to confirm the improvements?
I had an initial look but there's more work involved to improve that one.
Well, the performance was that bad that the scale test for the |
|
Yeah, are we able to uncomment them now? |
|
/scale-test |
|
🛠️ Scale test has started! Follow the progress here: Workflow Run |
Goose ReportGoose Attack ReportPlan Overview
Request Metrics
Response Time Metrics
Status Code Metrics
Transaction Metrics
Scenario Metrics
📄 Full Report (Go to "Artifacts" and download report) |
|
|
/scale-test |
|
🛠️ Scale test has started! Follow the progress here: Workflow Run |
Goose ReportGoose Attack ReportPlan Overview
Request Metrics
Response Time Metrics
Status Code Metrics
Transaction Metrics
Scenario Metrics
📄 Full Report (Go to "Artifacts" and download report) |
Optimize CPE-based vulnerability query with CTEs and index
Refactor the
product_advisory_info_sql()function to use Common Table Expressions (CTEs) and split OR conditions into UNION queries for performance improvement.The original query had a performance issue caused by an OR condition in the JOIN clause:
The OR condition in a JOIN generates a big Cartesian product intermediate result set whose cardinality slows down the performance.
The change here is splitting the OR into UNION, i.e. creating two separate queries and combines them with UNION.
Using CTEs also allowed for some pre-filtering, e.g. in
sbom_purlsthe packages are pre-filtered to be only the ones with the SBOM.Also added the index
product_status_package_idx(on product_status.package) to support some join conditions, e.g.ps.package = sp.name.The final result is the query's execution time move from ~3 minutes to ~0.5 sec.
Relates to https://issues.redhat.com/browse/TC-3139
Summary by Sourcery
Optimize the CPE-based vulnerability query by refactoring it with Common Table Expressions and splitting OR conditions into UNIONs, and support the changes by adding a new index to speed up package lookups
Enhancements:
Build: