-
Notifications
You must be signed in to change notification settings - Fork 32
fix: sort CVE records correctly #2020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's GuideImplement numeric-aware sorting for CVE identifiers by introducing a normalized SQL sort key (id_sort_key) and updating the sort translator to route 'id' sorts through it, and add tests to verify correct ascending and descending ordering. Entity relationship diagram for CVE ID sorting keyerDiagram
VULNERABILITY {
id TEXT
id_sort_key TEXT
}
VULNERABILITY ||--o{ PAGINATED_RESULTS : contains
VULNERABILITY ||--o{ COLUMNS : uses
COLUMNS {
id_sort_key TEXT
}
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
a5150d6 to
07293c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Extract the CASE WHEN regex padding logic into a named constant or helper to improve readability and avoid inline SQL clutter.
- Consider persisting the computed id_sort_key as a computed (or materialized) column and indexing it to avoid expensive regex/substring processing on each query.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Extract the CASE WHEN regex padding logic into a named constant or helper to improve readability and avoid inline SQL clutter.
- Consider persisting the computed id_sort_key as a computed (or materialized) column and indexing it to avoid expensive regex/substring processing on each query.
## Individual Comments
### Comment 1
<location> `modules/fundamental/src/vulnerability/service/test.rs:545-540` </location>
<code_context>
+async fn vulnerability_numeric_sorting(ctx: &TrustifyContext) -> Result<(), anyhow::Error> {
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding tests for edge cases such as malformed or non-standard CVE IDs.
Including malformed CVE IDs in tests will help verify that the sort key logic handles unexpected formats correctly.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| assert_eq!(vulns.items[0].advisories[1].score, None); | ||
| assert_eq!(vulns.items[0].advisories[1].severity, None); | ||
|
|
||
| Ok(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Consider adding tests for edge cases such as malformed or non-standard CVE IDs.
Including malformed CVE IDs in tests will help verify that the sort key logic handles unexpected formats correctly.
|
The change looks good. I'm just not sure it is the right approach. Yes, it makes it more convenient for CVE IDs. However, there are a lot of OSV sources which use a similar format:
Now the user would see CVE IDs sorted differently than those. And that would be hard to explain and understand. If we can change this to a way that we split this into components and then sort each part as ASCII or numeric (if it's numeric only), I think this could work. |
Thank you. I wasn't aware of those. I think we could certainly generalize those patterns. Out of those three examples, MAL and PSF do seem to follow the same pattern as CVE. RUSTSEC, if I'm reading the spec correctly, always requires 4 digits in the sequence sections, thus For my own notes, the different sources are listed here. Interestingly, some sources follow a slightly different pattern: https://github.com/AlmaLinux/osv-database/tree/master/advisories/almalinux10 Let me explore a way to generalize this. Do you have any performance concerns with this approach? We could introduce a new column that stores the computed sort ID but maybe that's a premature performance improvement right now. |
I always have concerns. 😬 And especially for performance. But we do have scale tests, which can be triggered using |
| }) | ||
| .add_expr( | ||
| "id_sort_key", | ||
| // Create a normalized sort key that preserves prefixes but sorts numbers numerically |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excuse the drive by comment (coming here from #2024 (comment)) 😄 I have a feeling it might be better to create an expression index using this normalized sort key rather than creating this key for each row at query time, that way the index can potentially be used at query time to return a sorted result set (this potentially avoids having to do sorts at query time because the index would already be sorted).
|
Pushed a change to make the sorting handle different vulnerability IDs. It looks like I broke some tests, I'll have a look at those next, but wanted to share the hmm... creative solution. |
CVE records follow a specific format where the last segment represents a numerical sequence. To properly sort CVE records, we must treat this sequence segment differently than the rest of the record ID. fixes guacsec#1811 Co-Authored-By: Claude <[email protected]> Signed-off-by: Luiz Carvalho <[email protected]>
Signed-off-by: Luiz Carvalho <[email protected]>
1328ffb to
5ae3648
Compare
The tests were broken. I just didn't have the expected locale set on my local system. If we want the approach of using expressions at query time, I believe the changes here achieve that. It would be great for someone with access to approve running the workflows and maybe run |
CVE records follow a specific format where the last segment represents a numerical sequence. To properly sort CVE records, we must treat this sequence segment differently than the rest of the record ID.
fixes #1811
Summary by Sourcery
Implement proper numeric sorting for CVE identifiers by introducing a normalized sort key and updating the sorting translator to use it, ensuring correct ascending and descending order across different ID prefixes.
Enhancements:
id_sort_keySQL expression to pad the numeric segment of CVE IDs for accurate numeric sorting.idsort operations to use the newid_sort_keywhen sorting vulnerabilities.Tests:
vulnerability_numeric_sortingintegration test to verify correct ascending and descending ordering for CVE, GHSA, and custom IDs.