fix: sort CVE records correctly #2020

lcarva · 2025-10-08T14:18:01Z

CVE records follow a specific format where the last segment represents a numerical sequence. To properly sort CVE records, we must treat this sequence segment differently than the rest of the record ID.

fixes #1811

Summary by Sourcery

Implement proper numeric sorting for CVE identifiers by introducing a normalized sort key and updating the sorting translator to use it, ensuring correct ascending and descending order across different ID prefixes.

Enhancements:

Introduce id_sort_key SQL expression to pad the numeric segment of CVE IDs for accurate numeric sorting.
Translate id sort operations to use the new id_sort_key when sorting vulnerabilities.

Tests:

Add vulnerability_numeric_sorting integration test to verify correct ascending and descending ordering for CVE, GHSA, and custom IDs.

sourcery-ai · 2025-10-08T14:18:09Z

Reviewer's Guide

Implement numeric-aware sorting for CVE identifiers by introducing a normalized SQL sort key (id_sort_key) and updating the sort translator to route 'id' sorts through it, and add tests to verify correct ascending and descending ordering.

Entity relationship diagram for CVE ID sorting key

erDiagram
    VULNERABILITY {
        id TEXT
        id_sort_key TEXT
    }
    VULNERABILITY ||--o{ PAGINATED_RESULTS : contains
    VULNERABILITY ||--o{ COLUMNS : uses
    COLUMNS {
        id_sort_key TEXT
    }

File-Level Changes

Change	Details	Files
Introduce id_sort_key expression and translate 'id' sorts to numeric-aware key	Add a CASE expression (id_sort_key) that pads the trailing CVE sequence to 19 digits Extend filtering_with translator to map sort('id') to id_sort_key:asc/desc Retain alphabetical sorting for non-CVE prefixes by falling back to raw id	`modules/fundamental/src/vulnerability/service/mod.rs`
Add vulnerability_numeric_sorting test for mixed identifier ordering	Ingest a set of CVE, GHSA and ABC identifiers with varying numeric lengths Verify ascending id sort returns ABC < CVE-2023-1234 < ... < GHSA Verify descending id sort correctly reverses that order	`modules/fundamental/src/vulnerability/service/test.rs`

Assessment against linked issues

Issue	Objective	Addressed	Explanation
#1811	Ensure that GET /api/v2/vulnerability?sort=id:desc returns vulnerabilities sorted by identifier such that CVE records are ordered numerically by their trailing sequence number, not lexicographically.	✅
#1811	Add or update tests to verify correct numeric sorting of vulnerability identifiers, including CVE records and other formats.	✅

Possibly linked issues

GET /api/v2/vulnerability sorted by identifier returns unexpected results #1811: The PR introduces a numeric-aware sorting key for CVE identifiers to fix the incorrect ordering reported in the issue.
GET /api/v2/vulnerability sorted by identifier returns unexpected results #1811: PR adds a numeric-aware sort key for CVE identifiers in the database query to correctly sort them by number, addressing the issue.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

Extract the CASE WHEN regex padding logic into a named constant or helper to improve readability and avoid inline SQL clutter.
Consider persisting the computed id_sort_key as a computed (or materialized) column and indexing it to avoid expensive regex/substring processing on each query.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Extract the CASE WHEN regex padding logic into a named constant or helper to improve readability and avoid inline SQL clutter.
- Consider persisting the computed id_sort_key as a computed (or materialized) column and indexing it to avoid expensive regex/substring processing on each query.

## Individual Comments

### Comment 1
<location> `modules/fundamental/src/vulnerability/service/test.rs:545-540` </location>
<code_context>
+async fn vulnerability_numeric_sorting(ctx: &TrustifyContext) -> Result<(), anyhow::Error> {
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding tests for edge cases such as malformed or non-standard CVE IDs.

Including malformed CVE IDs in tests will help verify that the sort key logic handles unexpected formats correctly.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-10-08T14:20:03Z

modules/fundamental/src/vulnerability/service/test.rs

    assert_eq!(vulns.items[0].advisories[1].score, None);
    assert_eq!(vulns.items[0].advisories[1].severity, None);

    Ok(())


suggestion (testing): Consider adding tests for edge cases such as malformed or non-standard CVE IDs.

Including malformed CVE IDs in tests will help verify that the sort key logic handles unexpected formats correctly.

ctron · 2025-10-09T08:32:18Z

The change looks good. I'm just not sure it is the right approach.

Yes, it makes it more convenient for CVE IDs. However, there are a lot of OSV sources which use a similar format:

Now the user would see CVE IDs sorted differently than those. And that would be hard to explain and understand.

If we can change this to a way that we split this into components and then sort each part as ASCII or numeric (if it's numeric only), I think this could work.

lcarva · 2025-10-09T17:45:19Z

The change looks good. I'm just not sure it is the right approach.

Yes, it makes it more convenient for CVE IDs. However, there are a lot of OSV sources which use a similar format:

https://osv.dev/vulnerability/RUSTSEC-2025-0072

https://osv.dev/vulnerability/MAL-2025-47815

https://osv.dev/vulnerability/PSF-2025-12

Thank you. I wasn't aware of those. I think we could certainly generalize those patterns.

Out of those three examples, MAL and PSF do seem to follow the same pattern as CVE. RUSTSEC, if I'm reading the spec correctly, always requires 4 digits in the sequence sections, thus 0072 in the example above. Not sure what happens when there are more than 9,999 RUSTSEC records in a single year.

For my own notes, the different sources are listed here. Interestingly, some sources follow a slightly different pattern: https://github.com/AlmaLinux/osv-database/tree/master/advisories/almalinux10

Let me explore a way to generalize this.

Do you have any performance concerns with this approach? We could introduce a new column that stores the computed sort ID but maybe that's a premature performance improvement right now.

ctron · 2025-10-10T05:49:58Z

Do you have any performance concerns with this approach? We could introduce a new column that stores the computed sort ID but maybe that's a premature performance improvement right now.

I always have concerns. 😬 And especially for performance. But we do have scale tests, which can be triggered using /scale-test on a PR. Assuming we capture this use case with them (maybe we need to extend) we should be sure enough that we don't impact performance. Or we understand what the impact is and can make a decision.

Strum355 · 2025-10-13T10:29:30Z

modules/fundamental/src/vulnerability/service/mod.rs

+                })
+                .add_expr(
+                    "id_sort_key",
+                    // Create a normalized sort key that preserves prefixes but sorts numbers numerically


Excuse the drive by comment (coming here from #2024 (comment)) 😄 I have a feeling it might be better to create an expression index using this normalized sort key rather than creating this key for each row at query time, that way the index can potentially be used at query time to return a sorted result set (this potentially avoids having to do sorts at query time because the index would already be sorted).

lcarva · 2025-10-17T21:05:49Z

Pushed a change to make the sorting handle different vulnerability IDs. It looks like I broke some tests, I'll have a look at those next, but wanted to share the hmm... creative solution.

CVE records follow a specific format where the last segment represents a numerical sequence. To properly sort CVE records, we must treat this sequence segment differently than the rest of the record ID. fixes guacsec#1811 Co-Authored-By: Claude <[email protected]> Signed-off-by: Luiz Carvalho <[email protected]>

Signed-off-by: Luiz Carvalho <[email protected]>

lcarva · 2025-10-31T18:34:11Z

Pushed a change to make the sorting handle different vulnerability IDs. It looks like I broke some tests, I'll have a look at those next, but wanted to share the hmm... creative solution.

The tests were broken. I just didn't have the expected locale set on my local system. LC_ALL=C cargo test does the job. I expect it will pass here as well.

If we want the approach of using expressions at query time, I believe the changes here achieve that. It would be great for someone with access to approve running the workflows and maybe run /scale-test.

lcarva force-pushed the fix-cve-ordering branch from a5150d6 to 07293c7 Compare October 8, 2025 14:18

sourcery-ai bot reviewed Oct 8, 2025

View reviewed changes

ctron mentioned this pull request Oct 13, 2025

Implement more generic vendor packages recommendations #2024

Open

Strum355 reviewed Oct 13, 2025

View reviewed changes

lcarva and others added 2 commits October 31, 2025 14:33

Generalize vulnerability sorting

5ae3648

Signed-off-by: Luiz Carvalho <[email protected]>

lcarva force-pushed the fix-cve-ordering branch from 1328ffb to 5ae3648 Compare October 31, 2025 18:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: sort CVE records correctly #2020

fix: sort CVE records correctly #2020

lcarva commented Oct 8, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Oct 8, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Oct 8, 2025

Uh oh!

ctron commented Oct 9, 2025

Uh oh!

lcarva commented Oct 9, 2025

Uh oh!

ctron commented Oct 10, 2025

Uh oh!

Strum355 Oct 13, 2025

Uh oh!

lcarva commented Oct 17, 2025

Uh oh!

lcarva commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: sort CVE records correctly #2020

Are you sure you want to change the base?

fix: sort CVE records correctly #2020

Conversation

lcarva commented Oct 8, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Entity relationship diagram for CVE ID sorting key

File-Level Changes

Assessment against linked issues

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

ctron commented Oct 9, 2025

Uh oh!

lcarva commented Oct 9, 2025

Uh oh!

ctron commented Oct 10, 2025

Uh oh!

Strum355 Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

lcarva commented Oct 17, 2025

Uh oh!

lcarva commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lcarva commented Oct 8, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Oct 8, 2025 •

edited

Loading