Skip to content

Conversation

@lcarva
Copy link

@lcarva lcarva commented Oct 8, 2025

CVE records follow a specific format where the last segment represents a numerical sequence. To properly sort CVE records, we must treat this sequence segment differently than the rest of the record ID.

fixes #1811

Summary by Sourcery

Implement proper numeric sorting for CVE identifiers by introducing a normalized sort key and updating the sorting translator to use it, ensuring correct ascending and descending order across different ID prefixes.

Enhancements:

  • Introduce id_sort_key SQL expression to pad the numeric segment of CVE IDs for accurate numeric sorting.
  • Translate id sort operations to use the new id_sort_key when sorting vulnerabilities.

Tests:

  • Add vulnerability_numeric_sorting integration test to verify correct ascending and descending ordering for CVE, GHSA, and custom IDs.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Oct 8, 2025

Reviewer's Guide

Implement numeric-aware sorting for CVE identifiers by introducing a normalized SQL sort key (id_sort_key) and updating the sort translator to route 'id' sorts through it, and add tests to verify correct ascending and descending ordering.

Entity relationship diagram for CVE ID sorting key

erDiagram
    VULNERABILITY {
        id TEXT
        id_sort_key TEXT
    }
    VULNERABILITY ||--o{ PAGINATED_RESULTS : contains
    VULNERABILITY ||--o{ COLUMNS : uses
    COLUMNS {
        id_sort_key TEXT
    }
Loading

File-Level Changes

Change Details Files
Introduce id_sort_key expression and translate 'id' sorts to numeric-aware key
  • Add a CASE expression (id_sort_key) that pads the trailing CVE sequence to 19 digits
  • Extend filtering_with translator to map sort('id') to id_sort_key:asc/desc
  • Retain alphabetical sorting for non-CVE prefixes by falling back to raw id
modules/fundamental/src/vulnerability/service/mod.rs
Add vulnerability_numeric_sorting test for mixed identifier ordering
  • Ingest a set of CVE, GHSA and ABC identifiers with varying numeric lengths
  • Verify ascending id sort returns ABC < CVE-2023-1234 < ... < GHSA
  • Verify descending id sort correctly reverses that order
modules/fundamental/src/vulnerability/service/test.rs

Assessment against linked issues

Issue Objective Addressed Explanation
#1811 Ensure that GET /api/v2/vulnerability?sort=id:desc returns vulnerabilities sorted by identifier such that CVE records are ordered numerically by their trailing sequence number, not lexicographically.
#1811 Add or update tests to verify correct numeric sorting of vulnerability identifiers, including CVE records and other formats.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Extract the CASE WHEN regex padding logic into a named constant or helper to improve readability and avoid inline SQL clutter.
  • Consider persisting the computed id_sort_key as a computed (or materialized) column and indexing it to avoid expensive regex/substring processing on each query.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Extract the CASE WHEN regex padding logic into a named constant or helper to improve readability and avoid inline SQL clutter.
- Consider persisting the computed id_sort_key as a computed (or materialized) column and indexing it to avoid expensive regex/substring processing on each query.

## Individual Comments

### Comment 1
<location> `modules/fundamental/src/vulnerability/service/test.rs:545-540` </location>
<code_context>
+async fn vulnerability_numeric_sorting(ctx: &TrustifyContext) -> Result<(), anyhow::Error> {
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding tests for edge cases such as malformed or non-standard CVE IDs.

Including malformed CVE IDs in tests will help verify that the sort key logic handles unexpected formats correctly.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

assert_eq!(vulns.items[0].advisories[1].score, None);
assert_eq!(vulns.items[0].advisories[1].severity, None);

Ok(())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider adding tests for edge cases such as malformed or non-standard CVE IDs.

Including malformed CVE IDs in tests will help verify that the sort key logic handles unexpected formats correctly.

@ctron
Copy link
Contributor

ctron commented Oct 9, 2025

The change looks good. I'm just not sure it is the right approach.

Yes, it makes it more convenient for CVE IDs. However, there are a lot of OSV sources which use a similar format:

Now the user would see CVE IDs sorted differently than those. And that would be hard to explain and understand.

If we can change this to a way that we split this into components and then sort each part as ASCII or numeric (if it's numeric only), I think this could work.

@lcarva
Copy link
Author

lcarva commented Oct 9, 2025

The change looks good. I'm just not sure it is the right approach.

Yes, it makes it more convenient for CVE IDs. However, there are a lot of OSV sources which use a similar format:

Thank you. I wasn't aware of those. I think we could certainly generalize those patterns.

Out of those three examples, MAL and PSF do seem to follow the same pattern as CVE. RUSTSEC, if I'm reading the spec correctly, always requires 4 digits in the sequence sections, thus 0072 in the example above. Not sure what happens when there are more than 9,999 RUSTSEC records in a single year.

For my own notes, the different sources are listed here. Interestingly, some sources follow a slightly different pattern: https://github.com/AlmaLinux/osv-database/tree/master/advisories/almalinux10

Let me explore a way to generalize this.

Do you have any performance concerns with this approach? We could introduce a new column that stores the computed sort ID but maybe that's a premature performance improvement right now.

@ctron
Copy link
Contributor

ctron commented Oct 10, 2025

Do you have any performance concerns with this approach? We could introduce a new column that stores the computed sort ID but maybe that's a premature performance improvement right now.

I always have concerns. 😬 And especially for performance. But we do have scale tests, which can be triggered using /scale-test on a PR. Assuming we capture this use case with them (maybe we need to extend) we should be sure enough that we don't impact performance. Or we understand what the impact is and can make a decision.

})
.add_expr(
"id_sort_key",
// Create a normalized sort key that preserves prefixes but sorts numbers numerically
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excuse the drive by comment (coming here from #2024 (comment)) 😄 I have a feeling it might be better to create an expression index using this normalized sort key rather than creating this key for each row at query time, that way the index can potentially be used at query time to return a sorted result set (this potentially avoids having to do sorts at query time because the index would already be sorted).

@lcarva
Copy link
Author

lcarva commented Oct 17, 2025

Pushed a change to make the sorting handle different vulnerability IDs. It looks like I broke some tests, I'll have a look at those next, but wanted to share the hmm... creative solution.

lcarva and others added 2 commits October 31, 2025 14:33
CVE records follow a specific format where the last segment represents a
numerical sequence. To properly sort CVE records, we must treat this
sequence segment differently than the rest of the record ID.

fixes guacsec#1811

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Luiz Carvalho <[email protected]>
@lcarva
Copy link
Author

lcarva commented Oct 31, 2025

Pushed a change to make the sorting handle different vulnerability IDs. It looks like I broke some tests, I'll have a look at those next, but wanted to share the hmm... creative solution.

The tests were broken. I just didn't have the expected locale set on my local system. LC_ALL=C cargo test does the job. I expect it will pass here as well.

If we want the approach of using expressions at query time, I believe the changes here achieve that. It would be great for someone with access to approve running the workflows and maybe run /scale-test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GET /api/v2/vulnerability sorted by identifier returns unexpected results

3 participants