Skip to content

Conversation

@crozzy
Copy link
Contributor

@crozzy crozzy commented Jan 13, 2026

There is a case during a period when there are lots of deletes happening in the VEX data that there are files deleted after we fetch the changes.csv file and before we actually try and read the changes in the referenced VEX file.
Review on CodeApprove

There is a case during a period when there are lots of deletes happening
in the VEX data that there are files deleted after we fetch the
changes.csv file and before we actually try and read the changes in the
referenced VEX file.

Signed-off-by: crozzy <[email protected]>
@crozzy crozzy requested a review from a team as a code owner January 13, 2026 17:54
@crozzy crozzy requested review from hdonnay and removed request for a team January 13, 2026 17:54
@hdonnay
Copy link
Member

hdonnay commented Jan 13, 2026

I'm disinclined to merge this: what happens when a VEX file isn't deleted, but just missing? Will it just never get picked up until it re-appears in the changes.csv?

@crozzy
Copy link
Contributor Author

crozzy commented Jan 13, 2026

I'm disinclined to merge this: what happens when a VEX file isn't deleted, but just missing? Will it just never get picked up until it re-appears in the changes.csv?

It won't be touched in the DB, it'll just stay in it's original state. If it has genuinely changed that change should be picked up next time the archive is updated (and we ingest it).

@crozzy
Copy link
Contributor Author

crozzy commented Jan 13, 2026

I'm disinclined to merge this: what happens when a VEX file isn't deleted, but just missing? Will it just never get picked up until it re-appears in the changes.csv?

It won't be touched in the DB, it'll just stay in it's original state. If it has genuinely changed that change should be picked up next time the archive is updated (and we ingest it).

I suppose it's a trade-off: nothing getting updated in an updater run (but potentially getting picked up on the next run) or everything apart from the missing/bad-timing-deleted stuff being updated and the remainder being stuck in a funk until the next archive update. I'm also happy to wait until the secdata issues have been resolved .

@hdonnay
Copy link
Member

hdonnay commented Jan 13, 2026

If it has genuinely changed that change should be picked up next time the archive is updated (and we ingest it).

But this won't happen, as the archive is only used on cold-start. I'm worried about an advisory being unable to be noticed until it's touched again.

@crozzy
Copy link
Contributor Author

crozzy commented Jan 13, 2026

Yeah you're right, it would be in limbo until it's updated again

break
case http.StatusNotFound:
// We don't want to fail the entire fetch if an advisory is not found.
// It can happen if the advisory has just been deleted.
Copy link
Contributor

@jvdm jvdm Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Just been deleted" seems to include planned deletions but also unexpected deletions (eg. security data errors), right? Not failing here takes care of both scenarios?

Copy link
Contributor Author

@crozzy crozzy Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would account for both scenarios, the caveat being what Hank said: if it wasn't a legit delete that change wouldn't be represented in the DB (and even if the file is added back and is different from the original, those differences won't be represented in the DB until (or if) it's changed again).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. And for use cases where the data is being added "from scratch" (eg. air-gapped envs, or exports of vulnerabilities) this would not be a problem as added back entries would mean recovery, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, in those cases we don't have to "keep state" we just have to represent current state.

Copy link
Contributor

@jvdm jvdm Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is if the deletion is invalid (not "legit").

When consuming a feed, if part of the feed payload is invalid according to the spec, the consumer should ignore the invalid part, still ingest the rest, and still report the feed as "updated"?

@hdonnay
Copy link
Member

hdonnay commented Jan 13, 2026

If we want to catch this, we could fetch the changes.csv and the deletions.csv and process them in their entirety before downloading any VEX files.

This would catch the case where a CVE was created, updated, then retracted. The 404 handling would then happen as is in main currently, though.

@crozzy
Copy link
Contributor Author

crozzy commented Jan 13, 2026

If we want to catch this, we could fetch the changes.csv and the deletions.csv and process them in their entirety before downloading any VEX files.

This would catch the case where a CVE was created, updated, then retracted. The 404 handling would then happen as is in main currently, though.

When stuff is added to the deletions.csv it is removed from the changes.csv hence why we're erroring on different files as the deletions are being processed. Therefore requesting the changes.csv and the deletions.csv together will give you the behaviour we have today. The edge case we were hitting (I think) is:

  • a file is changed, we request changes.csv and the file is listed, the file is deleted, we try and processes the file, 404

If we request both together:

  • a file is changed, we request changes.csv/deletions.csv and the file is listed in changes and not in deletions, the file is deleted, we try to process the file, 404.

@crozzy
Copy link
Contributor Author

crozzy commented Jan 13, 2026

Because we're requesting the status files before acting on them I'm not sure there is a way to definitively say whether something should exist, we'd either have to request the changes file every time we request a changed VEX file to check it's still there (even then it's not atomic, it just greatly reduces the staleness window), or we have the lax: meh if it's 404 we assume it doesn't exist since the delta of requesting the changes.

@hdonnay
Copy link
Member

hdonnay commented Jan 13, 2026

I guess I'm trying to guard against the case where the csv files are changed in the "incorrect" order w/r/t the deletion of the VEX file. As it stands, there's 3 logical events:

  1. changes.csv is updated
  2. deletions.csv is updated
  3. VEX file is deleted

Because we process changes and fetch VEX files then process deletions, we assume that these have to happen in the order above, but that's not specified anywhere. Using the csvs to generate one "view" of the state of the VEX files might make it more resilient?

I dunno, this gets very distsys very fast. It'd be much nicer if they atomically swapped between revisions of the VEX corpus.


Another, perhaps worse, approach would be to restart the (little-p) process when we detect an inconsistency.

@crozzy
Copy link
Contributor Author

crozzy commented Jan 13, 2026

I guess I'm trying to guard against the case where the csv files are changed in the "incorrect" order w/r/t the deletion of the VEX file. As it stands, there's 3 logical events:

1. `changes.csv` is updated

2. `deletions.csv` is updated

3. VEX file is deleted

Because we process changes and fetch VEX files then process deletions, we assume that these have to happen in the order above, but that's not specified anywhere. Using the csvs to generate one "view" of the state of the VEX files might make it more resilient?

I dunno, this gets very distsys very fast. It'd be much nicer if they atomically swapped between revisions of the VEX corpus.

Another, perhaps worse, approach would be to restart the (little-p) process when we detect an inconsistency.

The problem is, even if prodsec did atomically swap all the data at once (which I suspect is not the case), we still have a race condition where we've requested the changes.csv file (and potentially the deletions.csv) before processing it, so there is always the potential that things have changed/disappeared in that delta. If we started the process again it might work but it also might not again if deletions fall into the deadly void.

It's like we'd want to transactional-ize something that is cannot logically fit in a transaction (the retrieval of what has changed and what those changes are).

@hdonnay
Copy link
Member

hdonnay commented Jan 26, 2026

So what's the takeaway here?

I think this should probably not get merged.

If a VEX file should exist per the changes.csv but doesn't, it would become "invisible" to the system until it was updated again. This would be effectively undetectable and mean there's a missing advisory for an unbounded amount of time.

@dcaravel
Copy link

Agreed after chatting about this, skipping over 404's seems risky if not occasionally 're-syncing' data from the main VEX archive.

@crozzy crozzy closed this Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants