Skip to content

Conversation

alilleybrinker
Copy link

The affected array is an array containing product objects, which must at minimum include an "identifier" (which may be a composite identifier composed of multiple fields) along with a set of version bounds or a default status. Products may also specify an assortment of additional fields which further constrain the applicability of the CVE to its intended target hardware or software.

Previously, the set of identifiers available were:

  • A vendor and product
  • A collectionURL and packageName

This commit adds support for a new identifier, called packageURL, which uses the purl (Package URL) specification. The contents of the commit add this as a new field on the product type, with a description and examples, and also update the data constraints on the product type, both to make packageURL an option to fulfill the identifier requirement already in place on the type, and to ensure that the new packageURL field is not mixed with the existing collectionURL or packageName fields, as they are redundant with packageURL and including both increases the possibility of data inconsistency within a single CVE record.

This inclusion of a new packageURL type which can be used instead of the existing pair of collectionURL and packageName would require consumers of CVE records to update their logic both to accept the new field, and to use it in places where they may today use the pair of collectionURL and packageName.

This commit does not include a regular expression to parse Package URLs specifically. Rather, it reuses the existing uriType schema. So we can be sure after validating CVE records against this updated record format that the packageURL field is a URL, but not that it is a valid Package URL per the Package URL specification. It would be the responsibility of CVE Services to further validate the field to ensure values match the Package URL specification. We do not perform this validation in-schema due to the complexity of expressing the validation in the form of a regular expression.

This work is submitted as an alternative formulation of the design proposed in the draft RFD on software identifiers 1, and as an alternative to the existing proposals for making the cpeApplicability structure generic 2 (instead of it being CPE-specific) and enhancing this new generic applicability structure with support for Package URLs 3.

If this change is accepted, then 2 and 3 should not be accepted.

The `affected` array is an array containing `product` objects, which
must at minimum include an "identifier" (which may be a composite
identifier composed of multiple fields) along with a set of version
bounds or a default status. Products may also specify an assortment
of additional fields which further constrain the applicability of the
CVE to its intended target hardware or software.

Previously, the set of identifiers available were:

- A `vendor` and `product`
- A `collectionURL` and `packageName`

This commit adds support for a new identifier, called `packageURL`,
which uses the purl (Package URL) specification. The contents of the
commit add this as a new field on the `product` type, with a description
and examples, and also update the data constraints on the `product`
type, both to make `packageURL` an option to fulfill the identifier
requirement already in place on the type, and to ensure that the new
`packageURL` field is not mixed with the existing `collectionURL` or
`packageName` fields, as they are redundant with `packageURL` and
including both increases the possibility of data inconsistency within
a single CVE record.

This inclusion of a new `packageURL` type which can be used instead of
the existing pair of `collectionURL` and `packageName` would require
consumers of CVE records to update their logic both to accept the new
field, and to use it in places where they may today use the pair of
`collectionURL` and `packageName`.

This commit does not include a regular expression to parse Package URLs
specifically. Rather, it reuses the existing `uriType` schema. So we
can be sure after validating CVE records against this updated record
format that the `packageURL` field is a URL, but not that it is a valid
Package URL per the Package URL specification. It would be the
responsibility of CVE Services to further validate the field to ensure
values match the Package URL specification. We do not perform this
validation in-schema due to the complexity of expressing the validation
in the form of a regular expression.

This work is submitted as an alternative formulation of the design
proposed in the draft RFD on software identifiers [1], and as an
alternative to the existing proposals for making the `cpeApplicability`
structure generic [2] (instead of it being CPE-specific) and enhancing
this new generic applicability structure with support for Package
URLs [3].

If this change is accepted, then [2] and [3] should not be accepted.

[1]: CVEProject#407
[2]: CVEProject#391
[3]: CVEProject#397

Signed-off-by: Andrew Lilley Brinker <[email protected]>
@alilleybrinker
Copy link
Author

alilleybrinker commented May 22, 2025

Open questions:

  • Question 1: Should versions be permitted within a packageURL? (thanks @dwelch2344) (summarized in Add packageURL field to product in affected array. #409 (comment))
    • Option 1: Permit versions in packageURL, do not permit both a version in a packageURL and versions in the product object's versions field.
    • Option 2: Do not permit versions in packageURL, only permit versions in the product object's versions field.
    • Consensus of the group: Option 2, do not permit versions in packageURL.
  • Question 2: Should the packageURL field be able to fulfill the "identifier-like" requirement in a product object? (thanks @ElectricNroff) (summarized in Add packageURL field to product in affected array. #409 (comment))
    • Option 1: Yes, it can fulfill that requirement.
    • Option 2: No, it can't fulfill that requirement.
    • Group did not achieve consensus between the two options. Resolved to pursue Option 2, as it's the more conservative choice.
  • Question 3: Should the affected array disallow creating CVE Records which only use the OmniBOR Artifact ID for an identifier? (thanks @ccoffin) (summarized in Add packageURL field to product in affected array. #409 (comment))
    • Option 1: Yes, it should disallow creating CVE Records which only provide an OmniBOR Artifact ID.
    • Option 2: No, it should allow creating CVE Records which only provide an OmniBOR Artifact ID.
    • Proposed and achieved consensus on Option 1, disallowing producing CVE Records which only use "fine-grained" identifiers.

@alilleybrinker
Copy link
Author

alilleybrinker commented May 22, 2025

Summary: Should versions be permitted within a packageURL?

(This is a summary of a discussion during the Quality Working Group meeting on May 22nd. Positions described here are not necessarily my own, merely an attempt at summarizing the pros and cons of the two options identified during the discussion.)

Option 1: Permit versions in packageURL, do not permit both a version in a packageURL and versions in the product object's versions field.

  • Pros
    • More flexible for CVE Numbering Authorities (CNAs). They can either directly provide a PURL with a version embedded if a CVE affects only a single version, or they can omit the single version and use the versions object to express a version range.
  • Cons
    • More difficult for CVE data consumers, who have to conditionally check two places (inside the packageURL and in the versions field for the version information, which then must also be handled differently, as one is a single version while the other is a version range.

Option 2: Do not permit versions in packageURL, only permit versions in the product object's versions field.

  • Pros
    • Simpler for CVE data consumers. Version information in a product block is only ever expressed within the versions field. Handling of version data does not need to be conditional.
  • Cons
    • Less flexible for CNAs. If a CVE affects only a single software version, they'd nonetheless need to include a versions field which uses the constraints to express that single version. This makes expressing the single-version case more complex.

Option 3: Permit version information in both the packageURL and in the product object's verions field.

Note

This was added after the initial two based on a comment from @MrMegaZone.

  • Pros
    • Simplest to implement for CVE Services. Does not require parsing package URLs in the CVE Services codebase to validate the presence or lack of a version specifier.
    • Most flexible for CNAs, they can put version information in the packageURL, in the versions field, or in both.
  • Cons
    • Makes it possible that version information in a packageURL could contradict the version data in the versions field. CVE consumers would have to handle that case. The QWG could give guidance on which field would take priority, but consumers would have to correctly implement that priority.

Additional note: any decision made here should be made with consideration of other future formats. The same problem would arise for any "versionable" software identifier added to the affected array which can also itself embed version information. Whichever option is chosen, it should be taken as a decision for future formats as well, to maximize the consistency of the design.

@ElectricNroff
Copy link

Summary: Should versions be permitted within a packageURL?

(This is a summary of a discussion during the Quality Working Group meeting on May 22nd. Positions described here are not necessarily my own, merely an attempt at summarizing the pros and cons of the two options identified during the discussion.)

Option 1: Permit versions in packageURL, do not permit both a version in a packageURL and versions in the product object's versions field.

  • Pros

    • More flexible for CVE Numbering Authorities (CNAs). They can either directly provide a PURL with a version embedded if a CVE affects only a single version, or they can omit the single version and use the versions object to express a version range.
  • Cons

    • More difficult for CVE data consumers, who have to conditionally check two places (inside the packageURL and in the versions field for the version information, which then must also be handled differently, as one is a single version while the other is a version range.

Option 2: Do not permit versions in packageURL, only permit versions in the product object's versions field.

  • Pros

    • Simpler for CVE data consumers. Version information in a product block is only ever expressed within the versions field. Handling of version data does not need to be conditional.
  • Cons

    • Less flexible for CNAs. If a CVE affects only a single software version, they'd nonetheless need to include a versions field which uses the constraints to express that single version. This makes expressing the single-version case more complex.

Additional note: any decision made here should be made with consideration of other future formats. The same problem would arise for any "versionable" software identifier added to the affected array which can also itself embed version information. Whichever option is chosen, it should be taken as a decision for future formats as well, to maximize the consistency of the design.

Points in favor of Option 2:

The Option 1 Pro is unimportant because it is rare for a vulnerability to affect only one version. A group at MITRE researched this across OSV data in approximately 2024 and found that, with a few exclusions such as malicious-packages, less than 10% of vulnerabilities affected only one version. Also, there is probably no realistic category of CNA where, across all of their CVE Records, vulnerabilities affect only one version. Thus, every CNA may need a process for creating a range.

Large parts of the open source community are familiar with a constraint where Purl must omit a version, e.g., "The purl field is a string following the Package URL specification that identifies the package, without the @​version component" at https://ossf.github.io/osv-schema/

Placing a version number inside the Purl will, in some cases, increase the complexity of the secondary parser. For example, a Debian package version may have a ':' character (see the https://www.debian.org/doc/debian-policy/ch-controlfields.html#version page) that must be encoded as %3A within a Purl. Depending on my usage of Purl data in CVE, placing all versioning outside of the Purl may mean that my secondary parser doesn't need to know about URL encoding. (Admittedly, most people would still choose a secondary parser that handles 100% of the Purl complexities.)

With ADPs, there can be different reporting practices that cause different data providers to publish different version information for the same vulnerability. To simplify a consumer's process of comparing the differences, it could be helpful if the data were expressed in the same way. For example, one provider might be expressing only 8.0.0 but another is expressing "version":"6.0.0","lessThanOrEqual":"8.0.0"

@MrMegaZone
Copy link
Collaborator

Allowing versioning in both locations, but only one at a time, seems like it would complicate validation for CVE services as well. We'd need to parse the Purl to determine if version information is present. If it is, then we'd have to ensure no version information is provided in the existing elements, if it is not, then we'd need to ensure it is. Basically making sure it appears once and only once.

I'd prefer requiring version information in the existing fields and recommending it not be in the Purl - but we don't have to enforce that. We could say in cases of disagreement the non-Purl version information always wins and CNAs SHOULD NOT include version info in the Purl. (As opposed to MUST NOT.) But even if we make it is MUST NOT it seems easier to check the Purl and reject the record if version is included, and ensure the other fields are filled in. And it makes things more consistent downstream, for example the website, etc.

@sentientcellcluster
Copy link

(Masters) I support the 2nd option - not permitting version information in the PURL string.

@alilleybrinker
Copy link
Author

@MrMegaZone I've amended the list of options to reflect a third one based on your comment. If you disagree with my summary or think I've missed any pros or cons, let me know.

@alilleybrinker
Copy link
Author

alilleybrinker commented May 27, 2025

I've drafted example CVE records representing different possibilities for embedding version information in a packageURL, in the versions field, or in both, including the possibility for inconsistency. You can view it here.


Having thought about this over the weekend, I think option 2: only permitting version information in the versions field, is the right one.

It's the simplest option for CVE data consumers, who only need to look in one place for version data. It doesn't permit data inconsistency (an example of which you can see in the Gist I linked above).

Also, given that the decision we'd reach here should be applied in the same way to any future identifier formats added, not permitting versions embedded in an identifier avoids a future where CVE consumers would need to parse many different identifier formats to get all possible version information.

The current design of the affected array cleanly separates "identifier-like" fields (vendor and product, collectionURL and packageName) from "version-like" products (versions or defaultStatus). Permitting version information in an identifier would mix this distinction.

This would mean that CNAs need to put version information in the versions field rather than embedding it in a purl, but that's already where they put version information today.

So, in short, only permitting version information in the versions field is:

  • Simpler for CVE consumers data; they only need to look in one place.
  • Simpler for CVE consumers in the future; they don't have to parse different identifier formats to get version data out.
  • Keeps a clear distinction between "identifier-like" fields and "version-like" fields.
  • Avoids possible inconsistent data if version information can be given in two places which may accidentally contradict each other in the same record.
  • Matches the expectation for CNAs today: version information always goes in the versions field.

It would require CVE Services to parse purls, but there is an official JavaScript package for doing just that, which CVE Services could use easily: https://github.com/package-url/packageurl-js

@darakian
Copy link

+1 for option 2. Decoupling the version information would allow for version types to be added/vetted/validated asynchronously from package identifiers and would allow for a compact representation of a single software with multiple version ranges.

@alilleybrinker
Copy link
Author

alilleybrinker commented May 29, 2025

One more point in favor of disallowing versions in purls, OSV disallows them today: https://ossf.github.io/osv-schema/#affectedpackage-field. It would be ideal to match the behavior of other ecosystems where possible to increase interoperability.

This amends the specification for Package URLs to no longer
permit versions in them, updating the description and
examples for the `packageURL` field of the `product` object.

The actual enforcement of this requirement will need to be
done within CVE Services.

Signed-off-by: Andrew Lilley Brinker <[email protected]>
@alilleybrinker
Copy link
Author

The PR has been updated to no longer permit versions in Package URLs, going with Option 2 outlined above.

@alilleybrinker
Copy link
Author

alilleybrinker commented Jun 5, 2025

One new issue has been raised by @ElectricNroff (Matt Power) concerning forward compatibility. In short: the proposal today represents a forward-compatibility hazard, because of how it modifies the required field constraints on the product object of the affected array.

The current proposal turns this...

"anyOf": [
    {"required": ["vendor", "product"]},
    {"required": ["collectionURL", "packageName"]}
]

Into this...

"anyOf": [
    {"required": ["vendor", "product"]},
    {"required": ["collectionURL", "packageName"]},
    {"required": ["packageURL"]}
]

The concern is that by adding a new option to the anyOf constraint, you enable users of the updated schema to produce records which won't be accepted by users of the prior schema. This means that CVE consumers would need to adapt their CVE record parsing logic to use the new schema in order to be able to accept all new records coming from CVE Services.

In this case, that additional parsing logic would, if the consumer wants to make use of the Package URL, also likely involve a need to parse the packageURL field (though they could opt to ignore and not parse it, or to only use it for exact matching without parsing, if they wished).

On parsing, it's worth noting that many languages have libraries which offer a facility to generate parsers based on a JSON schema, so in these languages the challenge of updating to a new parser version is generally limited.

This is a distinct compatibility issue from the backwards-compatibility concerns reflected in SchemaVer; because of this, Matt also raised a view that SchemaVer may not be the right versioning scheme to use for the CVE Record Format (for that reason, I've also removed mention of it from the "RFD to introduce an RFD process" (#405), so the issue of what versioning scheme to use can be fully addressed in a more focused future RFD.


Given the future-compatibility hazard this proposal represents (and the same hazard exists for the OmniBOR PR [#410]), there's a decision to make, with trade-offs:

  • Option 1: keep the proposal as-is, with packageURL as an alternative way to fulfill the "identifier-like" requirement for product object.
  • Option 2: modify the proposal, meaning packageURL can't fulfill the "identifier-like" requirement.

The following a breakdown of pros and cons for the two options:

  • Option 1: let packageURL fulfill the "identifier-like" requirement
    • Pros:
      • More likely to drive adoption of the packageURL field, as it can be used as a more-easily-cross-referenced and more powerful alternative to the existing collectionURL and packageName fields.
      • Avoids risk of data inconsistency between packageURL and collectionURL/packageName by ensuring the fields can't be used together.
    • Cons:
      • CVE consumers must upgrade to use the new schema once it's adopted by CVE Services; may need to modify their parsers.
  • Option 2: do not let packageURL fulfill the "identifier-like" requirement
    • Pros:
      • CVE consumers do not need to upgrade their parsers, assuming their parsers do not balk on the presence of unknown fields in an object.
    • Cons:
      • Risk of data inconsistency between packageURL and collectionURL/packageName.

There is a broader question here of whether the CVE Record Format should try to preserve forward compatibility between versions. SchemaVer, which the QWG has previously had at least a loose consensus on adopting, does not consider forward compatibility for versioning, so even ADDITION-level changes are permitted to break forward compatibility by doing things like adding to the set of fields which fulfill a closed set of requirements, or adding new variants to an enum.

So we have both the immediate question of how to resolve this packageURL field issue, and the broader question of how to handle forward compatibility and relatedly what versioning scheme to use (and whether SchemaVer is the right choice).

@zmanion
Copy link
Contributor

zmanion commented Jun 10, 2025

I'm not sure if we've answered this permanently, but do we expect CVE Services (and the pile-of-JSON-files cache in GitHub) to only accept and return one version of the schema? Or can different records have different schema versions (and Services/file cache would accept/return different schema versions? The reason I ask:

you enable users of the updated schema to produce records which won't be accepted by users of the prior schema

If threre is one schema (version) at a time, and it gets updated, then I don't think use of the prior schema is a concern. Those potential "users of the prior schema" would need to use the updated schema, in which case they would accept packageURL and whatever other new software component IDs we choose to add. TLDR, I don't think forward/backward compatibility is an issue if the Program declares and moves to a single new schema version.

@alilleybrinker
Copy link
Author

@zmanion I’m not certain if records embed a reference to the schema they’re written against, but I think CVE Services enforces one schema version for input, and downstream consumers would still need to make changes to adapt to new schema versions, otherwise they would be unable to parse newer records when a forward compatibility breaking change is made.

@alilleybrinker
Copy link
Author

In today's QWG meeting folks raised a desire for examples of what both OmniBOR (see #410) and purl would look like in a product object when they can fulfill vs. cannot fulfill the "identifier-like" requirement for that object.

Here are examples! https://gist.github.com/alilleybrinker/de8f56ba599609f7867bc5589c73505b

@alilleybrinker
Copy link
Author

alilleybrinker commented Jun 24, 2025

One additional issue, raised by @ccoffin (see: https://gist.github.com/alilleybrinker/de8f56ba599609f7867bc5589c73505b?permalink_comment_id=5633750#gistcomment-5633750) is whether there should be a constraint to disallow the creation of CVE Records where the only identifier used within the affected array is an OmniBOR Artifact ID (using the proposed artifactID/artifactType fields).

Personally, I do not think this constraint is necessary, for two reasons:

  • I think it's very unlikely CNAs would want to submit such records. CNAs already fill in identifier information (vendor/product and/or collectionURL/packageName) and have built out the tooling and processes to do so. It's hard to imagine any CNAs dropping those processes entirely to solely use OmniBOR data.
  • Even if they did, I don't think this would be a data quality issue but a data completeness issue. It's not that using only OmniBOR Artifact IDs would be wrong, but that it may not be useful enough for CVE data consumers.

As I see it, at the very least this is a constraint which could be added later if we observe CVE Records being issued with insufficient identifier information.


Meta note: I'm tracking this issue here, although it may more properly be tracked in #410, because this PR is already used for tracking two other issues and I want to reduce splintering of the conversation.

@alilleybrinker
Copy link
Author

(This really applies to the RFD #407, but I am pasting it here as well for completeness)

Note

Final Comment Period

A Final Comment Period (FCP) has been called for this proposal. This is a final opportunity to raise new concerns with the proposal.

The FCP will close at 2pm PDT / 5pm EDT July 3rd, at the end of the Quality Working Group Meeting.

@ccoffin ccoffin merged commit e4fe53e into CVEProject:develop Aug 21, 2025
1 check passed
@alilleybrinker alilleybrinker deleted the alilleybrinker/affected-purls branch August 21, 2025 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants