Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

licenses: allow mix of multiple SPDX expressions AND/OR multiple named/spdx licenses #454

Open
jkowalleck opened this issue Apr 29, 2024 · 22 comments · May be fixed by #582
Open

licenses: allow mix of multiple SPDX expressions AND/OR multiple named/spdx licenses #454

jkowalleck opened this issue Apr 29, 2024 · 22 comments · May be fixed by #582
Assignees
Labels
promote to tc54 Promote to Ecma Technical Committee 54 proposed core enhancement RFC notice sent A public RFC notice was distributed to the CycloneDX mailing list for consideration RFC vote accepted
Milestone

Comments

@jkowalleck
Copy link
Member

jkowalleck commented Apr 29, 2024

current situation (CDX 1.6):

  • it is allowed to have EITHER one spdx license expression OR multiple named/spdx licenses. see spec
  • each license(expression/named/spdx) can have a acknowledgement - none or "declared" or "concluded". see spec

problem

the current situation does not allow the following:

  • situation A: multiple declared licenses ids (like in python license trove-classifiers) and one concluded expression
    • Declared spdx license id "MIT" - as set in the project manifest
    • Declared spdx license id "PostgreSQL" - as set in the project manifest
    • Declared named license "Apache Software License" - as set in the project manifest
    • License evidence from the README file: "chose the license that applies best to you: PostgreSql or MIT or Apache2"
    • Concluded spdx license expression license "(MIT OR PostgreSQL OR Apache-2.0)" - (this is just an example for spec reasons, this is not a real-world law case!)
  • situation B: declared expression and concluded expression
    • Delcared spdx expression "MIT OR (GPL-3.0 OR GPL-2.0)"
    • Concluded spdx expression "(GPL-3.0-only AND LGPL-2.0-only)" - after some lawyer checked for actual applied situation - (this is just an example for spec reasons, this is not a real-world law case!)
  • situation C: declared expression and concluded spdx id
    • Declared spdx expression "GPL-3.0-or-later OR GPL-2.0"
    • Concluded spdx id " GPL-3.0-only" - after some lawyer checked for actual applied situation - (this is just an example for spec reasons, this is not a real-world law case!)

▶ more regarding reasons and practical use cases here: #454 (comment)

request

allow the following:

  • multiple SPDX expressions at the same time
  • allow mix of SPDX expression and other licenses at the same time

possible results

{ 
"bomFormat": "CycloneDX",
"specVersion": "1.x",
// ...
"components": [
 
{
 // ... component properties ...
 "licenses": [
    // situation A -- (this is just an example for spec reasons, this is not a real-world law case!)
    { "license": { 
      "id": "MIT", 
      "acknowledgement": "declared" } },
    { "license": { 
      "id": "PostgreSQL", 
      "acknowledgement": "declared" } },
    { "license": { 
      "name": "Apache Software License", 
      "acknowledgement": "declared" } },
    { "expression": "(MIT OR PostgreSQL OR Apache-2.0)",
      "acknowledgement": "concluded" }
  ]
},

{
 // ... component properties ...
 "licenses": [
    // situation B -- (this is just an example for spec reasons, this is not a real-world law case!)
    { "expression": "MIT OR (GPL-3.0 OR GPL-2.0)", 
      "acknowledgement": "declared" },
    { "expression": "(GPL-2.0-only AND LGPL-2.0-only)",
      "acknowledgement": "concluded" }
  ]
},

{
 // ... component properties ...
 "licenses": [
    // situation C -- (this is just an example for spec reasons, this is not a real-world law case!)
    { "expression": "GPL-3.0+ OR GPL-2.0", 
      "acknowledgement": "declared" },
    { "license": { 
      "id": " GPL-3.0-only", 
      "acknowledgement": "concluded" } }
  ]
},

{
 // ... component properties ...
 "licenses": [
   // example with concluded LicenseRef -- (this is just an example for spec reasons, this is not a real-world law case!)
    { "license": { 
      "id": "MIT", 
      "acknowledgement": "declared" } },
    { "license": { 
      "name": "Amazon Software License", 
      "acknowledgement": "declared" } },
    { "expression": "MIT AND LicenseRef-.amazon.com.-AmznSL-1.0",
      "acknowledgement": "concluded" }
  ]
},


]
}
@jkowalleck
Copy link
Member Author

related: CycloneDX/cyclonedx-python#826

@villaflaminio
Copy link

villaflaminio commented Nov 8, 2024

I agree with the problem, especially case C.
I would like to be able to have a lawyer or some automatic mechanism review the product sbom so that I can identify which licence to "refer" to for each individual component based on my use case.
It is more than reasonable to have two licenses defined for a product, e.g. if the component is used in open source projects then the license is type A, otherwise if it is used for commercial purposes then the license is type B.
This is just an example.
And, of course, in the report I would like to avoid replacing the license expression declared by whoever produced the component in question.
One solution might be to allow to have one license expression and only one license by specifying ‘acknowledgement concluded’, but I don't know whether this would create problems elsewhere.

@jkowalleck
Copy link
Member Author

I will be working on a solution for this, planned for CycloneDX 1.7.
All comments, discussions, and any help is welcome.

@Joerki
Copy link

Joerki commented Dec 19, 2024

The example should also mention "LicenseRef-" items to clearly state that such simple expressions are also SPDX expression by definition of SPDX, not only compound expressions.

Situation B:
After discussing the topic "concluded" license with our OSCO (Open Source Compliance Officer) my understanding is that a concluded license must not contain a compound statement with "OR". A decision must be taken.
The example would be the same as "MIT OR GPL-3.0-only OR GPL-2.0-only"
Based on current existing licenses it is compatible with "MIT OR GPL-2.0-or-later" or "MIT OR GPL-2.0+"

Concluded license possiblities:

  • MIT (MIT)
  • GPL-2.0 (GPL-2.0-only)
  • GPL-3.0 (GPL-3.0-only)

Situation C:
"GPL-2.0+" cannot by used as concluded license based on the legal information I have (see also link below). It has an implicit "OR". As a consumer I have the choice to select between GPL-2.0 or a higher version (GPL-3.0) and I have to make a choice in my context. The GPL licenses are not fully compatible, so it is a legal requirement to select what applies for me as concluded license.

Clear statement here (Standard License Header):
https://spdx.org/licenses/GPL-2.0-or-later.html

Another significant reference:
BSI-TR-03183-2 Version 2.0.0 (10.10.2024)
Federal Office for Information Security (BSI)
https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/TechGuidelines/TR03183/BSI-TR-03183-2-2_0_0.html
Chapter 6.1 (License identifiers and expressions)
They refer to SPDX annexes about usage, they also recommend to use the Scancode LicenseDB Aboutcode!

@jkowalleck
Copy link
Member Author

re: #454 (comment)

Thanks for pointing that out.
I tried to incorporate your remarks in the description, and also added a section "expected results" to showcase the results.
Could you check whether all your use cases are represented, @Joerki ?

Anyway, the examples exist for showcasing needed options(requirements). As stated

(this is just an example for spec reasons, this is not a real-world law case!)

@jkowalleck
Copy link
Member Author

"GPL-2.0+" cannot by used as concluded license based on the legal information I have (see also link below). It has an implicit "OR". As a consumer I have the choice to select between GPL-2.0 or a higher version (GPL-3.0) and I have to make a choice in my context. The GPL licenses are not fully compatible, so it is a legal requirement to select what applies for me as concluded license.

this might be true for OBOM and alike, but not for SBOM.
lets say i am pulling a library from the internet, and i have a lawyer analyzing the license posture, and they conclude "I've analyzed the license's README and other evidences, and i conclude the license to be free to chose from A or B ... -- which one is applied can be decided only after the lib was integrated into a system."

@jkowalleck jkowalleck changed the title licenses: allow mix of multiple SPDX expressions AND multiple named/spdx licenses licenses: allow mix of multiple SPDX expressions AND/OR multiple named/spdx licenses Jan 20, 2025
@jkowalleck jkowalleck linked a pull request Jan 20, 2025 that will close this issue
8 tasks
@jkowalleck
Copy link
Member Author

jkowalleck commented Jan 20, 2025

please review the proposed implementation changes to enable the features outlined in this very ticket:
#582

@pombredanne
Copy link

@jkowalleck Since you are tackling licensing, I would suggest considering these:

  1. Deprecate entirely "single SPDX license IDs, and named licenses" and only use license expressions in the future (dropping entirely "single SPDX license IDs, and named licenses" in 1.8 or 1.9. The support of "A list of SPDX licenses and/or named licenses and/or SPDX License Expression." is confusing IMHO. Expressions are enough, especially if combined with 2. and 3. Names and ids are mostly unusable for any kind of automation.
  2. Add explicit support for scancode LicenseRef (they are recommended in Germany and India) and local LicenseRef as per @Joerki
  3. Add support for tracking the actual license text and notices which is missing today and needed for local LicenseRef
  4. Also adopt expressions for the various spec artifacts like this
    "$comment" : "CycloneDX JSON schema is published under the terms of the Apache License 2.0.",

@pombredanne
Copy link

@Joerki re: #454 (comment)

I have to make a choice in my context. The GPL licenses are not fully compatible, so it is a legal requirement to select what applies for me as concluded license.

Actually the mileage may vary. Some lawyers may prefer to convey the choice, some may prefer to pick one. So I am not sure there is any legal requirement to select a license, except in a few rare case.

One such case is with FreeType https://gitlab.freedesktop.org/freetype/freetype/-/blob/b1f47850878d232eea372ab167e760ccac4c4e32/LICENSE.TXT#L9 ...

This means that you must choose one of the two licenses described
below, then obey all its terms and conditions when using FreeType 2 in
any of your projects or products.

.... but this is a really rare case. Most exceptions allow to keep or drop the exception, and many licensing that allow future versions may not even give you the choice to drop future version (like the Eclipse license).

@jkowalleck
Copy link
Member Author

jkowalleck commented Feb 10, 2025

re: #454 (comment)

A friendly reminder: do not mix multiple scopes in one ticket. Deprecating one structure, and enhancing another are two different scopes, and should be handled in separate tickets and separate pull requests.

re topic 1

Deprecate entirely "single SPDX license IDs, and named licenses" and only use license expressions in the future (dropping entirely "single SPDX license IDs, and named licenses" in 1.8 or 1.9. The support of "A list of SPDX licenses and/or named licenses and/or SPDX License Expression." is confusing IMHO. Expressions are enough, especially if combined with 2. and 3. Names and ids are mostly unusable for any kind of automation.

This is not in the scope of this very ticket. I do not intend to switch scopes of this ticket.
I am planning to bring a "fix"/"feature" to CycloneDX that helps users ASAP. I do not intend to drive a years-long discussion about alternatives and such.

Anybody interested in pursuing the goal of deprecating current license structures, while enhancing others is free to do so - I have no problem with that. It all starts with creating a ticket for that and championing/driving the topic. Go ahead. :-)

re topic 2

Add explicit support for scancode LicenseRef

license-refs are anything the SPDX ID is not assigned, right?
this is what named licenses are for.

re topic 3

Add support for tracking the actual license text and notices which is missing today and needed for local LicenseRef

when used in an SPDX expression: this is #554 what is about.
Otherwise: this is what named licenses are for - they have an attachment already

re topic 4:

completely different topic.


PS: feel free to open separate tickets for all the things you mentioned. I will not work on them in this very ticket due to ticket scope.

@Joerki
Copy link

Joerki commented Feb 10, 2025

Add explicit support for scancode LicenseRef

license-refs are anything the SPDX ID is not assigned, right? this is what named licenses are for.

This is confusing for me. The "name" (also considering the example in the CycloneDX definition) is a human readable title (-> see SPDX). I could print it together with the license text in an attribution report for a customer.

@jkowalleck
Copy link
Member Author

jkowalleck commented Mar 6, 2025

RFC notice sent. https://groups.io/g/CycloneDX/message/304 https://cyclonedx.slack.com/archives/CVA0G10FN/p1738861352347449

Public RFC period ends March 6, 2025

Period ended today, change was promoted to TC54.

In today's TC54 meeting, some members rejected the feature as it is today, and rejected this original promoted feature. Reason: they expressed, that allowing multiple licenses was a bad idea.

The discussion about that shall be continued in this very ticket.

@jkowalleck
Copy link
Member Author

jkowalleck commented Mar 6, 2025

@pombredanne , Please provide your reasoning for rejecting this core enhancement regarding the necessary license mix as a whole.

@pombredanne
Copy link

@jkowalleck you wrote:

license-refs are anything the SPDX ID is not assigned, right?
this is what named licenses are for.

"LicenseRef" are ids designed to be used in expressions, so they may need to be defined as their own reference entries for such use in an expression, but not for use as a licensing statement alone, so they would not fit IMHO in the "named license" section.

Please provide your reasoning for rejecting this core enhancement regarding the necessary license mix as a whole.

About allow mix of SPDX expression and other licenses at the same time:

  • A mix of license ids and license expressions is redundant. An expression with a single license is the same as a license id, therefore I would drop entirely the support for single ids in a list and certainly not promote this further. The list of ids is also outdated and it should be best managed externally that in the spec JSON schema.

  • Enabling mixing license names and expressions is not a happy thing as it promotes unclear licensing which cannot be made sense. A name carries no clear meaning in general, therefore I I would add not extend support for more mixing of licenses names with license expressions, as it brings a whole set of new ambiguities. Therefore not mixing "named licenses" with expression as it is today is a better approach.

About multiple SPDX expressions at the same time exclusive of the above:

  • I am fine with this as long it is clear that not two expressions can have the same "acknowledgement". For instance something like:

EITHER (a list of one or more SPDX License Expressions, each with a different acknowledgement) OR (a list of named licenses. The relationship between the licenses in this list is undefined.)

Also, closely related, we are also missing optional but important notices and license texts that I should be able to add together with and in support of a license expression.

And as mentioned above LicenseRef need something.

One approach that would be backward compatible could be to add optional, extra attributes with each license expression entry such as:

  • extra data about any licenseref, including optional text, and URLs
  • actual notices and attributions

This would not be mixed with a list "named licenses"

So things are not really simple to resolve here, an in all cases adding more opportunities for ambiguities on top of existing ambiguities is not something that helps with clarity.

So in recap, my recommendation is to:

  • Keep the exclusion between named a list of named licenses and a list expressions
  • Deprecate using a list of license ids in favor of expressions
  • Allow a list of expressions with unique "acknowledgment" across all expressions
  • Expand attributes under an expressions for licensref/exceptionref and notice text, and possibly some evidence (e.g., some raw text or license tag or similar found in some manifest or file)

@mjherzog
Copy link

Circling back to this specific ticket, I think that there is a problem with ambiguous use of "expression".
In the v1.6 spec expression refers to a valid SPDX license expression - https://cyclonedx.org/docs/1.6/json/#components_items_licenses_oneOf_i1_items_i0_expression. A valid SPDX license expression requires that all elements are one of:

  • Valid SPDX license ids from the SPDX License List
  • LicenseRef-somename identifiers where LicenseRef-somename refers to license text documented somewhere else in the CDX SBOM or the SPDX document
  • Valid operators: AND, OR, WITH

I can best explain this with reference to the "backport of a newly added valid example for CDX 1.7." from https://github.com/CycloneDX/specification/pull/582/files - referring specifically to the JSON example.

  • For situation-B, "expression": "MIT OR (GPL-3.0 OR GPL-2.0)" is not a valid SPDX license expression. "MIT OR (LicenseRef-GPL-3.0 OR LicenseRef-GPL-2.0)" would be a valid SPDX license expression.
  • There is a similar problem with situation-C where "GPL-3.0-or-later OR GPL-2.0" is not a valid SPDX license expression - "GPL-3.0-or-later OR LicenseRef-GPL-2.0" would be a valid SPDX license expression.

So either the definition of "expression" needs to change such that it does not need to be a valid SPDX license expression or
we need two fields:

  • spdx_license_expression; requires all elements to be valid
  • other_license_expression: allows the expression to mix SPDX license ids (including those in LicenseRef format) and names

@jkowalleck jkowalleck added the RFC notice sent A public RFC notice was distributed to the CycloneDX mailing list for consideration label Mar 16, 2025
@jkowalleck
Copy link
Member Author

jkowalleck commented Mar 18, 2025

re: #454 (comment)

I am fine with this [...]

👍

[...] as long it is clear that not two expressions can have the same "acknowledgement". For instance something like:

this is not in the scope of this very ticket. in fact, this would be a completely different core enhancement - one that is impractical for real-world use:

  • for example in python packaging, you have multiple licenses declared - and in the readme you then describe what that means.
  • some packages might declare a custom license and a license-expression. they just did this, and merging those would already be a conclusion. so we would have a declared license, a declared license expression, and a concluded expression. -- this is exacerbate this ticket is about to enable.
  • i might need to triage a license, and ask two lawyers to find a conclusion - so i would have two conclusions then.

you see, not only is the thing you want a different scope, but also not trivial.

Also, closely related, we are also missing optional but important notices and license texts that I should be able to add together with and in support of a license expression.

Again, different scope. Please get more involved in the standard you are triaging.
see

So things are not really simple to resolve here [...]

actually, they are. if you kept with the original scope. :-)
your recommendations clearly show, that you dont have a problem with this very ticket, but you are thinking about something completely different. (feel free to describe and champion this)

I conclude, that you actually have no issue with this proposed change/fix. right, @pombredanne ?

@jkowalleck
Copy link
Member Author

jkowalleck commented Mar 18, 2025

re: #454 (comment)

So either the definition of "expression" needs to change such that it does not need to be a valid SPDX license expression or we need two fields:

  • spdx_license_expression; requires all elements to be valid
  • other_license_expression: allows the expression to mix SPDX license ids (including those in LicenseRef format) and names

@mjherzog ,
this PR is about allowing multiple license-expressions along with multiple license-id/-name.
if you want something else changed, please write a ticket for that and prototype the needed changes.

PS: the time this ticket was created, the SPDX license IDs you called out as invalid were valid, and they still are - event though they were deprecated

@mrutkows
Copy link
Contributor

mrutkows commented Mar 19, 2025

I do not believe the suggested solution (proposal) is the correct approach. Declared licenses are already incorporated in individual component objects. Licenses that appear in the top-level metadata should be the concluded license(s) which should be the deterministic location for legal risk assessment. Further annotations, meaningful to other domains, can be conveyed in each licenses properties. In addition, I would love to walk through real-world license problems (seen in practice from a scanner/generator) against a real project source code or artifact that is causing this request to provide guidance on how to use the existing fields and perhaps reflect it as an anonymized use case in a guide.

@mjherzog
Copy link

@jkowalleck I understand your point. Our AboutCode license-expression library treats the deprecated SPDX license-ids as unknown licenses in general, but it would probably be a good idea to recognize cases like GPL-2.0 as deprecated aliases rather than unknown.

@jkowalleck
Copy link
Member Author

jkowalleck commented Mar 20, 2025

[...] I would love to walk through real-world license problems (seen in practice from a scanner/generator) against a real project source code or artifact that is causing this request to provide guidance on how to use the existing fields and perhaps reflect it as an anonymized use case in a guide.

the ticket's description includes reasoning and use cases already, but maybe they are too abstract.

Here are multiple real world scenarios that people are actually unable to fulfill with current CycloneDX, I was told at a conference a year ago:

  • cannot collect license evidence as SPDX expression and any additional licenses
    1. ORT users analyze license evidences in that package A, and gather license evidences. They find multiple different licenses, at least one is an SPDX expressions ("MIT", "Apache-2.0", "(MIT or Apache-2.0)")
    2. they cannot model this scenario in current CycloneDX
  • cannot have declared SPDX expression and any concluded license at the same time
    1. ORT users analyze a package B, and find declared licenses in a package manifest. These license is a SPDX expression. ("MIT or Apache-2.0")
    2. ORT users analyze a package B, and gather license evidences from package files headers. (all are "Apache-2.0")
    3. ORT user has a lawyer to decide about license situation. Lawyer concludes a license that is different from the declared SPDX expression. ("Apache-2.0")
    4. they cannot model this scenario in current CycloneDX
  • cannot have licenses and a concluded SPDX expression at the same time
    1. ORT users analyze a package C, and find multiple declared licenses in a package manifest. The licenses are no SPDX expression. (multiple Python classifiers for license - "License :: CC0 1.0 Universal (CC0 1.0) Public Domain Dedication", "License :: OSI Approved :: MIT License" which translate to individual SPDX license IDs: "CC0-1.0" , "MIT")
    2. ORT users analyze a package C, and gather license evidences from package's readme file. ("This software is made available under a dual-license model: CC0 1.0 Universal (Public Domain Dedication) or the MIT License. Users may choose whichever license is most applicable to their legal jurisdiction and needs.")
    3. ORT user has a lawyer to decide about the general license situation. Lawyer say its a general license that is a SPDX expression. ("(CC0-1.0 OR MIT)")
    4. they cannot model this scenario in current CycloneDX
  • cannot have declared SPDX expression and a concluded SPDX expression at the same time
    1. ORT users analyze a package D, and in the package manifest they find a declared SPDX expression ("GPL-2.0-only WITH Classpath-exception-2.0")
    2. ORT user has a lawyer to decide about the applied license situation. Lawyer concludes a license that is the same as the declared SPDX expression ("GPL-2.0-only WITH Classpath-exception-2.0")
    3. they cannot model this scenario in current CycloneDX
  • cannot have declared SPDX expression and any other licenses with custom properties
    --> see also [FEATURE]: "properties" on an SPDX-expression license #549

Please be aware, that some of these practical use-cases work with declared/concluded licenses,
but also with license evidences - which are unacknowledged and do not have this kind of property.

Please be aware, that CycloneDX already is able to fulfill all these practical use-cases for licenses that are not SPDX expressions. We already allow only multiple licenses in parallel.

@jkowalleck
Copy link
Member Author

jkowalleck commented Mar 29, 2025

re: #454 (comment)

  • I am fine with this as long it is clear that not two expressions can have the same "acknowledgement". For instance something like:

EITHER (a list of one or more SPDX License Expressions, each with a different acknowledgement) OR (a list of named licenses. The relationship between the licenses in this list is undefined.)

we did not have this baked into the spec, yet - it was always possible to have multiple "declared" licenses, along with multiple "concluded" licenses.
Changing this will be a breaking change, so I will create a separate ticket for this.

@Joerki
Copy link

Joerki commented Apr 6, 2025

But here is my important point (@mrutkows, you asked for the "real world"):

My impression is that at the time when CycloneDX was invented, only the given component outcome types were considered for applying licenses.

But the Linux/Unixoid ecosystem was not properly considered! Here, the authors and licenses are assigned to source code file sets that form a component. So, a proper mapping of Linux components (see e.g. Debian's machine-readable copyright files) would need to be done and thought differently.

A proper mapping would require to have optional authors/copyrights for each license item (id/name/expression), and not just author/authors or a single copyright string for a single component.

An example:
https://metadata.ftp-master.debian.org/changelogs//main/o/openssl/openssl_3.0.15-1~deb12u1_copyright
(This is a short example, others are much more longer).

Existing scanners do this incorrectly in many cases. I made experiments with Syft analyzing components that list many authors in their "copyright" files who contribute code based on different single or multi-licenses EACH.
I created SBOMs of different format (syft-json, CycloneDX, SPDX). Results were sobering: Not unique and not correct. Multi-licenses of the same author(s) were partially declared with "or" in CycloneDX and "and" in SPDX in the output. I took the same source for SBOM scanning.

The conclusion for me is that the tools cannot perform properly for license compliance if the tool authors do not have a proper understanding and SBOM formats that satisfy the needs of all significant ecosystems.

The Linux ecosystem is for me a good example that lists license lists make sense where single name/id and expressions may coexist. This is still human readable, and every license entry can be attributed with further information (like we tried with expression based extension: texts etc.). If also authors and copyright information could be specified in each atomic license item on top of the stuff already discussed, a potential loss of information or falsification in SBOMs might be reduced significantly. Tools would work more accurate after an update when programmers can have a proper mapping in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
promote to tc54 Promote to Ecma Technical Committee 54 proposed core enhancement RFC notice sent A public RFC notice was distributed to the CycloneDX mailing list for consideration RFC vote accepted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants