-
Notifications
You must be signed in to change notification settings - Fork 7
feat: convert scan code to .opossum #174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 18 commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
c362dd6
feat: Dummy cli interface for scan code files
Hellgartner 9b658aa
feat: extract opossum metadata from scancode json
abraemer acbbd90
fix: pydantic.Extra is deprecated use string literal instead
abraemer f13c6de
refactor: separate model validation from metadata creation
abraemer ecbb705
feat: convert scancode to opossum
abraemer 7587ba9
test: fix test_cli_with_multiple_files on windows?
abraemer 08a5d9d
test: some tests for the conversion[WIP]
abraemer 6e9ca39
fix: prepend paths with "/" to account for root
abraemer bba92a7
test: ensure validity of the tree of Nodes
abraemer 76e22fd
test: verify attribution mapping
abraemer 771fa4a
feat: consider only best match when generating attributions
abraemer a22b073
feat: Merge branch 'main' into feat-convert-scan-code
abraemer 9164d18
test: minor cleanup test_cli.py
abraemer 93c4580
test: improve attribution mapping test by using deepcopy
abraemer f75399e
feat(scancode): include license name in key for attribution for bette…
abraemer 5008333
test: E2E test for scancode with comparison agains a reference
abraemer 71539e4
test(scancode): create test get_attribution_info
abraemer 1e6a75d
refactor(scancode): remove dependency on Resource and go directly to …
abraemer 1415bc6
refactor: improve user-facing texts
abraemer 51ee53d
refactor: address review comments
abraemer 7d4996b
refactor: address further comments
abraemer 52e305b
refactor: Rename the LicenseDetections to improve clarity
abraemer b1e6f6b
refactor: further improvements
abraemer 5dd967f
Merge branch 'main' into feat-convert-scan-code
abraemer File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# SPDX-FileCopyrightText: TNG Technology Consulting GmbH <https://www.tngtech.com> | ||
# | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
|
||
import json | ||
import logging | ||
import sys | ||
import uuid | ||
|
||
from opossum_lib.opossum.opossum_file import ( | ||
Metadata, | ||
OpossumInformation, | ||
) | ||
from opossum_lib.scancode.model import ScanCodeData | ||
from opossum_lib.scancode.resource_tree import ( | ||
convert_to_opossum_resources, | ||
create_attribution_mapping, | ||
scancode_to_file_tree, | ||
) | ||
|
||
|
||
def convert_scancode_to_opossum(filename: str) -> OpossumInformation: | ||
logging.info(f"Converting scancode to opossum {filename}") | ||
|
||
try: | ||
with open(filename) as inp: | ||
json_data = json.load(inp) | ||
except json.JSONDecodeError as jsde: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. unusual to give errors specific acronyms. generally best to avoid acronyms altogether or use standard ones, like |
||
logging.error(f"Error decoding json for file {filename}. Message: {jsde.msg}") | ||
sys.exit(1) | ||
except UnicodeDecodeError: | ||
logging.error(f"Error decoding json for file {filename}.") | ||
sys.exit(1) | ||
|
||
scanCodeData = ScanCodeData.model_validate(json_data) | ||
abraemer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
filetree = scancode_to_file_tree(scanCodeData) | ||
resources = convert_to_opossum_resources(filetree) | ||
externalAttributions, resourcesToAttributions = create_attribution_mapping(filetree) | ||
|
||
return OpossumInformation( | ||
metadata=create_opossum_metadata(scanCodeData), | ||
resources=resources, | ||
externalAttributions=externalAttributions, | ||
resourcesToAttributions=resourcesToAttributions, | ||
attributionBreakpoints=[], | ||
externalAttributionSources={}, | ||
) | ||
|
||
|
||
def create_opossum_metadata(scancode_data: ScanCodeData) -> Metadata: | ||
if len(scancode_data.headers) == 0: | ||
logging.error("ScanCode data is missing the header!") | ||
sys.exit(1) | ||
elif len(scancode_data.headers) > 1: | ||
logging.error(f"ScanCode data has {len(scancode_data.headers)} headers!") | ||
abraemer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
sys.exit(1) | ||
|
||
the_header = scancode_data.headers[0] | ||
abraemer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
metadata = {} | ||
metadata["projectId"] = str(uuid.uuid4()) | ||
metadata["fileCreationDate"] = the_header.end_timestamp | ||
metadata["projectTitle"] = "ScanCode file" | ||
abraemer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
return Metadata.model_validate(metadata) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# SPDX-FileCopyrightText: TNG Technology Consulting GmbH <https://www.tngtech.com> | ||
# | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
|
||
import os.path | ||
|
||
from pydantic import BaseModel | ||
from pydantic_core import SchemaValidator | ||
|
||
|
||
def path_segments(path: str) -> list[str]: | ||
path = os.path.normpath(path) | ||
return path.split(os.sep) | ||
|
||
|
||
def check_schema(model: BaseModel) -> None: | ||
schema_validator = SchemaValidator(schema=model.__pydantic_core_schema__) | ||
schema_validator.validate_python(model.__dict__) |
mstykow marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,151 @@ | ||
# SPDX-FileCopyrightText: TNG Technology Consulting GmbH <https://www.tngtech.com> | ||
# | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
from __future__ import annotations | ||
|
||
from typing import Any | ||
|
||
from pydantic import BaseModel | ||
|
||
|
||
class Options(BaseModel, extra="ignore"): | ||
input: list[str] | ||
|
||
|
||
class SystemEnvironment(BaseModel): | ||
operating_system: str | ||
cpu_architecture: str | ||
platform: str | ||
platform_version: str | ||
python_version: str | ||
|
||
|
||
class ExtraData(BaseModel): | ||
system_environment: SystemEnvironment | ||
spdx_license_list_version: str | ||
files_count: int | ||
|
||
|
||
class Header(BaseModel): | ||
tool_name: str | ||
tool_version: str | ||
options: Options | ||
notice: str | ||
start_timestamp: str | ||
end_timestamp: str | ||
output_format_version: str | ||
duration: float | ||
message: Any | ||
errors: list | ||
warnings: list | ||
extra_data: ExtraData | ||
|
||
|
||
class ReferenceMatch(BaseModel): | ||
license_expression: str | ||
license_expression_spdx: str | ||
from_file: str | ||
start_line: int | ||
end_line: int | ||
matcher: str | ||
score: float | ||
matched_length: int | ||
match_coverage: float | ||
rule_relevance: int | ||
rule_identifier: str | ||
rule_url: Any | ||
|
||
|
||
class LicenseDetection(BaseModel): | ||
identifier: str | ||
license_expression: str | ||
license_expression_spdx: str | ||
detection_count: int | ||
reference_matches: list[ReferenceMatch] | ||
|
||
|
||
class Match(BaseModel): | ||
license_expression: str | ||
license_expression_spdx: str | ||
from_file: str | ||
start_line: int | ||
end_line: int | ||
matcher: str | ||
score: float | ||
matched_length: int | ||
match_coverage: float | ||
rule_relevance: int | ||
rule_identifier: str | ||
rule_url: Any | ||
|
||
|
||
class LicenseDetection1(BaseModel): | ||
abraemer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
license_expression: str | ||
license_expression_spdx: str | ||
matches: list[Match] | ||
identifier: str | ||
|
||
|
||
class Copyright(BaseModel): | ||
copyright: str | ||
start_line: int | ||
end_line: int | ||
|
||
|
||
class Holder(BaseModel): | ||
holder: str | ||
start_line: int | ||
end_line: int | ||
|
||
|
||
class Url(BaseModel): | ||
url: str | ||
start_line: int | ||
end_line: int | ||
|
||
|
||
class File(BaseModel): | ||
path: str | ||
type: str | ||
abraemer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
name: str | ||
base_name: str | ||
extension: str | ||
size: int | ||
date: str | None | ||
sha1: str | None | ||
md5: str | None | ||
sha256: str | None | ||
mime_type: str | None | ||
file_type: str | None | ||
programming_language: str | None | ||
is_binary: bool | ||
is_text: bool | ||
is_archive: bool | ||
is_media: bool | ||
is_source: bool | ||
is_script: bool | ||
package_data: list | ||
for_packages: list | ||
detected_license_expression: str | None | ||
detected_license_expression_spdx: str | None | ||
license_detections: list[LicenseDetection1] | ||
license_clues: list | ||
percentage_of_license_text: float | ||
copyrights: list[Copyright] | ||
holders: list[Holder] | ||
authors: list | ||
emails: list | ||
urls: list[Url] | ||
files_count: int | ||
dirs_count: int | ||
size_count: int | ||
scan_errors: list | ||
|
||
|
||
class ScanCodeData(BaseModel): | ||
headers: list[Header] | ||
packages: list | ||
dependencies: list | ||
license_detections: list[LicenseDetection] | ||
files: list[File] |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not new but inconsistent: here we use "opossum_files" while previously the same variable is just called "opossum".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is inconsistent. I think I would prefer to name them all like
format_files
and made that choice consistently throughout cli.py.