feat(trivy): add normalized_id for cross-tool package matching#2388
feat(trivy): add normalized_id for cross-tool package matching#2388
Conversation
Add PURL parsing, PEP 503 name normalization, and a normalized_id property to TrivyPackage nodes. This enables matching packages across tools (e.g., Trivy and Syft) despite naming differences like PyNaCl vs pynacl or jaraco.context vs jaraco-context. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kunaal Sikka <kunaal@subimage.io>
There was a problem hiding this comment.
2 issues found across 4 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="cartography/models/trivy/package.py">
<violation number="1" location="cartography/models/trivy/package.py:26">
P3: The comment documents the normalized_id format without the optional namespace, but the implementation includes `{namespace/}` when present. Update the comment to match the actual format so readers don’t build incorrect assumptions.</violation>
<violation number="2" location="cartography/models/trivy/package.py:28">
P2: Rule violated: **Tests and documentation quality**
Document the new Package.normalized_id property in docs/root/modules/trivy/schema.md (and bold it as an indexed field). The schema docs currently omit this new indexed field, violating the "Exhaustive documentation" and schema table formatting requirements.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| pkg_id: PropertyRef = PropertyRef("PkgID") | ||
| # Normalized ID for cross-tool matching (format: {type}|{normalized_name}|{version}) | ||
| # Uses PEP 503 normalization for Python packages | ||
| normalized_id: PropertyRef = PropertyRef("normalized_id", extra_index=True) |
There was a problem hiding this comment.
P2: Rule violated: Tests and documentation quality
Document the new Package.normalized_id property in docs/root/modules/trivy/schema.md (and bold it as an indexed field). The schema docs currently omit this new indexed field, violating the "Exhaustive documentation" and schema table formatting requirements.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cartography/models/trivy/package.py, line 28:
<comment>Document the new Package.normalized_id property in docs/root/modules/trivy/schema.md (and bold it as an indexed field). The schema docs currently omit this new indexed field, violating the "Exhaustive documentation" and schema table formatting requirements.</comment>
<file context>
@@ -23,6 +23,9 @@ class TrivyPackageNodeProperties(CartographyNodeProperties):
pkg_id: PropertyRef = PropertyRef("PkgID")
+ # Normalized ID for cross-tool matching (format: {type}|{normalized_name}|{version})
+ # Uses PEP 503 normalization for Python packages
+ normalized_id: PropertyRef = PropertyRef("normalized_id", extra_index=True)
lastupdated: PropertyRef = PropertyRef("lastupdated", set_in_kwargs=True)
</file context>
Address review feedback:
- Add normalized_id (indexed), purl, and pkg_id to Trivy schema docs
- Fix comment to include optional {namespace/} in the format string
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kunaal Sikka <kunaal@subimage.io>
Signed-off-by: Kunaal Sikka <kunaal@subimage.io>
There was a problem hiding this comment.
1 issue found across 4 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="cartography/intel/trivy/util.py">
<violation number="1" location="cartography/intel/trivy/util.py:50">
P1: Rule violated: **General coding rules**
Do not silently catch and suppress parsing errors here. The rule requires failures to bubble up rather than returning a fallback when parsing fails.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| try: | ||
| parsed = PackageURL.from_string(purl) | ||
| except ValueError: | ||
| return None |
There was a problem hiding this comment.
P1: Rule violated: General coding rules
Do not silently catch and suppress parsing errors here. The rule requires failures to bubble up rather than returning a fallback when parsing fails.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cartography/intel/trivy/util.py, line 50:
<comment>Do not silently catch and suppress parsing errors here. The rule requires failures to bubble up rather than returning a fallback when parsing fails.</comment>
<file context>
@@ -43,52 +44,19 @@ def parse_purl(purl: str) -> dict | None:
- # Split type from rest
- type_end = rest.find("/")
- if type_end == -1:
+ try:
+ parsed = PackageURL.from_string(purl)
+ except ValueError:
</file context>
| try: | |
| parsed = PackageURL.from_string(purl) | |
| except ValueError: | |
| return None | |
| parsed = PackageURL.from_string(purl) |
Summary
cartography/intel/trivy/util.pywith PURL parsing, PEP 503 name normalization, andmake_normalized_package_id()for creating ecosystem-aware normalized IDsnormalized_idproperty (with extra index) toTrivyPackageNodePropertiesfor cross-tool matchingnormalized_idduring Trivy scan transform in bothtransform_scan_results()andtransform_all_packages()Context
This is a prerequisite for future ontology unification. The
normalized_idformat ({type}|{namespace/}{normalized_name}|{version}) handles:Test plan
uv run pytest tests/unit/cartography/intel/trivy/test_util.py -v— 36 tests pass🤖 Generated with Claude Code