Skip to content

feat(trivy): add normalized_id for cross-tool package matching#2388

Open
kunaals wants to merge 3 commits intomasterfrom
feat/trivy-normalized-id
Open

feat(trivy): add normalized_id for cross-tool package matching#2388
kunaals wants to merge 3 commits intomasterfrom
feat/trivy-normalized-id

Conversation

@kunaals
Copy link
Collaborator

@kunaals kunaals commented Feb 14, 2026

Summary

  • Adds cartography/intel/trivy/util.py with PURL parsing, PEP 503 name normalization, and make_normalized_package_id() for creating ecosystem-aware normalized IDs
  • Adds normalized_id property (with extra index) to TrivyPackageNodeProperties for cross-tool matching
  • Populates normalized_id during Trivy scan transform in both transform_scan_results() and transform_all_packages()
  • 36 unit tests covering PURL parsing, name normalization, ID generation, and cross-tool matching scenarios (PyNaCl vs pynacl, jaraco.context vs jaraco-context, scoped npm, etc.)

Context

This is a prerequisite for future ontology unification. The normalized_id format ({type}|{namespace/}{normalized_name}|{version}) handles:

  • Case differences (PyNaCl vs pynacl)
  • Separator differences (jaraco.context vs jaraco-context) via PEP 503
  • Ecosystem conflicts (npm lodash vs pip lodash) via type prefix
  • Namespace collisions (@types/node vs node) via namespace inclusion

Test plan

  • uv run pytest tests/unit/cartography/intel/trivy/test_util.py -v — 36 tests pass
  • Verify existing Trivy integration tests still pass

🤖 Generated with Claude Code

Add PURL parsing, PEP 503 name normalization, and a normalized_id property
to TrivyPackage nodes. This enables matching packages across tools (e.g.,
Trivy and Syft) despite naming differences like PyNaCl vs pynacl or
jaraco.context vs jaraco-context.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kunaal Sikka <kunaal@subimage.io>
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 4 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="cartography/models/trivy/package.py">

<violation number="1" location="cartography/models/trivy/package.py:26">
P3: The comment documents the normalized_id format without the optional namespace, but the implementation includes `{namespace/}` when present. Update the comment to match the actual format so readers don’t build incorrect assumptions.</violation>

<violation number="2" location="cartography/models/trivy/package.py:28">
P2: Rule violated: **Tests and documentation quality**

Document the new Package.normalized_id property in docs/root/modules/trivy/schema.md (and bold it as an indexed field). The schema docs currently omit this new indexed field, violating the "Exhaustive documentation" and schema table formatting requirements.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

pkg_id: PropertyRef = PropertyRef("PkgID")
# Normalized ID for cross-tool matching (format: {type}|{normalized_name}|{version})
# Uses PEP 503 normalization for Python packages
normalized_id: PropertyRef = PropertyRef("normalized_id", extra_index=True)
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Rule violated: Tests and documentation quality

Document the new Package.normalized_id property in docs/root/modules/trivy/schema.md (and bold it as an indexed field). The schema docs currently omit this new indexed field, violating the "Exhaustive documentation" and schema table formatting requirements.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cartography/models/trivy/package.py, line 28:

<comment>Document the new Package.normalized_id property in docs/root/modules/trivy/schema.md (and bold it as an indexed field). The schema docs currently omit this new indexed field, violating the "Exhaustive documentation" and schema table formatting requirements.</comment>

<file context>
@@ -23,6 +23,9 @@ class TrivyPackageNodeProperties(CartographyNodeProperties):
     pkg_id: PropertyRef = PropertyRef("PkgID")
+    # Normalized ID for cross-tool matching (format: {type}|{normalized_name}|{version})
+    # Uses PEP 503 normalization for Python packages
+    normalized_id: PropertyRef = PropertyRef("normalized_id", extra_index=True)
     lastupdated: PropertyRef = PropertyRef("lastupdated", set_in_kwargs=True)
 
</file context>
Fix with Cubic

kunaals and others added 2 commits February 13, 2026 19:19
Address review feedback:
- Add normalized_id (indexed), purl, and pkg_id to Trivy schema docs
- Fix comment to include optional {namespace/} in the format string

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kunaal Sikka <kunaal@subimage.io>
Signed-off-by: Kunaal Sikka <kunaal@subimage.io>
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="cartography/intel/trivy/util.py">

<violation number="1" location="cartography/intel/trivy/util.py:50">
P1: Rule violated: **General coding rules**

Do not silently catch and suppress parsing errors here. The rule requires failures to bubble up rather than returning a fallback when parsing fails.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment on lines +50 to +53
try:
parsed = PackageURL.from_string(purl)
except ValueError:
return None
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Rule violated: General coding rules

Do not silently catch and suppress parsing errors here. The rule requires failures to bubble up rather than returning a fallback when parsing fails.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cartography/intel/trivy/util.py, line 50:

<comment>Do not silently catch and suppress parsing errors here. The rule requires failures to bubble up rather than returning a fallback when parsing fails.</comment>

<file context>
@@ -43,52 +44,19 @@ def parse_purl(purl: str) -> dict | None:
-    # Split type from rest
-    type_end = rest.find("/")
-    if type_end == -1:
+    try:
+        parsed = PackageURL.from_string(purl)
+    except ValueError:
</file context>
Suggested change
try:
parsed = PackageURL.from_string(purl)
except ValueError:
return None
parsed = PackageURL.from_string(purl)
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant