diff --git a/docs/adrs/00002-analysis-graph.md b/docs/adrs/00002-analysis-graph.md new file mode 100644 index 000000000..344ffa2a1 --- /dev/null +++ b/docs/adrs/00002-analysis-graph.md @@ -0,0 +1,235 @@ +# 00002. Analysis graph API in Trustify + +Date: 2025-01-23 + +## Status + +DRAFT + +## Context + +This ADR is an addenda to [previous ADR](00001-graph-analytics.md) as an attempt to clarify the differences between the graph +relationships we capture and the view we want to create from the forest of graphs. + +Ingesting an sbom captures a set of trustify relationships, which are instantiated +in the forest of graphs as; + +```mermaid +graph TD + PackageA -->|CONTAINS| PackageOther + PackageD -->|CONTAINED_BY| PackageA + PackageA -->|DEPENDS_ON| PackageB + PackageB -->|DEPENDENCY_OF| PackageA + SBOMDOC1 -->|DESCRIBES| PackageA + UpstreamComponent -->|ANCESTOR_OF| PackageA + image.arch1 -->|VARIANT_OF| ImageIndex1 + image.arch2 -->|VARIANT_OF| ImageIndex1 + SBOMDOC2 -->|DESCRIBES| ImageIndex1 + + SBOMDOC3 -->|DESCRIBES| srpm_component + binarycomponent1 -->|GENERATED_FROM| srpm_component + binarycomponent2 -->|GENERATED_FROM| srpm_component +``` + +Trustify relationships attempt to put an abstraction over relationships +defined by any format of sbom (eg. cyclonedx, spdx). + +This graph encapsulates provenance of sbom relationship though end users are unlikely +to directly navigate such graphs, as it would mean forcing concrete understanding of relationship directionality. +Even if we were to _normalise_ all relationships (similar to the rewrite of all CONTAINS into CONTAINED_BY) the +resultant tree is still not quite right ... it is a good practice to keep logical model separate and not try to +overload that model to serve as conceptual model. + +The `api/v2/analysis` endpoints are responsible for building up the conceptual view. Where we want to query, filter and +traverse on the following. + +```mermaid +graph TD + SBOMDOC1 -->|DESCRIBES| PackageA + PackageA -->|CONTAINS| PackageOther + PackageA -->|CONTAINS| PackageD + PackageA -->|DEPENDS| PackageB + + SBOMDOC2 -->|DESCRIBES| ImageIndex1 + UpstreamComponent -->|ANCESTOR_OF| PackageA + ImageIndex1 -->|VARIANT| image.arch1 + ImageIndex1 -->|VARIANT| image.arch2 + + SBOMDOC3 -->|DESCRIBES| srpm_component + srpm_component -->|GENERATES| binarycomponent1 + srpm_component -->|GENERATES| binarycomponent2 +``` + +It is a feature that this conceptual model spans beyond traversal of just transitive software dependencies. + +For example, searching for any node in above view, should let us traverse ancestors and descendents ... a +few illustrative examples: + +**Search for 'PackageA'** +* component ancestors would be `[UpstreamComponent]` +* component descendents would be the tree underneath 'PackageA' `[PackageOther,PackageD,PackageB]` + +**Search for 'image.arch1'** +* component ancestors would be `[ImageIndex1]` +* component descendents would be `[]` + +_Note: every node in the graph already knows its relationship to original SBOM so no need +to enumerate DESCRIBES relationship ... though in the future we may see other artifacts (eg. sbom) +DESCRIBES._ + +We should make it easy to visualise this conceptual model direct from the endpoints (ex. Accept: image/svg +would pull down an svg representation). + +## Decision + +### Implement `api/v2/analysis/component` + +payload returns immediate ancestor/descendent relations (eg. 'one deep') +```json +{ + "sbom_id": "", + "node_id": "", + "purl": [ + "" + ], + "cpe": [], + "name": "PackageA", + "version": "", + "published": "2024-12-19 18:04:12+00", + "document_id": "urn:uuid:537c8dc3-6f66-3cac-b504-cc5fb0a09ece", + "product_name": "", + "product_version": "", + "ancestor": [ + { + "link": "http://localhost:8080/api/v2/component/{purl}", + "sbom_id": "", + "node_id": "", + "relationship": "AncestorOf", + "purl": [ + "" + ], + "cpe": [], + "name": "UpstreamPackage", + "version": "" + } + ], + "descendent": [ + { + "link": "http://localhost:8080/api/v2/component/{purl}", + "sbom_id": "", + "node_id": "", + "relationship": "Variant", + "purl": [ + "" + ], + "cpe": [], + "name": "PackageC", + "version": "" + } + ] +} +} +``` + +Where items in `ancestor` array imply _Ancestor component_ VIEWRELATION _Searched component_ ... in the above example that would +mean _UpstreamPackage_ **AncestorOF** _PackageA_ + +Where items in `descendent` array imply _Searched component_ VIEWRELATION _Descendent component_ ... in the above example that would +mean _PackageA_ **Variant** _PackageC_ + +This endpoint provides the following url params with the following defaults +- depth=1 +- show_ancestor=True +- show_descendent=True +- relationshipType=* + +### Implement `api/v2/analysis/ancestor` + payload returns all ancestor relations +```json +{ + "sbom_id": "", + "node_id": "", + "purl": [ + "" + ], + "cpe": [], + "name": "", + "version": "", + "published": "2024-12-19 18:04:12+00", + "document_id": "urn:uuid:537c8dc3-6f66-3cac-b504-cc5fb0a09ece", + "product_name": "", + "product_version": "", + "ancestor": [ + { + "sbom_id": "", + "node_id": "", + "relationship": "ANCESTOR_OF", + "purl": [ + "" + ], + "cpe": [], + "name": "", + "version": "" + }, {} .... + ] +} +``` + +The `ancestor` array contains a list of ancestors with the last item +in the list being considered the _root component_. + +* Implement `api/v2/analysis/descendent` +returns all descendent relations +```json +{ + "sbom_id": "", + "node_id": "", + "purl": [ + "" + ], + "cpe": [], + "name": "", + "version": "", + "published": "2024-12-19 18:04:12+00", + "document_id": "urn:uuid:537c8dc3-6f66-3cac-b504-cc5fb0a09ece", + "product_name": "", + "product_version": "", + "descendent": [ + { + "sbom_id": "", + "node_id": "", + "relationship": "Variant", + "purl": [ + "" + ], + "cpe": [], + "name": "", + "version": "", + "descendent": [{} ....] + }, {} .... + ] +} +``` +The `descendent` array contains a list of descendents with each component also containing any nested descendents. + + +### Document analysis graph API + + +## Alternative approaches + +**Directly use graphs:** It is likely that we will provide raw interface to the graphs (aka `api/v2/analysis/relationship`) though +we do not want to move responsibility of building up the 'view' to a client so still need to provide the other endpoints for that. + +**Build a new graph representing the conceptual model:** As graphs do not mutate, its not so far fetched to +consider additionally generating a conceptual graph. It might be something we consider as an optimisation in +the future though for now thinking it would be good to avoid the cost (ram, memory). The conceptual graph model might be +considered a replacement for logical model though that would be flawed thinking as we always need the logical +model to tell us relationship provenance eg. the logical model is absolutely required. + +## Consequences + +* I would rather not create a whole new set of View relationships (as I have outlined above) ... maybe there is a way to +present relationship not as a pure scalar +* Having a clear conceptual model will reduce cognitive load of having to mentally reparse graph relations +* Align conceptual model means we can also do neat stuff like generate visual representations (mermaid, svg, etc)