Skip to content

Conversation

@dejanb
Copy link
Contributor

@dejanb dejanb commented Aug 4, 2025

This commit significantly improves the performance of graph analysis by applying parallelization techniques to the run_graph_query and collect_graph functions.

Previously, the collection of ancestors and descendants for a given node was performed sequentially. By refactoring run_graph_query to use futures::join!, we now process both directions concurrently, reducing the overall
execution time.

Building on this, the collect_graph function was optimized to process all discovered nodes in parallel using join_all. This ensures that we leverage available resources more efficiently when analyzing multiple entry points in
the graph.

Assisted-by: Gemini

Summary by Sourcery

Optimize graph analysis performance by refactoring query and collection routines to execute traversals and node processing in parallel using futures::join! and join_all.

Enhancements:

  • Parallelize run_graph_query to concurrently collect ancestors and descendants using futures::join!
  • Process all candidate nodes in collect_graph concurrently across graphs with join_all instead of sequential iteration
  • Refactor DiscoveredTracker to use tokio::sync::Mutex and make visit asynchronous for non-blocking locking
  • Modify collector.collect_graph to traverse edges in parallel with join_all and concurrent futures
  • Update collect_graph signature to accept async closure factories for node creation

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Aug 4, 2025

Reviewer's Guide

This PR refactors the graph analysis pipeline to exploit async concurrency: collect_graph now batches node‐processing futures and uses join_all, per‐node ancestor/descendant queries leverage futures::join!, Collector.collect_graph processes child edges in parallel, and DiscoveredTracker switches to an async Mutex for efficient locking.

Sequence diagram for parallelized node processing in collect_graph

sequenceDiagram
    participant AnalysisService
    participant Graphs
    participant NodeFutures
    participant join_all
    participant Node

    AnalysisService->>Graphs: Iterate over graphs
    Graphs->>NodeFutures: For each node, create future (async closure)
    NodeFutures->>join_all: Batch all node futures
    join_all->>Node: Execute all node futures in parallel
    Node-->>join_all: Return processed Node
    join_all-->>AnalysisService: Return Vec<Node> (all nodes processed in parallel)
Loading

Updated class diagram for DiscoveredTracker and Collector with async changes

classDiagram
    class DiscoveredTracker {
        +Arc<tokio::sync::Mutex<HashMap<*const NodeGraph, FixedBitSet>>> cache
        +async fn visit(&self, graph: &NodeGraph, node: NodeIndex) -> bool
    }

    class Collector {
        +async fn collect_graph(&self) -> Vec<Node>
    }

    DiscoveredTracker <.. Collector : used by
Loading

File-Level Changes

Change Details Files
Refactor collect_graph to batch node‐processing futures and use join_all
  • Updated collect_graph signature to accept a Fn that returns a Future and introduced a Fut generic
  • Replaced stream.then(...) with stream.map(...) to build a Vec of futures
  • Applied join_all on collected futures to execute node creation concurrently
modules/analysis/src/service/mod.rs
Parallelize per‐node ancestor and descendant collection using futures::join!
  • Rewrote closure passed to collect_graph as an async move block capturing necessary clones
  • Separated ancestors and descendants collectors into futures and applied futures::join!
  • Constructed Node instances using the concurrently gathered results
modules/analysis/src/service/mod.rs
Parallelize Collector.collect_graph by batching edge traversal with join_all
  • Collected graph edges into a Vec then mapped each to an async recursion future
  • Applied join_all to run all child-collector futures concurrently
  • Assembled final_result by iterating over concurrent outcomes
modules/analysis/src/service/collector.rs
Convert DiscoveredTracker to use async Mutex and async visit
  • Replaced parking_lot::Mutex with tokio::sync::Mutex
  • Changed visit method to be async and awaited the lock acquisition
modules/analysis/src/service/collector.rs

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dejanb - I've reviewed your changes - here's some feedback:

  • Consider bounding the number of concurrent tasks in collect_graph (for example with buffer_unordered or a semaphore) to avoid spawning too many futures at once on large graphs.
  • Switching DiscoveredTracker to tokio::sync::Mutex may introduce contention under high parallelism; you might explore lock-free data structures or an RwLock to reduce contention.
  • Rather than collecting all edge futures into a Vec and then join_all, using FuturesUnordered would let you process results as they complete and reduce peak memory usage.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider bounding the number of concurrent tasks in collect_graph (for example with buffer_unordered or a semaphore) to avoid spawning too many futures at once on large graphs.
- Switching DiscoveredTracker to tokio::sync::Mutex may introduce contention under high parallelism; you might explore lock-free data structures or an RwLock to reduce contention.
- Rather than collecting all edge futures into a Vec and then join_all, using FuturesUnordered would let you process results as they complete and reduce peak memory usage.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@dejanb
Copy link
Contributor Author

dejanb commented Aug 4, 2025

/scale-test

@github-actions
Copy link

github-actions bot commented Aug 4, 2025

🛠️ Scale test has started! Follow the progress here: Workflow Run

@codecov
Copy link

codecov bot commented Aug 4, 2025

Codecov Report

❌ Patch coverage is 84.34783% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.29%. Comparing base (5a4c127) to head (d8634d4).
⚠️ Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
modules/analysis/src/service/collector.rs 84.21% 10 Missing and 2 partials ⚠️
modules/analysis/src/config.rs 50.00% 1 Missing and 2 partials ⚠️
modules/analysis/src/service/mod.rs 90.90% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1911      +/-   ##
==========================================
+ Coverage   68.14%   68.29%   +0.14%     
==========================================
  Files         365      367       +2     
  Lines       23123    23247     +124     
  Branches    23123    23247     +124     
==========================================
+ Hits        15757    15876     +119     
- Misses       6485     6488       +3     
- Partials      881      883       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

) -> Vec<Node>
where
C: AsyncFn(&Graph<graph::Node, Relationship>, NodeIndex, &graph::Node) -> Node,
F: Fn(&'g Graph<graph::Node, Relationship>, NodeIndex, &'g graph::Node) -> Fut + Clone,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use AsyncFn here? Instead of the old style stuff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature is changed as the function is now moved in the closure so it needed more lifetimes specified. After wrestling with it for a while, this is the best I could come up with

@dejanb
Copy link
Contributor Author

dejanb commented Aug 4, 2025

The changes in this commit improved performance of the analysis graph tenfold for the large sboms.

In particular this call on the load tests dataset went

time curl "http://localhost:8080/api/v2/analysis/component/cpe%3A%2Fa%3Aredhat%3Aopenstack%3A13%3A%3Ael7?descendants=10&limit=20&offset=0"

from

Executed in  671.87 secs - sbom not in the cache
Executed in  651.17 secs - sbom cached

to

Executed in   49.15 secs - sbom not in the cache
Executed in   23.03 secs - sbom cached

For smaller sboms, the difference is not that significant when the sbom is not in the cache, but it's also noticeable when it's it already ion the memory.

time curl "http://localhost:8080/api/v2/analysis/latest/component/cpe%3A%2Fa%3Aredhat%3Aquarkus%3A2.13%3A%3Ael8?descendants=10&limit=20&offset=0"

from

Executed in    3.37 secs
Executed in    2.72 secs

to

Executed in    3.78 secs
Executed in  476.32 millis

@dejanb dejanb requested a review from JimFuller-RedHat August 4, 2025 11:08
@dejanb
Copy link
Contributor Author

dejanb commented Aug 4, 2025

I'll go next and examine sourcery (and ctron's :) ) reviews

@github-actions
Copy link

github-actions bot commented Aug 4, 2025

Goose Report

Goose Attack Report

Plan Overview

Action Started Stopped Elapsed Users
Increasing 25-08-04 11:22:01 25-08-04 11:22:06 00:00:05 0 → 5
Maintaining 25-08-04 11:22:06 25-08-04 11:27:06 00:05:00 5
Decreasing 25-08-04 11:27:06 25-08-04 11:27:07 00:00:01 0 ← 5

Request Metrics

Method Name # Requests # Fails Average (ms) Min (ms) Max (ms) RPS Failures/s
GET get_advisory_by_doc_id 119 (+43) 0 12.61 (-0.86) 3 (0) 59 (0) 0.40 (+0.14) 0.00 (+0.00)
GET get_analysis_latest_cpe 121 (+41) 0 117.79 (-5.49) 35 (-6) 269 (+46) 0.40 (+0.14) 0.00 (+0.00)
GET get_analysis_status 121 (+41) 0 5.19 (-1.16) 1 (0) 54 (+3) 0.40 (+0.14) 0.00 (+0.00)
GET get_sbom[sha256:720e4451…a939656247164447] 121 (+41) 0 839.74 (-256.28) 206 (+20) 2270 (-1357) 0.40 (+0.14) 0.00 (+0.00)
GET get_sbom_license_ids[urn:uuid:019731…104-331632a21144] 119 (+40) 0 872.92 (-81.35) 512 (+108) 1220 (-155) 0.40 (+0.13) 0.00 (+0.00)
GET list_advisory 119 (+43) 0 466.03 (-13.62) 244 (+105) 898 (+71) 0.40 (+0.14) 0.00 (+0.00)
GET list_advisory_paginated 119 (+43) 0 409.30 (-3.51) 238 (+31) 1688 (+487) 0.40 (+0.14) 0.00 (+0.00)
GET list_importer 118 (+42) 0 4.45 (-1.41) 1 (0) 49 (-6) 0.39 (+0.14) 0.00 (+0.00)
GET list_organizations 119 (+43) 0 8.97 (-3.27) 1 (0) 47 (-1) 0.40 (+0.14) 0.00 (+0.00)
GET list_packages 118 (+42) 0 404.94 (+54.28) 133 (+26) 1005 (+160) 0.39 (+0.14) 0.00 (+0.00)
GET list_packages_paginated 118 (+42) 0 366.36 (+30.70) 185 (+81) 620 (+110) 0.39 (+0.14) 0.00 (+0.00)
GET list_products 123 (+42) 0 11.97 (+2.47) 2 (0) 63 (+4) 0.41 (+0.14) 0.00 (+0.00)
GET list_sboms 123 (+42) 0 1374.61 (+100.62) 564 (-28) 2015 (-141) 0.41 (+0.14) 0.00 (+0.00)
GET list_sboms_paginated 121 (+41) 0 1700.79 (-1212.60) 437 (-64) 4392 (-2921) 0.40 (+0.14) 0.00 (+0.00)
GET list_vulnerabilities 118 (+42) 0 239.83 (-10.72) 97 (+1) 381 (-95) 0.39 (+0.14) 0.00 (+0.00)
GET list_vulnerabilities_paginated 118 (+42) 0 198.99 (+22.44) 92 (+24) 365 (+62) 0.39 (+0.14) 0.00 (+0.00)
GET sbom_by_package[pkg:maven/io.qu…dhat.com%2fga%2f] 119 (+40) 0 70.46 (-7.82) 12 (+2) 196 (-10) 0.40 (+0.13) 0.00 (+0.00)
GET search_advisory 118 (+42) 0 1049.02 (+302.94) 384 (+113) 2518 (+132) 0.39 (+0.14) 0.00 (+0.00)
GET search_exact_purl 123 (+42) 0 12.33 (+3.70) 2 (0) 58 (-1) 0.41 (+0.14) 0.00 (+0.00)
GET search_purls 123 (+42) 0 3699.33 (-5324.54) 1708 (-94) 8672 (-12986) 0.41 (+0.14) 0.00 (+0.00)
POST post_vulnerability_analyze[pkg:rpm/redhat/…h=noarch&epoch=1] 119 (+41) 0 639.94 (-18.78) 313 (+146) 1263 (+181) 0.40 (+0.14) 0.00 (+0.00)
Aggregated 2517 (+877) 0 600.56 (-320.91) 1 (0) 8672 (-12986) 8.39 (+2.92) 0.00 (+0.00)

Response Time Metrics

Method Name 50%ile (ms) 60%ile (ms) 70%ile (ms) 80%ile (ms) 90%ile (ms) 95%ile (ms) 99%ile (ms) 100%ile (ms)
GET get_advisory_by_doc_id 6 (0) 8 (0) 9 (-1) 12 (-1) 44 (-6) 52 (-3) 59 (+4) 59 (0)
GET get_analysis_latest_cpe 110 (-10) 110 (-10) 120 (-20) 170 (0) 180 (0) 180 (-10) 220 (+20) 269 (+49)
GET get_analysis_status 3 (+1) 3 (0) 4 (0) 4 (-1) 6 (-4) 9 (-36) 52 (+3) 54 (+3)
GET get_sbom[sha256:720e4451…a939656247164447] 600 (+180) 800 (+330) 1,000 (+400) 2,000 (-1,000) 2,000 (-1,000) 2,000 (-1,000) 2,000 (-1,000) 2,000 (-1,627)
GET get_sbom_license_ids[urn:uuid:019731…104-331632a21144] 900 (-100) 900 (-100) 900 (-100) 1,000 (0) 1,000 (0) 1,000 (0) 1,000 (0) 1,000 (0)
GET list_advisory 460 (-20) 480 (-20) 490 (-10) 500 (-100) 600 (0) 600 (-100) 800 (0) 898 (+98)
GET list_advisory_paginated 390 (-10) 400 (-10) 410 (-20) 450 (-30) 500 (0) 500 (-100) 600 (-100) 1,688 (+688)
GET list_importer 2 (0) 3 (0) 3 (0) 4 (0) 6 (-1) 9 (-30) 46 (-6) 49 (-6)
GET list_organizations 3 (-1) 4 (-1) 6 (-2) 9 (-23) 31 (-11) 37 (-8) 42 (-4) 47 (-1)
GET list_packages 380 (+30) 400 (+20) 420 (+30) 470 (+70) 500 (+90) 600 (+120) 800 (+100) 1,000 (+200)
GET list_packages_paginated 370 (+10) 390 (+10) 400 (+20) 410 (+20) 460 (+60) 470 (+40) 500 (+30) 600 (+100)
GET list_products 6 (-2) 7 (-2) 8 (-1) 9 (-1) 45 (+32) 54 (+35) 60 (+3) 63 (+4)
GET list_sboms 1,000 (0) 1,000 (0) 2,000 (+1,000) 2,000 (0) 2,000 (0) 2,000 (0) 2,000 (0) 2,000 (0)
GET list_sboms_paginated 2,000 (-1,000) 2,000 (-1,000) 2,000 (-2,000) 2,000 (-2,000) 3,000 (-2,000) 3,000 (-3,000) 3,000 (-4,000) 4,000 (-3,000)
GET list_vulnerabilities 230 (+10) 250 (+30) 270 (+10) 290 (-30) 310 (-80) 320 (-110) 360 (-110) 380 (-96)
GET list_vulnerabilities_paginated 200 (+20) 200 (+10) 210 (+10) 210 (0) 260 (-10) 280 (0) 350 (+60) 365 (+65)
GET sbom_by_package[pkg:maven/io.qu…dhat.com%2fga%2f] 63 (-9) 71 (-3) 93 (+6) 120 (-30) 160 (-20) 180 (-10) 180 (-20) 196 (-10)
GET search_advisory 900 (+200) 1,000 (+300) 1,000 (+200) 1,000 (+200) 2,000 (+1,000) 2,000 (+1,000) 2,000 (0) 2,518 (+518)
GET search_exact_purl 6 (0) 6 (-1) 7 (0) 9 (+1) 50 (+41) 54 (+39) 57 (0) 58 (-1)
GET search_purls 3,000 (0) 3,000 (-7,000) 5,000 (-14,000) 6,000 (-14,000) 7,000 (-14,000) 7,000 (-14,658) 8,000 (-13,658) 8,672 (-12,986)
POST post_vulnerability_analyze[pkg:rpm/redhat/…h=noarch&epoch=1] 600 (0) 700 (0) 700 (0) 800 (0) 900 (0) 900 (0) 900 (-100) 1,000 (0)
Aggregated 310 (+10) 410 (+10) 500 (0) 900 (0) 2,000 (0) 2,000 (-1,000) 6,000 (-14,000) 8,672 (-12,986)

Status Code Metrics

Method Name Status Codes
GET get_advisory_by_doc_id 119 [200]
GET get_analysis_latest_cpe 121 [200]
GET get_analysis_status 121 [200]
GET get_sbom[sha256:720e4451…a939656247164447] 121 [200]
GET get_sbom_license_ids[urn:uuid:019731…104-331632a21144] 119 [200]
GET list_advisory 119 [200]
GET list_advisory_paginated 119 [200]
GET list_importer 118 [200]
GET list_organizations 119 [200]
GET list_packages 118 [200]
GET list_packages_paginated 118 [200]
GET list_products 123 [200]
GET list_sboms 123 [200]
GET list_sboms_paginated 121 [200]
GET list_vulnerabilities 118 [200]
GET list_vulnerabilities_paginated 118 [200]
GET sbom_by_package[pkg:maven/io.qu…dhat.com%2fga%2f] 119 [200]
GET search_advisory 118 [200]
GET search_exact_purl 123 [200]
GET search_purls 123 [200]
POST post_vulnerability_analyze[pkg:rpm/redhat/…h=noarch&epoch=1] 119 [200]
Aggregated 2,517 [200]

Transaction Metrics

Transaction # Times Run # Fails Average (ms) Min (ms) Max (ms) RPS Failures/s
WebsiteUser
0.0 logon 0 (0) 0 (0) 0.00 (+0.00) 0 (0) 0 (0) 0.00 (+0.00) 0.00 (+0.00)
0.1 website_index 0 (0) 0 (0) 0.00 (+0.00) 0 (0) 0 (0) 0.00 (+0.00) 0.00 (+0.00)
0.2 website_openapi 0 (0) 0 (0) 0.00 (+0.00) 0 (0) 0 (0) 0.00 (+0.00) 0.00 (+0.00)
0.3 website_sboms 0 (0) 0 (0) 0.00 (+0.00) 0 (0) 0 (0) 0.00 (+0.00) 0.00 (+0.00)
0.4 website_packages 0 (0) 0 (0) 0.00 (+0.00) 0 (0) 0 (0) 0.00 (+0.00) 0.00 (+0.00)
0.5 website_advisories 0 (0) 0 (0) 0.00 (+0.00) 0 (0) 0 (0) 0.00 (+0.00) 0.00 (+0.00)
0.6 website_importers 0 (0) 0 (0) 0.00 (+0.00) 0 (0) 0 (0) 0.00 (+0.00) 0.00 (+0.00)
RestAPIUser
1.0 logon 119 (+42) 0 (0) 14.02 (+0.24) 6 (-1) 25 (+3) 0.40 (+0.14) 0.00 (+0.00)
1.1 list_organizations 119 (+43) 0 (0) 9.20 (-3.17) 1 (0) 48 (0) 0.40 (+0.14) 0.00 (+0.00)
1.2 list_advisory 119 (+43) 0 (0) 466.12 (-13.61) 244 (+105) 898 (+71) 0.40 (+0.14) 0.00 (+0.00)
1.3 list_advisory_paginated 119 (+43) 0 (0) 409.39 (-3.50) 238 (+31) 1688 (+487) 0.40 (+0.14) 0.00 (+0.00)
1.4 get_advisory_by_doc_id 119 (+43) 0 (0) 12.64 (-0.87) 3 (0) 59 (-2) 0.40 (+0.14) 0.00 (+0.00)
1.5 search_advisory 118 (+42) 0 (0) 1049.09 (+302.96) 385 (+114) 2518 (+132) 0.39 (+0.14) 0.00 (+0.00)
1.6 list_vulnerabilities 118 (+42) 0 (0) 239.90 (-10.72) 97 (+1) 381 (-95) 0.39 (+0.14) 0.00 (+0.00)
1.7 list_vulnerabilities_paginated 118 (+42) 0 (0) 199.03 (+22.42) 92 (+24) 366 (+63) 0.39 (+0.14) 0.00 (+0.00)
1.8 list_importer 118 (+42) 0 (0) 4.51 (-1.36) 1 (0) 49 (-6) 0.39 (+0.14) 0.00 (+0.00)
1.9 list_packages 118 (+42) 0 (0) 405.00 (+54.30) 133 (+26) 1005 (+160) 0.39 (+0.14) 0.00 (+0.00)
1.10 list_packages_paginated 118 (+42) 0 (0) 366.42 (+30.71) 185 (+81) 620 (+110) 0.39 (+0.14) 0.00 (+0.00)
1.11 search_purls 123 (+42) 0 (0) 3699.39 (-5324.54) 1708 (-94) 8672 (-12986) 0.41 (+0.14) 0.00 (+0.00)
1.12 search_exact_purl 123 (+42) 0 (0) 12.36 (+3.65) 2 (0) 58 (-1) 0.41 (+0.14) 0.00 (+0.00)
1.13 list_products 123 (+42) 0 (0) 12.01 (+2.51) 2 (0) 63 (+4) 0.41 (+0.14) 0.00 (+0.00)
1.14 list_sboms 123 (+42) 0 (0) 1374.68 (+100.67) 564 (-28) 2015 (-141) 0.41 (+0.14) 0.00 (+0.00)
1.15 list_sboms_paginated 121 (+41) 0 (0) 1700.83 (-1212.62) 437 (-64) 4392 (-2921) 0.40 (+0.14) 0.00 (+0.00)
1.16 get_analysis_status 121 (+41) 0 (0) 5.25 (-1.16) 1 (0) 54 (+3) 0.40 (+0.14) 0.00 (+0.00)
1.17 get_analysis_latest_cpe 121 (+41) 0 (0) 117.82 (-5.52) 35 (-6) 269 (+46) 0.40 (+0.14) 0.00 (+0.00)
1.18 get_sbom[sha256:720e4451…a939656247164447] 121 (+41) 0 (0) 839.79 (-256.32) 206 (+20) 2270 (-1357) 0.40 (+0.14) 0.00 (+0.00)
1.19 sbom_by_package[pkg:maven/io.qu…dhat.com%2fga%2f] 119 (+40) 0 (0) 70.55 (-7.78) 12 (+2) 196 (-10) 0.40 (+0.13) 0.00 (+0.00)
1.20 get_sbom_license_ids[urn:uuid:019731…104-331632a21144] 119 (+40) 0 (0) 872.99 (-81.34) 512 (+108) 1220 (-155) 0.40 (+0.13) 0.00 (+0.00)
1.21 post_vulnerability_analyze[pkg:rpm/redhat/…h=noarch&epoch=1] 119 (+41) 0 (0) 639.97 (-18.81) 313 (+145) 1263 (+181) 0.40 (+0.14) 0.00 (+0.00)
Aggregated 2636 (+919) 0 (0) 573.45 (-306.70) 1 (0) 8672 (-12986) 8.79 (+3.06) 0.00 (+0.00)

Scenario Metrics

Transaction # Users # Times Run Average (ms) Min (ms) Max (ms) Scenarios/s Iterations
WebsiteUser 0 (0) 0 (0) 0.00 (+0.00) 0 (0) 0 (0) 0.00 (+0.00) 0.00 (+0.00)
RestAPIUser 5 (0) 119 (+42) 12492.84 (-6715.17) 8772 (-116) 18296 (-14538) 0.40 (+0.14) 23.80 (+8.40)
Aggregated 5 (0) 119 (+42) 12492.84 (-6715.17) 8772 (-116) 18296 (-14538) 0.40 (+0.14) 23.80 (+8.40)

📄 Full Report (Go to "Artifacts" and download report)

@ctron
Copy link
Contributor

ctron commented Aug 4, 2025

My concern with this change (I know I always have concerns) is that there's an unbounded fork of tasks. Sure that will improve performance, but it might also spike resource consumption. Question is, can we somehow limit this? And make this limit configurable?

@JimFuller-RedHat
Copy link
Contributor

cool - a few observations:

  • I think the initial dev approach was to do things serially first for simplicity ... its a good time to optimise
  • I am not sure http://localhost:8080/api/v2/analysis/component would ever be used in 'real life' or maybe I am mistaken and this is a filed perf issue ... otherwise the latest variants are where the 'red meat' is ... hoping some day the ux will take advantage of that view
  • if the perf optimisation is observed with the latest endpoints then I think this is worth considering though as observed by @ctron this is going to increase mem usage ... is it worth the optimisation trade off ... unsure ... we want to ensure we have enough memory for in mem graph cache first

@dejanb dejanb force-pushed the graph-performance branch 3 times, most recently from 1f675a9 to a5a3d21 Compare August 6, 2025 12:24
@dejanb
Copy link
Contributor Author

dejanb commented Aug 6, 2025

/scale-test

@github-actions
Copy link

github-actions bot commented Aug 6, 2025

🛠️ Scale test has started! Follow the progress here: Workflow Run

@dejanb
Copy link
Contributor Author

dejanb commented Aug 6, 2025

The reworked implementation now uses buffer_unordered() instead of join_all() and bound the parallelisation. I did some manual tests with different values to find a sweet spot for performance/peak memory usage.

The test with

time curl "http://localhost:8080/api/v2/analysis/latest/component/cpe%3A%2Fa%3Aredhat%3Aopenstack%3A13%3A%3Ael7?descendants=10&limit=20&offset=0"

against load testing dataset.

Concurrency Time (s) Peak Memory (MB)
before 638.96 413.2
1 705.16 370.7
2 351.76 407.5
5 139.47 424.1
10 101.33 444.2
100 92.81 673.3
1000 62.72 1,173.1
10000 52.41 1,390.1

So, I picked 10 a default value as it improves performances significantly without blowing up the memory. The value is configurable thought environment variable if anyone need more performances and have resources to support it. There's a few more things we could improve in the future, but it's not critical to do it now.

It'd be great if we could periodically run memory profiling during load/scaling tests in the future.

@dejanb
Copy link
Contributor Author

dejanb commented Aug 6, 2025

  • I am not sure http://localhost:8080/api/v2/analysis/component would ever be used in 'real life' or maybe I am mistaken and this is a filed perf issue ... otherwise the latest variants are where the 'red meat' is ... hoping some day the ux will take advantage of that view

@JimFuller-RedHat I tested both latest and non-latest variant. The behaviour is the same as loading the sboms is not an issue, but querying graphs once loaded. Since your comment I tested exclusively with the latest API for consistence.

@dejanb dejanb requested a review from ctron August 6, 2025 13:27
Copy link
Contributor

@JimFuller-RedHat JimFuller-RedHat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice work ! LGTM

@github-actions
Copy link

github-actions bot commented Aug 6, 2025

Goose Report

Goose Attack Report

Plan Overview

Action Started Stopped Elapsed Users
Increasing 25-08-06 13:54:13 25-08-06 13:54:18 00:00:05 0 → 5
Maintaining 25-08-06 13:54:18 25-08-06 13:59:18 00:05:00 5
Decreasing 25-08-06 13:59:18 25-08-06 13:59:19 00:00:01 0 ← 5

Request Metrics

Method Name # Requests # Fails Average (ms) Min (ms) Max (ms) RPS Failures/s
GET get_advisory_by_doc_id 142 (+92) 0 11.21 (-1.05) 3 (0) 67 (+11) 0.47 (+0.31) 0.00 (+0.00)
GET get_analysis_latest_cpe 144 (+94) 0 120.76 (-32.08) 32 (0) 275 (-40) 0.48 (+0.31) 0.00 (+0.00)
GET get_analysis_status 144 (+94) 0 4.74 (-2.56) 1 (0) 62 (+10) 0.48 (+0.31) 0.00 (+0.00)
GET get_sbom[sha256:720e4451…a939656247164447] 144 (+94) 0 724.94 (-630.34) 200 (-184) 1803 (-1907) 0.48 (+0.31) 0.00 (+0.00)
GET get_sbom_license_ids[urn:uuid:019731…104-331632a21144] 143 (+93) 0 906.92 (+64.64) 472 (+116) 1469 (+377) 0.48 (+0.31) 0.00 (+0.00)
GET list_advisory 143 (+93) 0 472.31 (+30.19) 113 (-4) 826 (+64) 0.48 (+0.31) 0.00 (+0.00)
GET list_advisory_paginated 143 (+93) 0 403.37 (+47.15) 90 (-15) 701 (+64) 0.48 (+0.31) 0.00 (+0.00)
GET list_importer 142 (+92) 0 4.30 (+2.12) 1 (0) 52 (+40) 0.47 (+0.31) 0.00 (+0.00)
GET list_organizations 143 (+93) 0 10.96 (+2.02) 1 (0) 54 (+11) 0.48 (+0.31) 0.00 (+0.00)
GET list_packages 142 (+92) 0 400.63 (+65.19) 105 (+30) 840 (+52) 0.47 (+0.31) 0.00 (+0.00)
GET list_packages_paginated 142 (+92) 0 361.42 (+59.44) 93 (+6) 634 (+151) 0.47 (+0.31) 0.00 (+0.00)
GET list_products 145 (+90) 0 12.71 (+4.67) 2 (-2) 62 (+48) 0.48 (+0.30) 0.00 (+0.00)
GET list_sboms 145 (+90) 0 1481.51 (+188.66) 603 (+15) 2207 (+535) 0.48 (+0.30) 0.00 (+0.00)
GET list_sboms_paginated 145 (+95) 0 1688.94 (-1131.46) 430 (-33) 3494 (-4101) 0.48 (+0.32) 0.00 (+0.00)
GET list_vulnerabilities 142 (+92) 0 236.54 (-26.74) 52 (+3) 376 (-282) 0.47 (+0.31) 0.00 (+0.00)
GET list_vulnerabilities_paginated 142 (+92) 0 185.93 (+27.27) 47 (+6) 379 (+93) 0.47 (+0.31) 0.00 (+0.00)
GET sbom_by_package[pkg:maven/io.qu…dhat.com%2fga%2f] 143 (+93) 0 60.43 (-6.05) 10 (-3) 252 (+59) 0.48 (+0.31) 0.00 (+0.00)
GET search_advisory 142 (+92) 0 948.36 (+90.32) 139 (+19) 2281 (+54) 0.47 (+0.31) 0.00 (+0.00)
GET search_exact_purl 145 (+90) 0 9.06 (+1.46) 2 (-3) 57 (+45) 0.48 (+0.30) 0.00 (+0.00)
GET search_purls 146 (+91) 0 1793.22 (-16525.93) 432 (-14174) 5771 (-14640) 0.49 (+0.30) 0.00 (+0.00)
POST post_vulnerability_analyze[pkg:rpm/redhat/…h=noarch&epoch=1] 143 (+93) 0 649.72 (-41.12) 265 (+112) 1227 (+41) 0.48 (+0.31) 0.00 (+0.00)
Aggregated 3010 (+1940) 0 501.74 (-912.51) 1 (0) 5771 (-14640) 10.03 (+6.47) 0.00 (+0.00)

Response Time Metrics

Method Name 50%ile (ms) 60%ile (ms) 70%ile (ms) 80%ile (ms) 90%ile (ms) 95%ile (ms) 99%ile (ms) 100%ile (ms)
GET get_advisory_by_doc_id 6 (-1) 7 (-1) 8 (-1) 11 (-1) 25 (-17) 55 (+4) 64 (+8) 67 (+11)
GET get_analysis_latest_cpe 110 (-50) 120 (-50) 140 (-40) 170 (-30) 180 (-20) 190 (-30) 270 (-45) 275 (-40)
GET get_analysis_status 2 (0) 2 (-1) 3 (0) 4 (0) 6 (-24) 10 (-41) 55 (+3) 62 (+10)
GET get_sbom[sha256:720e4451…a939656247164447] 430 (-470) 500 (-500) 1,000 (0) 1,000 (0) 1,000 (-2,000) 1,803 (-1,197) 1,803 (-1,907) 1,803 (-1,907)
GET get_sbom_license_ids[urn:uuid:019731…104-331632a21144] 900 (+100) 900 (0) 1,000 (+100) 1,000 (0) 1,000 (0) 1,000 (0) 1,000 (0) 1,000 (0)
GET list_advisory 460 (+10) 480 (+10) 500 (+20) 500 (+10) 600 (-100) 600 (-100) 800 (+38) 800 (+38)
GET list_advisory_paginated 400 (+70) 410 (+30) 430 (+20) 480 (+20) 500 (0) 600 (+100) 700 (+100) 700 (+100)
GET list_importer 2 (0) 3 (+1) 3 (+1) 4 (+1) 6 (+2) 8 (+3) 50 (+38) 52 (+40)
GET list_organizations 3 (-2) 5 (-1) 6 (-1) 15 (+1) 38 (+13) 45 (+13) 49 (+6) 54 (+11)
GET list_packages 390 (+10) 400 (+20) 420 (+30) 460 (+50) 490 (+50) 600 (0) 800 (+12) 800 (+12)
GET list_packages_paginated 370 (+50) 380 (+20) 390 (+10) 410 (0) 430 (-20) 490 (+10) 500 (+20) 600 (+120)
GET list_products 7 (-1) 8 (-1) 10 (+1) 12 (+2) 50 (+39) 55 (+43) 61 (+47) 62 (+48)
GET list_sboms 2,000 (+1,000) 2,000 (+1,000) 2,000 (+1,000) 2,000 (+328) 2,000 (+328) 2,000 (+328) 2,000 (+328) 2,000 (+328)
GET list_sboms_paginated 2,000 (-1,000) 2,000 (-1,000) 2,000 (-1,000) 2,000 (-2,000) 3,000 (-1,000) 3,000 (-3,000) 3,000 (-4,595) 3,000 (-4,595)
GET list_vulnerabilities 230 (0) 240 (0) 270 (-70) 280 (-150) 290 (-180) 320 (-280) 370 (-288) 376 (-282)
GET list_vulnerabilities_paginated 190 (+20) 190 (-10) 200 (-10) 210 (0) 260 (+30) 280 (+20) 300 (+14) 379 (+93)
GET sbom_by_package[pkg:maven/io.qu…dhat.com%2fga%2f] 55 (-11) 63 (-11) 76 (-4) 110 (+23) 120 (-10) 170 (0) 200 (+10) 250 (+60)
GET search_advisory 800 (+100) 900 (+100) 1,000 (+100) 1,000 (0) 2,000 (0) 2,000 (0) 2,000 (0) 2,000 (0)
GET search_exact_purl 5 (-2) 5 (-2) 6 (-2) 8 (-1) 13 (+3) 50 (+40) 55 (+43) 57 (+45)
GET search_purls 1,000 (-18,000) 1,000 (-18,000) 1,000 (-18,000) 3,000 (-16,000) 5,000 (-15,000) 5,000 (-15,000) 5,771 (-14,229) 5,771 (-14,229)
POST post_vulnerability_analyze[pkg:rpm/redhat/…h=noarch&epoch=1] 600 (-100) 700 (0) 700 (-100) 800 (-100) 900 (-100) 900 (-100) 1,000 (0) 1,000 (0)
Aggregated 310 (+60) 400 (0) 500 (-100) 800 (-100) 1,000 (-1,000) 2,000 (-14,000) 3,000 (-16,000) 5,771 (-14,229)

Status Code Metrics

Method Name Status Codes
GET get_advisory_by_doc_id 142 [200]
GET get_analysis_latest_cpe 144 [200]
GET get_analysis_status 144 [200]
GET get_sbom[sha256:720e4451…a939656247164447] 144 [200]
GET get_sbom_license_ids[urn:uuid:019731…104-331632a21144] 143 [200]
GET list_advisory 143 [200]
GET list_advisory_paginated 143 [200]
GET list_importer 142 [200]
GET list_organizations 143 [200]
GET list_packages 142 [200]
GET list_packages_paginated 142 [200]
GET list_products 145 [200]
GET list_sboms 145 [200]
GET list_sboms_paginated 145 [200]
GET list_vulnerabilities 142 [200]
GET list_vulnerabilities_paginated 142 [200]
GET sbom_by_package[pkg:maven/io.qu…dhat.com%2fga%2f] 143 [200]
GET search_advisory 142 [200]
GET search_exact_purl 145 [200]
GET search_purls 146 [200]
POST post_vulnerability_analyze[pkg:rpm/redhat/…h=noarch&epoch=1] 143 [200]
Aggregated 3,010 [200]

Transaction Metrics

Transaction # Times Run # Fails Average (ms) Min (ms) Max (ms) RPS Failures/s
WebsiteUser
0.0 logon 0 (0) 0 (0) 0.00 (+0.00) 0 (0) 0 (0) 0.00 (+0.00) 0.00 (+0.00)
0.1 website_index 0 (0) 0 (0) 0.00 (+0.00) 0 (0) 0 (0) 0.00 (+0.00) 0.00 (+0.00)
0.2 website_openapi 0 (0) 0 (0) 0.00 (+0.00) 0 (0) 0 (0) 0.00 (+0.00) 0.00 (+0.00)
0.3 website_sboms 0 (0) 0 (0) 0.00 (+0.00) 0 (0) 0 (0) 0.00 (+0.00) 0.00 (+0.00)
0.4 website_packages 0 (0) 0 (0) 0.00 (+0.00) 0 (0) 0 (0) 0.00 (+0.00) 0.00 (+0.00)
0.5 website_advisories 0 (0) 0 (0) 0.00 (+0.00) 0 (0) 0 (0) 0.00 (+0.00) 0.00 (+0.00)
0.6 website_importers 0 (0) 0 (0) 0.00 (+0.00) 0 (0) 0 (0) 0.00 (+0.00) 0.00 (+0.00)
RestAPIUser
1.0 logon 143 (+93) 0 (0) 13.61 (+0.63) 7 (+1) 20 (0) 0.48 (+0.31) 0.00 (+0.00)
1.1 list_organizations 143 (+93) 0 (0) 11.21 (+2.13) 1 (0) 54 (+11) 0.48 (+0.31) 0.00 (+0.00)
1.2 list_advisory 143 (+93) 0 (0) 472.38 (+30.22) 113 (-4) 826 (+64) 0.48 (+0.31) 0.00 (+0.00)
1.3 list_advisory_paginated 143 (+93) 0 (0) 403.48 (+47.20) 90 (-15) 701 (+63) 0.48 (+0.31) 0.00 (+0.00)
1.4 get_advisory_by_doc_id 142 (+92) 0 (0) 11.27 (-1.01) 3 (0) 67 (+10) 0.47 (+0.31) 0.00 (+0.00)
1.5 search_advisory 142 (+92) 0 (0) 948.40 (+90.34) 139 (+19) 2281 (+54) 0.47 (+0.31) 0.00 (+0.00)
1.6 list_vulnerabilities 142 (+92) 0 (0) 236.58 (-26.74) 52 (+3) 376 (-282) 0.47 (+0.31) 0.00 (+0.00)
1.7 list_vulnerabilities_paginated 142 (+92) 0 (0) 186.02 (+27.32) 47 (+6) 379 (+93) 0.47 (+0.31) 0.00 (+0.00)
1.8 list_importer 142 (+92) 0 (0) 4.31 (+2.13) 1 (0) 52 (+40) 0.47 (+0.31) 0.00 (+0.00)
1.9 list_packages 142 (+92) 0 (0) 400.68 (+65.20) 105 (+30) 840 (+52) 0.47 (+0.31) 0.00 (+0.00)
1.10 list_packages_paginated 142 (+92) 0 (0) 361.47 (+59.47) 93 (+6) 634 (+151) 0.47 (+0.31) 0.00 (+0.00)
1.11 search_purls 146 (+91) 0 (0) 1793.32 (-16525.87) 432 (-14174) 5771 (-14640) 0.49 (+0.30) 0.00 (+0.00)
1.12 search_exact_purl 145 (+90) 0 (0) 9.10 (+1.44) 2 (-3) 57 (+45) 0.48 (+0.30) 0.00 (+0.00)
1.13 list_products 145 (+90) 0 (0) 12.74 (+4.68) 2 (-2) 62 (+48) 0.48 (+0.30) 0.00 (+0.00)
1.14 list_sboms 145 (+90) 0 (0) 1481.56 (+188.69) 603 (+15) 2207 (+535) 0.48 (+0.30) 0.00 (+0.00)
1.15 list_sboms_paginated 145 (+95) 0 (0) 1689.01 (-1131.51) 430 (-33) 3494 (-4101) 0.48 (+0.32) 0.00 (+0.00)
1.16 get_analysis_status 144 (+94) 0 (0) 4.81 (-2.55) 1 (0) 62 (+10) 0.48 (+0.31) 0.00 (+0.00)
1.17 get_analysis_latest_cpe 144 (+94) 0 (0) 120.82 (-32.06) 32 (0) 276 (-39) 0.48 (+0.31) 0.00 (+0.00)
1.18 get_sbom[sha256:720e4451…a939656247164447] 144 (+94) 0 (0) 725.04 (-630.28) 200 (-184) 1803 (-1907) 0.48 (+0.31) 0.00 (+0.00)
1.19 sbom_by_package[pkg:maven/io.qu…dhat.com%2fga%2f] 143 (+93) 0 (0) 60.55 (-5.97) 10 (-3) 252 (+59) 0.48 (+0.31) 0.00 (+0.00)
1.20 get_sbom_license_ids[urn:uuid:019731…104-331632a21144] 143 (+93) 0 (0) 907.03 (+64.73) 473 (+117) 1469 (+377) 0.48 (+0.31) 0.00 (+0.00)
1.21 post_vulnerability_analyze[pkg:rpm/redhat/…h=noarch&epoch=1] 143 (+93) 0 (0) 649.81 (-41.13) 265 (+112) 1227 (+41) 0.48 (+0.31) 0.00 (+0.00)
Aggregated 3153 (+2033) 0 (0) 478.99 (-872.13) 1 (0) 5771 (-14640) 10.51 (+6.78) 0.00 (+0.00)

Scenario Metrics

Transaction # Users # Times Run Average (ms) Min (ms) Max (ms) Scenarios/s Iterations
WebsiteUser 0 (0) 0 (0) 0.00 (+0.00) 0 (0) 0 (0) 0.00 (+0.00) 0.00 (+0.00)
RestAPIUser 5 (0) 143 (+93) 10436.98 (-17825.78) 5666 (-12886) 15667 (-18034) 0.48 (+0.31) 28.60 (+18.60)
Aggregated 5 (0) 143 (+93) 10436.98 (-17825.78) 5666 (-12886) 15667 (-18034) 0.48 (+0.31) 28.60 (+18.60)

📄 Full Report (Go to "Artifacts" and download report)

Copy link
Contributor

@ctron ctron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good too me. Thanks for wrestling with this. Two small ideas. But not blockers.

})
.buffer_unordered(self.concurrency)
.filter_map(|nodes| async move { nodes })
.collect::<Vec<_>>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you flatten the stream before collecting it? So save one collect?

default_value = "10",
help = "The number of concurrent tasks for analysis."
)]
pub concurrency: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should guard against zero. Maybe using https://doc.rust-lang.org/std/num/type.NonZeroUsize.html is just a manual check.

@dejanb dejanb force-pushed the graph-performance branch 4 times, most recently from 3b5bc59 to cb780d4 Compare August 7, 2025 11:01
  This commit significantly improves the performance of graph analysis by applying parallelization techniques to the run_graph_query and collect_graph functions.

  Previously, the collection of ancestors and descendants for a given node was performed sequentially. By refactoring run_graph_query to use futures::join!, we now process both directions concurrently, reducing the overall
  execution time.

  Building on this, the collect_graph function was optimized to process all discovered nodes in parallel using join_all. This ensures that we leverage available resources more efficiently when analyzing multiple entry points in
  the graph.

Assisted-by: Gemini
@dejanb dejanb force-pushed the graph-performance branch from cb780d4 to d8634d4 Compare August 7, 2025 11:08
@dejanb
Copy link
Contributor Author

dejanb commented Aug 7, 2025

@ctron Thanks for the suggestions. Flattening the stream definitely makes sense and I think additionally impacts the memory usage in a positive way.

I also implemented the concurrency config with NonZeroUsize. I'm not super happy with the usage pattern, but I think it's good for now. There are also a few more places in the code where this could be applied.

Can you give it another quick look before merging?

@dejanb dejanb added this pull request to the merge queue Aug 7, 2025
Merged via the queue into main with commit 92cb5b6 Aug 7, 2025
7 checks passed
@dejanb dejanb deleted the graph-performance branch August 7, 2025 12:39
@dejanb dejanb added the backport release/0.3.z Backport (0.3.z) label Aug 7, 2025
@dejanb
Copy link
Contributor Author

dejanb commented Aug 7, 2025

/backport

@trustification-ci-bot
Copy link

Successfully created backport PR for release/0.3.z:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport release/0.3.z Backport (0.3.z)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants