Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add test reporting doc to benchmarks dir #3238

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

stubbsta
Copy link

@stubbsta stubbsta commented Jan 14, 2025

Description

This PR is a first pass at adding a nwaku test summary page which aims to provide a quick reference for anyone implementing the waku protocol using nwaku to see the expected performance as well as have quick access to test reports.

Changes

  • Added file docs/benchmarks/test-results-summary.md

How to test

  1. Pull https://github.com/waku-org/docs.waku.org
  2. Edit url to use this branch: https://github.com/waku-org/docs.waku.org/blob/develop/fetch-content.js#L64
  3. yarn build
  4. yarn serve

@stubbsta stubbsta changed the title Add test reporting doc to benchmarks dir docs: Add test reporting doc to benchmarks dir Jan 14, 2025
@stubbsta
Copy link
Author

This is a draft to get inputs on the formatting and content.
The Quick ref section definitely needs work. I would appreciate some info on how to relate the data back to Status App.

Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I've added a couple of comments and suggestions on what other sections to add.

docs/benchmarks/test-results-summary.md Outdated Show resolved Hide resolved
docs/benchmarks/test-results-summary.md Outdated Show resolved Hide resolved
docs/benchmarks/test-results-summary.md Outdated Show resolved Hide resolved
docs/benchmarks/test-results-summary.md Outdated Show resolved Hide resolved
docs/benchmarks/test-results-summary.md Outdated Show resolved Hide resolved
docs/benchmarks/test-results-summary.md Outdated Show resolved Hide resolved
docs/benchmarks/test-results-summary.md Outdated Show resolved Hide resolved
docs/benchmarks/test-results-summary.md Outdated Show resolved Hide resolved
docs/benchmarks/test-results-summary.md Outdated Show resolved Hide resolved
docs/benchmarks/test-results-summary.md Show resolved Hide resolved
@stubbsta stubbsta force-pushed the add-performance-benchmarks-overview branch from 2362ed1 to e01a61e Compare January 20, 2025 06:00
@stubbsta
Copy link
Author

@fryorcraken I wonder if the TL;DR section is not too wordy? Is the requirement not to have it be something very short that can be read quickly and easily remembered such as:

  • Relay network average bandwidth usage: x KB/s
  • Disv5 average bandwidth usage: y KB/s
  • etc.

and then if the reader wants more info (such as the network size and message rate for the simulations where the above values were obtained, they can look at the Insights section and if they want even more info they can go look at the reports on notion?

@stubbsta stubbsta requested a review from jm-clius January 27, 2025 09:30
Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments. :)


> ## TL;DR
>
> - libp2p bandwidth usage fluctuates between 5 and 15 KB/s for topologies of up to 1000 nodes, with average bandwidth usage at **10 KB/s**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the bandwidth numbers to make sense, we need to add the message rate and size. Perhaps just mentioning the average and max bandwidth is enough?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this section according to the lower comment to simplify it further.

docs/benchmarks/test-results-summary.md Outdated Show resolved Hide resolved
Comment on lines 39 to 41
| [Relay](https://www.notion.so/Waku-regression-testing-v0-34-1618f96fb65c803bb7bad6ecd6bafff9) (1000 nodes) | 0.05 | 1.6 |
| [Mixed](https://www.notion.so/Mixed-environment-analysis-1688f96fb65c809eb235c59b97d6e15b) (210 nodes) | 0.0125 | 0.007 |
| [Non-persistent Relay](https://www.notion.so/High-Churn-Relay-Store-Reliability-16c8f96fb65c8008bacaf5e86881160c) (510 nodes)| 0.0125 | 0.25 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just add very brief description of what "Relay", "Mixed" and "Non-persistent Relay" means, so that a reader doesn't have to click the links to get an intuitive understanding.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in adcdbee


## Testing
### DST
The VAC DST team performs regression testing on all new **nwaku** releases, comparing performance with previous versions. They simulate large Waku networks with a variety of network and protocol configurations that are representative of real-world usage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The VAC DST team performs regression testing on all new **nwaku** releases, comparing performance with previous versions. They simulate large Waku networks with a variety of network and protocol configurations that are representative of real-world usage.
The VAC DST team performs regression testing on all new **nwaku** releases, comparing performance with previous versions.
They simulate large Waku networks with a variety of network and protocol configurations that are representative of real-world usage.

Semantic breaks, here and further down. :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the breaks in adcdbee, but I'm not sure whether they are formatted correctly now?


> ## TL;DR
>
> - libp2p bandwidth usage fluctuates between 5 and 15 KB/s for topologies of up to 1000 nodes, with average bandwidth usage at **10 KB/s**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know the configured degree D here - I think this is just the default of 6? Perhaps not worth mentioning if this is a "well-known fact" about Waku.

So, on second thought I think we can simplify this TL;DR, focus on the critical conclusion and use less domain terms. For example, our first sentence suggests that we have concluded an average of 10 KB/s only up to 1000 nodes, but in the next sentence we say roughly the same but this time for up to 2000 nodes. I'd suggest something like:

Waku bandwidth (minus traffic related to discv5 Discovery) averages ~10KB/s for a message injection rate of X KB/s for any topology size* (*confirmed up to 2000 nodes).

I think X is 1KB/s (i.e. 1KB message every 1 second)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think D is the default of 6 @AlbertoSoutullo could you please confirm this?
This line in the doc refers to these results https://www.notion.so/Waku-regression-testing-v0-34-1618f96fb65c803bb7bad6ecd6bafff9

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless from your part you changed default values in the waku version, it is 6 afaik, yes c:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @AlbertoSoutullo. All the data in this doc is based on the DST tests reports, so no changes from my side

@stubbsta stubbsta requested a review from jm-clius February 10, 2025 05:36
Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM now. :) I know there's still an outstanding section on relevance to Status, but happy to see this merged once that is done.

@stubbsta
Copy link
Author

Hi @plopezlpz, could you please have a look and maybe provide some inputs?
One of the main objectives of the doc is to indicate how these values might be relevant/helpful to Status, but my knowledge of the Status waku config is limited.
For example, it would be good to be able to compare the overall Status bandwidth usage to the average Waku bandwidth usage.
Does the Status app have its own performance benchmarks that we could use to compare?

Copy link
Collaborator

@fryorcraken fryorcraken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good 👍


> ## TL;DR
>
> - Average Waku bandwidth usage: ~**10 KB/s** (minus discv5 Discovery) for 1KB message size and message injection rate of 1msg/s.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe more interesting to read the bandwidth overhead.

I would not exclude discv5

So something like:

for 1KB message per second, sent by 2000 publisher nodes (all nodes in network are publisher). The total bandwidth is 3MBps. Knowing that data injected to the network is 2MBs (1KB * 2000 per second). There is an overheard of 1MBps (gossipsub amplification + peer discovery discv5).

> - Average time for a message to propagate to 100% of nodes: **0.4s** for topologies of up to 2000 Relay nodes.
> - Average per-node bandwidth usage of the discv5 protocol: **8 KB/s** for incoming traffic and **7.4 KB/s** for outgoing traffic,
in a network with 100 continuously online nodes.
> - Relevancy to Status App: **TODO**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aim is to produce a messaging API, whose default values should be the one used by Status and perform benchmarks using the messaging API so it's all relevant for Status or any app using said API.

Comment on lines +22 to +28
The average per-node `libp2p` bandwidth usage in a 1000-node Relay network with 1KB messages at varying injection rates.


| Message Injection Rate | Average libp2p incoming bandwidth (KB/s) | Average libp2p outgoing bandwidth (KB/s) |
|------------------------|------------------------------------------|------------------------------------------|
| 1 msg/s | ~10.1 | ~10.3 |
| 1 msg/10s | ~1.8 | ~1.9 |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to state the obvious of application data injected (1msg/s *1000 nodes) and data measured so we understand the overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants