Skip to content

Feat/511 Implement Data Collection and Visualization for Web3.Storage Measurement Batch #560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Winter-Soren
Copy link

@Winter-Soren Winter-Soren commented Mar 27, 2025

This PR implements collection and visualization of Web3.Storage measurement batch metrics.

Closes #511

Changes

  1. Created new InfluxDB bucket 'spark-batch-metrics' for dedicated batch metrics storage
  2. Enhanced telemetry recording with separate batch metrics measurement
  3. Added visualization dashboards for batch metrics

Code Changes

  • Modified common/telemetry.js to add dedicated batch metrics write client
  • Enhanced publish/index.js to separate batch metrics recording
  • Added proper tags for improved metric querying

Metrics Collected

  • batch_size_bytes: Total size of the measurement batch
  • avg_measurement_size_bytes: Average size per measurement
  • measurement_count: Number of measurements in batch
  • Tags:
    • cid: Content ID of the batch
    • round_index: Associated round number

Testing

  • Verified metrics collection with test data
  • Confirmed proper data storage in InfluxDB
  • Validated dashboard visualizations

@Winter-Soren
Copy link
Author

Winter-Soren commented Mar 27, 2025

Hi @pyropy can you review this draft and let me know if I’m heading in the right direction?

@Winter-Soren
Copy link
Author

Winter-Soren commented Mar 27, 2025

@pyropy , I am also facing this issue in parallel, where the server runs fine but suddenly outputs an error stating that InfluxDB write failed. I have attached a screenshot below.

image

I created InfluxDB keys from this platform: https://eu-central-1-1.aws.cloud2.influxdata.com/orgs/e0c31560a2dfff87/load-data/tokens.

I tried deleting keys and checking RPC nodes but issue persists.

@juliangruber
Copy link
Member

@Winter-Soren could you please link to the issue this addresses both in the PR title and description?

Copy link
Member

@bajtos bajtos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused.

What problems is this pull request trying to solve?

If the goal is to start collecting telemetry about batch sizes, I would expect the pull request to simply add more fields to the existing publish point.

publish/index.js Outdated
Comment on lines 152 to 153
point.tag('cid', cid.toString())
point.tag('round_index', roundIndex.toString())
Copy link
Member

@bajtos bajtos Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing CIDs and round indexes in tags will cause high cardinality that will degrade performance and/or increase our bill.

Quoting from https://docs.influxdata.com/influxdb/v2/write-data/best-practices/resolve-high-cardinality/

Tags containing highly variable information like unique IDs, hashes, and random strings lead to a large number of series, also known as high series cardinality. High series cardinality is a primary driver of high memory usage for many database workloads.
(...)
Review your tags to ensure each tag does not contain unique values for most entries.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the suggestion, I'll rectify the commit!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to what @bajtos said.

@Winter-Soren Winter-Soren changed the title Web3.Storage Measurement Batch Metrics Visualization Feat/511 Implement Data Collection and Visualization for Web3.Storage Measurement Batch Mar 29, 2025
@Winter-Soren
Copy link
Author

@Winter-Soren could you please link to the issue this addresses both in the PR title and description?

@juliangruber, I have edited the name of the PR and added a link to the issue it resolves.

@Winter-Soren Winter-Soren requested a review from bajtos March 29, 2025 18:57
Copy link
Contributor

@pyropy pyropy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job and thank you for your contribution @Winter-Soren! 🎉

I think the PR is going in the right direction but would need to fix few things before continuing. Namely lets stick to established pattern in the common/telemetry.js file (which I've noted in the comments). We would also need to improve naming a bit to make it more obvious of what metrics we're storing.

Keep up the good work! 🚀

@@ -53,6 +53,11 @@ export const publish = async ({

logger.log(`Publishing ${measurements.length} measurements. Total unpublished: ${totalCount}. Batch size: ${maxMeasurements}.`)

// Calculate batch size in bytes
const batchSizeBytes = Buffer.byteLength(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you found some other ways to calculate batch size without serialising objects to JSON? Depending on the batch size that might create consume a lot of memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

We have the following code few lines below:

  const file = new File(
    [measurements.map(m => JSON.stringify(m)).join('\n')],
    'measurements.ndjson',
    { type: 'application/json' }
  )

Please refactor it so that we create only one copy of measurements.map(m => JSON.stringify(m)).join('\n').

publish/index.js Outdated
Comment on lines 152 to 153
point.tag('cid', cid.toString())
point.tag('round_index', roundIndex.toString())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to what @bajtos said.

@bajtos
Copy link
Member

bajtos commented Mar 31, 2025

Thank you for adding a link to #511, @Winter-Soren. This pull request makes sense to me now!

Thank you for the contribution ❤️

Before we move forward, I think it's best for me and @pyropy to agree on the higher-level design.

@pyropy Why is it necessary to create a new bucket and a new telemetry writer?

As I wrote before:

If the goal is to start collecting telemetry about batch sizes, I would expect the pull request to simply add more fields to the existing publish point.

If we add more fields to the existing publish point, what problems would that cause?

@pyropy
Copy link
Contributor

pyropy commented Mar 31, 2025

Thank you for adding a link to #511, @Winter-Soren. This pull request makes sense to me now!

Thank you for the contribution ❤️

Before we move forward, I think it's best for me and @pyropy to agree on the higher-level design.

@pyropy Why is it necessary to create a new bucket and a new telemetry writer?

As I wrote before:

If the goal is to start collecting telemetry about batch sizes, I would expect the pull request to simply add more fields to the existing publish point.

If we add more fields to the existing publish point, what problems would that cause?

Sorry, I have just quickly glanced over your comment. Looking at the code I have to say you're right, we could just simply extend publish point. I'm going also update the #511 description accordingly.

@pyropy
Copy link
Contributor

pyropy commented Mar 31, 2025

@Winter-Soren I have published a bad task description without looking deeper into the existing codebase. I have updated the task description according to original comment published by @bajtos.

Copy link
Contributor

@pyropy pyropy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added new comments according to updated description of #511. Please let me know if you need further explanation. 🙏🏻

@@ -25,9 +25,17 @@ const networkInfoWriteClient = influx.getWriteApi(
's' // precision
)

// Add new write client for batch metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not add new bucket and write client, rather let's just extend existing publish metric.

setInterval(() => {
publishWriteClient.flush().catch(console.error)
networkInfoWriteClient.flush().catch(console.error)
batchMetricsWriteClient.flush().catch(console.error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
batchMetricsWriteClient.flush().catch(console.error)

We won't need this one if we extend existing publish metric.

Comment on lines 54 to 55
recordNetworkInfoTelemetry,
batchMetricsWriteClient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
recordNetworkInfoTelemetry,
batchMetricsWriteClient
recordNetworkInfoTelemetry

We won't need this one if we extend existing publish metric.

recordTelemetry('publish', point => {
// Existing metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's extend this metric with new point that collects batch size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me correct that - we should add new fields to the existing point.

publish/index.js Outdated
@@ -136,6 +143,16 @@ export const publish = async ({
)
point.intField('add_measurements_duration_ms', ieAddMeasurementsDuration)
})

// Separate batch metrics recording for better organization
recordTelemetry('batch_metrics', point => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's delete this new metric.

bajtos

This comment was marked as off-topic.

@Winter-Soren Winter-Soren force-pushed the feat/511-web3-storage-batch-metrics branch from 9bc327d to 3a83c09 Compare April 5, 2025 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Implement Data Collection and Visualization for Web3.Storage Measurement Batch
4 participants