-
Notifications
You must be signed in to change notification settings - Fork 4
Feat/511 Implement Data Collection and Visualization for Web3.Storage Measurement Batch #560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feat/511 Implement Data Collection and Visualization for Web3.Storage Measurement Batch #560
Conversation
Hi @pyropy can you review this draft and let me know if I’m heading in the right direction? |
@pyropy , I am also facing this issue in parallel, where the server runs fine but suddenly outputs an error stating that InfluxDB write failed. I have attached a screenshot below. I created InfluxDB keys from this platform: https://eu-central-1-1.aws.cloud2.influxdata.com/orgs/e0c31560a2dfff87/load-data/tokens. I tried deleting keys and checking RPC nodes but issue persists. |
@Winter-Soren could you please link to the issue this addresses both in the PR title and description? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused.
What problems is this pull request trying to solve?
If the goal is to start collecting telemetry about batch sizes, I would expect the pull request to simply add more fields to the existing publish
point.
publish/index.js
Outdated
point.tag('cid', cid.toString()) | ||
point.tag('round_index', roundIndex.toString()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storing CIDs and round indexes in tags will cause high cardinality that will degrade performance and/or increase our bill.
Quoting from https://docs.influxdata.com/influxdb/v2/write-data/best-practices/resolve-high-cardinality/
Tags containing highly variable information like unique IDs, hashes, and random strings lead to a large number of series, also known as high series cardinality. High series cardinality is a primary driver of high memory usage for many database workloads.
(...)
Review your tags to ensure each tag does not contain unique values for most entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the suggestion, I'll rectify the commit!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to what @bajtos said.
@juliangruber, I have edited the name of the PR and added a link to the issue it resolves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job and thank you for your contribution @Winter-Soren! 🎉
I think the PR is going in the right direction but would need to fix few things before continuing. Namely lets stick to established pattern in the common/telemetry.js
file (which I've noted in the comments). We would also need to improve naming a bit to make it more obvious of what metrics we're storing.
Keep up the good work! 🚀
@@ -53,6 +53,11 @@ export const publish = async ({ | |||
|
|||
logger.log(`Publishing ${measurements.length} measurements. Total unpublished: ${totalCount}. Batch size: ${maxMeasurements}.`) | |||
|
|||
// Calculate batch size in bytes | |||
const batchSizeBytes = Buffer.byteLength( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you found some other ways to calculate batch size without serialising objects to JSON? Depending on the batch size that might create consume a lot of memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
We have the following code few lines below:
const file = new File(
[measurements.map(m => JSON.stringify(m)).join('\n')],
'measurements.ndjson',
{ type: 'application/json' }
)
Please refactor it so that we create only one copy of measurements.map(m => JSON.stringify(m)).join('\n')
.
publish/index.js
Outdated
point.tag('cid', cid.toString()) | ||
point.tag('round_index', roundIndex.toString()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to what @bajtos said.
Thank you for adding a link to #511, @Winter-Soren. This pull request makes sense to me now! Thank you for the contribution ❤️ Before we move forward, I think it's best for me and @pyropy to agree on the higher-level design. @pyropy Why is it necessary to create a new bucket and a new telemetry writer? As I wrote before:
If we add more fields to the existing |
Sorry, I have just quickly glanced over your comment. Looking at the code I have to say you're right, we could just simply extend |
@Winter-Soren I have published a bad task description without looking deeper into the existing codebase. I have updated the task description according to original comment published by @bajtos. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added new comments according to updated description of #511. Please let me know if you need further explanation. 🙏🏻
@@ -25,9 +25,17 @@ const networkInfoWriteClient = influx.getWriteApi( | |||
's' // precision | |||
) | |||
|
|||
// Add new write client for batch metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not add new bucket and write client, rather let's just extend existing publish
metric.
common/telemetry.js
Outdated
setInterval(() => { | ||
publishWriteClient.flush().catch(console.error) | ||
networkInfoWriteClient.flush().catch(console.error) | ||
batchMetricsWriteClient.flush().catch(console.error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batchMetricsWriteClient.flush().catch(console.error) |
We won't need this one if we extend existing publish
metric.
common/telemetry.js
Outdated
recordNetworkInfoTelemetry, | ||
batchMetricsWriteClient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recordNetworkInfoTelemetry, | |
batchMetricsWriteClient | |
recordNetworkInfoTelemetry |
We won't need this one if we extend existing publish metric.
recordTelemetry('publish', point => { | ||
// Existing metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's extend this metric with new point that collects batch size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me correct that - we should add new fields to the existing point.
publish/index.js
Outdated
@@ -136,6 +143,16 @@ export const publish = async ({ | |||
) | |||
point.intField('add_measurements_duration_ms', ieAddMeasurementsDuration) | |||
}) | |||
|
|||
// Separate batch metrics recording for better organization | |||
recordTelemetry('batch_metrics', point => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's delete this new metric.
9bc327d
to
3a83c09
Compare
This PR implements collection and visualization of Web3.Storage measurement batch metrics.
Closes #511
Changes
Code Changes
common/telemetry.js
to add dedicated batch metrics write clientpublish/index.js
to separate batch metrics recordingMetrics Collected
batch_size_bytes
: Total size of the measurement batchavg_measurement_size_bytes
: Average size per measurementmeasurement_count
: Number of measurements in batchcid
: Content ID of the batchround_index
: Associated round numberTesting