Conversation
| metrics::incr_grpc_message_sent_counter(&subscriber_id); | ||
| metrics::incr_grpc_bytes_sent(&subscriber_id, proto_size); |
There was a problem hiding this comment.
@linuskendall
This is the main sending code to the client.
Here I measure the size of the proto payload I will send to the client via gRPC.
Only payload that passes client's filter will be measure.
Finnally, I increase the subscriber message counter.
yellowstone-grpc-geyser/src/grpc.rs
Outdated
There was a problem hiding this comment.
Here I measure three things:
The overall loop processing pace set_subscriber_pace where the unit is "geyser event"/second.
The actual bandwidth load we are sending to the downstream client (only filtered data matching client's filters).
The actual bandwidth consumption rate a client do per second.
| let subscriber_id = request | ||
| .metadata() | ||
| .get("x-subscription-id") | ||
| .and_then(|h| h.to_str().ok().map(|s| s.to_string())) | ||
| .or(request.remote_addr().map(|addr| addr.to_string())); | ||
|
|
There was a problem hiding this comment.
In order to identitfy a downstream client I check if x-subscription-id is present otherwise I use its ip address.
| } | ||
| // Dedup accounts by max write_version | ||
| Message::Account(msg) => { | ||
| metrics::observe_geyser_account_update_received(msg.account.data.len()); |
There was a problem hiding this comment.
This line of code is inside the "geyser loop" which process every geyser event from agave.
I measure the account data size and put it inside an histogram.
| static ref GEYSER_ACCOUNT_UPDATE_RECEIVED: Histogram = Histogram::with_opts( | ||
| HistogramOpts::new( | ||
| "geyser_account_update_data_size_kib", | ||
| "Histogram of all account update data (kib) received from Geyser plugin" | ||
| ) | ||
| .buckets(vec![5.0, 10.0, 20.0, 30.0, 50.0, 100.0, 200.0, 300.0, 500.0, 1000.0, 2000.0, 3000.0, 5000.0, 10000.0]) | ||
| ).unwrap(); | ||
| } |
There was a problem hiding this comment.
I opted for an histogram to measure account data our geyser plugin receives since histogram gives us two other metrics for "free":
the total sum of data (geyser_account_update_data_size_kib_sum) and the account update counts (geyser_account_update_data_size_kib_count).
The bucket is based of previous work I did for fumarole.
The P90 of account data size should be below 5KiB.
The P95 should be < 20kIB.
About 1% of account data can be above 1MiB.
The max size (bucket) is ~10mb which match the max account data size.
| /// Exponential Moving Average (EMA) for load tracking. | ||
| /// | ||
| #[derive(Debug)] | ||
| pub struct Ema { |
There was a problem hiding this comment.
This is Exponential-Moving-Avg algorithm to compute an average load.
This is important to use to have a smoothed view of the load we are sending to downstream client and remove noise from graph, thus helping us with better visibility.
| &["status"] | ||
| ).unwrap(); | ||
|
|
||
| static ref GRPC_MESSAGE_SENT: IntCounterVec = IntCounterVec::new( |
There was a problem hiding this comment.
Measure how many geyser event we sent to each subscriber.
| &["subscriber_id"] | ||
| ).unwrap(); | ||
|
|
||
| static ref GRPC_BYTES_SENT: IntCounterVec = IntCounterVec::new( |
There was a problem hiding this comment.
Measures how many bytes (protobuffer encoded data) to each subscriber
Added sevral metrics for geyser and load tracking client session.
Client load uses EMA (Exponential-Moving-Average) to smooth out metrics and remove noise and extreme short-live spike in prometheus/grafana.