feat: CON-1639 HTTPS outcalls pay-as-you-go and dark launch budget trackers#10519
feat: CON-1639 HTTPS outcalls pay-as-you-go and dark launch budget trackers#10519eichhorl wants to merge 33 commits into
Conversation
…into eichhorl/dark-launch-tracker
…into eichhorl/dark-launch-tracker
There was a problem hiding this comment.
Pull request overview
This PR introduces the pay-as-you-go per-replica pricing tracker for HTTPS outcalls and a dark-launch tracker that computes the pay-as-you-go result side-by-side with the legacy tracker, emitting metrics/logs on divergences while keeping the legacy externally observable behavior unchanged. It also threads subnet pricing inputs (subnet size + cycles cost schedule) to the HTTP adapter path so the pricing logic can compute correct costs.
Changes:
- Add
PayAsYouGoTrackerandDarkLaunchTracker, plus pricing metrics and aPricingFactorythat selects trackers byPricingVersion. - Extend
CanisterHttpRequestwithsubnet_sizeandcost_schedule, populated by consensus/pocket-ic and consumed by the adapter client. - Move reject-message truncation to the adapter client and add a gossip-usage accounting step before generating the per-replica receipt.
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| rs/types/types/src/canister_http.rs | Adds subnet pricing inputs to CanisterHttpRequest and centralizes the reject-message size limit constant. |
| rs/types/cycles/src/cycles_cost_schedule.rs | Adds Hash derivation to support hashing cost schedule values carried in requests. |
| rs/pocket_ic_server/src/pocket_ic.rs | Populates new CanisterHttpRequest pricing fields in PocketIC request construction. |
| rs/https_outcalls/pricing/src/payg.rs | Implements pay-as-you-go per-replica accounting with tests. |
| rs/https_outcalls/pricing/src/metrics.rs | Introduces dark-launch metrics (total evaluated + incompatible by step/replication). |
| rs/https_outcalls/pricing/src/lib.rs | Adds PricingFactory (with metrics/logger), extends BudgetTracker with gossip accounting, wires dark-launch vs payg selection. |
| rs/https_outcalls/pricing/src/legacy.rs | Updates legacy tracker to satisfy new BudgetTracker API (no-op gossip step). |
| rs/https_outcalls/pricing/src/dark_launch.rs | Implements side-by-side “real vs shadow” tracker with divergence logging + metrics and tests. |
| rs/https_outcalls/pricing/Cargo.toml | Adds dependencies required for logging/metrics and cost schedule types. |
| rs/https_outcalls/pricing/BUILD.bazel | Adds Bazel deps for new pricing crate dependencies. |
| rs/https_outcalls/consensus/src/pool_manager.rs | Fetches subnet pricing inputs from registry and passes them to the HTTP adapter request. |
| rs/https_outcalls/consensus/Cargo.toml | Removes ic-utils dependency no longer needed after moving truncation logic. |
| rs/https_outcalls/consensus/BUILD.bazel | Removes //rs/utils Bazel dep accordingly. |
| rs/https_outcalls/client/src/client.rs | Instantiates pricing factory, truncates oversized rejects, and accounts for gossip usage before receipt creation. |
| rs/https_outcalls/client/Cargo.toml | Adds ic-utils dependency for StrEllipsize. |
| rs/https_outcalls/client/BUILD.bazel | Adds //rs/utils dependency for the client crate and tests. |
| Cargo.lock | Updates lockfile for new/added dependencies. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…into eichhorl/dark-launch-tracker
|
✅ No security or compliance issues detected. Reviewed everything up to 81c0f81. Security Overview
Detected Code Changes
|
|
|
||
| // A multi-byte message whose 1200 bytes exceed the 1024-byte limit, with | ||
| // emoji straddling the truncation boundary to exercise char safety. | ||
| let oversized_message = "😀".repeat(300); |
There was a problem hiding this comment.
I don't think this crosses the boundary of 1024 chars.
There was a problem hiding this comment.
Why not? There is an assert just below saying that it is
There was a problem hiding this comment.
What I mean is that the emoji is encoded using 4 bytes and the byte limit of 1024 bytes is an integer multiple of that so we don't exercise the case of the emoji crossing the byte limit (in this case truncating to exactly 1024 bytes would cut the emoji encoding and the blob won't be a valid string anymore).
| use ic_metrics::MetricsRegistry; | ||
| use prometheus::IntCounterVec; | ||
|
|
||
| /// Label identifying the accounting step at which the shadow tracker diverged |
There was a problem hiding this comment.
"diverged" (or "disagreed" below) sounds like they don't agree on the actual charged amount which is expected; I'd rephrase this so that it is clear that we refer to the step at which the shadow tracker reported an error while the legacy tracker succeeded
There was a problem hiding this comment.
There are still occurrences of "divergence" and "disagreed" that I'd suggest to clean up, too.
| /// Charges `amount` against the budget. Returns an error if the total spent | ||
| /// now exceeds the available allowance. | ||
| fn charge(&mut self, amount: u128) -> Result<(), PricingError> { | ||
| // A free cost schedule means the subnet charges nothing for resources. |
There was a problem hiding this comment.
I think we should still track the cost so that canister metrics could be updated based on the actual work done. Canister metrics are used on "free" subnets for cost accounting in user space.
| pub struct PayAsYouGoTracker { | ||
| /// Number of nodes (`N`) on the subnet. | ||
| subnet_size: NumberOfNodes, | ||
| /// Whether this responses to this outcalls are gossiped (only flexible and non-replicated). |
There was a problem hiding this comment.
| /// Whether this responses to this outcalls are gossiped (only flexible and non-replicated). | |
| /// Whether responses to this outcalls are gossiped (only flexible and non-replicated). |
| self.error_reported = true; | ||
| self.metrics | ||
| .shadow_incompatible_total | ||
| .with_label_values(&[step, self.replication.as_str()]) |
There was a problem hiding this comment.
Would it make sense to add a label to distinguish which one (real vs shadow) resulted in insufficient cycles? Such that you could easily query how many requests succeeded with the real but not with the shadow, which is what we are mostly interested in IIUC.
There was a problem hiding this comment.
Concerning reporting once per request, wouldn't we be interested to know if the trackers diverge at different steps?
| // + 50 * transformed_response_bytes_i * N + transform_instructions_i / 13 | ||
| const PER_DOWNLOADED_BYTE_FEE: u128 = 50; | ||
| const PER_RESPONSE_MS_FEE: u128 = 300; | ||
| const TRANSFORM_INSTRUCTION_DIVISOR: u128 = 13; |
There was a problem hiding this comment.
Just making sure, this doesn't need to be dynamic based on the actual size of the subnet, right?
Background
Currently, HTTPS outcalls are charged upfront based on a
max_response_bytesparameter. This is done by subtracting the full cost from the caller's payment. The remaining cycles are stored in the request context and refunded once a response is delivered.Instead, we want to introduce pay-as-you-go pricing, which charges cycles whenever resources are consumed. This happens in three stages:
The per-replica cost (2.) is calculated by the "budget tracker": This struct is instantiated with the per-replica cycles allowance, whenever a new request is starting to be processed by the HTTP adapter. As the response is downloaded, transformed and gossiped, the tracker charges the consumed cycles from the initial allowance. If at any point the remaining allowance does not cover an outstanding charge, an error is returned and a reject response is gossiped. If the initial allowance does cover all charges, the remaining cycles are refunded. To this end, the budget tracker creates a payment receipt which is gossiped alongside the response.
Before this PR only one budget tracker exited (
LegacyTracker) which doesn't compute any per-replica cost, and also doesn't refund anything.Proposed Changes
This PR introduces the "pay-as-you-go" budget tracker, whose purpose is to calculate the per-replica cost for outcalls using the "pay-as-you-go" pricing (note that such outcalls do not exist yet). This is done by implementing the per-replica part of the pricing formula defined here (internal). To do this, we additionally pass the subnet's size and cycle cost schedule to the HTTP adapter. This is needed to calculate the correct pricing.
"Pay-as-you-go" pricing charges for the amount of bytes that are gossiped explicitly in the case of flexible and non-replicated outcalls (where the whole response is gossiped). To do this, we move the truncation of oversized reject messages into the adapter, such that the correct payload length is charged.
Additionally, we implement and start to use a
DarkLaunchTracker. This tracker calculates both, the real (legacy, i.e. 0) and the new (pay-as-you-go) per-replica cost. In the end, only the "real" refund is gossiped. However, this allows us to compare both trackers, and observe whenever the pay-as-you-go tracker returns an out-of-cycles error, while the legacy tracker succeeds. Such an event indicates that the outcall would not be covered by enough cycles under the new pricing. In this case, the canister ID is logged and a metric increased.The legacy charging flow should be unchanged by this PR.