Skip to content

Conversation

@ksn6
Copy link
Contributor

@ksn6 ksn6 commented Jan 24, 2026

Problem and Summary of Changes

consensus_metrics.rs is currently written incorrectly.

We can't assume the invariant that maybe_new_epoch is invoked with a monotonically non-decreasing epoch, given that blocks from different forks may play out of order during replay.

To rectify, this PR introduces a map from epoch to metrics + updates the appropriate structure upon receiving a block.

When it comes to metrics emission, suppose that slot S is the highest possible slot in some epoch called S.epoch. We emit "end of epoch" metrics for S.epoch at the earlier of:

  • We have a finalized slot in the following epoch, S.epoch + 1
  • If S finalizes (if S doesn't finalize, then default to (1)).

@ksn6 ksn6 force-pushed the rewrite-consensus-metrics branch from 88814c4 to 9efe7d0 Compare January 24, 2026 04:46
@ksn6 ksn6 requested review from AshwinSekar and akhi3030 January 24, 2026 04:53
@ksn6 ksn6 force-pushed the rewrite-consensus-metrics branch 2 times, most recently from 3a5a65b to cf20225 Compare January 24, 2026 19:31
@ksn6 ksn6 changed the title feat: improve consensus metrics feat: rewrite consensus metrics to not assume epoch monotonicity Jan 24, 2026
@ksn6 ksn6 force-pushed the rewrite-consensus-metrics branch 2 times, most recently from 9038dea to 10b8ed1 Compare January 24, 2026 19:57
@ksn6 ksn6 force-pushed the rewrite-consensus-metrics branch from 10b8ed1 to d1cc352 Compare January 24, 2026 20:22
Copy link
Contributor

@AshwinSekar AshwinSekar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm fine with the overall direction of this change. I'd like to give a chance for @akhi3030 to review this and the histogram change.

He's out this week, but given that we've already removed the problematic assert we should be good to wait until he gets back.

Copy link
Contributor

@akhi3030 akhi3030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this problem and for the fix. Generally looks good a few small comments though.

};

let root_epoch = sharable_banks.root().epoch();
let epoch_schedule = sharable_banks.root().epoch_schedule().clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember seeing any code so far where we were able to copy the epoch schedule out of the bank. I guess I assumed that it is possible that the epoch schedule can change in future banks (and that is why it is stored in banks). In other words, just want to make sure that this is indeed safe and would the metrics container need access to sharable_banks instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be safe - bank.rs sets its epoch schedule by cloning from GenesisConfig while children build their epoch schedules by cloning from their respective parents.

@ksn6 ksn6 force-pushed the rewrite-consensus-metrics branch from d1cc352 to 9b751c7 Compare February 2, 2026 21:55
@ksn6 ksn6 requested review from AshwinSekar and akhi3030 February 2, 2026 23:19
Copy link
Contributor

@akhi3030 akhi3030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants