Skip to content

feat: improve lookup table overhead for cloning pipeline #457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: dev
Choose a base branch
from

Conversation

thlorenz
Copy link
Collaborator

@thlorenz thlorenz commented Jul 15, 2025

Summary

This PR fixes the performance regression introduced in the previous this PR, by not waiting for the lookup table transaction to succeed.
Instead it just logs the result and ensures right before committing an account that the necessary pubkeys are in place.

Details

Table Mania Enhancements

  • added ensure_pubkeys_table() method to guarantee pubkeys exist in lookup tables without increasing reference counts
  • implemented get_pubkey_refcount() method for querying refcount of pubkeys across tables
  • ix test for the new ensure functionality

Committor Service Optimizations

  • Modified commit process to ensure all pubkeys have tables before proceeding with transactions
  • Improved parallel processing by moving table reservation to async spawned tasks

General Improvements

Integration Testing Improvements

  • Added individual make targets for all integration tests (test-schedulecommit, test-cloning, test-committor, etc.)
  • Renamed list-tasks to list in the Makefile
  • Enhanced test runner output with better test name reporting for clearer failure diagnostics

Performance

Performance is back to what it was on master, i.e. the first clone no longer takes much longer
than subsequent clones. For comparison here are the performance results, for more details see
this PR

master

+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Metric                        | Observations | Median | Min  | Max   | Avg  | 95th Perc | Stddev |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| sendTransaction Response (μs) | 2000         | 2690   | 1116 | 12516 | 3340 | 7159      | 1801   |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Account Update (μs)           | 0            | 0      | 0    | 0     | 0    | 0         | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Signature Confirmation (μs)   | 2000         | 2756   | 1125 | 12559 | 3433 | 7494      | 1901   |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Transactions Per Second (TPS) | 51           | 40     | 40   | 40    | 40   | 40        | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+

new committor

+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
| Metric                        | Observations | Median | Min  | Max     | Avg   | 95th Perc | Stddev |
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
| sendTransaction Response (μs) | 2000         | 2219   | 1104 | 5525915 | 39628 | 3647      | 329493 |
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
| Account Update (μs)           | 0            | 0      | 0    | 0       | 0     | 0         | 0      |
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
| Signature Confirmation (μs)   | 2000         | 2247   | 1116 | 5525967 | 39668 | 3730      | 329498 |
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
| Transactions Per Second (TPS) | 52           | 40     | 13   | 40      | 39    | 17        | 4      |
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+

We can see that there is a huge deviation in this branch due to the first clone taking a lot
longer as it reserves the pubkeys needed to commit the cloned account in a lookup table.

new committor on this branch

+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Metric                        | Observations | Median | Min  | Max   | Avg  | 95th Perc | Stddev |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| sendTransaction Response (μs) | 2000         | 1827   | 1250 | 19942 | 1929 | 2537      | 705    |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Account Update (μs)           | 0            | 0      | 0    | 0     | 0    | 0         | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Signature Confirmation (μs)   | 2000         | 1867   | 1272 | 21436 | 2003 | 2768      | 848    |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Transactions Per Second (TPS) | 51           | 40     | 40   | 40    | 40   | 40        | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+

We can see that the deviation is back to a sane amount.
In this case the max is still higher than on master, but that could be an outlier.

I ran the perf test another time and confirmed this:

+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Metric                        | Observations | Median | Min  | Max   | Avg  | 95th Perc | Stddev |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| sendTransaction Response (μs) | 2000         | 1889   | 1221 | 11291 | 1977 | 2656      | 471    |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Account Update (μs)           | 0            | 0      | 0    | 0     | 0    | 0         | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Signature Confirmation (μs)   | 2000         | 1922   | 1354 | 40313 | 2064 | 2847      | 1047   |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Transactions Per Second (TPS) | 51           | 40     | 40   | 40    | 40   | 40        | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+

thlorenz added 14 commits July 14, 2025 11:15
This commit introduces the  method to the  manager. This method allows
ensuring that a set of pubkeys are present in lookup tables without
incrementing their reference count if they already exist. If a pubkey is
not in any table, a new table is created for it, and the refcount is
initialized to 1.

To support this and provide better introspection, the following was added:

-  to query the reference count of a pubkey within a specific table.
-  to query the reference count of a pubkey across all active tables.

Integration tests are included to verify the behavior of  in different scenarios, such as when pubkeys already exist or when they are new.
…nz/committor-improve-table-speed

* thlorenz/committor-increase-compute-budget:
  chore: adding safety multiplier to table mania CU budgets
- this added more complexity and we can treat it just like any table
  mania error instead, keeping the code much simpler
- left in some refactorings done while adding the special handling,
  since they improved the code quality
* dev:
  fix: prevent new subscriptions if no free connections exist (#456)
  chore: avoid CU budget exceeded for lookup table management (#455)
  Cleanup and refactor accountsdb index (#419)
  chore: create dev branch
Copy link
Contributor

@taco-paco taco-paco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Minor comments

@@ -275,22 +276,34 @@ impl CommittorProcessor {
.await;
commit_stages.extend(failed_finalize.into_iter().flat_map(
|(sig, infos)| {
fn get_sigs_for_bundle(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds more lines of code than before + extra indirection

@@ -378,19 +410,12 @@ impl CommittorProcessor {
CommitStage::FailedUndelegate((
x.clone(),
CommitStrategy::args(use_lookup),
CommitSignatures {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version much more explicit and direct imo

fn get_refcount(&self, pubkey: &Pubkey) -> Option<usize> {
self.pubkeys
.get(pubkey)
.map(|count| count.load(Ordering::SeqCst))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.map(|count| count.load(Ordering::SeqCst))
.map(|count| count.load(Ordering::Relaxed))

Such stron ordering isn't required here

// 1. Check which pubkeys already exist in any table
for pubkey in pubkeys {
let mut found = false;
for table in self.active_tables.read().await.iter() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.active_tables.read().await acquired on every pubkey. Could be locked once outside the cycle.
Note: needs to be dropped before self.reserve_new_pubkeys

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants