feat: Add balance validation to wrapped block validator#2184
feat: Add balance validation to wrapped block validator#2184rockysingh wants to merge 12 commits intohiero-ledger:mainfrom
Conversation
752d5b0 to
6b30bc3
Compare
jsync-swirlds
left a comment
There was a problem hiding this comment.
Looks good, one note regarding var usage for future reference.
...nd-tests/tools/src/main/java/org/hiero/block/tools/blocks/wrapped/WrappedBlockValidator.java
Outdated
Show resolved
Hide resolved
| @Option( | ||
| names = {"--validate-balances"}, | ||
| description = "Enable validation of account balances against CSV balance files from GCP") | ||
| private boolean validateBalances = false; |
There was a problem hiding this comment.
Feel like we should make this true by default. Also we should add local cache for downloaded balance files. So we only download once and can validate many times.
| @Option( | ||
| names = {"--balance-start-day"}, | ||
| description = "Start day for balance validation in format YYYY-MM-DD (e.g., 2019-09-13)") | ||
| private String balanceStartDay; |
There was a problem hiding this comment.
Does this only effect balance verification? I think it is not needed. We should validate complete date range. We could add option start/end date for all validation. For balance files we can hard code the first and last available dates for mainnet. We might need to add a network config property.
| @Option( | ||
| names = {"--gcp-project"}, | ||
| description = "GCP project for requester-pays bucket access") | ||
| private String gcpProject; |
There was a problem hiding this comment.
This should pickup from environment variable by default. Like we do else where.
| names = {"--verify-signatures"}, | ||
| description = "Verify balance file signatures (requires --address-book)", | ||
| defaultValue = "false") | ||
| private boolean verifySignatures; |
There was a problem hiding this comment.
This should be true always I think. It is key to the trust of the data and should not be slow. Especially once balance files are downloaded.
| names = {"--cache-dir"}, | ||
| description = "Directory for caching downloaded files", | ||
| defaultValue = "data/gcp-cache") | ||
| private Path cacheDir; |
There was a problem hiding this comment.
I wonder if we should do some special caching. The reason I say that is I have tool now for converting newer saved states into balance files. I would like to be able to drop them into a directory we could use. Or maybe seperate dir for custom downloaded files. I have for example accountBalances_89270840.pb.gz and accountBalances_91019204.pb.gz. The numbers in file name are the block number they are for. We will not have signatures for them so can't do that part of validation. But the tool that converted checked the saved state sigantures.
| * @param block the block to extract timestamp from | ||
| * @return the block timestamp as Instant, or null if not found | ||
| */ | ||
| private static Instant extractBlockTimestamp(Block block) { |
There was a problem hiding this comment.
Seems like a good util method for our TimeUtils class.
| @Parameters(index = "0..1", description = "Block files, directories, or zip archives to process") | ||
| private File[] files; | ||
|
|
||
| @Option( |
There was a problem hiding this comment.
It would be good to have a option to pick granularity for how often we check balances file. I am thinking in days, maybe every 7 once a week. Just want to balance amount of downloads vs time to failure. Seems like a week or even month would be fine.
There was a problem hiding this comment.
I've changed it to a month for now.
6b30bc3 to
4437c02
Compare
f6e1868 to
f55a284
Compare
f9a074d to
964cfca
Compare
| * </ul> | ||
| */ | ||
| @SuppressWarnings("CyclomaticComplexity") | ||
| public class BalanceProtobufParser { |
There was a problem hiding this comment.
You should not need to manual parse as you can just pass max size into PBJ parse method. We will also need to parse the tokens as well as balances.
| * @param amendments the amendment items (missing transactions) to merge in | ||
| * @return a new list containing all items sorted by consensus timestamp | ||
| */ | ||
| private static List<RecordStreamItem> mergeRecordStreamItems( |
There was a problem hiding this comment.
This is not right, the amendments can replace transactions in the original file as well as add new transactions.
| * @param dayPrefix the day prefix in format "YYYY-MM-DD" (e.g., "2019-09-13") | ||
| * @return list of Instant timestamps for available balance files | ||
| */ | ||
| public List<Instant> listBalanceTimestampsForDay(String dayPrefix) { |
There was a problem hiding this comment.
Feels like we should be able to target midnight balance file more directly than doing a list operation for every day needed.
| * <p>Also supports loading custom balance files in the format | ||
| * {@code accountBalances_{blockNumber}.pb.gz} from a directory. | ||
| */ | ||
| public class BalanceCheckpointsLoader { |
There was a problem hiding this comment.
Feels like we should just be able to use balance file protobuf format. Not sure why we need another file format. Also this doesn't cover tokens.
There was a problem hiding this comment.
There was an issue with the file size. Let me see if I can get around it.
There was a problem hiding this comment.
This has been addressed.
f040b08 to
f19be68
Compare
jsync-swirlds
left a comment
There was a problem hiding this comment.
Somehow we seem to have added the giant list of cache files here too.
I recommend adding data/gcp-cache to the .gitignore file to prevent adding these again.
Add optional balance file validation that compares computed account
balances against signed protobuf balance files from GCP mainnet bucket.
This provides per-account verification in addition to the existing
50 billion HBAR supply check.
- Add BalanceFileBucket to download balance files from GCP
- Add BalanceCsvValidator to compare balances at checkpoints
- Add signature verification for balance files using address book
- Update WrappedBlockValidator to process amendments in balance tracking
- Add CLI options: --validate-balances, --balance-start-day,
--balance-end-day, --address-book, --verify-signatures, --gcp-project
Balance files are available from September 2019 through October 23, 2023
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
(default: 30 days/monthly) to control how often balance checkpoints
are validated
- Add setCheckIntervalDays() to BalanceCheckpointValidator with
filtering logic based on ~20,000 blocks/day
- Update README.md command tree with fetchBalanceCheckpoints and
validate-wrapped commands
- Add comprehensive documentation for fetchBalanceCheckpoints command
including options, prerequisites, output format, and examples
- Update validate-wrapped documentation with new balance validation
options and clarify relationship between fetch and validation intervals
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Address PR review comments:
- Rename BalanceCsvValidator to BalanceProtobufValidator (downloads .pb.gz files)
- Change checkpoint file format to length-prefixed protobufs:
[blockNumber (8 bytes)][length (4 bytes)][raw protobuf bytes]
- Add token balance support via BalanceProtobufParser.parseWithTokens()
- Add direct midnight targeting in BalanceFileBucket.downloadMidnightBalanceFile()
- Update mergeRecordStreamItems to handle replacements (same timestamp = replace)
- Remove unused BalanceProtobufParser.parseAndWrite() method
Note: Existing .zstd checkpoint files need regeneration with new format.
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Remove validateCheckpoint(HBAR-only) overloads and parseProtobufBalances method that are no longer called after token balance support was added. Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
f19be68 to
8e01a28
Compare
Remove 128 cached GCS balance files that were accidentally committed. Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Prevent cached GCS balance files from being accidentally committed. Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
65cbeb5 to
56bc4ae
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. @@ Coverage Diff @@
## main #2184 +/- ##
============================================
- Coverage 81.08% 80.96% -0.12%
+ Complexity 1463 1460 -3
============================================
Files 139 139
Lines 6757 6757
Branches 727 727
============================================
- Hits 5479 5471 -8
- Misses 957 968 +11
+ Partials 321 318 -3 see 2 files with indirect coverage changes 🚀 New features to boost your workflow:
|
Summary
fetchBalanceCheckpointscommand to pre-compile balance checkpoints into compressed resource fileslastMerkleLeaf.binoutput to wrap command for quick CN access to final block hashDescription
This adds per-account balance verification in addition to the existing 50 billion HBAR supply check. Balance files were published every ~15 minutes and contain signed snapshots of all account balances until October 2023. By comparing our
computed balances against these signed files at checkpoint timestamps, we can verify that transactions are being processed correctly.
New CLI options for
validate-wrapped:--validate-balances--balance-check-interval-days--balance-checkpoints--address-bookNew command:
blocks fetchBalanceCheckpointsFetches balance files from GCP, verifies signatures, and compiles them into a compressed resource file for offline validation:
--interval-days--start-day/--end-day--skip-signaturesPre-compiled resource files:
balance_checkpoints_monthly.zstd- 32 checkpoints (~14MB) for monthly validationbalance_checkpoints_weekly.zstd- 136 checkpoints (~20MB) for weekly validationlastMerkleLeaf.bin
The wrap command now outputs
lastMerkleLeaf.bincontaining the final block number (8 bytes) and block hash (48 bytes, SHA-384), allowing CN to quickly access the latest block hash without loading the full merkle tree. The value is trackedin memory during processing and written once at the end (or on shutdown) for efficiency.
Files added/modified:
BalanceFileBucket.java- Downloads balance files from GCP bucketBalanceProtobufParser.java- Manual protobuf parser to bypass PBJ's 2M account limitBalanceCheckpointsLoader.java- Loads pre-compiled checkpoint filesFetchBalanceCheckpointsCommand.java- CLI command to fetch and compile checkpointsValidateWrappedBlocksCommand.java- Added CLI options for balance validationWrappedBlockValidator.java- Fixed to process amendments in balance trackingToWrappedBlocksCommand.java- Added lastMerkleLeaf.bin outputLimitations
Balance files are available from September 2019 through October 23, 2023. Post-October 2023 validation will be addressed in a follow-up PR.
Test plan
This is partial work for: #2173