-
Notifications
You must be signed in to change notification settings - Fork 441
feat: blocks backup / restore #3946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
🛠 PR Checks SummaryAll Automated Checks passed. ✅ Manual Checks (for Reviewers):
Read More🤖 This bot helps streamline PR reviews by verifying automated checks and providing guidance for contributors and reviewers. ✅ Automated Checks (for Contributors):🟢 Maintainers must be able to edit this pull request (more info) ☑️ Contributor Actions:
☑️ Reviewer Actions:
📚 Resources:Debug
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
|
Hi @n0izn0iz . The CI checks in master are fixed now. Do you want to merge master and run the tests again? |
|
it was up to date and the failing check is a flaky one I think. the gnokms merge from yesterday introduced conflicts that I will fix asap |
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
Signed-off-by: Norman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover?
| // XXX: we unmarshal the block which is remarshalled in the backup writer just after. | ||
| // This choice was made to have a more stable interface than bytes. | ||
| // If this proves to be a bottleneck, we should revist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of the bottleneck for sure, but 100% not the main problem. What if instead of using amino as the endpoint output, we use JSON Lines format? To reduce bandwidth usage, we can compress the HTTP data using gzip, and instead of marshalling-unmarshalling-marshalling-unmarshalling, we can directly save that JSON to the backup file.
|
|
||
| // StreamBlocks implements backuppbconnect.BackupServiceHandler. | ||
| func (b *backupServer) StreamBlocks(_ context.Context, req *connect.Request[backuppb.StreamBlocksRequest], stream *connect.ServerStream[backuppb.StreamBlocksResponse]) error { | ||
| startHeight := req.Msg.StartHeight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, check that Msg and req are not nil
| for height := startHeight; height <= endHeight; height++ { | ||
| block := b.store.LoadBlock(height) | ||
| if block == nil { | ||
| return fmt.Errorf("block store returned nil block for height %d", height) | ||
| } | ||
|
|
||
| data, err := amino.Marshal(block) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| if err := stream.Send(&backuppb.StreamBlocksResponse{Data: data}); err != nil { | ||
| return err | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some things to comment on this:
- Fetching blocks one by one from the K/V storage is extremely slow. We need to use iterators for that.
- We are getting byte arrays from the K/V storage, the
LoadBlockmethod is unmarshalling that to aBlockstruct, after that we marshal again blocks to a byte array. After that, we unmarshal a StreamBlocksResponse into protobuf... That's a lot of unnecessary processing.
| if err := state.Validators.VerifyCommit( | ||
| chainID, firstID, first.Height, second.LastCommit); err != nil { | ||
| return fmt.Errorf("invalid commit (%d:%X): %w", first.Height, first.Hash(), err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a flag to skip commit verification? If we trust the backup, it can improve import speed.
| return fmt.Errorf("invalid commit (%d:%X): %w", first.Height, first.Hash(), err) | ||
| } | ||
|
|
||
| bcR.store.SaveBlock(first, firstParts, second.LastCommit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to use insert Batches here. It will be an order of magnitude faster.
|
Reviewed by core dev ajnavarro. Ready for review by other core devs. (Still need to fix CI checks.) |
Fixes #1827
Description
Backupin the tendermint2 node that allows to stream blocks efficientlyIt has a single method
StreamBlocksthat take a start and end height. If end height is 0 it will stream to the latest height. It is disabled by default and require enabling it in theconfig.tomlcontribsbinary namedtm2backupthat pulls blocks from the backup service and store them in compressed 100-blocks files. It takes a start and end height as well as supporting resuming.The tar format was chosen to bundle blocks since it's widely supported and efficient. The zstandard format was chosen for compression because it's fast, has a good compression ratio and is widely supported.
restoresubcommand to the gnoland binary that allows to replay blocks from a backup. It takes the options from thestartsubcommand as well as the backup directory and an optional end height.It will start at the current node height + 1.
The restore command can only restore at
backupEndHeight-1because I did not figure a way to commit blocknwithout blockn+1. I'd gladly take ideas on how to do that.The backup is fast enough for now IMO (< 20min for test5 on my macbook) but can be optimized because it's not parallelized.
The restore bottleneck seems to be the gnovm currently but I would need to profile to be sure.
How to create a backup
config.tomltm2backupcommandcd contribs/tm2backup tm2backup -o blocks-backup -remote http://localhost:4242How to create a node from a backup
TODO
find a way to restore up to backup height if possible