Skip to content

Conversation

@n0izn0iz
Copy link
Contributor

@n0izn0iz n0izn0iz commented Mar 14, 2025

Fixes #1827

Description

  • Adds a grpc service called Backup in the tendermint2 node that allows to stream blocks efficiently
    It has a single method StreamBlocks that take a start and end height. If end height is 0 it will stream to the latest height. It is disabled by default and require enabling it in the config.toml
  • Adds a contribs binary named tm2backup that pulls blocks from the backup service and store them in compressed 100-blocks files. It takes a start and end height as well as supporting resuming.
    The tar format was chosen to bundle blocks since it's widely supported and efficient. The zstandard format was chosen for compression because it's fast, has a good compression ratio and is widely supported.
  • Adds a restore subcommand to the gnoland binary that allows to replay blocks from a backup. It takes the options from the start subcommand as well as the backup directory and an optional end height.
    It will start at the current node height + 1.

The restore command can only restore at backupEndHeight-1 because I did not figure a way to commit block n without block n+1. I'd gladly take ideas on how to do that.

The backup is fast enough for now IMO (< 20min for test5 on my macbook) but can be optimized because it's not parallelized.
The restore bottleneck seems to be the gnovm currently but I would need to profile to be sure.

How to create a backup

  • Enable the backup service in your node's config.toml
    [backup]
    
    laddr = "localhost:4242"
  • (Re-)Start your node
  • Run the tm2backup command
    cd contribs/tm2backup
    tm2backup -o blocks-backup -remote http://localhost:4242
    Example output: Screenshot 2025-03-14 at 22 50 29

    (...)

    Screenshot 2025-03-14 at 22 51 27

How to create a node from a backup

  • Get the genesis file, for example:
    wget https://example.com/genesis.json
  • Run the restore command
    gnoland restore --lazy --backup-dir ../contribs/tm2backup/blocks-backup
  • Start your node
    gnoland start
    

TODO

  • blocks streaming grpc service
  • 100 blocks files generation command
  • restore command
  • find a way to restore up to backup height if possible
  • tests

@Gno2D2
Copy link
Collaborator

Gno2D2 commented Mar 14, 2025

🛠 PR Checks Summary

All Automated Checks passed. ✅

Manual Checks (for Reviewers):
  • IGNORE the bot requirements for this PR (force green CI check)
Read More

🤖 This bot helps streamline PR reviews by verifying automated checks and providing guidance for contributors and reviewers.

✅ Automated Checks (for Contributors):

🟢 Maintainers must be able to edit this pull request (more info)
🟢 Pending initial approval by a review team member, or review from tech-staff

☑️ Contributor Actions:
  1. Fix any issues flagged by automated checks.
  2. Follow the Contributor Checklist to ensure your PR is ready for review.
    • Add new tests, or document why they are unnecessary.
    • Provide clear examples/screenshots, if necessary.
    • Update documentation, if required.
    • Ensure no breaking changes, or include BREAKING CHANGE notes.
    • Link related issues/PRs, where applicable.
☑️ Reviewer Actions:
  1. Complete manual checks for the PR, including the guidelines and additional checks if applicable.
📚 Resources:
Debug
Automated Checks
Maintainers must be able to edit this pull request (more info)

If

🟢 Condition met
└── 🟢 And
    ├── 🟢 The base branch matches this pattern: ^master$
    └── 🟢 The pull request was created from a fork (head branch repo: n0izn0iz/gno)

Then

🟢 Requirement satisfied
└── 🟢 Maintainer can modify this pull request

Pending initial approval by a review team member, or review from tech-staff

If

🟢 Condition met
└── 🟢 And
    ├── 🟢 The base branch matches this pattern: ^master$
    └── 🟢 Not (🔴 Pull request author is a member of the team: tech-staff)

Then

🟢 Requirement satisfied
└── 🟢 If
    ├── 🟢 Condition
    │   └── 🟢 Or
    │       ├── 🔴 At least one of these user(s) reviewed the pull request: [jefft0 leohhhn n0izn0iz notJoon omarsy x1unix] (with state "APPROVED")
    │       ├── 🟢 At least 1 user(s) of the team tech-staff reviewed pull request
    │       └── 🔴 This pull request is a draft
    └── 🟢 Then
        └── 🟢 Not (🔴 This label is applied to pull request: review/triage-pending)

Manual Checks
**IGNORE** the bot requirements for this PR (force green CI check)

If

🟢 Condition met
└── 🟢 On every pull request

Can be checked by

  • Any user with comment edit permission

@zivkovicmilos zivkovicmilos requested review from sw360cab and zivkovicmilos and removed request for a team March 14, 2025 22:31
@n0izn0iz n0izn0iz changed the title feat: backup / restore feat: blocks backup / restore Mar 14, 2025
@jefft0
Copy link
Contributor

jefft0 commented Jul 24, 2025

Hi @n0izn0iz . The CI checks in master are fixed now. Do you want to merge master and run the tests again?

@n0izn0iz
Copy link
Contributor Author

it was up to date and the failing check is a flaky one I think. the gnokms merge from yesterday introduced conflicts that I will fix asap

@Kouteki Kouteki added this to the ⏭️Next after mainnet beta milestone Aug 4, 2025
@sw360cab sw360cab requested a review from ajnavarro September 3, 2025 12:43
@Kouteki Kouteki removed the request for review from a team November 27, 2025 12:36
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover?

Comment on lines +108 to +110
// XXX: we unmarshal the block which is remarshalled in the backup writer just after.
// This choice was made to have a more stable interface than bytes.
// If this proves to be a bottleneck, we should revist.
Copy link
Contributor

@ajnavarro ajnavarro Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the bottleneck for sure, but 100% not the main problem. What if instead of using amino as the endpoint output, we use JSON Lines format? To reduce bandwidth usage, we can compress the HTTP data using gzip, and instead of marshalling-unmarshalling-marshalling-unmarshalling, we can directly save that JSON to the backup file.


// StreamBlocks implements backuppbconnect.BackupServiceHandler.
func (b *backupServer) StreamBlocks(_ context.Context, req *connect.Request[backuppb.StreamBlocksRequest], stream *connect.ServerStream[backuppb.StreamBlocksResponse]) error {
startHeight := req.Msg.StartHeight
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, check that Msg and req are not nil

Comment on lines +78 to +92
for height := startHeight; height <= endHeight; height++ {
block := b.store.LoadBlock(height)
if block == nil {
return fmt.Errorf("block store returned nil block for height %d", height)
}

data, err := amino.Marshal(block)
if err != nil {
return err
}

if err := stream.Send(&backuppb.StreamBlocksResponse{Data: data}); err != nil {
return err
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some things to comment on this:

  • Fetching blocks one by one from the K/V storage is extremely slow. We need to use iterators for that.
  • We are getting byte arrays from the K/V storage, the LoadBlock method is unmarshalling that to a Block struct, after that we marshal again blocks to a byte array. After that, we unmarshal a StreamBlocksResponse into protobuf... That's a lot of unnecessary processing.

Comment on lines +394 to +397
if err := state.Validators.VerifyCommit(
chainID, firstID, first.Height, second.LastCommit); err != nil {
return fmt.Errorf("invalid commit (%d:%X): %w", first.Height, first.Hash(), err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a flag to skip commit verification? If we trust the backup, it can improve import speed.

return fmt.Errorf("invalid commit (%d:%X): %w", first.Height, first.Hash(), err)
}

bcR.store.SaveBlock(first, firstParts, second.LastCommit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to use insert Batches here. It will be an order of magnitude faster.

@jefft0 jefft0 removed the review/triage-pending PRs opened by external contributors that are waiting for the 1st review label Dec 10, 2025
@jefft0
Copy link
Contributor

jefft0 commented Dec 10, 2025

Reviewed by core dev ajnavarro. Ready for review by other core devs. (Still need to fix CI checks.)

@Villaquiranm
Copy link
Contributor

Reviewed by core dev ajnavarro. Ready for review by other core devs. (Still need to fix CI checks.)

I will work on top of this PR instead of @n0izn0iz
new PR is here: #4950

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🤝 contribs 🐳 devops 🛠️ gnodev 🐹 golang Pull requests that update Go code 📦 🌐 tendermint v2 Issues or PRs tm2 related 📦 ⛰️ gno.land Issues or PRs gno.land package related 📦 🤖 gnovm Issues or PRs gnovm related 🧾 package/realm Tag used for new Realms or Packages.

Projects

Status: In Progress
Status: In Review

Development

Successfully merging this pull request may close these issues.

[chain] Backup / Restore Functionality

8 participants