-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEP-568: Resharding V3 #568
base: master
Are you sure you want to change the base?
Conversation
Hi @wacban – thank you for starting this proposal. As the moderator, I labeled this PR as "Needs author revision" because we assume you are still working on it since you submitted it in "Draft" mode. |
Filling in the future possibilities section - and looking for more ideas :)
Filling the specification section about flat state.
This is complete except for section `Handling buffered receipts that target parent shard` which is still being discussed.
Some beautification courtesy ChatGPT. I double checked everything, we aren't changing any meaning.
As the moderator, I believe this proposal is ready for SME review. @near/wg-protocol , could you help assign SMEs who can review the proposal? Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a reviewer I would like to thank the team for putting together such a well-written and detailed proposal! Designing this feature was no easy task, and the proposal here seems to address all the intricacies nicely.
I have left two questions, but otherwise the design looks good to me.
|
||
#### Flat State's Status Persistence | ||
|
||
Every shard's Flat State has a status associated with it and stored in the database, called `FlatStorageStatus`. We propose extending the existing object by adding a new enum variant named `FlatStorageStatus::Resharding`. This approach has two benefits. First, the progress of any Flat State resharding is persisted to disk, making the operation resilient to a node crash or restart. Second, resuming resharding on node restart shares the same code path as Flat State creation (see `FlatStorageShardCreator`), reducing code duplication. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the node crashes during resharding then the HybridArena
MemTrie objects also need to be reconstructed to resume block processing. How does this recovery happen? I assume it is easy to reconstruct the FrozenArena
from the Flat state, but what about the changes on top of it? Are they persisted somewhere? If they are not then I suppose the node would need to recreate the MemTrie by re-applying the chunks that have happened so far in the epoch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Trisfald Can you reply please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the HybridArena Memtrie will need to be reconstructed upon crash or even a simple restart.
I think at the moment stage of nearcore/master
we don't support seamless node restarts yet. There are a couple possible solution to this problem since all changes to the state are always persisted, in a way or another. In increasing order of complexity:
- Wait for flat storage to complete resharding and load the memtrie for children shards as usual
- Recreate the memtrie of the parent shard as it was at the resharding block, split it again, and apply changes
- Create memtries from current flat storage state of children (even if not complete) and apply remaining changes
- More complex solutions which leverage decoupling of memtrie from flat storage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification @Trisfald . I brought this up here because crash recovery is mentioned as part of the motivation for the design. Maybe it is worth adding a section to the MemTrie specification about how restarts during resharding will be handled.
@mfornet , could you be able to review this NEP? |
@mfornet , another ping to make sure it catches your eyes. As we are trying to release resharding v3 with upcoming release, it will be great if you could share your thoughts. Thank you. |
NEP Status (Updated by NEP Moderators)
Status: Review
SME reviews: