-
Notifications
You must be signed in to change notification settings - Fork 90
Event Log Rotation #1581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As it was only mentioned in passing in this item, we might want to scope separate item(s) about the memory growth in:
|
Created #1618 to cover the API server part of tackling memory growth. |
What remains to do here is the |
While the midnight glacier drop originally was a use case that could benefit from this, the architect of that solution was actually happy to hear that we keep the full log of all transaction (and don't squash things into checkpoints). Also disk usage after millions of transactions was not a topic for them anymore. Which leaves us only with restart times as a possible motivating purpose for this issue. Checkpoints are not the only possible solution for that as the storage format, encoding and in general the code has still lots of room for optimization. See also #1585 |
We need to understand the impact of this request to the ?history API.
Maybe its enough to make it configurable somehow. |
While startup times are probably the lowest hanging fruit to motivate this issue, the thinking when I wrote this had very little to do with startup times. I also wasn't envisioning this would (by default) discard the transaction history, but to segment it so that someone can operationalize around those segments to meet whatever their organizational needs are based on those segments. This comes up a lot in large enterprise systems, particularly around things like logs or event streams. Large scale enterprise deployments of Hydra are likely going to need to do some combination of the following:
|
Thanks for adding more context @Quantumplation! Would you be fine with this first initial scope?
|
yes, that seems totally reasonable, though |
Why
While working on the hydra-doom project, we noticed that both the on-disk state and the in memory state grew without bound (see #1572)
This meant that, at the sustained load that the hydra doom demo was producing, nodes became inoperable after just a few hours. The hack in #1572 helped, but on-disk state still needed to be rotated regularly, by hand.
This consisted of stopping the nodes, renaming the
data
directory, bringing the nodes back up, and then shipping thedata
directory off to archival storage. And this only worked because we were using offline nodes and didn't mind interrupting the head.What
I'd like to propose that the hydra head implement checkpointing for the event log.
How
This is just a proposed implementation, feel free to adapt to better fit the intricacies of the hydra codebase.
data/seq-0/state
ordata/seq-12345/state
This would allow a 3rd party agent to detect the checkpoint and trigger any appropriate archival / backup / cleanup that was needed, without interrupting the hydra head, hydra heads would be able to recover faster after a failure, and memory usage would be kept within a bounded limit.
Again, I'm super unfamiliar with the hydra codebase, so there might be more subtleties that are needed, but I just wanted to get the ball rolling on a discussion :)
The text was updated successfully, but these errors were encountered: