-
Notifications
You must be signed in to change notification settings - Fork 418
lightning-liquidity
persistence: Add serialization logic for services and event queue
#4059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 I see @TheBlueMatt was un-assigned. |
124211d
to
26f3ce3
Compare
a98dff6
to
d630c4e
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4059 +/- ##
==========================================
- Coverage 88.76% 88.54% -0.22%
==========================================
Files 176 178 +2
Lines 129518 130019 +501
Branches 129518 130019 +501
==========================================
+ Hits 114968 115127 +159
- Misses 11945 12269 +324
- Partials 2605 2623 +18
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d630c4e
to
70118e7
Compare
70118e7
to
dd43edc
Compare
this all LGTM. I have a small concern: maybe I’m being a little paranoid, but read_lsps2_service_peer_states and read_lsps5_service_peer_states pull every entry from the KVStore into memory with no limit. That could lead to unbounded state, exhausting memory and crash. Maybe we can add a limit on how many entries we load into memory to protect against this dos? not sure how realistic this is though. maybe an attacker could have access to or share the same storage with the victim, and they could dump effectively infinite data onto disk. in this scenario, probably the victim would be vulnerable to other attacks too, but still.. |
Reading state from disk (currently) happens on startup only, so crashing wouldn't be the worst thing, we would simply fail to start up properly. Some even argue that we need to panic if we hit any IO errors at this point to escalate to an operator. We could add some safeguard/upper bound, but I'm honestly not sure what it would protect against.
Heh, well, if we assume the attacker has write access to our |
🔔 1st Reminder Hey @TheBlueMatt! This PR has been waiting for your review. |
dd43edc
to
f73146b
Compare
entropy_source: ES, node_signer: NS, channel_manager: CM, chain_source: Option<C>, | ||
chain_params: Option<ChainParameters>, service_config: Option<LiquidityServiceConfig>, | ||
chain_params: Option<ChainParameters>, kv_store: Arc<dyn KVStore + Send + Sync>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the KVStore need to be dyn or an Arc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above #4059 (comment): generally dealing with all the generics gets very cumbersome. In this particular case, if we'd make it a generic, it would need to propagate to all the inner service handlers, in turn requiring we need to provide Sync
wrappers for all of them. FWIW, we might need such wrappers at least for some of them once we do in-line persistence for some of the API calls, but still dealing with both KVStore
and KVStoreSync
generics across the LiquidityManager
codebase would be a huge mess then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in turn requiring we need to provide Sync wrappers for all of them.
I'm confused by this claim. Anywhere a dyn KVStore
exists it should be replaceable with a generic KVStore
without any change in the semantics. If sync wrappers are needed, it should be the same in either case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by this claim. Anywhere a
dyn KVStore
exists it should be replaceable with a genericKVStore
without any change in the semantics. If sync wrappers are needed, it should be the same in either case.
In turn, could you expand on why dynamic dispatch is not acceptable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, for example it doesn't allow the compiler to dynamically determine Send + Sync
- #4059 (comment)
But also we really should try to keep our API consistent...
/// Wraps [`LiquidityManager::new`]. | ||
pub fn new( | ||
entropy_source: ES, node_signer: NS, channel_manager: CM, chain_source: Option<C>, | ||
chain_params: Option<ChainParameters>, kv_store_sync: Arc<dyn KVStoreSync + Send + Sync>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, I see no reason why KVStoreSync
needs to be dyn or an Arc? I also don't see why it needs to be Send
+ Sync
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
As to why we need Send + Sync
, it's required for spawning the async BP task. If we remove the bounds, we get a lot of errors like:
error[E0277]: `(dyn KVStore + 'static)` cannot be shared between threads safely
--> lightning-background-processor/src/lib.rs:2463:4
|
2455 | let bg_processor = BackgroundProcessor::start(
| -------------------------- required by a bound introduced by this call
...
2463 | Some(Arc::clone(&nodes[0].liquidity_manager)),
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `(dyn KVStore + 'static)` cannot be shared between threads safely
|
= help: the trait `std::marker::Sync` is not implemented for `(dyn KVStore + 'static)`
= note: required for `Arc<(dyn KVStore + 'static)>` to implement `std::marker::Sync`
...
error[E0277]: `(dyn KVStore + 'static)` cannot be sent between threads safely
--> lightning-background-processor/src/lib.rs:3376:25
|
3376 | let t1 = tokio::spawn(bp_future);
| ------------ ^^^^^^^^^ `(dyn KVStore + 'static)` cannot be sent between threads safely
| |
| required by a bound introduced by this call
|
= help: the trait `Send` is not implemented for `(dyn KVStore + 'static)`
= note: required for `Arc<(dyn KVStore + 'static)>` to implement `std::marker::Sync`
note: required because it appears within the type `lightning_liquidity::events::event_queue::EventQueue`
--> /home/tnull/workspace/rust-lightning/lightning-liquidity/src/events/event_queue.rs:24:19
in the background processor tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm, right, when you make it dyn you have to be more explicit vs keeping it generic allows for flexbility for sync users...
@@ -45,6 +46,10 @@ pub struct LSPS2GetInfoRequest { | |||
pub token: Option<String>, | |||
} | |||
|
|||
impl_writeable_tlv_based!(LSPS2GetInfoRequest, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to have two ways to serialize all these types? Wouldn't it make more sense to just use the serde
serialization we already have and wrap that so that it can't all be misused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think I'd be in favor of using TLV serialization for our own persistence.
Note that the compat guarantees of LSPS0/the JSON/serde format might not exactly match what we require in LDK, and our Rust representation might also diverge from the pure JSON impl. On top of that JSON is of course much less efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, is there some easy way to avoid exposing that in the public API, then? Maybe a wrapper struct oe extension trait for serialization somehow? Seems like kinda a footgun for users, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, is there some easy way to avoid exposing that in the public API, then? Maybe a wrapper struct oe extension trait for serialization somehow? Seems like kinda a footgun for users, I think?
Not quite sure I understand the footgun? You mean because these types then have Writeable
as well as Serialize
implementations on them and users might wrongly pick Writeable
when they use the types independently from/outside of lightning-liquidity
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, for example. Someone who uses serde presumably has some wrapper that serde-writes Writeable
structs and suddenly their code could read/compile totally fine and be reading the wrong kind of thing. If they have some less-used codepaths (eg writing Events before they process them and then removing them again after) they might not find immediately.
) -> Pin<Box<dyn Future<Output = Result<(), lightning::io::Error>> + Send>> { | ||
let outer_state_lock = self.per_peer_state.read().unwrap(); | ||
let mut futures = Vec::new(); | ||
for (counterparty_node_id, peer_state) in outer_state_lock.iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh? Why would we ever want to do a single huge persist pass and write every peer's state at once? Shouldn't we be doing this iteratively? Same applies in the LSPS2 service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, only persisting what's needed/changed will be part of the next PR as it ties into how we wake the BP to drive persistence (cf. "Avoid re-persisting peer states if no changes happened (needs_persist
flag everywhere)" bullet over at #4058 (comment)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused why we're adding this method then? If its going to be removed in the next PR in the series we should just not add it in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not gonna be removed, but extended: PeerState
(here as well as in LSPS2) will gain a dirty/needs_persist
flag and we'd simply skip persisting any entries that haven't been changed since the last persistence round.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like a weird design if we need to persist something immediately while its being operated on - we have the node in question why walk a whole peer list? Can you put up the followup code so we can see how its going to be used? Given this PR is mostly boilerplate I honestly wouldn't mind it being a bit bigger, as long as the code isn't too crazy.
We add `KVStore` to `LiquidityManager`, which will be used in the next commits. We also add a `LiquidityManagerSync` wrapper that wraps a the `LiquidityManager` interface which will soon become async due to usage of the async `KVStore`.
We add simple `persist` call to `LSPS2ServiceHandler` that sequentially persist all the peer states under a key that encodes their node id.
We add simple `persist` call to `LSPS5ServiceHandler` that sequentially persist all the peer states under a key that encodes their node id.
We add simple `persist` call to `EventQueue` that persists it under a `event_queue` key.
.. this is likely only temporary necessary as we can drop our own `dummy_waker` implementation once we bump MSRV.
We read any previously-persisted state upon construction of `LiquidityManager`.
We read any previously-persisted state upon construction of `LiquidityManager`.
We read any previously-persisted state upon construction of `LiquidityManager`.
f73146b
to
2971982
Compare
Rebased to address minor conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responded to the outstanding comments, not quite sure I fully get all the rationale here.
This is the second PR in a series of PRs adding persistence to
lightning-liquidity
(see #4058). As this is already >1000LoC, I now decided to put this up as an intermediary step instead of adding everything in one go.In this PR we add the serialization logic for for the LSPS2 and LSPS5 service handlers as well as for the event queue. We also have
LiquidityManager
take aKVStore
towards which it persists the respetive peer states keyed by the counterparty's node id.LiquidityManager::new
now also deserializes any previously-persisted state from that givenKVStore
. Note that so far we don't actually persist anything, as wiring upBackgroundProcessor
to drive persistence will be part of the next PR (which will also make further optimizations, such as only persisting when needed, and persisting some imporant things in-line).This also adds a bunch of boilerplate to account for both
KVStore
andKVStoreSync
variants, following the approach we previously took withOutputSweeper
etc.cc @martinsaposnic