-
Notifications
You must be signed in to change notification settings - Fork 8
Refactor networking #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Refactor networking #66
Conversation
Ports under 1024 are priveledged and were failing (at least on Linux) when running as a normal user.
We already have oxen-libquic, oxen-logging, and nlohmann via lokinet, so get them via that nested submodule rather than having a duplicated submodule in libsession-util itself. Also removes the macos workaround call to `oxen_logging_add_source_dir` because that directive no longer does anything.
- LOKINET_EMBEDDED=ON is replaced with LOKINET_FULL=OFF - LOKINET_BOOTSTRAP was removed - LOKINET_DAEMON=OFF is not strictly needed (it should be default) but makes it clear what we're doing.
- Load libquic before oxenc/oxen-logging so that libquic has a chance to set up its oxen-logging, etc. targets before libsession tries. - Remove unneeded settings to disable tests/docs (these are [now] the dependency defaults when not doing a top-level project build). - Update to depend on proper lokinet::liblokinet target.
• Added missing config options • Added exponential backoffs for retries (and a retry limit for path building) • Fixed a couple of issues with the logic to finish refreshing the snode pool
• Fixed a use-after-move issue • Fixed an issue where the OnionRequestRouter would start trying to make requests before the SnodePool bootstrap was completed
• Added a missing import • Updated the OnionRequestRouter to wait for the SnodePool to be populated before allowing any requests to be sent • Updated the SnodePool to make ephemeral connections to refresh it's cache (that way we won't always use seed node connections for subsequent requests on new accounts) • Fixed some use-after-move issues • Fixed an issue were the SnodePool bootstrap request response wasn't being handled
• Fixed an infinite loop with the OnionRequestRouter refreshing the SnodePool while a refresh was already running • Fixed an edge-case where the SnodePool wouldn't trigger a refresh when all nodes are marked as failed
• Added parse and expose the general network settings the clients use (network_time_offset, hardfork_version, softfork_version) • Added error handling from old logic • Added 421 retry handling • Fixed an issue where retrying the snode refresh would cause a deadlock
• Added initial LokinetRouter wrapper • Added changes that were missing from previous commit • Updated QuicTransport to be able to send requests to RemoteAddress directly
• Added factory functions for the FileServer endpoints the clients use • Ran the formatter • Fixed a linker error • Fixed a bug where we were incorrectly reporting successful responses as failures
• Added a log when succeeding after a 421 retry (old code had it) • Added logic to mark a node as failed after a QUIC handshake timeout • Added a connection status hook and logic to track the connection status • Added a function to retrieve the current active paths (TODO for the LokinetRouter)
• Added logic so the OnionRequestRouter can observe connection failures to it's guard nodes and trigger path rebuilds when they happen • Fixed an issue where paths in 'single_path_mode' wouldn't get rebuild
• Renamed the `ENABLE_ONIONREQ` flag to `ENABLE_NETWORKING` • Started working on unit tests
• Updated to the latest looking • Tweaked the lokinet config to listen on a random port (multiple simulators we colliding without this) • Fixed a bug where we would incorrectly use the timestamp value returned from a server for the network offset time (some server return seconds instead of milliseconds which break things) • Started fixing up unit tests
• Added the `DirectRouter` • Added unit tests for the SnodePool `get_unused_nodes` function • Updated SnodePool to use `weak_ptr` everywhere to avoid invalid memory crashes during tests • Removed old outdated unit tests • Fixed a bug where the RequestQueue could incorrectly start checking for request timeouts even though it didn't need to
# Conflicts: # external/oxen-libquic # src/session_network.cpp # tests/test_session_network.cpp
// Use 'call_get' to force this to be synchronous | ||
if (_loop) | ||
_loop->call_get([this] { _close_connections(); }); | ||
log::debug(cat, "[OnionRequestRouter] Destroyed."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these [OnionRequestRouter]
embedded strings feels like a duplication of what the log::Cat is meant to be. Can we change this around to auto cat = oxen::log::Cat("OnionRequestRouter");
and then drop all of these redundant message prefixes?
|
||
// Attempty to verify connectivity to the guard node | ||
_pending_paths[path_id] = path_nodes; | ||
auto guard_node = path_nodes.front(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do a global rename of "guard" to "edge", so that we have matching terminology between onion requests and (lokinet) onion routing?
constexpr auto ENDPOINT_FILE = "file"; | ||
} // namespace | ||
|
||
Request upload( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to rethink the API for upload/download, because uploads and downloads will no longer be generic "requests" soon (i.e. with fileserver on lokinet) but instead will be per-stream quic transfers, and so shoehorning it into a "Request" means we will have to break the API when we add streamed upload/downloads. (Lokinet requests also don't have headers, but rather use a dict prepending the file stream data).
Basically, when we use onion requests, we need the file server host, pubkey, endpoint, and so on, but when you are doing lokinet, the transfer will be entirely different and we need something that can accommodate both approaches so that we don't have to break any binding code when we add the alternative.
That means that we need an abstraction that represents a "upload"/"download"/"get_client_version" where the host and keys are implementation details (i.e. inside the onion request handler, or entirely different lokinet keys/handling/etc. in the lokinet handler) rather than request components that can be determined out here when constructing the Request object, simply because of how different these mechanisms are going to be.
The other thing that we likely want is a different callback mechanism that is going to work seamlessly as we transition to streaming encryption and file transfers. This is the current interface for submitting a request:
virtual void send_request(Request request, network_response_callback_t callback);
where you just have a one-shot, all-done callback. That's fine for most requests, but for uploads/downloads we are going to want something different.
Just to brainstorm ideas, something like this would work fine for streaming, and wouldn't be too burdensome for onion request uploads/downloads:
struct file_metadata {
std::string id;
int size;
std::chrono::sys_seconds uploaded;
std::chrono::sys_seconds expiry;
};
struct UploadRequest {
UploadRequest(
std::function<std::vector<unsigned char>> next_data,
std::optional<std::string> file_name,
std::chrono::milliseconds stall_timeout,
std::function<void(UploadRequest& req, std::variant<file_metadata, int16_t> info_or_errcode, bool timeout)> on_complete
);
// ...
};
struct DownloadRequest {
DownloadRequest(
std::string file_id,
std::chrono::milliseconds stall_timeout,
std::function<void(DownloadRequest& req, std::variant<file_metadata, int16_t> info_or_errcode, bool timeout)> on_complete,
std::function<void(DownloadRequest& req, const file_metadata& info, std::vector<unsigned char>)>
on_data = nullptr,
std::chrono::milliseconds partial_min_interval = 250ms
);
// ...
};
// And in IRouter:
class IRouter {
public:
// ...
virtual void upload(std::shared_ptr<UploadRequest> up) = 0;
virtual void download(std::shared_ptr<DownloadRequest> up) = 0;
};
where stall_timeout
is a timeout that fires when nothing has progressed (no successful path build, or no new data could be sent (upload) or has been received (download) in the given time interval). This is a bit like request_timeout
, except that it resets every time something progresses so that large uploads/downloads on slow connections still work without timing out.
For an upload, next_data
will be called repeatedly as more data can be sent to the remote until it returns an empty vector (signaling the end of the upload) or throws an exception (cancelling the upload). It can simply return everything in one go if it already has it in memory, but also allows for avoiding needing to slam everything into ram, especially once we chain it with encryption streaming.
For a download, data would get passed as it arrives (but at most once every partial_min_interval
, so that the caller can decide how to balance callback overhead with memory usage), and then once all data has been received successfully or an error occurs (which could be partway through the download) calls on_complete
with metadata (success) or an error code. For onion requests, this is still going to be a one-shot single call to on_data
when the entire thing is complete, but for lokinet transfers we'll actually get it streaming in (and so can provide metrics like download speed).
Internally, in the onion request router implementation, the implementations of OnionRequestRouter::upload
and ::download
would basically just take the Upload/DownloadRequest
object, and convert it into a call to send_request
(i.e. accumulate all the data via next_data()
, load all the file server endpoint/address/pubkeys/etc., and then provide a callback to send_request that translates the response into a (possible) call to on_data
and a call to on_complete
.
These changes aren't critical for this PR (i.e. we could merge this PR without them), but I think it might be worth building them in now so that code using it doesn't have to change in the near future as we aim to be more stream oriented.
UploadInfo{std::move(file_name)}}; | ||
} | ||
|
||
Request download( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit mislead (until I read further) by the name: upload
and download
at first blush made me think this is where the upload or download happens, but these functions only make an upload/download request. Perhaps make_upload
/ make_download
would better convey that?
(However, given my comments above, if each of those becomes a bespoke object then these would just become constructors, e.g. UploadRequest upload{...};
instead of auto upload = file_server::upload(...)
so this comment may be irrelevant).
} else if constexpr (std::is_same_v<T, ServerDestination>) { | ||
key = PROXIED_REQUESTS_KEY; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently if the code runs off the bottom of this if / else if / else if then key
remains nullopt, and later on we throw a runtime_error.
All of that can be detected at compile time, however, by changing this final else if
into an else
with a static assert:
} else if constexpr (std::is_same_v<T, ServerDestination>) { | |
key = PROXIED_REQUESTS_KEY; | |
} | |
} else { | |
static_assert(std::is_same_v<T, ServerDestination>); | |
key = PROXIED_REQUESTS_KEY; | |
} |
That way, if something changes that adds another type to the variant, this code simply won't compile rather than causing runtime errors.
[&key](auto&& arg) { | ||
using T = std::decay_t<decltype(arg)>; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++20 modernization allows capturing T
via a template deduction instead of having to recapture it with the uglier using T = ...
line:
[&key](auto&& arg) { | |
using T = std::decay_t<decltype(arg)>; | |
[&key]<typename T>(const T& arg) { |
if constexpr (std::is_same_v<T, oxen::quic::RemoteAddress>) { | ||
key = oxenc::to_hex(arg.view_remote_key()); | ||
} else if constexpr (std::is_same_v<T, service_node>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is personal preference, but I'd be tempted to combine these since both have an identical call:
if constexpr (std::is_same_v<T, oxen::quic::RemoteAddress>) { | |
key = oxenc::to_hex(arg.view_remote_key()); | |
} else if constexpr (std::is_same_v<T, service_node>) { | |
if constexpr (std::is_same_v<T, oxen::quic::RemoteAddress> || std::is_same_v<T, service_node>) { |
return *key; | ||
} | ||
|
||
oxen::quic::RemoteAddress address_for_destination( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little bit confused by this function.
In Lokinet mode, all we ever really have is remote pubkey ("abcxyz.snode"), or a private IP + port (something like 127.0.0.1:34567) that you obtained from a liblokinet call.
I don't think we can handle the former here (because it means we're missing a call somewhere to give us a mapped port), and in the latter, we should only ever have a RemoteAddress. But this function seems to allow a service_node
input, which it then seems to be mapping to the public IP and port rather than a mapped one.
Should that latter case (the inner else if
below) be deleted, or am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see what's going on: the arg.host()
that goes into the address
isn't used along the lokinet path, but we need the remote key + port in order to build the tunnel.
In that case, can we change arg.host()
to "0.0.0.0" to make this clearer that we don't want or care about the actual remote host.
|
||
std::visit( | ||
[&address, &request_id](auto&& arg) { | ||
using T = std::decay_t<decltype(arg)>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(C++20 modernize; see earlier comment)
This PR refactors the
session_network
to be far more configurable and easier to extend, as well as fix a number of bugs which existed in the original implementation. The main interface has now been genericised with the routing and transport mechanisms abstracted from the client; the updatedRequest
structure also makes it easier to pre-construct requests which should allow for abstracting more of the network requests in the future.It also includes a number of new configuration options, some particularly useful ones include:
Onion Requests
,Lokinet
orDirect
to their destinationdevnet
environmentNote: This contains a breaking change for clients which don't currently use networking -
ENABLE_ONIONREQ
has been renamed toENABLE_NETWORKING
to be more accurate (this defaults to on so any clients which have it disabled will need to update their build flag)