-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary patching rust hot-reloading, sub-second rebuilds, independent server/client hot-reload #3797
base: main
Are you sure you want to change the base?
Conversation
e0d52a4
to
68becb3
Compare
progress updateI've migrated everything over from ipbp so now anyone should be able to run the demos in macOS/iOS. Going to add linux + android support next. I've been tinkering with the syntax for subsecond a bit and am generally happy now with the API. You can wrap any closure with pub fn launch() {
loop {
std::thread::sleep(std::time::Duration::from_secs(1));
subsecond::call(|| tick());
}
}
fn tick() {
println!("boom!");
} If you need more granular support over "hot" functions then you'll want to use
For example, a TUI app with some state: struct App {
should_exit: bool,
temperatures: Vec<u8>,
} might implement a "run" method that calls subsecond: fn run(&mut self, terminal: &mut DefaultTerminal) -> Result<()> {
while !self.should_exit {
subsecond::call(|| self.tick(terminal))?;
}
Ok(())
} If the struct's size/layout change, then we want to rebuild the app from scratch. Alternatively, we could somehow migrate it, which is out of scope for this PR, but implementations can be found in libraries like dexterous. We might end up taking an approach that unwinds the stack to the app's constructor and then copies it to a new size/layout, merging the new fields in. TODO on what this should look like. Here's a vide of the tui_demo in the subsecond_harness crate: subsecond-tui.mp4runtime integrationOriginally I wanted to use LLDB to drive the patching system - and we still might need to for proper "patching" - but I ran into a bunch of segfaults and LLDB crashes when we sigstopped the program in the middle of malloc/demalloc. Apparently there's a large list of things you cannot do when a program is sigstopped and using allocators is one such thing. We could look into using a dedicated bump allocator and continue using lldb, but for now I have an adapter build on websockets. We might end up migrating to a shared-memory system such that the HOST and DUT can share the patch table freely. The challenge with these approaches is that they're not very portable and websockets seem to be available literally everywhere. zero-link / thinlinkOne cool thing spun out of this work is "zerolink" (thinlink maybe?): our new approach for drastically speeding up rust compile times by automatically using dynamic linking. This is super useful for tests, benchmarks, and general development since we can automatically split your workspace crates from your "true" dependencies and skip linking your dependencies on every build. This means you can turn up opt levels and leave debug symbols (two things that generally slow down builds) which incurs a one-time cost and then continuously dynamically link your incremental object files against the dependencies dylib. Most OSes support a dyld_cache equivalent which keeps your ZeroLink isn't really an "incremental linker" per se, but it behaves like one thanks to Rust's incremental compile system. In spirit it's very similar to marking a crate as a dylib crate in your crate graph (see bevy/dynamic) but it doesn't require you to change any of your crates and it supports WASM. dx is standaloneI wanted to use zerolink with non-dioxus projects, so this PR also makes wasm workWASM does not support dynamic linking so we need to mess with the binaries ourselves. Fortunately this is as simple as linking the deps together to a relocatable object file, lifting the symbols into the export table, and recording the element segments. When the patches load they need two things
unfortunately the wasm-bindgen pass runs What's leftThere's three avenues of work left here:
I expect Windows + WASM to take the longest to get proper support and will prioritize that over propagating the change graph. Dioxus can function properly without a sophisticated change graph, but other libraries will want the richer detail available. |
Awesome work here! I might recommend adding I found on my M3 Pro Macbook it brings down the average times from ~600ms to ~300ms. The backend ships as a cargo component now so it should be a drop in replacement for desktop or possibly mobile platforms. |
Wow that's incredible! On my M1 I've been getting around 900ms on the dioxus harness with default [profile.subsecond-dev]
inherits = "dev"
debug = 0
strip = "debuginfo" I'll add the cranelift backend option and then report back. In the interim you can check to see if that profile speeds up your cranelift builds at all. I did some profiling of rustc and about 100-300ms is spent copying incremental artifacts on disk. That's pretty substantial given the whole process is like 500ms. Hopefully this is improved here: I would like to see that time drop to 0ms at some point and then we'd basically have "blink and you miss it" hotpatching. |
I tried the profile and with or without it, its consistently ~300ms on my Mac.
That seemed to be a fluke in testing and actually the remaining time is mostly incremental cache related file IO. Not sure how much can be done about that. Regardless, this is super exciting work, let me know if there's any other way I can help. |
I switched to a slightly modified approach (lower level, faster, more reliable, more complex). This is implemented to work around a number of very challenging android issues
Since this is more flexible it should work across linux and windows (android and linux are the same). Last target is wasm. Here's the android demo: hotpatch-android.mp4iOS: ios-binarypatch.mp4 |
Inlines the work from https://github.com/jkelleyrtp/ipbp to bring pure rust hot-reloading to Dioxus.
fast_reload.mp4
The approach we're taking works across all platforms though each will require some bespoke logic. The object crate is thankfully generic over mac/win/linux/wasm, though we need to handle system linkers differently.
This change also enables dx to operate as a faster linker allowing sub-second (in many cases, sub 200ms) incremental rebuilds.
Todo:
Notes:
This unfortunately brings a very large refactor to the build system since we need to persist app bundles while allowing new builds to be "merged" into them. I ended up flattening BuildRequest + Bundle together and Runner + Builder together since we need knowledge of previous bundles and currently running processes to get patching to work properly.