Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary patching rust hot-reloading, sub-second rebuilds, independent server/client hot-reload #3797

Draft
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

jkelleyrtp
Copy link
Member

@jkelleyrtp jkelleyrtp commented Feb 25, 2025

Inlines the work from https://github.com/jkelleyrtp/ipbp to bring pure rust hot-reloading to Dioxus.

fast_reload.mp4

The approach we're taking works across all platforms though each will require some bespoke logic. The object crate is thankfully generic over mac/win/linux/wasm, though we need to handle system linkers differently.

This change also enables dx to operate as a faster linker allowing sub-second (in many cases, sub 200ms) incremental rebuilds.

Todo:

  • Add logic to the devtools types and generic integration
  • Wire up desktop
  • Rework existing hot-reload engine to be properly compatible
  • Remove old binaries
  • Wire up iOS
  • Wire up macOS
  • Wire up Android
  • Wire up Linux
  • Wire up wasm
  • Wire up windows
  • Wire up server
  • clean up app/server impl (support more than 2 exe-s in prep for dioxus.json)
  • fix integration with old hot-reload engine

Notes:

This unfortunately brings a very large refactor to the build system since we need to persist app bundles while allowing new builds to be "merged" into them. I ended up flattening BuildRequest + Bundle together and Runner + Builder together since we need knowledge of previous bundles and currently running processes to get patching to work properly.

@jkelleyrtp jkelleyrtp changed the title Binary patching rust hot-reloading Binary patching rust hot-reloading, sub-second rebuilds Feb 25, 2025
@jkelleyrtp jkelleyrtp changed the title Binary patching rust hot-reloading, sub-second rebuilds Binary patching rust hot-reloading, sub-second rebuilds, independent server/client hot-reload Feb 25, 2025
@jkelleyrtp
Copy link
Member Author

jkelleyrtp commented Mar 18, 2025

progress update

I've migrated everything over from ipbp so now anyone should be able to run the demos in macOS/iOS. Going to add linux + android support next.

I've been tinkering with the syntax for subsecond a bit and am generally happy now with the API. You can wrap any closure with ::call() and that closure is now "hot":

pub fn launch() {
    loop {
        std::thread::sleep(std::time::Duration::from_secs(1));
        subsecond::call(|| tick());
    }
}

fn tick() {
    println!("boom!");
}

If you need more granular support over "hot" functions then you'll want to use ::current(closure) which gives you a HotFn with extra flags and methods for running a callback. It also lets you run closures which are FnOnce since ::call() currently does not.

::call() taking an FnMut is meant to provide an "unwind" point that our assembly-diffing logic can bounce up to by emitting panics. This is meant to support cases where you might add a field to a struct and need to "rebuild" the app from a higher checkpoint (aka re-instancing).

For example, a TUI app with some state:

struct App {
    should_exit: bool,
    temperatures: Vec<u8>,
}

might implement a "run" method that calls subsecond:

    fn run(&mut self, terminal: &mut DefaultTerminal) -> Result<()> {
        while !self.should_exit {
            subsecond::call(|| self.tick(terminal))?;
        }
        Ok(())
    }

If the struct's size/layout change, then we want to rebuild the app from scratch. Alternatively, we could somehow migrate it, which is out of scope for this PR, but implementations can be found in libraries like dexterous. We might end up taking an approach that unwinds the stack to the app's constructor and then copies it to a new size/layout, merging the new fields in. TODO on what this should look like.

Here's a vide of the tui_demo in the subsecond_harness crate:

subsecond-tui.mp4

runtime integration

Originally I wanted to use LLDB to drive the patching system - and we still might need to for proper "patching" - but I ran into a bunch of segfaults and LLDB crashes when we sigstopped the program in the middle of malloc/demalloc. Apparently there's a large list of things you cannot do when a program is sigstopped and using allocators is one such thing. We could look into using a dedicated bump allocator and continue using lldb, but for now I have an adapter build on websockets. We might end up migrating to a shared-memory system such that the HOST and DUT can share the patch table freely. The challenge with these approaches is that they're not very portable and websockets seem to be available literally everywhere.

zero-link / thinlink

One cool thing spun out of this work is "zerolink" (thinlink maybe?): our new approach for drastically speeding up rust compile times by automatically using dynamic linking. This is super useful for tests, benchmarks, and general development since we can automatically split your workspace crates from your "true" dependencies and skip linking your dependencies on every build.

This means you can turn up opt levels and leave debug symbols (two things that generally slow down builds) which incurs a one-time cost and then continuously dynamically link your incremental object files against the dependencies dylib. Most OSes support a dyld_cache equivalent which keeps your dependencies.dylib memory mapped and cached between invocations which also greatly speed up launch times.

ZeroLink isn't really an "incremental linker" per se, but it behaves like one thanks to Rust's incremental compile system. In spirit it's very similar to marking a crate as a dylib crate in your crate graph (see bevy/dynamic) but it doesn't require you to change any of your crates and it supports WASM.

dx is standalone

I wanted to use zerolink with non-dioxus projects, so this PR also makes dx a standalone rust runner. You can dx run your project and dioxus does not need to be part of your crate graph for it to work. This lets us bootstrap dx by running dx with itself and making it easy to update the TUI without fully rebuilding the CLI.

wasm work

WASM does not support dynamic linking so we need to mess with the binaries ourselves. Fortunately this is as simple as linking the deps together to a relocatable object file, lifting the symbols into the export table, and recording the element segments.

When the patches load they need two things

  • addresses within the ifunc table for ifuncs
  • imports from the main module

unfortunately the wasm-bindgen pass runs ::gc so I don't think there's any cool combination of flags we can use against wasm-ld to do this for us automatically. However, all the work we put into wasm_split really comes in handy.

What's left

There's three avenues of work left here:

  • Propagating the change graph through the HotFn points
  • More platform support (windows, wasm, server_fn)
  • Bugs (better handling of statics, destructors, renaming symbols, changing signatures, and dioxus integration like Global)

I expect Windows + WASM to take the longest to get proper support and will prioritize that over propagating the change graph. Dioxus can function properly without a sophisticated change graph, but other libraries will want the richer detail available.

@DrewRidley
Copy link

Awesome work here! I might recommend adding .arg("-Zcodegen-backend=cranelift") as an optional user-facing argument when hot reloading.

I found on my M3 Pro Macbook it brings down the average times from ~600ms to ~300ms. The backend ships as a cargo component now so it should be a drop in replacement for desktop or possibly mobile platforms.

@jkelleyrtp
Copy link
Member Author

jkelleyrtp commented Mar 19, 2025

Awesome work here! I might recommend adding .arg("-Zcodegen-backend=cranelift") as an optional user-facing argument when hot reloading.

I found on my M3 Pro Macbook it brings down the average times from ~600ms to ~300ms. The backend ships as a cargo component now so it should be a drop in replacement for desktop or possibly mobile platforms.

Wow that's incredible!

On my M1 I've been getting around 900ms on the dioxus harness with default dev profile and then 500-600 with the subsecond-dev profile:

[profile.subsecond-dev]
inherits = "dev"
debug = 0
strip = "debuginfo"

I'll add the cranelift backend option and then report back. In the interim you can check to see if that profile speeds up your cranelift builds at all.

I did some profiling of rustc and about 100-300ms is spent copying incremental artifacts on disk. That's pretty substantial given the whole process is like 500ms. Hopefully this is improved here:

rust-lang/rust#128320

I would like to see that time drop to 0ms at some point and then we'd basically have "blink and you miss it" hotpatching.

@DrewRidley
Copy link

DrewRidley commented Mar 19, 2025

I tried the profile and with or without it, its consistently ~300ms on my Mac. When doing self-profile I noticed that register allocation takes a huge portion of the total time spent

I discovered this (https://docs.wasmtime.dev/api/cranelift_codegen/settings/enum.RegallocAlgorithm.html) which might help if its been backported to codegen_clif as a flag or option.

That seemed to be a fluke in testing and actually the remaining time is mostly incremental cache related file IO. Not sure how much can be done about that.

Regardless, this is super exciting work, let me know if there's any other way I can help.

@jkelleyrtp
Copy link
Member Author

jkelleyrtp commented Mar 22, 2025

I switched to a slightly modified approach (lower level, faster, more reliable, more complex).

This is implemented to work around a number of very challenging android issues

  • pointer tagging
  • mte
  • linker namespaces
  • read/write permissions

Since this is more flexible it should work across linux and windows (android and linux are the same). Last target is wasm.

Here's the android demo:

hotpatch-android.mp4

iOS:

ios-binarypatch.mp4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants