Skip to content

Commit c990d06

Browse files
committed
Add docs on full local snapshots
Signed-off-by: Amory Hoste <[email protected]>
1 parent 7bde409 commit c990d06

File tree

5 files changed

+69
-6
lines changed

5 files changed

+69
-6
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
## [Unreleased]
44

55
### Added
6+
- Add support for [fullLocal snapshots](docs/fulllocal_snapshots.md) mode
67

78
### Changed
89

configs/.wordlist.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,9 @@ DCNs
107107
De
108108
debian
109109
deployer
110+
deterministically
110111
dev
112+
devicemapper
111113
devmapper
112114
df
113115
DialGRPCWithUnaryInterceptor
@@ -285,6 +287,7 @@ microarchitectural
285287
Microarchitecture
286288
microbenchmark
287289
microbenchmarks
290+
microVM
288291
microVMs
289292
minio
290293
MinIO
@@ -395,11 +398,13 @@ rebasing
395398
repo
396399
Repos
397400
roadmap
401+
rootfs
398402
RPC
399403
rperf
400404
RPerf
401405
RPERF
402406
rsquo
407+
rsync
403408
runc
404409
runtime
405410
runtimes
@@ -432,6 +437,7 @@ SinkBinding
432437
SinkBindings
433438
sms
434439
SMT
440+
snapshotted
435441
snapshotting
436442
SoC
437443
SOCACHE
@@ -461,6 +467,7 @@ TestProfileIncrementConfiguration
461467
TestProfileSingleConfiguration
462468
TextFormatter
463469
th
470+
thinpool
464471
Timeseries
465472
timeseriesdb
466473
TimeseriesDB

docs/developers_guide.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,14 +108,16 @@ We also offer self-hosted stock-Knative environments powered by KinD. To be able
108108

109109
* vHive supports both the baseline Firecracker snapshots and our advanced
110110
Record-and-Prefetch (REAP) snapshots.
111-
111+
112112
* vHive integrates with Kubernetes and Knative via its built-in CRI support.
113113
Currently, only Knative Serving is supported.
114114

115115
* vHive supports arbitrary distributed setup of a serverless cluster.
116116

117117
* vHive supports arbitrary functions deployed with OCI (Docker images).
118118

119+
* Remote snapshot restore functionality can be integrated through the [full local snapshot functionality](./fulllocal_snapshots.md).
120+
119121
* vHive has robust Continuous-Integration and our team is committed to deliver
120122
high-quality code.
121123

docs/fulllocal_snapshots.md

Lines changed: 56 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,58 @@
1-
# vHive fulllocal snapshots guide
1+
# vHive full local snapshots
22

3-
The default snapshots in vHive use an offloading based technique that leaves the shim and other resources running upon shutting down a VM such that it can be re-used in the future. This technique has the advantage that a shim does not have to be recreated and the block and network devices of the previously stopped VM can be reused. This approach does however limit the amount of VMs that can be booted from a snapshot to the amount of VMs that have been offloaded. An alternative approach is to allow loading an arbitrary amount of VMs from a single snapshot by creating a new shim, block and network devices upon loading a snapshot. This functionality can be enabled by running vHive using the `-snapshots -fulllocal` flags. Additionally, the following flags can be used to further configure the fullLocal snapshots
3+
When using Firecracker as the sandbox technology in vHive, two snapshotting modes are supported: a default mode and a
4+
full local mode. The default snapshot mode use an offloading based technique which leaves the shim and other resources
5+
running upon shutting down a microVM such that it can be re-used in the future. This technique has the advantage that
6+
the shim does not have to be recreated and the block and network devices of the previously stopped microVM can be
7+
reused, but limits the amount of microVMs that can be booted from a snapshot to the amount of microVMs that have been
8+
offloaded. The full local snapshot mode instead allows loading an arbitrary amount of microVMs from a single snapshot.
9+
This is done by creating a new shim and the required block and network devices upon loading a snapshot and creating an
10+
extra patch file containing the filesystem differences written by the microVM upon snapshot creation. To enable the
11+
full local snapshot functionality, vHive must be run with the `-snapshots` and `-fulllocal` flags. In addition, the
12+
full local snapshot mode can be further configured using the following flags:
413

5-
* `-isSparseSnaps`: store the memory file as a sparse file to make the storage size closer to the actual memory utilized by the VM, rather than the memory allocated to the VM
6-
* `-snapsStorageSize [capacityGiB]`: specify the amount of capacity that can be used to store snapshots
7-
* `-netPoolSize [capacity]`: keep around a pool of [capacity] network devices that can be used by VMs to keep network creation off the cold start path
14+
- `isSparseSnaps`: store the memory file as a sparse file to make its storage size closer to the actual size of the memory utilized by the microVM, rather than the memory allocated to the microVM
15+
- `snapsStorageSize [capacityGiB]`: specify the amount of capacity that can be used to store snapshots
16+
- `netPoolSize [capacity]`: the amount of network devices in the network pool, which can be used by microVMs to keep the network initialization off the cold start path
17+
18+
## Remote snapshots
19+
20+
Rather than only using the snapshots available locally on a node, snapshots can also be transferred between nodes to
21+
potentially accelerate cold start times and reduce memory utilization, given that proper mechanisms are in place to
22+
minimize the snapshot network transfer latency. This could be done by storing snapshots in a global storage solution
23+
such as S3, or directly distributing snapshots between compute nodes. The full local snapshot functionality in vHive
24+
can be used to implement such functionality. To implement this, the container image used by the snapshotted microVM
25+
must be available on the local node where the remote snapshot will be restored. This container image can be used in
26+
combination with the filesystem changes stored in the snapshot patch file to create a device mapper snapshot that
27+
contains the root filesystem needed by the restored microVM. After recreating the root filesystem block device, the
28+
microVM can be created from the fetched memory file and microVM state similarly to how this is done for the full local
29+
snapshots.
30+
31+
## Incompatibilities and limitations
32+
33+
### Snapshot filesystem changes capture and restoration
34+
35+
Currently, the filesystem changes are captured in a “patch file”, which is created by mounting both the original
36+
container image and the microVM block device and extracting the changes between both using rsync. Even though rsync
37+
uses some optimisations such as using timestamps and file sizes to limit the amount of reads, this procedure is quite
38+
inefficient and could be sped up by directly extracting the changed block offsets from the thinpool metadata device
39+
and directly reading these blocks from the microVM rootfs block device. These extracted blocks could then be written
40+
back at the correct offsets on top of the base image block device to create a root filesystem for the to be restored
41+
microVM. Support for this alternative approach is provided through the `ForkContainerSnap` and `CreateDeviceSnapshot`
42+
functions. However, for this approach to work across nodes for remote snapshots, support to [deterministically flatten a container image into a filesystem](https://www.youtube.com/watch?v=A-7j0QlGwFk)
43+
would be required to ensure the block devices of identical images pulled to different nodes are bit identical.
44+
In addition, further optimizations would be necessary to more efficiently extract filesystem changes from the thinpool
45+
metadata device rather than current method, which relies on the devicemapper `reserve_metadata_snap` method to create
46+
a snapshot of the current metadata state in combination with `thin_delta` to extract changed blocks.
47+
48+
### Performance limitations
49+
50+
The full local snapshot mode requires a new block device and network device with the exact state of the snapshotted
51+
microVM to be created before restoring the snapshot. The network namespace and devicemapper block device creation turn
52+
out to be a bottleneck when concurrently restoring many snapshots. Approaches that reduce the impact of these operations
53+
could further speedup the microVM snapshot restore latency at high load.
54+
55+
### UPF snapshot compatibility
56+
57+
The full local snapshot functionality is currently not integrated with the [Record-and-Prefetch (REAP)](papers/REAP_ASPLOS21.pdf)
58+
accelerated snapshots and thus cannot be used in combination with the `-upf` flag.

docs/quickstart_guide.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,8 @@ SSD-equipped nodes are highly recommended. Full list of CloudLab nodes can be fo
130130
> By default, the microVMs are booted, `-snapshots` enables snapshots after the 2nd invocation of each function.
131131
>
132132
> If `-snapshots` and `-upf` are specified, the snapshots are accelerated with the Record-and-Prefetch (REAP) technique that we described in our ASPLOS'21 paper ([extended abstract][ext-abstract], [full paper](papers/REAP_ASPLOS21.pdf)).
133+
>
134+
> If `-snapshots` and `-fulllocal` are specified, a single snapshot can be used to restore many microVMs ([full local snapshots](./fulllocal_snapshots.md)). Note that this mode is currently not compatible with the REAP technique.
133135
134136
### 3. Configure Master Node
135137
**On the master node**, execute the following instructions below **as a non-root user with sudo rights** using **bash**:

0 commit comments

Comments
 (0)