-
Notifications
You must be signed in to change notification settings - Fork 258
Description
Sandbox ID or Build ID
No response
Environment
E2B version: Community release 2026.09
- Deployment: Self-hosted on bare-metal (1 API node + 5 sandbox nodes)
- OS: Linux (kernel 5.x)
- Workload: ~100-700 concurrent sandboxes, continuous creation and destruction
Timestamp of the issue
2026-03-01 23:27 UTC
Frequency
One-time occurrence
Expected behavior
Orchestrator has normal memory behavior
Actual behavior
- Orchestrator's memory grows monotonically
- After 2 days: 37.2 GB RSS, approaching OOM kill threshold (40 GB limit)
- Only 23 running VMs on one of the sandbox node, but 382 NBD devices occupied and 394 mmap'd rootfs regions
- 3,983 threads (goroutine leak?)
- Sandbox directories and rootfs CoW files remain on disk after VM destruction
On one the of sandbox node:
Diagnostics are as bellow:
smaps_rollup (/proc//smaps_rollup)
Rss: 39017532 kB (~37.2 GB)
Pss_Anon: 6588208 kB (~6.3 GB) ← Go heap + goroutine stacks
Pss_File: 32427604 kB (~30.9 GB) ← leaked mmap'd rootfs files
Private_Clean: 32790556 kB (~31.3 GB)
AnonHugePages: 2910208 kB (~2.8 GB)
30.9 GB (83%) is file-backed mmap from rootfs/snapshot files that were never unmapped.
NBD device leak
382 NBD devices show a kernel PID, but only 23 VMs are running
$ ls /sys/block/nbd*/pid | wc -l
382
Only 23 child firecracker processes
$ pgrep -P <orchestrator_pid> | wc -l
23
Memory maps
/proc//maps shows hundreds of mmap entries for sandbox rootfs paths like:
7521b2e00000-752c520d0000 rw-s ... /orchestrator/sandbox/rootfs/ids7ddu137jqgfb3ayfe-...cow
7536f1200000-754191660000 rw-s ... /orchestrator/sandbox/rootfs/ikwi2ibau1z3a2q80u5ow-...cow
...
These correspond to sandboxes that were destroyed long ago but whose mmap regions were never released.
Sandbox directory residue
Hundreds of sandbox directories remain on disk after VM destruction
$ ls /orchestrator/sandbox/ | wc -l
396 # but only 23 VMs running
Issue reproduction
Reproduction
- Deploy orchestrator 2026.09 on a single api node + 5 sandbox nodes
- Continuously create and destroy sandboxes (100 - 700 concurrent, sustained over hours)
- Monitor nomad alloc status xxxx-id-of-orchestrator-alloc
- Compare ls /sys/block/nbd*/pid | wc -l vs actual running VMs (pgrep firecracker | wc -l)
Additional context
The orchestrator process leaks memory continuously, growing from ~200 MB to 37+ GB RSS within 2 days on a single-node deployment running community release 2026.09. 83% of the leaked memory is file-backed mmap regions from sandbox rootfs/snapshot files that are never unmapped after sandbox destruction.