[Bug]:  Orchestrator memory leak

### Sandbox ID or Build ID

_No response_

### Environment

 E2B version: Community release 2026.09
  - Deployment: Self-hosted on bare-metal (1 API node + 5 sandbox nodes)
  - OS: Linux (kernel 5.x)
  - Workload: ~100-700 concurrent sandboxes, continuous creation and destruction

### Timestamp of the issue

2026-03-01 23:27 UTC

### Frequency

One-time occurrence

### Expected behavior

Orchestrator has normal memory behavior

### Actual behavior

  - Orchestrator's memory grows monotonically
  - After 2 days: 37.2 GB RSS, approaching OOM kill threshold (40 GB limit)
  - Only 23 running VMs on one of the sandbox node, but 382 NBD devices occupied and 394 mmap'd rootfs regions
  - 3,983 threads (goroutine leak?)
  - Sandbox directories and rootfs CoW files remain on disk after VM destruction

On one the of sandbox node:

Diagnostics are as bellow:

  smaps_rollup (/proc/<pid>/smaps_rollup)

  Rss:            39017532 kB   (~37.2 GB)
  Pss_Anon:        6588208 kB   (~6.3 GB)   ← Go heap + goroutine stacks
  Pss_File:       32427604 kB   (~30.9 GB)  ← leaked mmap'd rootfs files
  Private_Clean:  32790556 kB   (~31.3 GB)
  AnonHugePages:   2910208 kB   (~2.8 GB)

  30.9 GB (83%) is file-backed mmap from rootfs/snapshot files that were never unmapped.

  NBD device leak

  # 382 NBD devices show a kernel PID, but only 23 VMs are running
  $ ls /sys/block/nbd*/pid | wc -l
  382

  # Only 23 child firecracker processes
  $ pgrep -P <orchestrator_pid> | wc -l
  23

 Memory maps

  /proc/<pid>/maps shows hundreds of mmap entries for sandbox rootfs paths like:

  7521b2e00000-752c520d0000 rw-s ... /orchestrator/sandbox/rootfs/ids7ddu137jqgfb3ayfe-...cow
  7536f1200000-754191660000 rw-s ... /orchestrator/sandbox/rootfs/ikwi2ibau1z3a2q80u5ow-...cow
  ...

  These correspond to sandboxes that were destroyed long ago but whose mmap regions were never released.

  Sandbox directory residue

  # Hundreds of sandbox directories remain on disk after VM destruction
  $ ls /orchestrator/sandbox/ | wc -l
  396   # but only 23 VMs running

### Issue reproduction

Reproduction

  1. Deploy orchestrator 2026.09 on a single api node + 5 sandbox nodes
  2. Continuously create and destroy sandboxes (100 - 700 concurrent, sustained over hours)
  3. Monitor nomad alloc status xxxx-id-of-orchestrator-alloc
  4. Compare ls /sys/block/nbd*/pid | wc -l vs actual running VMs (pgrep firecracker | wc -l)

### Additional context

The orchestrator process leaks memory continuously, growing from ~200 MB to 37+ GB RSS within 2 days on a single-node deployment running community release 2026.09. 83% of the leaked memory is file-backed mmap regions from sandbox rootfs/snapshot files that are never unmapped after sandbox destruction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Orchestrator memory leak #2029

Sandbox ID or Build ID

Environment

Timestamp of the issue

Frequency

Expected behavior

Actual behavior

382 NBD devices show a kernel PID, but only 23 VMs are running

Only 23 child firecracker processes

Hundreds of sandbox directories remain on disk after VM destruction

Issue reproduction

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Orchestrator memory leak #2029

Description

Sandbox ID or Build ID

Environment

Timestamp of the issue

Frequency

Expected behavior

Actual behavior

382 NBD devices show a kernel PID, but only 23 VMs are running

Only 23 child firecracker processes

Hundreds of sandbox directories remain on disk after VM destruction

Issue reproduction

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions