fix(cli): keep GPU fallback mode internal

pimlock · pimlock · commit 1a558bcb9f82 · 2026-03-31T08:08:59.000-07:00
diff --git a/README.md b/README.md
@@ -128,7 +128,7 @@ OpenShell can pass host GPUs into sandboxes for local inference, fine-tuning, or
 openshell sandbox create --gpu --from [gpu-enabled-sandbox] -- claude
 ```
 
-The CLI auto-bootstraps a GPU-enabled gateway on first use. GPU intent is also inferred automatically for community images with `gpu` in the name.
+The CLI auto-bootstraps a GPU-enabled gateway on first use, auto-selecting CDI when available and otherwise falling back to Docker's NVIDIA GPU request path (`--gpus all`). GPU intent is also inferred automatically for community images with `gpu` in the name.
 
 **Requirements:** NVIDIA drivers and the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) must be installed on the host. The sandbox image itself must include the appropriate GPU drivers and libraries for your workload — the default `base` image does not. See the [BYOC example](https://github.com/NVIDIA/OpenShell/tree/main/examples/bring-your-own-container) for building a custom sandbox image with GPU support.
 
diff --git a/architecture/gateway-single-node.md b/architecture/gateway-single-node.md
@@ -318,12 +318,7 @@ Host GPU drivers & NVIDIA Container Toolkit
 
 ### `--gpu` flag
 
-The `--gpu` flag on `gateway start` accepts an optional value that overrides the automatic injection mode:
-
-| Invocation | Behaviour |
-|---|---|
-| `--gpu` | Auto-select: CDI when enabled on the daemon, `--gpus all` otherwise |
-| `--gpu=legacy` | Force `--gpus all` |
+The `--gpu` flag on `gateway start` enables GPU passthrough. OpenShell auto-selects CDI when enabled on the daemon and falls back to Docker's NVIDIA GPU request path (`--gpus all`) otherwise.
 
 The expected smoke test is a plain pod requesting `nvidia.com/gpu: 1` with `runtimeClassName: nvidia` and running `nvidia-smi`.
 
@@ -392,7 +387,7 @@ When `openshell sandbox create` cannot connect to a gateway (connection refused,
 1. `should_attempt_bootstrap()` in `crates/openshell-cli/src/bootstrap.rs` checks the error type. It returns `true` for connectivity errors and missing default TLS materials, but `false` for TLS handshake/auth errors.
 2. If running in a terminal, the user is prompted to confirm.
 3. `run_bootstrap()` deploys a gateway named `"openshell"`, sets it as active, and returns fresh `TlsOptions` pointing to the newly-written mTLS certs.
-4. When `sandbox create` requests GPU explicitly (`--gpu`) or infers it from an image whose final name component contains `gpu` (such as `nvidia-gpu`), the bootstrap path enables gateway GPU support before retrying sandbox creation.
+4. When `sandbox create` requests GPU explicitly (`--gpu`) or infers it from an image whose final name component contains `gpu` (such as `nvidia-gpu`), the bootstrap path enables gateway GPU support before retrying sandbox creation, using the same CDI-or-fallback selection as `gateway start --gpu`.
 
 ## Container Environment Variables
 
diff --git a/crates/openshell-bootstrap/src/docker.rs b/crates/openshell-bootstrap/src/docker.rs
@@ -28,7 +28,7 @@ const REGISTRY_NAMESPACE_DEFAULT: &str = "openshell";
 /// | Input        | Output                                                       |
 /// |--------------|--------------------------------------------------------------|
 /// | `[]`         | `[]`  — no GPU                                               |
-/// | `["legacy"]` | `["legacy"]`  — pass through                                 |
+/// | `["legacy"]` | `["legacy"]`  — pass through to the non-CDI fallback path    |
 /// | `["auto"]`   | `["nvidia.com/gpu=all"]` if CDI enabled, else `["legacy"]`   |
 /// | `[cdi-ids…]` | unchanged                                                    |
 pub(crate) fn resolve_gpu_device_ids(gpu: &[String], cdi_enabled: bool) -> Vec<String> {
@@ -569,8 +569,8 @@ pub async fn ensure_container(
     //
     // The list is pre-resolved by `resolve_gpu_device_ids` before reaching here:
     //   []           — no GPU passthrough
-    //   ["legacy"]   — legacy nvidia DeviceRequest (driver="nvidia", count=-1);
-    //                  relies on the NVIDIA Container Runtime hook
+    //   ["legacy"]   — internal non-CDI fallback path: `driver="nvidia"`,
+    //                  `count=-1`; relies on the NVIDIA Container Runtime hook
     //   [cdi-ids…]   — CDI DeviceRequest (driver="cdi") with the given device IDs;
     //                  Docker resolves them against the host CDI spec at /etc/cdi/
     match device_ids {
diff --git a/crates/openshell-bootstrap/src/lib.rs b/crates/openshell-bootstrap/src/lib.rs
@@ -115,8 +115,8 @@ pub struct DeployOptions {
     /// GPU device IDs to inject into the gateway container.
     ///
     /// - `[]`          — no GPU passthrough (default)
-    /// - `["legacy"]`  — legacy nvidia DeviceRequest (driver="nvidia", count=-1)
-    /// - `["auto"]`    — resolved at deploy time: CDI if enabled on the daemon, else legacy
+    /// - `["legacy"]`  — internal non-CDI fallback path (`driver="nvidia"`, `count=-1`)
+    /// - `["auto"]`    — resolved at deploy time: CDI if enabled on the daemon, else the non-CDI fallback
     /// - `[cdi-ids…]`  — CDI DeviceRequest with the given device IDs
     pub gpu: Vec<String>,
     /// When true, destroy any existing gateway resources before deploying.
@@ -193,9 +193,9 @@ impl DeployOptions {
 
     /// Set GPU device IDs for the cluster container.
     ///
-    /// Pass `vec!["auto"]` to auto-select between CDI and legacy based on Docker
-    /// version at deploy time, or an explicit list of CDI device IDs, or
-    /// `vec!["legacy"]` to force the legacy nvidia DeviceRequest.
+    /// Pass `vec!["auto"]` to auto-select between CDI and the non-CDI fallback
+    /// based on daemon capabilities at deploy time. The `legacy` sentinel is an
+    /// internal implementation detail for the fallback path.
     #[must_use]
     pub fn with_gpu(mut self, gpu: Vec<String>) -> Self {
         self.gpu = gpu;
diff --git a/crates/openshell-cli/src/main.rs b/crates/openshell-cli/src/main.rs
@@ -808,12 +808,11 @@ enum GatewayCommands {
         /// `nvidia.com/gpu` resources. Requires NVIDIA drivers and the
         /// NVIDIA Container Toolkit on the host.
         ///
-        /// An optional argument controls the injection mode:
-        ///
-        ///   --gpu            Auto-select: CDI when enabled on the daemon, legacy otherwise
-        ///   --gpu=legacy     Force legacy nvidia DeviceRequest
-        #[arg(long = "gpu", num_args = 0..=1, default_missing_value = "auto", value_name = "MODE")]
-        gpu: Option<String>,
+        /// When enabled, OpenShell auto-selects CDI when the Docker daemon has
+        /// CDI enabled and falls back to Docker's NVIDIA GPU request path
+        /// (`--gpus all`) otherwise.
+        #[arg(long)]
+        gpu: bool,
     },
 
     /// Stop the gateway (preserves state).
@@ -1117,8 +1116,10 @@ enum SandboxCommands {
         /// Request GPU resources for the sandbox.
         ///
         /// When no gateway is running, auto-bootstrap starts a GPU-enabled
-        /// gateway. GPU intent is also inferred automatically for known
-        /// GPU-designated image names such as `nvidia-gpu`.
+        /// gateway using the same automatic injection selection as
+        /// `openshell gateway start --gpu`. GPU intent is also inferred
+        /// automatically for known GPU-designated image names such as
+        /// `nvidia-gpu`.
         #[arg(long)]
         gpu: bool,
 
@@ -1575,15 +1576,10 @@ async fn main() -> Result<()> {
                 registry_token,
                 gpu,
             } => {
-                let gpu = match gpu.as_deref() {
-                    None => vec![],
-                    Some("auto") => vec!["auto".to_string()],
-                    Some("legacy") => vec!["legacy".to_string()],
-                    Some(other) => {
-                        return Err(miette::miette!(
-                            "unknown --gpu value: {other:?}; expected `legacy`"
-                        ));
-                    }
+                let gpu = if gpu {
+                    vec!["auto".to_string()]
+                } else {
+                    vec![]
                 };
                 run::gateway_admin_deploy(
                     &name,
diff --git a/docs/sandboxes/manage-gateways.md b/docs/sandboxes/manage-gateways.md
@@ -168,7 +168,7 @@ $ openshell gateway info --name my-remote-cluster
 
 | Flag | Purpose |
 |---|---|
-| `--gpu` | Enable NVIDIA GPU passthrough. Requires NVIDIA drivers and the Container Toolkit on the host. Accepts an optional value: omit for auto-select (CDI when enabled on the daemon, `--gpus all` otherwise), or `--gpu=legacy` to force `--gpus all`. |
+| `--gpu` | Enable NVIDIA GPU passthrough. Requires NVIDIA drivers and the Container Toolkit on the host. OpenShell auto-selects CDI when enabled on the daemon and falls back to Docker's NVIDIA GPU request path (`--gpus all`) otherwise. |
 | `--plaintext` | Listen on HTTP instead of mTLS. Use behind a TLS-terminating reverse proxy. |
 | `--disable-gateway-auth` | Skip mTLS client certificate checks. Use when a reverse proxy cannot forward client certs. |
 | `--registry-username` | Username for registry authentication. Defaults to `__token__` when `--registry-token` is set. Only needed for private registries. Also configurable with `OPENSHELL_REGISTRY_USERNAME`. |