Skip to content

Commit 62eb7e7

Browse files
committed
fix(packaging): add upgrade migration docs and podman socket retry
After #1415 ships, users upgrading from previous releases need guidance on the gateway.env deprecation, port/bind/database path changes, and the podman.socket restart requirement. - docs(rpm): add 'Migrating from gateway.env' section to TROUBLESHOOTING covering backward compatibility, env-to-TOML key mapping, and three breaking changes (default port 8080->17670, bind address 0.0.0.0->127.0.0.1, database path move). Add podman.socket restart step to upgrade procedure. - docs(rpm): add upgrade callout to CONFIGURATION.md pointing at migration section. - fix(podman): retry PodmanComputeDriver ping up to 5 times with 2s delay to tolerate transient socket unavailability after package upgrades. The systemd unit uses Wants=podman.socket (not Requires) so the gateway can start while the socket is briefly re-activating after an RPM upgrade changes its unit file on disk. - chore(rpm): update EnvironmentFile comment in RPM spec to explain backward-compatibility intent. Signed-off-by: Adam Miller <admiller@redhat.com>
1 parent af75374 commit 62eb7e7

4 files changed

Lines changed: 101 additions & 3 deletions

File tree

crates/openshell-driver-podman/src/driver.rs

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ use crate::watcher::{
1111
};
1212
use openshell_core::ComputeDriverError;
1313
use openshell_core::proto::compute::v1::{DriverSandbox, GetCapabilitiesResponse};
14+
use std::time::Duration;
1415
use tracing::{info, warn};
1516

1617
impl From<PodmanApiError> for ComputeDriverError {
@@ -80,8 +81,29 @@ impl PodmanComputeDriver {
8081

8182
let client = PodmanClient::new(config.socket_path.clone());
8283

83-
// Verify connectivity.
84-
client.ping().await?;
84+
// Verify connectivity, retrying briefly to tolerate transient socket
85+
// unavailability (e.g. podman.socket restarting after a package
86+
// upgrade). The systemd unit uses Wants=podman.socket (not Requires),
87+
// so the gateway may start while the socket is briefly re-activating.
88+
const MAX_PING_RETRIES: u32 = 5;
89+
const PING_RETRY_DELAY: Duration = Duration::from_secs(2);
90+
let mut attempts = 0;
91+
loop {
92+
match client.ping().await {
93+
Ok(()) => break,
94+
Err(e) if attempts < MAX_PING_RETRIES => {
95+
attempts += 1;
96+
warn!(
97+
attempt = attempts,
98+
max_retries = MAX_PING_RETRIES,
99+
error = %e,
100+
"Podman socket not ready, retrying"
101+
);
102+
tokio::time::sleep(PING_RETRY_DELAY).await;
103+
}
104+
Err(e) => return Err(e),
105+
}
106+
}
85107

86108
// Verify cgroups v2, detect rootless mode, and log system info.
87109
match client.system_info().await {

deploy/rpm/CONFIGURATION.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,11 @@ configuration is required.
195195

196196
## Configuration reference
197197

198+
> **Upgrading from a previous release?** See the
199+
> ["Migrating from gateway.env"](TROUBLESHOOTING.md#migrating-from-gatewayenv)
200+
> section in TROUBLESHOOTING.md for the env-to-TOML mapping and notes on
201+
> the default port, bind address, and database path changes.
202+
198203
Gateway and driver settings have local runtime defaults. The gateway reads
199204
`~/.config/openshell/gateway.toml` when that file exists. Set
200205
`OPENSHELL_GATEWAY_CONFIG` in the launch environment to use a different file.

deploy/rpm/TROUBLESHOOTING.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -214,12 +214,19 @@ After upgrading the RPM packages:
214214

215215
```shell
216216
sudo dnf update openshell openshell-gateway
217+
systemctl --user restart podman.socket
217218
systemctl --user restart openshell-gateway
218219
```
219220

220221
The SQLite database schema is auto-migrated on startup. Running
221222
sandboxes are stopped during the restart.
222223

224+
Restarting `podman.socket` after a package upgrade is recommended: if the
225+
unit file changed on disk during the upgrade, the running socket may become
226+
non-functional until restarted, causing the gateway to fail with a
227+
connection error on `/run/user/<uid>/podman/podman.sock`. The gateway
228+
retries briefly on startup, but a stale socket will not recover on its own.
229+
223230
Package upgrades do not overwrite `~/.config/openshell/gateway.toml` when you
224231
create one. New gateway process options can be added manually by referencing
225232
CONFIGURATION.md or running `openshell-gateway --help`.
@@ -230,3 +237,65 @@ To pick up new container images after an upgrade:
230237
podman pull ghcr.io/nvidia/openshell/supervisor:latest
231238
podman pull ghcr.io/nvidia/openshell-community/sandboxes/base:latest
232239
```
240+
241+
### Migrating from gateway.env
242+
243+
Previous releases generated `~/.config/openshell/gateway.env` on first
244+
start and used it to configure the gateway at launch. The gateway now
245+
starts from built-in runtime defaults and reads
246+
`~/.config/openshell/gateway.toml` when that file exists.
247+
248+
If you have a `gateway.env` file it is still honored: the systemd unit
249+
reads it via `EnvironmentFile` on every start. You can leave it in place
250+
or delete it. New installs no longer generate one.
251+
252+
To migrate settings to TOML, create `~/.config/openshell/gateway.toml`
253+
and map the relevant variables:
254+
255+
| Environment variable | TOML equivalent |
256+
|---|---|
257+
| `OPENSHELL_BIND_ADDRESS=A` + `OPENSHELL_SERVER_PORT=P` | `bind_address = "A:P"` under `[openshell.gateway]` |
258+
| `OPENSHELL_DRIVERS=podman` | `compute_drivers = ["podman"]` under `[openshell.gateway]` |
259+
| `OPENSHELL_DISABLE_TLS=true` | `disable_tls = true` under `[openshell.gateway]` |
260+
| `OPENSHELL_TLS_CERT=PATH` | `cert_path = "PATH"` under `[openshell.gateway.tls]` |
261+
| `OPENSHELL_TLS_KEY=PATH` | `key_path = "PATH"` under `[openshell.gateway.tls]` |
262+
| `OPENSHELL_TLS_CLIENT_CA=PATH` | `client_ca_path = "PATH"` under `[openshell.gateway.tls]` |
263+
| `OPENSHELL_DB_URL=URL` | env-only — not accepted in TOML; keep in env or drop-in override |
264+
| `OPENSHELL_LOG_LEVEL=debug` | env-only — keep as `Environment=OPENSHELL_LOG_LEVEL=debug` in a drop-in |
265+
266+
Other breaking changes in this release:
267+
268+
- **Default port changed from 8080 to 17670.** If you registered the
269+
gateway at `https://127.0.0.1:8080`, re-register it:
270+
271+
```shell
272+
openshell gateway add --local https://127.0.0.1:17670
273+
```
274+
275+
- **Default bind address changed from `0.0.0.0` to `127.0.0.1`.** If
276+
you relied on network-accessible access without an explicit bind
277+
address, add the following to `~/.config/openshell/gateway.toml`:
278+
279+
```toml
280+
[openshell.gateway]
281+
bind_address = "0.0.0.0:17670"
282+
```
283+
284+
Also update your firewall rule if applicable:
285+
286+
```shell
287+
sudo firewall-cmd --remove-port=8080/tcp --permanent
288+
sudo firewall-cmd --add-port=17670/tcp --permanent
289+
sudo firewall-cmd --reload
290+
```
291+
292+
- **Database path changed** from `~/.local/state/openshell/gateway.db`
293+
to `~/.local/state/openshell/gateway/openshell.db`. Existing gateway
294+
state (registered sandboxes, etc.) is not migrated automatically. To
295+
preserve state across the upgrade, move the file before restarting:
296+
297+
```shell
298+
mkdir -p ~/.local/state/openshell/gateway
299+
mv ~/.local/state/openshell/gateway.db \
300+
~/.local/state/openshell/gateway/openshell.db
301+
```

openshell.spec

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,9 @@ ExecStartPre=/bin/sh -c 'test -f %%E/openshell/gateway.toml || install -Dm644 /u
161161
# %%S expands to $XDG_STATE_HOME (~/.local/state) in user units.
162162
ExecStartPre=/usr/bin/openshell-gateway generate-certs --output-dir %%S/openshell/tls --server-san host.openshell.internal
163163

164-
# Optional OPENSHELL_* overrides.
164+
# gateway.env is honored for backward compatibility with pre-1415 installs.
165+
# New installs use runtime defaults; create gateway.toml to override.
166+
# See TROUBLESHOOTING.md for the env-to-TOML migration guide.
165167
EnvironmentFile=-%%E/openshell/gateway.env
166168
ExecStart=/usr/bin/openshell-gateway
167169
StateDirectory=openshell

0 commit comments

Comments
 (0)