Skip to content

Commit 96654d0

Browse files
authored
rust: add option to kill server at stdin EOF (#4408)
Summary: If `tensorboard` is to launch RustBoard as a subprocess rather than relying on the user to launch it concurrently, then we should try hard not to leave around a zombie server. An easy solution is an `atexit` handler in Python TensorBoard, but this doesn’t handle the case when TensorBoard crashes or gets SIGTERMed. On Linux, we can be notified when our parent dies by setting the parent-death signal (`man 2 prctl`), but this isn’t portable. On portable Unix, the child can poll its PPID to see when it changes to `1` or the PID of some subreaper, but this isn’t portable to Windows, which doesn’t update PPID when the parent dies. Windows is not my strong suit, but some web searching didn’t turn up an easy and clean solution portable to both Windows and Unix (any Windows internals like “job objects” are a non-starter, since I can’t test them). The one thing that I would expect to work everywhere is “just wait until stdin closes and then exit”. If even that doesn’t work on Windows, well, we can burn that bridge when we come to it. Test Plan: Build `bazel build //tensorboard/data/server`, then write a simple Python driver: ```python import subprocess import time server = "bazel-bin/tensorboard/data/server/server" p = subprocess.Popen( [server, "--logdir", "/tmp/nonexistent", "-v", "--die-after-stdin"], stdin=subprocess.PIPE, ) print(p.pid) time.sleep(2) ``` Run it with `python test.py`, and note that after 2 seconds the server prints “Stdin closed; exiting”. Run it again, and suspend (`^Z`) the process before it finishes sleeping. Run `kill -SIGCONT CHILD_PID` with the PID printed by the Python script to resume server execution; this is just needed because your `^Z` propagates to all processes in the group. Note that the server continues to print logs as it starts and finishes new load cycles. Then, run `fg` and wait for the sleep to complete, and note that the server again exits, as desired. wchargin-branch: rust-die-after-stdin
1 parent 73e71f0 commit 96654d0

File tree

1 file changed

+42
-11
lines changed

1 file changed

+42
-11
lines changed

tensorboard/data/server/main.rs

Lines changed: 42 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,11 @@ limitations under the License.
1515

1616
use clap::Clap;
1717
use log::{debug, info, LevelFilter};
18+
use std::io::Read;
1819
use std::net::{IpAddr, SocketAddr};
1920
use std::path::PathBuf;
2021
use std::str::FromStr;
22+
use std::thread;
2123
use std::time::{Duration, Instant};
2224
use tokio::net::TcpListener;
2325
use tonic::transport::Server;
@@ -62,6 +64,16 @@ struct Opts {
6264
/// Use verbose output (-vv for very verbose output)
6365
#[clap(long = "verbose", short, parse(from_occurrences))]
6466
verbosity: u32,
67+
68+
/// Kill this server once stdin is closed
69+
///
70+
/// While this server is running, read stdin to end of file and then kill the server. Used to
71+
/// portably ensure that the server exits when the parent process dies, even due to a crash.
72+
/// Don't set this if stdin is connected to a tty and the process will be backgrounded, since
73+
/// then the server will receive `SIGTTIN` and its process will be stopped (in the `SIGSTOP`
74+
/// sense) but not killed.
75+
#[clap(long)]
76+
die_after_stdin: bool,
6577
}
6678

6779
/// A duration in seconds.
@@ -89,6 +101,13 @@ async fn main() -> Result<(), Box<dyn std::error::Error>> {
89101
});
90102
debug!("Parsed options: {:?}", opts);
91103

104+
if opts.die_after_stdin {
105+
thread::Builder::new()
106+
.name("StdinWatcher".to_string())
107+
.spawn(die_after_stdin)
108+
.expect("failed to spawn stdin watcher thread");
109+
}
110+
92111
let addr = SocketAddr::new(opts.host, opts.port);
93112
let listener = TcpListener::bind(addr).await?;
94113
let bound = listener.local_addr()?;
@@ -98,17 +117,20 @@ async fn main() -> Result<(), Box<dyn std::error::Error>> {
98117
// leaks the outer commit structure (of constant size), not the pointers to the actual data.
99118
let commit: &'static Commit = Box::leak(Box::new(Commit::new()));
100119

101-
std::thread::spawn(move || {
102-
let mut loader = LogdirLoader::new(commit, opts.logdir);
103-
loop {
104-
info!("Starting load cycle");
105-
let start = Instant::now();
106-
loader.reload();
107-
let end = Instant::now();
108-
info!("Finished load cycle ({:?})", end - start);
109-
std::thread::sleep(opts.reload_interval.duration());
110-
}
111-
});
120+
thread::Builder::new()
121+
.name("Reloader".to_string())
122+
.spawn(move || {
123+
let mut loader = LogdirLoader::new(commit, opts.logdir);
124+
loop {
125+
info!("Starting load cycle");
126+
let start = Instant::now();
127+
loader.reload();
128+
let end = Instant::now();
129+
info!("Finished load cycle ({:?})", end - start);
130+
thread::sleep(opts.reload_interval.duration());
131+
}
132+
})
133+
.expect("failed to spawn reloader thread");
112134

113135
let handler = DataProviderHandler { commit };
114136
Server::builder()
@@ -125,3 +147,12 @@ fn init_logging(default_log_level: LevelFilter) {
125147
use env_logger::{Builder, Env};
126148
Builder::from_env(Env::default().default_filter_or(default_log_level.to_string())).init();
127149
}
150+
151+
/// Locks stdin and reads it to EOF, then exits the process.
152+
fn die_after_stdin() {
153+
let stdin = std::io::stdin();
154+
let stdin_lock = stdin.lock();
155+
for _ in stdin_lock.bytes() {}
156+
info!("Stdin closed; exiting");
157+
std::process::exit(0);
158+
}

0 commit comments

Comments
 (0)