Skip to content

Commit cad1d85

Browse files
committed
Rename availability modes per review
1 parent 835d293 commit cad1d85

File tree

5 files changed

+39
-34
lines changed

5 files changed

+39
-34
lines changed

docs/source/operators/config-availability.md

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Availability modes
22

3-
Enterprise Gateway can be optionally configured in one of two "availability modes": _single-instance_ or _multi-instance_. When configured, Enterprise Gateway can recover from failures and reconnect to any active remote kernels that were previously managed by the terminated EG instance. As such, both modes require that kernel session persistence also be enabled via `KernelSessionManager.enable_persistence=True`.
3+
Enterprise Gateway can be optionally configured in one of two "availability modes": _standalone_ or _replication_. When configured, Enterprise Gateway can recover from failures and reconnect to any active remote kernels that were previously managed by the terminated EG instance. As such, both modes require that kernel session persistence also be enabled via `KernelSessionManager.enable_persistence=True`.
44

55
```{note}
66
Kernel session persistence will be automtically enabled whenever availability mode is configured.
@@ -16,13 +16,13 @@ Known issues include:
1616
We hope to address these in future releaases (depending on demand).
1717
```
1818

19-
## Single-instance availability
19+
## Standalone availability
2020

21-
_Single-instance availability_ assumes that, upon failure of the original EG instance, another EG instance will be started. Upon startup of the second instance (following the termination of the first), EG will attempt to load and reconnect to all kernels that were deemed active when the previous instance terminated. This mode is somewhat analogous to the classic HA/DR mode of _active-passive_ and is typically used when node resources are at a premium or the number of replicas (in the Kubernetes sense) must remain at 1.
21+
_Standalone availability_ assumes that, upon failure of the original EG instance, another EG instance will be started. Upon startup of the second instance (following the termination of the first), EG will attempt to load and reconnect to all kernels that were deemed active when the previous instance terminated. This mode is somewhat analogous to the classic HA/DR mode of _active-passive_ and is typically used when node resources are at a premium or the number of replicas (in the Kubernetes sense) must remain at 1.
2222

23-
To enable Enterprise Gateway for 'single-instance' availability, configure `EnterpiseGatewayApp.availability_mode=single-instance` or set env `EG_AVAILABILITY_MODE=single-instance`.
23+
To enable Enterprise Gateway for 'standalone' availability, configure `EnterpiseGatewayApp.availability_mode=standalone` or set env `EG_AVAILABILITY_MODE=standalone`.
2424

25-
Here's an example for starting Enterprise Gateway with single-instance availability:
25+
Here's an example for starting Enterprise Gateway with standalone availability:
2626

2727
```bash
2828
#!/bin/bash
@@ -31,7 +31,7 @@ LOG=/var/log/enterprise_gateway.log
3131
PIDFILE=/var/run/enterprise_gateway.pid
3232

3333
jupyter enterprisegateway --ip=0.0.0.0 --port_retries=0 --log-level=DEBUG \
34-
--EnterpriseGatewayApp.availability_mode=single-instance > $LOG 2>&1 &
34+
--EnterpriseGatewayApp.availability_mode=standalone > $LOG 2>&1 &
3535

3636
if [ "$?" -eq 0 ]; then
3737
echo $! > $PIDFILE
@@ -40,23 +40,23 @@ else
4040
fi
4141
```
4242

43-
## Multi-instance availability
43+
## Replication availability
4444

45-
With _multi-instance availability_, multiple EG instances are operating at the same time, and fronted with some kind of reverse proxy or load balancer. Because state still resides within each `KernelManager` instance executing within a given EG instance, we strongly suggest configuring some form of _client affinity_ (a.k.a, "sticky session") to avoid node switches wherever possible since each node switch requires manual reconnection of the front-end (today).
45+
With _replication availability_, multiple EG instances (or replicas) are operating at the same time, and fronted with some kind of reverse proxy or load balancer. Because state still resides within each `KernelManager` instance executing within a given EG instance, we strongly suggest configuring some form of _client affinity_ (a.k.a, "sticky session") to avoid node switches wherever possible since each node switch requires manual reconnection of the front-end (today).
4646

4747
```{tip}
4848
Configuring client affinity is **strongly recommended**, otherwise functionality that relies on state within the servicing node (e.g., culling) can be affected upon node switches, resulting in incorrect behavior.
4949
```
5050

5151
In this mode, when one node goes down, the subsequent request will be routed to a different node that doesn't know about the kernel. Prior to returning a `404` (not found) status code, EG will check its persisted store to determine if the kernel was managed and, if so, attempt to "hydrate" a `KernelManager` instance associated with the remote kernel. (Of course, if the kernel was running local to the downed server, chances are it cannot be _revived_.) Upon successful "hydration" the request continues as if on the originating node. Because _client affinity_ is in place, subsequent requests should continue to be routed to the "servicing node".
5252

53-
To enable Enterprise Gateway for 'multi-instance' availability, configure `EnterpiseGatewayApp.availability_mode=multi-instance` or set env `EG_AVAILABILITY_MODE=multi-instance`.
53+
To enable Enterprise Gateway for 'replication' availability, configure `EnterpiseGatewayApp.availability_mode=replication` or set env `EG_AVAILABILITY_MODE=replication`.
5454

5555
```{attention}
56-
To preserve backwards compatibility, if only kernel session persistence is enabled via `KernelSessionManager.enable_persistence=True`, the availability mode will be automatically configured to 'multi-instance' if `EnterpiseGatewayApp.availability_mode` is not configured.
56+
To preserve backwards compatibility, if only kernel session persistence is enabled via `KernelSessionManager.enable_persistence=True`, the availability mode will be automatically configured to 'replication' if `EnterpiseGatewayApp.availability_mode` is not configured.
5757
```
5858

59-
Here's an example for starting Enterprise Gateway with multi-instance availability:
59+
Here's an example for starting Enterprise Gateway with replication availability:
6060

6161
```bash
6262
#!/bin/bash
@@ -65,7 +65,7 @@ LOG=/var/log/enterprise_gateway.log
6565
PIDFILE=/var/run/enterprise_gateway.pid
6666

6767
jupyter enterprisegateway --ip=0.0.0.0 --port_retries=0 --log-level=DEBUG \
68-
--EnterpriseGatewayApp.availability_mode=multi-instance > $LOG 2>&1 &
68+
--EnterpriseGatewayApp.availability_mode=replication > $LOG 2>&1 &
6969

7070
if [ "$?" -eq 0 ]; then
7171
echo $! > $PIDFILE
@@ -74,25 +74,25 @@ else
7474
fi
7575
```
7676

77-
## Kernel Session Persistence
77+
# Kernel Session Persistence
7878

7979
Enabling kernel session persistence allows Jupyter Notebooks to reconnect to kernels when Enterprise Gateway is restarted and forms the basis for the _availability modes_ described above. Enterprise Gateway provides two ways of persisting kernel sessions: _File Kernel Session Persistence_ and _Webhook Kernel Session Persistence_, although others can be provided by subclassing `KernelSessionManager` (see below).
8080

8181
```{attention}
82-
Due to its experimental nature, kernel session persistence is disabled by default. To enable this functionality, you must configure `KernelSessionManger.enable_persistence=True` or configure `EnterpriseGatewayApp.availability_mode` to either `single-instance` or `multi-instance`.
82+
Due to its experimental nature, kernel session persistence is disabled by default. To enable this functionality, you must configure `KernelSessionManger.enable_persistence=True` or configure `EnterpriseGatewayApp.availability_mode` to either `standalone` or `replication`.
8383
```
8484

8585
As noted above, the availability modes rely on the persisted information relative to the kernel. This information consists of the arguments and options used to launch the kernel, along with its connection information. In essence, it consists of any information necessary to re-establish communication with the kernel.
8686

87-
### File Kernel Session Persistence
87+
## File Kernel Session Persistence
8888

8989
File Kernel Session Persistence stores kernel sessions as files in a specified directory. To enable this form of persistence, set the environment variable `EG_KERNEL_SESSION_PERSISTENCE=True` or configure `FileKernelSessionManager.enable_persistence=True`. To change the directory in which the kernel session file is being saved, either set the environment variable `EG_PERSISTENCE_ROOT` or configure `FileKernelSessionManager.persistence_root` to the directory. By default, the directory used to store a given kernel's session information is the `JUPYTER_DATA_DIR`.
9090

9191
```{note}
9292
Because `FileKernelSessionManager` is the default class for kernel session persistence, configuring `EnterpriseGatewayApp.kernel_session_manager_class` to `enterprise_gateway.services.sessions.kernelsessionmanager.FileKernelSessionManager` is not necessary.
9393
```
9494

95-
### Webhook Kernel Session Persistence
95+
## Webhook Kernel Session Persistence
9696

9797
Webhook Kernel Session Persistence stores all kernel sessions to any database. In order for this to work, an API must be created. The API must include four endpoints:
9898

@@ -112,15 +112,15 @@ To enable the webhook kernel session persistence, set the environment variable `
112112

113113
Because `WebhookKernelSessionManager` is not the default kernel session persistence class, an additional configuration step must be taken to instruct EG to use this class: `EnterpriseGatewayApp.kernel_session_manager_class = enterprise_gateway.services.sessions.kernelsessionmanager.WebhookKernelSessionManager`.
114114

115-
#### Enabling Authentication
115+
### Enabling Authentication
116116

117117
Enabling authentication is an option if the API requires it for requests. Set the environment variable `EG_AUTH_TYPE` or configure `WebhookKernelSessionManager.auth_type` to be either `Basic` or `Digest`. If it is set to an empty string authentication won't be enabled.
118118

119119
Then set the environment variables `EG_WEBHOOK_USERNAME` and `EG_WEBHOOK_PASSWORD` or configure `WebhookKernelSessionManager.webhook_username` and `WebhookKernelSessionManager.webhook_password` to provide the username and password for authentication.
120120

121-
### Bring Your Own Kernel Session Persistence
121+
## Bring Your Own Kernel Session Persistence
122122

123-
To introduce a different implementation, you must configure the kernel session manager class. Here's an example for starting Enterprise Gateway using a custom `KernelSessionManager` and 'single-instance' availability. Note that setting `--MyCustomKernelSessionManager.enable_persistence=True` is not necessary because an availability mode is specified, but displayed here for completeness:
123+
To introduce a different implementation, you must configure the kernel session manager class. Here's an example for starting Enterprise Gateway using a custom `KernelSessionManager` and 'standalone' availability. Note that setting `--MyCustomKernelSessionManager.enable_persistence=True` is not necessary because an availability mode is specified, but displayed here for completeness:
124124

125125
```bash
126126
#!/bin/bash
@@ -131,7 +131,7 @@ PIDFILE=/var/run/enterprise_gateway.pid
131131
jupyter enterprisegateway --ip=0.0.0.0 --port_retries=0 --log-level=DEBUG \
132132
--EnterpriseGatewayApp.kernel_session_manager_class=custom.package.MyCustomKernelSessionManager \
133133
--MyCustomKernelSessionManager.enable_persistence=True \
134-
--EnterpriseGatewayApp.availability_mode=single-instance > $LOG 2>&1 &
134+
--EnterpriseGatewayApp.availability_mode=standalone > $LOG 2>&1 &
135135

136136
if [ "$?" -eq 0 ]; then
137137
echo $! > $PIDFILE
@@ -142,7 +142,7 @@ fi
142142

143143
Alternative persistence implementations using SQL and NoSQL databases would be ideal and, as always, contributions are welcome!
144144

145-
### Testing Kernel Session Persistence
145+
## Testing Kernel Session Persistence
146146

147147
Once kernel session persistence has been enabled and configured, create a kernel by opening up a Jupyter Notebook. Save some variable in that notebook and shutdown Enterprise Gateway using `kill -9 PID`, where `PID` is the PID of gateway. Restart Enterprise Gateway and refresh you notebook tab. If all worked correctly, the variable should be loaded without the need to rerun the cell.
148148

enterprise_gateway/enterprisegatewayapp.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ def init_configurables(self):
145145
# mode is not enabled, go ahead and default availability mode to 'multi-instance'.
146146
if self.kernel_session_manager.enable_persistence:
147147
if self.availability_mode is None:
148-
self.availability_mode = "multi-instance"
148+
self.availability_mode = EnterpriseGatewayConfigMixin.AVAILABILITY_REPLICATION
149149
self.log.info(
150150
f"Kernel session persistence is enabled but availability mode is not. "
151151
f"Setting EnterpriseGatewayApp.availability_mode to '{self.availability_mode}'."
@@ -161,7 +161,7 @@ def init_configurables(self):
161161
)
162162

163163
# If we're using single-instance availability, attempt to start persisted sessions
164-
if self.availability_mode == "single-instance":
164+
if self.availability_mode == EnterpriseGatewayConfigMixin.AVAILABILITY_STANDALONE:
165165
self.kernel_session_manager.start_sessions()
166166

167167
self.contents_manager = None # Gateways don't use contents manager
@@ -272,11 +272,11 @@ def _build_ssl_options(self) -> Optional[ssl.SSLContext]:
272272
return ssl_context
273273

274274
def init_http_server(self):
275-
"""Initializes a HTTP server for the Tornado web application on the
275+
"""Initializes an HTTP server for the Tornado web application on the
276276
configured interface and port.
277277
278278
Tries to find an open port if the one configured is not available using
279-
the same logic as the Jupyer Notebook server.
279+
the same logic as the Jupyter Notebook server.
280280
"""
281281
ssl_options = self._build_ssl_options()
282282
self.http_server = httpserver.HTTPServer(

enterprise_gateway/mixins.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -682,14 +682,15 @@ def dynamic_config_interval_changed(self, event):
682682
dynamic_config_poller = None
683683

684684
# Availability Mode
685+
AVAILABILITY_STANDALONE = "standalone"
686+
AVAILABILITY_REPLICATION = "replication"
685687
availability_mode_env = "EG_AVAILABILITY_MODE"
686688
availability_mode_default_value = None
687689
availability_mode = CaselessStrEnum(
688690
allow_none=True,
689-
values=["multi-instance", "single-instance"],
691+
values=[AVAILABILITY_REPLICATION, AVAILABILITY_STANDALONE],
690692
config=True,
691-
help="""Specifies the type of availability. Values must be one of "single-instance" or "multi-instance".
692-
Configuration of this this option requires that KernelSessionManager.enable_persistence is True.
693+
help="""Specifies the type of availability. Values must be one of "standalone" or "replication".
693694
(EG_AVAILABILITY_MODE env var)""",
694695
)
695696

enterprise_gateway/services/kernels/remotemanager.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -163,10 +163,11 @@ def check_kernel_id(self, kernel_id):
163163
raise web.HTTPError(404, "Kernel does not exist: %s" % kernel_id)
164164

165165
def _refresh_kernel(self, kernel_id) -> bool:
166-
if not self.parent.availability_mode or self.parent.availability_mode == "single-instance":
167-
return False
168-
self.parent.kernel_session_manager.load_session(kernel_id)
169-
return self.parent.kernel_session_manager.start_session(kernel_id)
166+
if self.parent.availability_mode == EnterpriseGatewayConfigMixin.AVAILABILITY_REPLICATION:
167+
self.parent.kernel_session_manager.load_session(kernel_id)
168+
return self.parent.kernel_session_manager.start_session(kernel_id)
169+
# else we should throw 404 when not using an availability mode of 'replication'
170+
return False
170171

171172
async def start_kernel(self, *args, **kwargs):
172173
"""

enterprise_gateway/tests/test_gatewayapp.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
from tornado.testing import AsyncHTTPTestCase, ExpectLog
1010

1111
from enterprise_gateway.enterprisegatewayapp import EnterpriseGatewayApp
12+
from enterprise_gateway.mixins import EnterpriseGatewayConfigMixin
1213

1314
RESOURCES = os.path.join(os.path.dirname(__file__), "resources")
1415

@@ -49,7 +50,9 @@ def _assert_envs_to_traitlets(self, env_prefix: str):
4950
self.assertEqual(app.ssl_version, 3)
5051
if env_prefix == "EG_": # These options did not exist in JKG
5152
self.assertEqual(app.kernel_session_manager.enable_persistence, True)
52-
self.assertEqual(app.availability_mode, "multi-instance")
53+
self.assertEqual(
54+
app.availability_mode, EnterpriseGatewayConfigMixin.AVAILABILITY_REPLICATION
55+
)
5356

5457
def test_config_env_vars_bc(self):
5558
"""B/C env vars should be honored for traitlets."""
@@ -96,7 +99,7 @@ def test_config_env_vars(self):
9699
os.environ["EG_SSL_VERSION"] = "3"
97100
os.environ[
98101
"EG_KERNEL_SESSION_PERSISTENCE"
99-
] = "True" # availability mode will be defaulted to multi-instance
102+
] = "True" # availability mode will be defaulted to replication
100103

101104
self._assert_envs_to_traitlets("EG_")
102105

0 commit comments

Comments
 (0)