Rename availability modes per review

kevin-bates · kevin-bates · commit 44d8b2cf5dda · 2022-06-13T14:20:04.000-07:00
diff --git a/docs/source/operators/config-availability.md b/docs/source/operators/config-availability.md
@@ -1,6 +1,6 @@
 # Availability modes
 
-Enterprise Gateway can be optionally configured in one of two "availability modes": _single-instance_ or _multi-instance_. When configured, Enterprise Gateway can recover from failures and reconnect to any active remote kernels that were previously managed by the terminated EG instance. As such, both modes require that kernel session persistence also be enabled via `KernelSessionManager.enable_persistence=True`.
+Enterprise Gateway can be optionally configured in one of two "availability modes": _standalone_ or _replication_. When configured, Enterprise Gateway can recover from failures and reconnect to any active remote kernels that were previously managed by the terminated EG instance. As such, both modes require that kernel session persistence also be enabled via `KernelSessionManager.enable_persistence=True`.
 
 ```{note}
 Kernel session persistence will be automtically enabled whenever availability mode is configured.
@@ -16,13 +16,13 @@ Known issues include:
 We hope to address these in future releaases (depending on demand).
 ```
 
-## Single-instance availability
+## Standalone availability
 
-_Single-instance availability_ assumes that, upon failure of the original EG instance, another EG instance will be started. Upon startup of the second instance (following the termination of the first), EG will attempt to load and reconnect to all kernels that were deemed active when the previous instance terminated. This mode is somewhat analogous to the classic HA/DR mode of _active-passive_ and is typically used when node resources are at a premium or the number of replicas (in the Kubernetes sense) must remain at 1.
+_Standalone availability_ assumes that, upon failure of the original EG instance, another EG instance will be started. Upon startup of the second instance (following the termination of the first), EG will attempt to load and reconnect to all kernels that were deemed active when the previous instance terminated. This mode is somewhat analogous to the classic HA/DR mode of _active-passive_ and is typically used when node resources are at a premium or the number of replicas (in the Kubernetes sense) must remain at 1.
 
-To enable Enterprise Gateway for 'single-instance' availability, configure `EnterpiseGatewayApp.availability_mode=single-instance` or set env `EG_AVAILABILITY_MODE=single-instance`.
+To enable Enterprise Gateway for 'standalone' availability, configure `EnterpiseGatewayApp.availability_mode=standalone` or set env `EG_AVAILABILITY_MODE=standalone`.
 
-Here's an example for starting Enterprise Gateway with single-instance availability:
+Here's an example for starting Enterprise Gateway with standalone availability:
 
 ```bash
 #!/bin/bash
@@ -31,7 +31,7 @@ LOG=/var/log/enterprise_gateway.log
 PIDFILE=/var/run/enterprise_gateway.pid
 
 jupyter enterprisegateway --ip=0.0.0.0 --port_retries=0 --log-level=DEBUG \
-   --EnterpriseGatewayApp.availability_mode=single-instance > $LOG 2>&1 &
+   --EnterpriseGatewayApp.availability_mode=standalone > $LOG 2>&1 &
 
 if [ "$?" -eq 0 ]; then
   echo $! > $PIDFILE
@@ -40,23 +40,23 @@ else
 fi
 ```
 
-## Multi-instance availability
+## Replication availability
 
-With _multi-instance availability_, multiple EG instances are operating at the same time, and fronted with some kind of reverse proxy or load balancer. Because state still resides within each `KernelManager` instance executing within a given EG instance, we strongly suggest configuring some form of _client affinity_ (a.k.a, "sticky session") to avoid node switches wherever possible since each node switch requires manual reconnection of the front-end (today).
+With _replication availability_, multiple EG instances (or replicas) are operating at the same time, and fronted with some kind of reverse proxy or load balancer. Because state still resides within each `KernelManager` instance executing within a given EG instance, we strongly suggest configuring some form of _client affinity_ (a.k.a, "sticky session") to avoid node switches wherever possible since each node switch requires manual reconnection of the front-end (today).
 
 ```{tip}
 Configuring client affinity is **strongly recommended**, otherwise functionality that relies on state within the servicing node (e.g., culling) can be affected upon node switches, resulting in incorrect behavior.
 ```
 
 In this mode, when one node goes down, the subsequent request will be routed to a different node that doesn't know about the kernel. Prior to returning a `404` (not found) status code, EG will check its persisted store to determine if the kernel was managed and, if so, attempt to "hydrate" a `KernelManager` instance associated with the remote kernel. (Of course, if the kernel was running local to the downed server, chances are it cannot be _revived_.) Upon successful "hydration" the request continues as if on the originating node. Because _client affinity_ is in place, subsequent requests should continue to be routed to the "servicing node".
 
-To enable Enterprise Gateway for 'multi-instance' availability, configure `EnterpiseGatewayApp.availability_mode=multi-instance` or set env `EG_AVAILABILITY_MODE=multi-instance`.
+To enable Enterprise Gateway for 'replication' availability, configure `EnterpiseGatewayApp.availability_mode=replication` or set env `EG_AVAILABILITY_MODE=replication`.
 
 ```{attention}
-To preserve backwards compatibility, if only kernel session persistence is enabled via `KernelSessionManager.enable_persistence=True`, the availability mode will be automatically configured to 'multi-instance' if `EnterpiseGatewayApp.availability_mode` is not configured.
+To preserve backwards compatibility, if only kernel session persistence is enabled via `KernelSessionManager.enable_persistence=True`, the availability mode will be automatically configured to 'replication' if `EnterpiseGatewayApp.availability_mode` is not configured.
 ```
 
-Here's an example for starting Enterprise Gateway with multi-instance availability:
+Here's an example for starting Enterprise Gateway with replication availability:
 
 ```bash
 #!/bin/bash
@@ -65,7 +65,7 @@ LOG=/var/log/enterprise_gateway.log
 PIDFILE=/var/run/enterprise_gateway.pid
 
 jupyter enterprisegateway --ip=0.0.0.0 --port_retries=0 --log-level=DEBUG \
-   --EnterpriseGatewayApp.availability_mode=multi-instance > $LOG 2>&1 &
+   --EnterpriseGatewayApp.availability_mode=replication > $LOG 2>&1 &
 
 if [ "$?" -eq 0 ]; then
   echo $! > $PIDFILE
@@ -74,25 +74,25 @@ else
 fi
 ```
 
-## Kernel Session Persistence
+# Kernel Session Persistence
 
 Enabling kernel session persistence allows Jupyter Notebooks to reconnect to kernels when Enterprise Gateway is restarted and forms the basis for the _availability modes_ described above. Enterprise Gateway provides two ways of persisting kernel sessions: _File Kernel Session Persistence_ and _Webhook Kernel Session Persistence_, although others can be provided by subclassing `KernelSessionManager` (see below).
 
 ```{attention}
-Due to its experimental nature, kernel session persistence is disabled by default. To enable this functionality, you must configure `KernelSessionManger.enable_persistence=True` or configure `EnterpriseGatewayApp.availability_mode` to either `single-instance` or `multi-instance`.
+Due to its experimental nature, kernel session persistence is disabled by default. To enable this functionality, you must configure `KernelSessionManger.enable_persistence=True` or configure `EnterpriseGatewayApp.availability_mode` to either `standalone` or `replication`.
 ```
 
 As noted above, the availability modes rely on the persisted information relative to the kernel. This information consists of the arguments and options used to launch the kernel, along with its connection information. In essence, it consists of any information necessary to re-establish communication with the kernel.
 
-### File Kernel Session Persistence
+## File Kernel Session Persistence
 
 File Kernel Session Persistence stores kernel sessions as files in a specified directory. To enable this form of persistence, set the environment variable `EG_KERNEL_SESSION_PERSISTENCE=True` or configure `FileKernelSessionManager.enable_persistence=True`. To change the directory in which the kernel session file is being saved, either set the environment variable `EG_PERSISTENCE_ROOT` or configure `FileKernelSessionManager.persistence_root` to the directory. By default, the directory used to store a given kernel's session information is the `JUPYTER_DATA_DIR`.
 
 ```{note}
 Because `FileKernelSessionManager` is the default class for kernel session persistence, configuring `EnterpriseGatewayApp.kernel_session_manager_class` to `enterprise_gateway.services.sessions.kernelsessionmanager.FileKernelSessionManager` is not necessary.
 ```
 
-### Webhook Kernel Session Persistence
+## Webhook Kernel Session Persistence
 
 Webhook Kernel Session Persistence stores all kernel sessions to any database. In order for this to work, an API must be created. The API must include four endpoints:
 
@@ -112,15 +112,15 @@ To enable the webhook kernel session persistence, set the environment variable `
 
 Because `WebhookKernelSessionManager` is not the default kernel session persistence class, an additional configuration step must be taken to instruct EG to use this class: `EnterpriseGatewayApp.kernel_session_manager_class = enterprise_gateway.services.sessions.kernelsessionmanager.WebhookKernelSessionManager`.
 
-#### Enabling Authentication
+### Enabling Authentication
 
 Enabling authentication is an option if the API requires it for requests. Set the environment variable `EG_AUTH_TYPE` or configure `WebhookKernelSessionManager.auth_type` to be either `Basic` or `Digest`. If it is set to an empty string authentication won't be enabled.
 
 Then set the environment variables `EG_WEBHOOK_USERNAME` and `EG_WEBHOOK_PASSWORD` or configure `WebhookKernelSessionManager.webhook_username` and `WebhookKernelSessionManager.webhook_password` to provide the username and password for authentication.
 
-### Bring Your Own Kernel Session Persistence
+## Bring Your Own Kernel Session Persistence
 
-To introduce a different implementation, you must configure the kernel session manager class. Here's an example for starting Enterprise Gateway using a custom `KernelSessionManager` and 'single-instance' availability. Note that setting `--MyCustomKernelSessionManager.enable_persistence=True` is not necessary because an availability mode is specified, but displayed here for completeness:
+To introduce a different implementation, you must configure the kernel session manager class. Here's an example for starting Enterprise Gateway using a custom `KernelSessionManager` and 'standalone' availability. Note that setting `--MyCustomKernelSessionManager.enable_persistence=True` is not necessary because an availability mode is specified, but displayed here for completeness:
 
 ```bash
 #!/bin/bash
@@ -131,7 +131,7 @@ PIDFILE=/var/run/enterprise_gateway.pid
 jupyter enterprisegateway --ip=0.0.0.0 --port_retries=0 --log-level=DEBUG \
    --EnterpriseGatewayApp.kernel_session_manager_class=custom.package.MyCustomKernelSessionManager \
    --MyCustomKernelSessionManager.enable_persistence=True \
-   --EnterpriseGatewayApp.availability_mode=single-instance > $LOG 2>&1 &
+   --EnterpriseGatewayApp.availability_mode=standalone > $LOG 2>&1 &
 
 if [ "$?" -eq 0 ]; then
   echo $! > $PIDFILE
@@ -142,7 +142,7 @@ fi
 
 Alternative persistence implementations using SQL and NoSQL databases would be ideal and, as always, contributions are welcome!
 
-### Testing Kernel Session Persistence
+## Testing Kernel Session Persistence
 
 Once kernel session persistence has been enabled and configured, create a kernel by opening up a Jupyter Notebook. Save some variable in that notebook and shutdown Enterprise Gateway using `kill -9 PID`, where `PID` is the PID of gateway. Restart Enterprise Gateway and refresh you notebook tab. If all worked correctly, the variable should be loaded without the need to rerun the cell.
 
diff --git a/enterprise_gateway/enterprisegatewayapp.py b/enterprise_gateway/enterprisegatewayapp.py
@@ -145,7 +145,7 @@ def init_configurables(self):
         # mode is not enabled, go ahead and default availability mode to 'multi-instance'.
         if self.kernel_session_manager.enable_persistence:
             if self.availability_mode is None:
-                self.availability_mode = "multi-instance"
+                self.availability_mode = EnterpriseGatewayConfigMixin.AVAILABILITY_REPLICATION
                 self.log.info(
                     f"Kernel session persistence is enabled but availability mode is not.  "
                     f"Setting EnterpriseGatewayApp.availability_mode to '{self.availability_mode}'."
@@ -161,7 +161,7 @@ def init_configurables(self):
                 )
 
         # If we're using single-instance availability, attempt to start persisted sessions
-        if self.availability_mode == "single-instance":
+        if self.availability_mode == EnterpriseGatewayConfigMixin.AVAILABILITY_STANDALONE:
             self.kernel_session_manager.start_sessions()
 
         self.contents_manager = None  # Gateways don't use contents manager
@@ -272,11 +272,11 @@ def _build_ssl_options(self) -> Optional[ssl.SSLContext]:
         return ssl_context
 
     def init_http_server(self):
-        """Initializes a HTTP server for the Tornado web application on the
+        """Initializes an HTTP server for the Tornado web application on the
         configured interface and port.
 
         Tries to find an open port if the one configured is not available using
-        the same logic as the Jupyer Notebook server.
+        the same logic as the Jupyter Notebook server.
         """
         ssl_options = self._build_ssl_options()
         self.http_server = httpserver.HTTPServer(
diff --git a/enterprise_gateway/mixins.py b/enterprise_gateway/mixins.py
@@ -682,14 +682,15 @@ def dynamic_config_interval_changed(self, event):
     dynamic_config_poller = None
 
     # Availability Mode
+    AVAILABILITY_STANDALONE = "standalone"
+    AVAILABILITY_REPLICATION = "replication"
     availability_mode_env = "EG_AVAILABILITY_MODE"
     availability_mode_default_value = None
     availability_mode = CaselessStrEnum(
         allow_none=True,
-        values=["multi-instance", "single-instance"],
+        values=[AVAILABILITY_REPLICATION, AVAILABILITY_STANDALONE],
         config=True,
-        help="""Specifies the type of availability.  Values must be one of "single-instance" or "multi-instance".
-                Configuration of this this option requires that KernelSessionManager.enable_persistence is True.
+        help="""Specifies the type of availability.  Values must be one of "standalone" or "replication".
                 (EG_AVAILABILITY_MODE env var)""",
     )
 
diff --git a/enterprise_gateway/services/kernels/remotemanager.py b/enterprise_gateway/services/kernels/remotemanager.py
@@ -163,10 +163,11 @@ def check_kernel_id(self, kernel_id):
                 raise web.HTTPError(404, "Kernel does not exist: %s" % kernel_id)
 
     def _refresh_kernel(self, kernel_id) -> bool:
-        if not self.parent.availability_mode or self.parent.availability_mode == "single-instance":
-            return False
-        self.parent.kernel_session_manager.load_session(kernel_id)
-        return self.parent.kernel_session_manager.start_session(kernel_id)
+        if self.parent.availability_mode == EnterpriseGatewayConfigMixin.AVAILABILITY_REPLICATION:
+            self.parent.kernel_session_manager.load_session(kernel_id)
+            return self.parent.kernel_session_manager.start_session(kernel_id)
+        # else we should throw 404 when not using an availability mode of 'replication'
+        return False
 
     async def start_kernel(self, *args, **kwargs):
         """
diff --git a/enterprise_gateway/tests/test_gatewayapp.py b/enterprise_gateway/tests/test_gatewayapp.py
@@ -9,6 +9,7 @@
 from tornado.testing import AsyncHTTPTestCase, ExpectLog
 
 from enterprise_gateway.enterprisegatewayapp import EnterpriseGatewayApp
+from enterprise_gateway.mixins import EnterpriseGatewayConfigMixin
 
 RESOURCES = os.path.join(os.path.dirname(__file__), "resources")
 
@@ -49,7 +50,9 @@ def _assert_envs_to_traitlets(self, env_prefix: str):
         self.assertEqual(app.ssl_version, 3)
         if env_prefix == "EG_":  # These options did not exist in JKG
             self.assertEqual(app.kernel_session_manager.enable_persistence, True)
-            self.assertEqual(app.availability_mode, "multi-instance")
+            self.assertEqual(
+                app.availability_mode, EnterpriseGatewayConfigMixin.AVAILABILITY_REPLICATION
+            )
 
     def test_config_env_vars_bc(self):
         """B/C env vars should be honored for traitlets."""
@@ -96,7 +99,7 @@ def test_config_env_vars(self):
         os.environ["EG_SSL_VERSION"] = "3"
         os.environ[
             "EG_KERNEL_SESSION_PERSISTENCE"
-        ] = "True"  # availability mode will be defaulted to multi-instance
+        ] = "True"  # availability mode will be defaulted to replication
 
         self._assert_envs_to_traitlets("EG_")