Skip to content

Commit 69f2dfc

Browse files
authored
hot restart: provide a mechanism for obtaining a base-id dynamically (envoyproxy#11357)
Provides a --use-dynamic-base-id flag to select an unused base-id. Primarily useful for testing, but generally available. Adds a --base-id-path flag where Envoy writes the base id to a file. Converts tests to use the dynamic base id selection rather than trying to keep all base ids unique. Signed-off-by: Stephan Zuercher <[email protected]>
1 parent 8cb2958 commit 69f2dfc

28 files changed

+330
-140
lines changed

api/envoy/admin/v3/server_info.proto

+7-1
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ message ServerInfo {
5454
CommandLineOptions command_line_options = 6;
5555
}
5656

57-
// [#next-free-field: 31]
57+
// [#next-free-field: 33]
5858
message CommandLineOptions {
5959
option (udpa.annotations.versioning).previous_message_type =
6060
"envoy.admin.v2alpha.CommandLineOptions";
@@ -82,6 +82,12 @@ message CommandLineOptions {
8282
// See :option:`--base-id` for details.
8383
uint64 base_id = 1;
8484

85+
// See :option:`--use-dynamic-base-id` for details.
86+
bool use_dynamic_base_id = 31;
87+
88+
// See :option:`--base-id-path` for details.
89+
string base_id_path = 32;
90+
8591
// See :option:`--concurrency` for details.
8692
uint32 concurrency = 2;
8793

api/envoy/admin/v4alpha/server_info.proto

+7-1
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ message ServerInfo {
5454
CommandLineOptions command_line_options = 6;
5555
}
5656

57-
// [#next-free-field: 31]
57+
// [#next-free-field: 33]
5858
message CommandLineOptions {
5959
option (udpa.annotations.versioning).previous_message_type = "envoy.admin.v3.CommandLineOptions";
6060

@@ -81,6 +81,12 @@ message CommandLineOptions {
8181
// See :option:`--base-id` for details.
8282
uint64 base_id = 1;
8383

84+
// See :option:`--use-dynamic-base-id` for details.
85+
bool use_dynamic_base_id = 31;
86+
87+
// See :option:`--base-id-path` for details.
88+
string base_id_path = 32;
89+
8490
// See :option:`--concurrency` for details.
8591
uint32 concurrency = 2;
8692

docs/root/intro/arch_overview/operations/hot_restart.rst

+6
Original file line numberDiff line numberDiff line change
@@ -26,3 +26,9 @@ hot restart functionality has the following general architecture:
2626
the processes takes place only using unix domain sockets.
2727
* An example restarter/parent process written in Python is included in the source distribution. This
2828
parent process is usable with standard process control utilities such as monit/runit/etc.
29+
30+
Envoy's default command line options assume that only a single set of Envoy processes is running on
31+
a given host: an active Envoy server process and, potentially, a draining Envoy server process that
32+
will exit as described above. The :option:`--base-id` or :option:`--use-dynamic-base-id` options
33+
may be used to allow multiple, distinctly configured Envoys to run on the same host and hot restart
34+
independently.

docs/root/operations/cli.rst

+22-8
Original file line numberDiff line numberDiff line change
@@ -66,10 +66,24 @@ following are the command line options that Envoy supports.
6666
set this option. However, if Envoy needs to be run multiple times on the same machine, each
6767
running Envoy will need a unique base ID so that the shared memory regions do not conflict.
6868

69+
.. option:: --use-dynamic-base-id
70+
71+
*(optional)* Selects an unused base ID to use when allocating shared memory regions. Using
72+
preselected values with :option:`--base-id` is preferred, however. If this option is enabled,
73+
it supersedes the :option:`--base-id` value. This flag may not be used when the value of
74+
:option:`--restart-epoch` is non-zero. Instead, for subsequent hot restarts, set
75+
:option:`--base-id` option with the selected base ID. See :option:`--base-id-path`.
76+
77+
.. option:: --base-id-path <path_string>
78+
79+
*(optional)* Writes the base ID to the given path. While this option is compatible with
80+
:option:`--base-id`, its intended use is to provide access to the dynamic base ID selected by
81+
:option:`--use-dynamic-base-id`.
82+
6983
.. option:: --concurrency <integer>
7084

7185
*(optional)* The number of :ref:`worker threads <arch_overview_threading>` to run. If not
72-
specified defaults to the number of hardware threads on the machine. If set to zero, Envoy will
86+
specified defaults to the number of hardware threads on the machine. If set to zero, Envoy will
7387
still run one worker thread.
7488

7589
.. option:: -l <string>, --log-level <string>
@@ -79,9 +93,9 @@ following are the command line options that Envoy supports.
7993

8094
.. option:: --component-log-level <string>
8195

82-
*(optional)* The comma separated list of logging level per component. Non developers should generally
83-
never set this option. For example, if you want `upstream` component to run at `debug` level and
84-
`connection` component to run at `trace` level, you should pass ``upstream:debug,connection:trace`` to
96+
*(optional)* The comma separated list of logging level per component. Non developers should generally
97+
never set this option. For example, if you want `upstream` component to run at `debug` level and
98+
`connection` component to run at `trace` level, you should pass ``upstream:debug,connection:trace`` to
8599
this flag. See ``ALL_LOGGER_IDS`` in :repo:`/source/common/common/logger.h` for a list of components.
86100

87101
.. option:: --cpuset-threads
@@ -239,11 +253,11 @@ following are the command line options that Envoy supports.
239253

240254
.. option:: --drain-time-s <integer>
241255

242-
*(optional)* The time in seconds that Envoy will drain connections during
256+
*(optional)* The time in seconds that Envoy will drain connections during
243257
a :ref:`hot restart <arch_overview_hot_restart>` or when individual listeners are being
244-
modified or removed via :ref:`LDS <arch_overview_dynamic_config_lds>`.
245-
Defaults to 600 seconds (10 minutes). Generally the drain time should be less than
246-
the parent shutdown time set via the :option:`--parent-shutdown-time-s` option. How the two
258+
modified or removed via :ref:`LDS <arch_overview_dynamic_config_lds>`.
259+
Defaults to 600 seconds (10 minutes). Generally the drain time should be less than
260+
the parent shutdown time set via the :option:`--parent-shutdown-time-s` option. How the two
247261
settings are configured depends on the specific deployment. In edge scenarios, it might be
248262
desirable to have a very long drain time. In service to service scenarios, it might be possible
249263
to make the drain and shutdown time much shorter (e.g., 60s/90s).

docs/root/operations/hot_restarter.rst

+10-3
Original file line numberDiff line numberDiff line change
@@ -21,15 +21,22 @@ The restarter is invoked like so:
2121
2222
ulimit -n {{ pillar.get('envoy_max_open_files', '102400') }}
2323
sysctl fs.inotify.max_user_watches={{ pillar.get('envoy_max_inotify_watches', '524288') }}
24-
24+
2525
exec /usr/sbin/envoy -c /etc/envoy/envoy.cfg --restart-epoch $RESTART_EPOCH --service-cluster {{ grains['cluster_name'] }} --service-node {{ grains['service_node'] }} --service-zone {{ grains.get('ec2_availability-zone', 'unknown') }}
2626
2727
Note on `inotify.max_user_watches`: If Envoy is being configured to watch many files for configuration in a directory
2828
on a Linux machine, increase this value as Linux enforces limits on the maximum number of files that can be watched.
29-
30-
The *RESTART_EPOCH* environment variable is set by the restarter on each restart and can be passed
29+
30+
The *RESTART_EPOCH* environment variable is set by the restarter on each restart and must be passed
3131
to the :option:`--restart-epoch` option.
3232

33+
.. warning::
34+
35+
Special care must be taken if you wish to use the :option:`--use-dynamic-base-id` option. That
36+
flag may only be set when the *RESTART_EPOCH* is 0 and your *start_envoy.sh* must obtain the
37+
chosen base ID (via :option:`--base-id-path`), store it, and use it as the :option:`--base-id`
38+
value on subsequent invocations (when *RESTART_EPOCH* is greater than 0).
39+
3340
The restarter handles the following signals:
3441

3542
* **SIGTERM** or **SIGINT** (Ctrl-C): Will cleanly terminate all child processes and exit.

docs/root/version_history/current.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Minor Behavior Changes
1313
*Changes that may cause incompatibilities for some users, but should not for most*
1414

1515
* access loggers: applied existing buffer limits to access logs, as well as :ref:`stats <config_access_log_stats>` for logged / dropped logs. This can be reverted temporarily by setting runtime feature `envoy.reloadable_features.disallow_unbounded_access_logs` to false.
16+
* hot restart: added the option :option:`--use-dynamic-base-id` to select an unused base ID at startup and the option :option:`--base-id-path` to write the base id to a file (for reuse with later hot restarts).
1617
* http: fixed several bugs with applying correct connection close behavior across the http connection manager, health checker, and connection pool. This behavior may be temporarily reverted by setting runtime feature `envoy.reloadable_features.fix_connection_close` to false.
1718
* http: fixed a bug where the upgrade header was not cleared on responses to non-upgrade requests.
1819
Can be reverted temporarily by setting runtime feature `envoy.reloadable_features.fix_upgrade_response` to false.
@@ -68,7 +69,7 @@ New Features
6869
* listener: added in place filter chain update flow for tcp listener update which doesn't close connections if the corresponding network filter chain is equivalent during the listener update.
6970
Can be disabled by setting runtime feature `envoy.reloadable_features.listener_in_place_filterchain_update` to false.
7071
Also added additional draining filter chain stat for :ref:`listener manager <config_listener_manager_stats>` to track the number of draining filter chains and the number of in place update attempts.
71-
* logger: added :ref:`--log-format-prefix-with-location <operations_cli>` command line option to prefix '%v' with file path and line number.
72+
* logger: added :option:`--log-format-prefix-with-location` command line option to prefix '%v' with file path and line number.
7273
* lrs: added new *envoy_api_field_service.load_stats.v2.LoadStatsResponse.send_all_clusters* field
7374
in LRS response, which allows management servers to avoid explicitly listing all clusters it is
7475
interested in; behavior is allowed based on new "envoy.lrs.supports_send_all_clusters" capability

generated_api_shadow/envoy/admin/v3/server_info.proto

+7-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

generated_api_shadow/envoy/admin/v4alpha/server_info.proto

+7-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

include/envoy/server/hot_restart.h

+14
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,11 @@ class HotRestart {
7979
*/
8080
virtual void shutdown() PURE;
8181

82+
/**
83+
* Return the base id used to generate a domain socket name.
84+
*/
85+
virtual uint32_t baseId() PURE;
86+
8287
/**
8388
* Return the hot restart compatibility version so that operations code can decide whether to
8489
* perform a full or hot restart.
@@ -96,5 +101,14 @@ class HotRestart {
96101
virtual Thread::BasicLockable& accessLogLock() PURE;
97102
};
98103

104+
/**
105+
* HotRestartDomainSocketInUseException is thrown during HotRestart construction only when the
106+
* underlying domain socket is in use.
107+
*/
108+
class HotRestartDomainSocketInUseException : public EnvoyException {
109+
public:
110+
HotRestartDomainSocketInUseException(const std::string& what) : EnvoyException(what) {}
111+
};
112+
99113
} // namespace Server
100114
} // namespace Envoy

include/envoy/server/options.h

+11
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,17 @@ class Options {
5959
*/
6060
virtual uint64_t baseId() const PURE;
6161

62+
/**
63+
* @return bool choose an unused base ID dynamically. The chosen base id can be written to a
64+
* a file using the baseIdPath option.
65+
*/
66+
virtual bool useDynamicBaseId() const PURE;
67+
68+
/**
69+
* @return const std::string& the dynamic base id output file.
70+
*/
71+
virtual const std::string& baseIdPath() const PURE;
72+
6273
/**
6374
* @return the number of worker threads to run in the server.
6475
*/

source/exe/main_common.cc

+57-8
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
#include "exe/main_common.h"
22

3+
#include <fstream>
34
#include <iostream>
45
#include <memory>
56
#include <new>
67

78
#include "envoy/config/listener/v3/listener.pb.h"
89

910
#include "common/common/compiler_requirements.h"
11+
#include "common/common/logger.h"
1012
#include "common/common/perf_annotation.h"
1113
#include "common/network/utility.h"
1214
#include "common/stats/symbol_table_creator.h"
@@ -58,14 +60,7 @@ MainCommonBase::MainCommonBase(const OptionsImpl& options, Event::TimeSystem& ti
5860
switch (options_.mode()) {
5961
case Server::Mode::InitOnly:
6062
case Server::Mode::Serve: {
61-
#ifdef ENVOY_HOT_RESTART
62-
if (!options.hotRestartDisabled()) {
63-
restarter_ = std::make_unique<Server::HotRestartImpl>(options_);
64-
}
65-
#endif
66-
if (restarter_ == nullptr) {
67-
restarter_ = std::make_unique<Server::HotRestartNopImpl>();
68-
}
63+
configureHotRestarter(*random_generator);
6964

7065
tls_ = std::make_unique<ThreadLocal::InstanceImpl>();
7166
Thread::BasicLockable& log_lock = restarter_->logLock();
@@ -106,6 +101,60 @@ void MainCommonBase::configureComponentLogLevels() {
106101
}
107102
}
108103

104+
void MainCommonBase::configureHotRestarter(Runtime::RandomGenerator& random_generator) {
105+
#ifdef ENVOY_HOT_RESTART
106+
if (!options_.hotRestartDisabled()) {
107+
uint32_t base_id = options_.baseId();
108+
109+
if (options_.useDynamicBaseId()) {
110+
ASSERT(options_.restartEpoch() == 0, "cannot use dynamic base id during hot restart");
111+
112+
std::unique_ptr<Server::HotRestart> restarter;
113+
114+
// Try 100 times to get an unused base ID and then give up under the assumption
115+
// that some other problem has occurred to prevent binding the domain socket.
116+
for (int i = 0; i < 100 && restarter == nullptr; i++) {
117+
// HotRestartImpl is going to multiply this value by 10, so leave head room.
118+
base_id = static_cast<uint32_t>(random_generator.random()) & 0x0FFFFFFF;
119+
120+
try {
121+
restarter = std::make_unique<Server::HotRestartImpl>(base_id, 0);
122+
} catch (Server::HotRestartDomainSocketInUseException& ex) {
123+
// No luck, try again.
124+
ENVOY_LOG_MISC(debug, "dynamic base id: {}", ex.what());
125+
}
126+
}
127+
128+
if (restarter == nullptr) {
129+
throw EnvoyException("unable to select a dynamic base id");
130+
}
131+
132+
restarter_.swap(restarter);
133+
} else {
134+
restarter_ = std::make_unique<Server::HotRestartImpl>(base_id, options_.restartEpoch());
135+
}
136+
137+
// Write the base-id to the requested path whether we selected it
138+
// dynamically or not.
139+
if (!options_.baseIdPath().empty()) {
140+
std::ofstream base_id_out_file(options_.baseIdPath());
141+
if (!base_id_out_file) {
142+
ENVOY_LOG_MISC(critical, "cannot open base id output file {} for writing.",
143+
options_.baseIdPath());
144+
} else {
145+
base_id_out_file << base_id;
146+
}
147+
}
148+
}
149+
#else
150+
UNREFERENCED_PARAMETER(random_generator);
151+
#endif
152+
153+
if (restarter_ == nullptr) {
154+
restarter_ = std::make_unique<Server::HotRestartNopImpl>();
155+
}
156+
}
157+
109158
bool MainCommonBase::run() {
110159
switch (options_.mode()) {
111160
case Server::Mode::Serve:

source/exe/main_common.h

+1
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ class MainCommonBase {
8787

8888
private:
8989
void configureComponentLogLevels();
90+
void configureHotRestarter(Runtime::RandomGenerator& random_generator);
9091
};
9192

9293
// TODO(jmarantz): consider removing this class; I think it'd be more useful to

0 commit comments

Comments
 (0)