Skip to content

Commit 77efb47

Browse files
authored
admin: Support /drain_listeners?graceful (envoyproxy#11639)
Calling /drain_listeners?graceful will trigger the drain manager drain sequence prior to closing listeners. Risk Level: Low. Testing: Tested that connections are terminated on request complete during the graceful drain period, that new connections can still be opened, H1/H2-specific response behaviour. Docs Changes: Add docs to admin.rst, improve the overall drain sequence documentation. Signed-off-by: Auni Ahsan <[email protected]>
1 parent f190f05 commit 77efb47

File tree

8 files changed

+160
-14
lines changed

8 files changed

+160
-14
lines changed

docs/root/intro/arch_overview/operations/draining.rst

+29-13
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,42 @@
33
Draining
44
========
55

6-
Draining is the process by which Envoy attempts to gracefully shed connections in response to
7-
various events. Draining occurs at the following times:
6+
In a few different scenarios, Envoy will attempt to gracefully shed connections. For instance,
7+
during server shutdown, existing requests can be discouraged and listeners set to stop accepting,
8+
to reduce the number of open connections when the server shuts down. Draining behaviour is defined
9+
by the server options in addition to individual listener configs.
810

11+
Draining occurs at the following times:
12+
13+
* The server is being :ref:`hot restarted <arch_overview_hot_restart>`.
14+
* The server begins the graceful drain sequence via the :ref:`drain_listeners?graceful
15+
<operations_admin_interface_drain>` admin endpoint.
916
* The server has been manually health check failed via the :ref:`healthcheck/fail
1017
<operations_admin_interface_healthcheck_fail>` admin endpoint. See the :ref:`health check filter
1118
<arch_overview_health_checking_filter>` architecture overview for more information.
12-
* The server is being :ref:`hot restarted <arch_overview_hot_restart>`.
1319
* Individual listeners are being modified or removed via :ref:`LDS
1420
<arch_overview_dynamic_config_lds>`.
1521

22+
By default, the Envoy server will close listeners immediately on server shutdown. To drain listeners
23+
for some duration of time prior to server shutdown, use :ref:`drain_listeners <operations_admin_interface_drain>`
24+
before shutting down the server. The listeners will be directly stopped without any graceful draining behaviour,
25+
and cease accepting new connections immediately.
26+
27+
To add a graceful drain period prior to listeners being closed, use the query parameter
28+
:ref:`drain_listeners?graceful <operations_admin_interface_drain>`. By default, Envoy
29+
will discourage requests for some period of time (as determined by :option:`--drain-time-s`).
30+
The behaviour of request discouraging is determined by the drain manager.
31+
32+
Note that although draining is a per-listener concept, it must be supported at the network filter
33+
level. Currently the only filters that support graceful draining are
34+
:ref:`Redis <config_network_filters_redis_proxy>`,
35+
:ref:`Mongo <config_network_filters_mongo_proxy>`,
36+
and :ref:`HTTP connection manager <config_http_conn_man>`.
37+
38+
By default, the :ref:`HTTP connection manager <config_http_conn_man>` filter will
39+
add "Connection: close" to HTTP1 requests, send HTTP2 GOAWAY, and terminate connections
40+
on request completion (after the delayed close period).
41+
1642
Each :ref:`configured listener <arch_overview_listeners>` has a :ref:`drain_type
1743
<envoy_v3_api_enum_config.listener.v3.Listener.DrainType>` setting which controls when draining takes place. The currently
1844
supported values are:
@@ -27,13 +53,3 @@ modify_only
2753
It may be desirable to set *modify_only* on egress listeners so they only drain during
2854
modifications while relying on ingress listener draining to perform full server draining when
2955
attempting to do a controlled shutdown.
30-
31-
Note that although draining is a per-listener concept, it must be supported at the network filter
32-
level. Currently the only filters that support graceful draining are
33-
:ref:`HTTP connection manager <config_http_conn_man>`,
34-
:ref:`Redis <config_network_filters_redis_proxy>`, and
35-
:ref:`Mongo <config_network_filters_mongo_proxy>`.
36-
37-
Listeners can also be stopped via :ref:`drain_listeners <operations_admin_interface_drain>`. In this case,
38-
they are directly stopped (without going through the actual draining process) on worker threads,
39-
so that they will not accept any new requests.

docs/root/operations/admin.rst

+6
Original file line numberDiff line numberDiff line change
@@ -258,6 +258,12 @@ modify different aspects of the server:
258258
:ref:`Listener <envoy_v3_api_msg_config.listener.v3.Listener>` is used to determine whether a listener
259259
is inbound or outbound.
260260

261+
.. http:post:: /drain_listeners?graceful
262+
263+
When draining listeners, enter a graceful drain period prior to closing listeners.
264+
This behaviour and duration is configurable via server options or CLI
265+
(:option:`--drain-time-s` and :option:`--drain-strategy`).
266+
261267
.. attention::
262268

263269
This operation directly stops the matched listeners on workers. Once listeners in a given

include/envoy/server/drain_manager.h

+5
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,11 @@ class DrainManager : public Network::DrainDecision {
2121
*/
2222
virtual void startDrainSequence(std::function<void()> drain_complete_cb) PURE;
2323

24+
/**
25+
* @return whether the drain sequence has started.
26+
*/
27+
virtual bool draining() const PURE;
28+
2429
/**
2530
* Invoked in the newly launched primary process to begin the parent shutdown sequence. At the end
2631
* of the sequence the previous primary process will be terminated.

source/server/admin/listeners_handler.cc

+15-1
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,24 @@ ListenersHandler::ListenersHandler(Server::Instance& server) : HandlerContextBas
1616
Http::Code ListenersHandler::handlerDrainListeners(absl::string_view url, Http::ResponseHeaderMap&,
1717
Buffer::Instance& response, AdminStream&) {
1818
const Http::Utility::QueryParams params = Http::Utility::parseQueryString(url);
19+
1920
ListenerManager::StopListenersType stop_listeners_type =
2021
params.find("inboundonly") != params.end() ? ListenerManager::StopListenersType::InboundOnly
2122
: ListenerManager::StopListenersType::All;
22-
server_.listenerManager().stopListeners(stop_listeners_type);
23+
24+
const bool graceful = params.find("graceful") != params.end();
25+
if (graceful) {
26+
// Ignore calls to /drain_listeners?graceful if the drain sequence has
27+
// already started.
28+
if (!server_.drainManager().draining()) {
29+
server_.drainManager().startDrainSequence([this, stop_listeners_type]() {
30+
server_.listenerManager().stopListeners(stop_listeners_type);
31+
});
32+
}
33+
} else {
34+
server_.listenerManager().stopListeners(stop_listeners_type);
35+
}
36+
2337
response.add("OK\n");
2438
return Http::Code::OK;
2539
}

source/server/drain_manager_impl.h

+1
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ class DrainManagerImpl : Logger::Loggable<Logger::Id::main>, public DrainManager
2828

2929
// Server::DrainManager
3030
void startDrainSequence(std::function<void()> drain_complete_cb) override;
31+
bool draining() const override { return draining_; }
3132
void startParentShutdownSequence() override;
3233

3334
private:

test/integration/drain_close_integration_test.cc

+100
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,106 @@ TEST_P(DrainCloseIntegrationTest, DrainCloseImmediate) {
7575

7676
TEST_P(DrainCloseIntegrationTest, AdminDrain) { testAdminDrain(downstreamProtocol()); }
7777

78+
TEST_P(DrainCloseIntegrationTest, AdminGracefulDrain) {
79+
drain_strategy_ = Server::DrainStrategy::Immediate;
80+
drain_time_ = std::chrono::seconds(999);
81+
initialize();
82+
fake_upstreams_[0]->set_allow_unexpected_disconnects(true);
83+
uint32_t http_port = lookupPort("http");
84+
codec_client_ = makeHttpConnection(http_port);
85+
86+
auto response = codec_client_->makeHeaderOnlyRequest(default_request_headers_);
87+
waitForNextUpstreamRequest(0);
88+
upstream_request_->encodeHeaders(default_response_headers_, true);
89+
response->waitForEndStream();
90+
ASSERT_TRUE(response->complete());
91+
EXPECT_THAT(response->headers(), Http::HttpStatusIs("200"));
92+
// The request is completed but the connection remains open.
93+
EXPECT_TRUE(codec_client_->connected());
94+
95+
// Invoke /drain_listeners with graceful drain
96+
BufferingStreamDecoderPtr admin_response = IntegrationUtil::makeSingleRequest(
97+
lookupPort("admin"), "POST", "/drain_listeners?graceful", "", downstreamProtocol(), version_);
98+
EXPECT_EQ(admin_response->headers().Status()->value().getStringView(), "200");
99+
100+
// With a 999s graceful drain period, the listener should still be open.
101+
EXPECT_EQ(test_server_->counter("listener_manager.listener_stopped")->value(), 0);
102+
103+
response = codec_client_->makeHeaderOnlyRequest(default_request_headers_);
104+
waitForNextUpstreamRequest(0);
105+
upstream_request_->encodeHeaders(default_response_headers_, true);
106+
response->waitForEndStream();
107+
ASSERT_TRUE(response->complete());
108+
EXPECT_THAT(response->headers(), Http::HttpStatusIs("200"));
109+
110+
// Connections will terminate on request complete
111+
ASSERT_TRUE(codec_client_->waitForDisconnect());
112+
if (downstream_protocol_ == Http::CodecClient::Type::HTTP2) {
113+
EXPECT_TRUE(codec_client_->sawGoAway());
114+
} else {
115+
EXPECT_EQ("close", response->headers().getConnectionValue());
116+
}
117+
118+
// New connections can still be made.
119+
auto second_codec_client_ = makeRawHttpConnection(makeClientConnection(http_port));
120+
EXPECT_TRUE(second_codec_client_->connected());
121+
122+
// Invoke /drain_listeners and shut down listeners.
123+
second_codec_client_->rawConnection().close(Network::ConnectionCloseType::NoFlush);
124+
admin_response = IntegrationUtil::makeSingleRequest(
125+
lookupPort("admin"), "POST", "/drain_listeners", "", downstreamProtocol(), version_);
126+
EXPECT_EQ(admin_response->headers().Status()->value().getStringView(), "200");
127+
128+
test_server_->waitForCounterEq("listener_manager.listener_stopped", 1);
129+
EXPECT_NO_THROW(Network::TcpListenSocket(
130+
Network::Utility::getAddressWithPort(*Network::Test::getCanonicalLoopbackAddress(version_),
131+
http_port),
132+
nullptr, true));
133+
}
134+
135+
TEST_P(DrainCloseIntegrationTest, RepeatedAdminGracefulDrain) {
136+
// Use the default gradual probabilistic DrainStrategy so drainClose()
137+
// behaviour isn't conflated with whether the drain sequence has started.
138+
drain_time_ = std::chrono::seconds(999);
139+
initialize();
140+
fake_upstreams_[0]->set_allow_unexpected_disconnects(true);
141+
uint32_t http_port = lookupPort("http");
142+
codec_client_ = makeHttpConnection(http_port);
143+
144+
auto response = codec_client_->makeHeaderOnlyRequest(default_request_headers_);
145+
waitForNextUpstreamRequest(0);
146+
upstream_request_->encodeHeaders(default_response_headers_, true);
147+
response->waitForEndStream();
148+
149+
// Invoke /drain_listeners with graceful drain
150+
BufferingStreamDecoderPtr admin_response = IntegrationUtil::makeSingleRequest(
151+
lookupPort("admin"), "POST", "/drain_listeners?graceful", "", downstreamProtocol(), version_);
152+
EXPECT_EQ(admin_response->headers().Status()->value().getStringView(), "200");
153+
EXPECT_EQ(test_server_->counter("listener_manager.listener_stopped")->value(), 0);
154+
155+
admin_response = IntegrationUtil::makeSingleRequest(
156+
lookupPort("admin"), "POST", "/drain_listeners?graceful", "", downstreamProtocol(), version_);
157+
EXPECT_EQ(admin_response->headers().Status()->value().getStringView(), "200");
158+
EXPECT_EQ(admin_response->headers().Status()->value().getStringView(), "200");
159+
160+
response = codec_client_->makeHeaderOnlyRequest(default_request_headers_);
161+
waitForNextUpstreamRequest(0);
162+
upstream_request_->encodeHeaders(default_response_headers_, true);
163+
response->waitForEndStream();
164+
ASSERT_TRUE(response->complete());
165+
EXPECT_THAT(response->headers(), Http::HttpStatusIs("200"));
166+
167+
admin_response = IntegrationUtil::makeSingleRequest(
168+
lookupPort("admin"), "POST", "/drain_listeners", "", downstreamProtocol(), version_);
169+
EXPECT_EQ(admin_response->headers().Status()->value().getStringView(), "200");
170+
171+
test_server_->waitForCounterEq("listener_manager.listener_stopped", 1);
172+
EXPECT_NO_THROW(Network::TcpListenSocket(
173+
Network::Utility::getAddressWithPort(*Network::Test::getCanonicalLoopbackAddress(version_),
174+
http_port),
175+
nullptr, true));
176+
}
177+
78178
INSTANTIATE_TEST_SUITE_P(Protocols, DrainCloseIntegrationTest,
79179
testing::ValuesIn(HttpProtocolIntegrationTest::getProtocolTestParams(
80180
{Http::CodecClient::Type::HTTP1, Http::CodecClient::Type::HTTP2},

test/mocks/server/mocks.h

+1
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,7 @@ class MockDrainManager : public DrainManager {
192192

193193
// Server::DrainManager
194194
MOCK_METHOD(bool, drainClose, (), (const));
195+
MOCK_METHOD(bool, draining, (), (const));
195196
MOCK_METHOD(void, startDrainSequence, (std::function<void()> completion));
196197
MOCK_METHOD(void, startParentShutdownSequence, ());
197198

test/server/drain_manager_impl_test.cc

+3
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,10 @@ TEST_P(DrainManagerImplTest, DrainDeadlineProbability) {
126126
EXPECT_TRUE(drain_manager.drainClose());
127127
EXPECT_CALL(server_, healthCheckFailed()).WillRepeatedly(Return(false));
128128
EXPECT_FALSE(drain_manager.drainClose());
129+
EXPECT_FALSE(drain_manager.draining());
130+
129131
drain_manager.startDrainSequence([] {});
132+
EXPECT_TRUE(drain_manager.draining());
130133

131134
if (drain_gradually) {
132135
// random() should be called when elapsed time < drain timeout

0 commit comments

Comments
 (0)