add: use SO_REUSEPORT on platform supporting it#4703
Conversation
c252511 to
27c16d7
Compare
|
This is failing all tests, I'd suggest to put these type of PRs as draft. |
Hmm... worked before latest push. Will switch to draft and fix. |
@steve-chavez - it looks like the issue is that on my machine PostgREST startup is fast enough so that it loads the schema cache before it accepts any requests. Here in CI the new instance fails with The question: is there any particular reason why we return |
We were aiming to have requests wait instead of 503, this waiting does happen during schema cache reload but not on startup; we discussed this on #4129. Would it be better to not listen on the socket? How would clients behave in this case? |
|
They would get a "connection refused" error, which means nobody there and is imo more confusing that any 5xx error. UX-wise I would prefer some waiting to a presumably hard fail any day. |
I am not convinced, see below. This is a complex topic so let's dig into it a little more. The startup sequence right now is:
The alternatives are:
b - listening on a socket only after schema cache loaded
So from the point of view of the clients (they don't know when Postgrest was started), we have 3 alternatives:
I am not sure what value clients get from the first two options comparing to the third one. Diagnostics and readiness checks should be done using In case of
So the first two options cause disruptions whereas the third one is fully zero-downtime and transparent to the clients. My take on it would be:
This would require splitting binding from listening on the main socket (ie. we need to bind without listening first so that we can pass the socket to the admin server). @steve-chavez @develop7 thoughts? |
|
Dependent on resolving (or having a workaround to) yesodweb/wai#853 |
64bb6ca to
2cb6503
Compare
Agree, sounds much better. |
7d7375a to
bd4c7ec
Compare
|
Conflicted in the changelog. |
|
|
||
| Zero-Downtime Upgrades | ||
| ====================== | ||
|
|
There was a problem hiding this comment.
| :author: `mkleczek <https://github.com/mkleczek>`_ |
We've been doing this for almost all how-tos:
|
|
||
| The TCP port to bind the web server. Use ``0`` to automatically assign a port. | ||
|
|
||
| On operating systems that support ``SO_REUSEPORT``, you can start multiple |
There was a problem hiding this comment.
Let's put a heading and anchor here so we can link it from other places
| On operating systems that support ``SO_REUSEPORT``, you can start multiple | |
| .. _reuseport: | |
| SO_REUSEPORT | |
| ~~~~~~~~~~~~~ | |
| On operating systems that support ``SO_REUSEPORT``, you can start multiple |
0e7d4aa to
680c6e1
Compare
c12c6fd to
bec258c
Compare
5641f65 to
71ce2e3
Compare
| host=None, | ||
| wait_for=Admin.ready, | ||
| wait_max_seconds=1, | ||
| wait_max_seconds=3, |
There was a problem hiding this comment.
Why did we increase the default? Could this be done for particular tests instead?
There was a problem hiding this comment.
It was somewhat flaky on my machine. Can roll it back if needed.
| When running multiple PostgREST instances on the same :ref:`server-port`, use | ||
| a different ``admin-server-port`` for each instance. Admin ports are not shared | ||
| between instances, so readiness checks always target one specific PostgREST | ||
| instance. | ||
|
|
There was a problem hiding this comment.
I'd suggest putting all these paragraphs under the reuseport section, otherwise it's kinda hard to hunt them down.
There was a problem hiding this comment.
Added this paragraph on my suggestion: https://github.com/PostgREST/postgrest/pull/4703/changes#r3470538404
Can be deleted from here if you agree
| When :ref:`server-reuseport` is enabled on an operating system that supports | ||
| ``SO_REUSEPORT``, you can start multiple PostgREST instances on the same | ||
| :ref:`server-host` and ``server-port``. For example, two PostgREST processes | ||
| can use the same configuration: | ||
|
|
||
| .. code:: ini | ||
|
|
||
| server-host = "127.0.0.1" | ||
| server-port = 3000 | ||
| server-reuseport = true | ||
|
|
||
| New connections are then distributed by the operating system between the | ||
| running PostgREST processes. This can be used to start a replacement process | ||
| before stopping the old one, or to run several PostgREST processes behind one | ||
| port. | ||
|
|
||
| If ``server-reuseport`` is disabled, starting another PostgREST process on | ||
| the same host and port will fail with the usual address-in-use error. | ||
|
|
||
| .. _server-reuseport: | ||
|
|
||
| server-reuseport | ||
| ---------------- | ||
|
|
||
| =============== ================================= | ||
| **Type** Bool | ||
| **Default** false | ||
| **Reloadable** N | ||
| **Environment** PGRST_SERVER_REUSEPORT | ||
| **In-Database** `n/a` | ||
| =============== ================================= | ||
|
|
||
| Enables ``SO_REUSEPORT`` on the TCP server socket. This allows multiple | ||
| PostgREST processes to bind to the same :ref:`server-host` and | ||
| :ref:`server-port` when the operating system supports it. | ||
|
|
||
| Enabling this setting on an operating system that does not support | ||
| ``SO_REUSEPORT`` is a configuration error. PostgREST will fail to start | ||
| instead of falling back to a normal TCP socket. | ||
|
|
||
| This setting does not apply when :ref:`server-unix-socket` is used. | ||
|
|
There was a problem hiding this comment.
Ditto here, maybe like:
| When :ref:`server-reuseport` is enabled on an operating system that supports | |
| ``SO_REUSEPORT``, you can start multiple PostgREST instances on the same | |
| :ref:`server-host` and ``server-port``. For example, two PostgREST processes | |
| can use the same configuration: | |
| .. code:: ini | |
| server-host = "127.0.0.1" | |
| server-port = 3000 | |
| server-reuseport = true | |
| New connections are then distributed by the operating system between the | |
| running PostgREST processes. This can be used to start a replacement process | |
| before stopping the old one, or to run several PostgREST processes behind one | |
| port. | |
| If ``server-reuseport`` is disabled, starting another PostgREST process on | |
| the same host and port will fail with the usual address-in-use error. | |
| .. _server-reuseport: | |
| server-reuseport | |
| ---------------- | |
| =============== ================================= | |
| **Type** Bool | |
| **Default** false | |
| **Reloadable** N | |
| **Environment** PGRST_SERVER_REUSEPORT | |
| **In-Database** `n/a` | |
| =============== ================================= | |
| Enables ``SO_REUSEPORT`` on the TCP server socket. This allows multiple | |
| PostgREST processes to bind to the same :ref:`server-host` and | |
| :ref:`server-port` when the operating system supports it. | |
| Enabling this setting on an operating system that does not support | |
| ``SO_REUSEPORT`` is a configuration error. PostgREST will fail to start | |
| instead of falling back to a normal TCP socket. | |
| This setting does not apply when :ref:`server-unix-socket` is used. | |
| .. _server-reuseport: | |
| server-reuseport | |
| ---------------- | |
| =============== ================================= | |
| **Type** Bool | |
| **Default** false | |
| **Reloadable** N | |
| **Environment** PGRST_SERVER_REUSEPORT | |
| **In-Database** `n/a` | |
| =============== ================================= | |
| Enables ``SO_REUSEPORT`` on the TCP server socket. This allows multiple | |
| PostgREST processes to bind to the same :ref:`server-host` and | |
| :ref:`server-port` when the operating system supports it. | |
| For example, two PostgREST processes can use the same configuration | |
| .. code:: ini | |
| server-host = "127.0.0.1" | |
| server-port = 3000 | |
| server-reuseport = true | |
| New connections are then distributed by the operating system between the | |
| running PostgREST processes. This can be used to start a replacement process | |
| before stopping the old one, or to run several PostgREST processes behind one | |
| port. | |
| Use a different ``admin-server-port`` for each instance. Admin ports are not shared | |
| between instances: | |
| - Readiness checks always target one specific PostgREST | |
| - Give each instance a different :ref:`admin-server-port`, otherwise the new instance will fail to start. | |
| Enabling this setting on an operating system that does not support | |
| ``SO_REUSEPORT`` is a configuration error. PostgREST will fail to start | |
| instead of falling back to a normal TCP socket. | |
| This setting does not apply when :ref:`server-unix-socket` is used. |
| Multiple PostgREST instances can share the same public API host and port when | ||
| :ref:`server-reuseport` is enabled on operating systems that support | ||
| ``SO_REUSEPORT``. Admin ports are not shared: give each instance a different | ||
| :ref:`admin-server-port`, otherwise the new instance will fail to start. |
There was a problem hiding this comment.
Included this in https://github.com/PostgREST/postgrest/pull/4703/changes#r3470538404. To have everything in one place.
| If the machine has multiple network interfaces, configure concrete | ||
| :ref:`server-host` and :ref:`admin-server-host` values when you need health | ||
| checks to target a specific process. Avoid special values (``!4``, ``*``, etc) | ||
| in this case because the health check could report a false positive. |
There was a problem hiding this comment.
This doesn't look related to this feature?
There was a problem hiding this comment.
This doesn't look related to this feature?
Not directly but I added it because it is important in this case: we have multiple PostgREST processes running at the same time and it is easy to target the wrong one with health checks.
There was a problem hiding this comment.
Right, make sense. But it feels a bit out of place here. I believe it should go inside the server-reuseport section in the config.
There was a problem hiding this comment.
So let me see whether I understand the problem this tries to hint at: I turn on server-reuseport. I set server-host at the default of !4. This automatically applies to admin-server-host as well, I think. I now accidentally set the admin-server-port to the same value for both instances. According to the note further up, I would expect this to fail, because the same port for the admin server is used.
But it's not, because it's using a different interface. So I run two admin servers on the same port, but on different interfaces. Now, things start to break.
Is this what you had in mind?
If yes, I feel like it fits right in here. But it should be more framed as an exception to the above rule ("admin servers on the same port will fail to start").
If not.. please elaborate.
DISCLAIMER:
This commit was authored entirely by a human without the assistance of LLMs.
Fixes #4694
Stacked on top of #4702 as it is not enough to start a new instance, it is also necessary not to fail in-flight requests on the old instance.