Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug logs for port bindings when Docker deploy image fails #761

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MadLittleMods
Copy link
Collaborator

@MadLittleMods MadLittleMods commented Jan 29, 2025

Debug logs for port bindings when Docker deploy image fails.

This is spawning from bind: address already in use failures we were seeing in CI and will hopefully give a better picture of what's happening.

Example bind: address already in use error:

Deploy: Deploy returned error Deploy: Failed to deploy image {Containers:-1 Created:1738100303 ID:sha256:4d01dfad2d62669c5784270111944528031170173ddc0b66d96f58ae700812d5 Labels:map[complement_blueprint:5_servers complement_context:sidecar_tests.5_servers.hs4 complement_hs_name:hs4 complement_pkg:sidecar_tests gitsha1:68e27e962c946074d6f9e18e98e55b3fe4f4c128 org.opencontainers.image.documentation:https://github.com/element-hq/synapse/blob/master/docker/README.md org.opencontainers.image.licenses:AGPL-3.0-or-later org.opencontainers.image.source:https://github.com/element-hq/synapse.git org.opencontainers.image.url:https://matrix.org/docs/projects/server/synapse] ParentID:sha256:2ae79e697d44ba986e3369e5a5197328e7b9638fa20102c9ca259e885ae71eea RepoDigests:[] RepoTags:[localhost/complement:sidecar_tests.5_servers.hs4] SharedSize:-1 Size:1400625929 VirtualSize:0} : Error response from daemon: driver failed programming external connectivity on endpoint complement_sidecar_tests_3_sideca
  r_tests.5_servers.hs4_2 (fc1bbb210acbcb5c13e64dc1336b36e914f4c8a5069db8d0ded1e12f0f26f10f): Error starting userland proxy: listen tcp4 127.0.0.1:33090: bind: address already in use

Example logs that this PR will output

$ COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh -run TestFederationRoomsInvite
...

============== While deploying complement_fed_1_fed.2_servers.hs2_2 : START ALL COMPLEMENT DOCKER PORT BINDINGS ==============
Container: 6c4665d6a12f063f0a0c0707fe826ac7bb1ef7642d9cc2b8189206c28aa30c12: [/complement_fed_1_fed.2_servers.hs1_1]
    (host) -> (container)
    0.0.0.0:33494 -> 8009/tcp
    0.0.0.0:33495 -> 8080/tcp
    127.0.0.1:33107 -> 8448/tcp
    127.0.0.1:33106 -> 8008/tcp
Container: 8cb4e7a3725d291a3bb6762d7ac2af4cf6f66be64b8947542f8d299e4a01b645: [/complement_fed_1_fed.2_servers.hs2_2]
    (host) -> (container)
    127.0.0.1:33105 -> 8448/tcp
    127.0.0.1:33104 -> 8008/tcp
    0.0.0.0:33492 -> 8009/tcp
    0.0.0.0:33493 -> 8080/tcp
Container: f3f21372d83bc31bd17ccd1636c1084d9187fad9bcb6ce987348d01576ee0406: [/complement_fed.2_servers.hs2]
    (host) -> (container)
Container: aaa75bed21d55804aead3ab5bb10695803bb5c0a1f72d9b892f99d02d2548659: [/complement_fed.2_servers.hs1]
    (host) -> (container)
=============== While deploying complement_fed_1_fed.2_servers.hs2_2 : END ALL COMPLEMENT DOCKER PORT BINDINGS ===============

What's the root cause?

I suspect that we might be running into bugs with the Docker port allocation code since Complement is creating the images in parallel (I think).

Compounding the problem might be that we're exposing many more ports than usual in the out-of-repo Complement image that we're using and Complement is using PublishAllPorts: true when deploying the image which will take all of our exposed ports and make port bindings to the host for each.

Bugs in this part of Docker is not uncommon, just search for "address already in use" in the Docker codebase. For example, moby/moby#48274 has some smoke.


The first layer of parallelism that we can control is multiple images within a single deploy which I think we can add a mutex around things to control.

Multiple test packages running in parallel (controlled by -p n) is a good second factor that is probably harder to figure out.

$ go help build
...

-p n
    the number of programs, such as build commands or
    test binaries, that can be run in parallel.
    The default is GOMAXPROCS, normally the number of CPUs available.

Pull Request Checklist

Signed-off-by: Eric Eastwood [email protected]

@MadLittleMods MadLittleMods marked this pull request as ready for review January 29, 2025 17:56
@MadLittleMods MadLittleMods requested review from kegsay and a team as code owners January 29, 2025 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant