Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPv6 address handling broken in LAN <-> WAN join flooder #22225

Open
tkren opened this issue Mar 14, 2025 · 0 comments · May be fixed by #22226
Open

IPv6 address handling broken in LAN <-> WAN join flooder #22225

tkren opened this issue Mar 14, 2025 · 0 comments · May be fixed by #22226

Comments

@tkren
Copy link

tkren commented Mar 14, 2025

Overview of the Issue

When advertise_addr is an IPv6 address, Consul WAN federation breaks and we get a constant stream of warnings that the other server nodes are cannot be looked up:

Mar 14 23:12:00 i-006ba90b993de97b9 consul[2131]: 2025-03-14T23:12:00.837Z [WARN]  agent.server.memberlist.wan: memberlist: Failed to resolve i-02a3c94b46768b01f.aws-eu-west-1/2a05:d018:18a3:f202:64d9:1621:ab22:df45:8302: lookup 2a05:d018:18a3:f202:64d9:1621:ab22:df45:8302: no such host                                                                                             
Mar 14 23:12:00 i-006ba90b993de97b9 consul[2131]: 2025-03-14T23:12:00.837Z [DEBUG] agent.server: Failed to flood-join server at address: server=i-02a3c94b46768b01f.aws-eu-west-1 address=2a05:d018:18a3:f202:64d9:1621:ab22:df45:8302                                                                                                                                                      
Mar 14 23:12:00 i-006ba90b993de97b9 consul[2131]:   error=                                                                                                                                                                                                                                                                                                                                  
Mar 14 23:12:00 i-006ba90b993de97b9 consul[2131]:   | 1 error occurred:                                                                                                                                                                                                                                                                                                                     
Mar 14 23:12:00 i-006ba90b993de97b9 consul[2131]:   | \t* Failed to resolve i-02a3c94b46768b01f.aws-eu-west-1/2a05:d018:18a3:f202:64d9:1621:ab22:df45:8302: lookup 2a05:d018:18a3:f202:64d9:1621:ab22:df45:8302: no such host                                                                                                                                                               
Mar 14 23:12:00 i-006ba90b993de97b9 consul[2131]:   |                                                                                                                                                                                                                                                                                                                                       
Mar 14 23:12:00 i-006ba90b993de97b9 consul[2131]:                                                                                                                                                                                                                                                                                                                                           
Mar 14 23:12:00 i-006ba90b993de97b9 consul[2131]: 2025-03-14T23:12:00.837Z [WARN]  agent.server.memberlist.wan: memberlist: Failed to resolve i-09a493e6d3274ff9b.aws-eu-west-1/2a05:d018:18a3:f201:7332:d29e:acc1:2a1c:8302: lookup 2a05:d018:18a3:f201:7332:d29e:acc1:2a1c:8302: no such host                                                                                             
Mar 14 23:12:00 i-006ba90b993de97b9 consul[2131]: 2025-03-14T23:12:00.837Z [DEBUG] agent.server: Failed to flood-join server at address: server=i-09a493e6d3274ff9b.aws-eu-west-1 address=2a05:d018:18a3:f201:7332:d29e:acc1:2a1c:8302                                                                                                                                                      
Mar 14 23:12:00 i-006ba90b993de97b9 consul[2131]:   error=                                                                                                                                    
Mar 14 23:12:00 i-006ba90b993de97b9 consul[2131]:   | 1 error occurred:                                                                                                                       
Mar 14 23:12:00 i-006ba90b993de97b9 consul[2131]:   | \t* Failed to resolve i-09a493e6d3274ff9b.aws-eu-west-1/2a05:d018:18a3:f201:7332:d29e:acc1:2a1c:8302: lookup 2a05:d018:18a3:f201:7332:d29e:acc1:2a1c:8302: no such host
Mar 14 23:12:00 i-006ba90b993de97b9 consul[2131]:   |                                                                                                                                                                                                                                                                                                                                       
Mar 14 23:12:00 i-006ba90b993de97b9 consul[2131]:                                                                                                                                                                                                                                                                                                                                           

The reason is that the LAN <-> WAN join flooder is instantiated with a router.FloodAddrFn called addrFn that currently cannot handle IPv6 addresses. The call fmt.Sprintf("%s:%d", addr, s.WanJoinPort) in addrFn incorrectly joins an IPv6 address without square brackets to a port, which will be called in serf_flooder.FloodJoins. The incorrect address will then be passed on at serf_flooder.FloodJoins with a call to memberlist.Join and then we confuse net.SplitHostPort in memberlist.resolveAddr.

I have a local branch that avoids this by calling net.JoinHostPort(host, port string) instead of fmt.Sprintf("%s:%d", addr, s.WanJoinPort), which will create valid network addresses of the the form [host]:port whenever the string host is an IPv6 address.

The PR with a fix will be available in a moment.


Reproduction Steps

  1. Create a Consul 1.20.5 cluster with 3 server nodes (see configuration below)
  2. Running journalctl -u consul on a server node will get constant warnings of the form [WARN] agent.server.memberlist.wan: memberlist: Failed to resolve i-02a3c94b46768b01f.aws-eu-west-1/2a05:d018:18a3:f202:64d9:1621:ab22:df45:8302: lookup 2a05:d018:18a3:f202:64d9:1621:ab22:df45:8302: no such host
  3. Running consul members -token=${CONSUL_MGMT_TOKEN} -wan on a server node will only show the local node as result
# consul members -token=${CONSUL_MGMT_TOKEN}
Node                 Address                                         Status  Type    Build   Protocol  DC             Partition  Segment
i-006ba90b993de97b9  [2a05:d018:18a3:f200:5cb3:7eb1:257e:1750]:8301  alive   server  1.20.5  2         aws-eu-west-1  default    <all>
i-02a3c94b46768b01f  [2a05:d018:18a3:f202:64d9:1621:ab22:df45]:8301  alive   server  1.20.5  2         aws-eu-west-1  default    <all>
i-09a493e6d3274ff9b  [2a05:d018:18a3:f201:7332:d29e:acc1:2a1c]:8301  alive   server  1.20.5  2         aws-eu-west-1  default    <all>

# consul members -token=${CONSUL_MGMT_TOKEN} -wan
Node                               Address                                         Status  Type    Build   Protocol  DC             Partition  Segment
i-006ba90b993de97b9.aws-eu-west-1  [2a05:d018:18a3:f200:5cb3:7eb1:257e:1750]:8302  alive   server  1.20.5  2         aws-eu-west-1  default    <all>

Consul info for both Client and Server

Server info
consul info -token=${CONSUL_MGMT_TOKEN} 
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 0
        services = 0
build:
        prerelease = 
        revision = 74efe419
        version = 1.20.5
        version_metadata = 
consul:
        acl = enabled
        bootstrap = false
        known_datacenters = 1
        leader = false
        leader_addr = [2a05:d018:18a3:f201:7332:d29e:acc1:2a1c]:8300
        server = true
raft:
        applied_index = 179
        commit_index = 179
        fsm_pending = 0
        last_contact = 24.651741ms
        last_log_index = 179
        last_log_term = 3
        last_snapshot_index = 0
        last_snapshot_term = 0
        latest_configuration = [{Suffrage:Voter ID:7229b91c-b89a-4b73-916a-7495c28c6297 Address:[2a05:d018:18a3:f201:7332:d29e:acc1:2a1c]:8300} {Suffrage:Voter ID:6f8a9bae-0441-6fe1-429b-f12a98095c4b Address:[2a05:d018:18a3:f202:64d9:1621:ab22:df45]:8300} {Suffrage:Voter ID:3a286536-45a5-baf5-005f-b9b0669da969 Address:[2a05:d018:18a3:f200:5cb3:7eb1:257e:1750]:8300}]
        latest_configuration_index = 0
        num_peers = 2
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Follower
        term = 3
runtime:
        arch = amd64
        cpu_count = 2
        goroutines = 135
        max_procs = 2
        os = linux
        version = go1.23.6
serf_lan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 3
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 8
        members = 3
        query_queue = 0
        query_time = 1
serf_wan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 2
        members = 1
        query_queue = 0
        query_time = 1
# Server agent HCL config
node_name = "NAMEHERE"
data_dir  = "/opt/consul"
log_level = "DEBUG"
disable_update_check = true
datacenter         = "aws-eu-west-1"
primary_datacenter = "aws-eu-west-1"
bootstrap_expect = 3
leave_on_terminate = true
server             = true
autopilot {
  min_quorum = 3
}
acl {
  enabled                  = true
  default_policy           = "deny"
  down_policy              = "extend-cache"
  enable_token_replication = true
  enable_token_persistence = true
}
client_addr = "[::]"
bind_addr = "[::]"
ports = {
  server   = 8300
  serf_lan = 8301
  serf_wan = 8302
  http     = 8500
  https    = 8501
  grpc     = -1
  grpc_tls = -1
  dns      = -1
}
advertise_addr = "{{ GetPublicInterfaces | include `type` `IPv6` | limit 1 | attr `address` }}"
retry_join = ["provider=aws tag_key=consul-server tag_value=myserver"]
tls {
}
auto_encrypt {
  allow_tls = true
}
telemetry {
  prometheus_retention_time = "48h"
  disable_hostname          = true
  disable_per_tenancy_usage_metrics = true
}
limits {
  rpc_max_conns_per_client  = 100
  http_max_conns_per_client = 200
}
ui_config {
  enabled = true
}
audit {
  enabled = false
}

Operating system and Environment details

  • Fedora release 41
  • Kernel 6.13.5-200.fc41.x86_64

Log Fragments

See Overview of the Issue above

tkren added a commit to tkren/consul that referenced this issue Mar 14, 2025
When `advertise_addr` is an IPv6 address, Consul WAN federation breaks
and we get a constant stream of warnings:
```
[WARN]  agent.server.memberlist.wan: memberlist: Failed to resolve
i-02a3c94b46768b01f.aws-eu-west-1/2a05:d018:18a3:f202:64d9:1621:ab22:df45:8302:
lookup 2a05:d018:18a3:f202:64d9:1621:ab22:df45:8302: no such host
```

The LAN <-> WAN join flooder creates IPv6 addresses without square
brackets with `fmt.Sprintf("%s:%d", addr, s.WanJoinPort)`, which
confuses `net.SplitHostPort` in `memberlist.resolveAddr`.

Fixes hashicorp#22225
@tkren tkren linked a pull request Mar 14, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant