nllb: Envoy health check unhealthy_threshold: 5 causes 25s detection lag on controller failure

## Problem

The Envoy configuration for the NLLB `apiserver` and `konnectivity` clusters uses:

```yaml
health_checks:
- tcp_health_check: {}
  timeout: 1s
  interval: 5s
  healthy_threshold: 3
  unhealthy_threshold: 5
```

`interval: 5s × unhealthy_threshold: 5` means Envoy takes **25 seconds** after a controller stops before it removes it from rotation. During those 25 seconds, konnectivity reconnection attempts from workers are routed to the dead controller and fail, delaying cluster recovery in HA deployments.

There is already a `FIXME` comment at the relevant location noting the health check needs improvement:

```go
// pkg/component/worker/nllb/envoy.go:375
// FIXME: Better use a proper HTTP based health check, but this needs certs and stuff...
- tcp_health_check: {}
```

This delay was also identified as a contributing factor to the `check-nllb-ipv6` / `check-nllb-traefik-ipv6` test flakiness (see #7742).

## Desired solution

Lower `unhealthy_threshold` from `5` to `2` for both clusters (`apiserver` and `konnectivity`) in `pkg/component/worker/nllb/envoy.go`:

```yaml
# Before
unhealthy_threshold: 5   # 5 × 5s = 25s to declare a controller dead

# After
unhealthy_threshold: 2   # 2 × 5s = 10s — still conservative, avoids flaps
```

This cuts the dead-controller detection window from 25s to 10s without risking false positives on transient network blips.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nllb: Envoy health check unhealthy_threshold: 5 causes 25s detection lag on controller failure #7743

Problem

Desired solution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

nllb: Envoy health check unhealthy_threshold: 5 causes 25s detection lag on controller failure #7743

Description

Problem

Desired solution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions