Skip to content

[BUG] Prometheus metric livekit_egress_available is showing misleading information #1110

@sluceno

Description

@sluceno

Describe the bug
No matter what cpu requests, or costs I configure for my egress, that with 1 request the egress is shows as NOT available.

Official documentation states livekit_egress_available is computed by calculating the cpu usage https://docs.livekit.io/transport/self-hosting/egress/#ensuring-availability

But looking at the code, we found:

	promNodeAvailable := prometheus.NewGaugeFunc(prometheus.GaugeOpts{
		Namespace:   "livekit",
		Subsystem:   "egress",
		Name:        "available",
		ConstLabels: prometheus.Labels{"node_id": m.nodeID, "cluster_id": m.clusterID},
	}, m.promIsIdle)

where m.promIsIdle:

func (m *Monitor) promIsIdle() float64 {
	if m.svc.IsIdle() {
		return 1
	}
	return 0
}

where isIdle is:

func (s *Server) IsIdle() bool {
--
return s.activeRequests.Load() == 0
}

So, livekit_egress_available would be 1 if we have 0 active requests, or 0 (unavailable) if we have one or more active requests.

If I set autoscaling for the egress pods, will autoscale everytime it gets 1 request.
My egress has enough CPU resources to handle some requests in paralel

Egress Version
livekit/egress:v1.9.0

Logs

INFO    egress    stats/monitor.go:139    cpu available: 4.000000 max cost: 0.300000    {"nodeID": "XXX", "clusterID": ""}  

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions