You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Streams encoded by Mesa's av1_vaapi (AMD GPUs) decode reliably as files but freeze in Chromium-based WHEP receivers within a few seconds — first GOP looks fine, then the reference chain breaks and the picture stalls while RTP packets keep arriving. Streams encoded by av1_nvenc (NVIDIA) over the same pipeline work flawlessly.
The Chromium-side RTCInboundRtpStreamStats for the AMD case shows the symptom cleanly (12 s sample):
Tested on 1.17.1; AV1 RTMP→WHEP isn't usable on 1.18.x because of #5728, so 1.17.1 is the relevant version for AMD-VAAPI publishers.
Repro
Push an AV1 stream over Enhanced-RTMP from AMD hardware (Mesa ≥ 23 + gpu-screen-recorder's -k av1 -c flv defaults, or any other producer using av1_vaapi), consume via WHEP from Chromium / Electron. The freeze is encoder-shape-dependent, not content-dependent, and reproduces on a static UI just as well as a game capture.
Two small FLV samples (AV1 + Opus, ~10 s, captured from gpu-screen-recorder) illustrate the structural difference:
amd-vaapi-av1.flv (Mesa av1_vaapi) — freezes in WHEP
nvenc-av1.flv (NVIDIA av1_nvenc) — works in WHEP
Happy to share these via a non-public channel if useful — they're screen recordings so I'd rather not attach them to a public issue.
Root cause — two AV1 OBU-shape asymmetries
OBU type counts from ffmpeg -bsf:v trace_headers on each file:
OBU type
AMD-VAAPI
NVENC
1 — Sequence Header
6
4
2 — Temporal Delimiter
0
331
3 — Frame
504
331
4 — Redundant Frame Header
504
331
15 — Padding
29 (1232–8230 bytes each)
0
No OBU_TEMPORAL_DELIMITER. Spec-permitted in low-overhead bitstream form (AV1 §5.6), but libwebrtc-side AV1 RTP receivers tend to lean on them for frame-boundary detection. NVENC emits one per frame; Mesa's av1_vaapi emits none. dav1d-on-disk is tolerant either way (which is why the captured file decodes cleanly), but routed through MediaMTX→WHEP→Chromium the missing TempDelim costs ~80 % of frames on its own.
Large OBU_PADDING. Mesa pads the bitstream to meet CBR bitrate on low-motion content with multi-KB padding OBUs (1.2–8.2 KB each in our samples). Per spec these are no-ops, but in practice fragmenting an 8 KB padding OBU across ~6 RTP packets seems to break libwebrtc's reassembly. The receiver-stat pattern "first ~20 frames of each GOP decode, reference chain breaks until the next keyframe" matches a per-GOP reassembly breakdown rather than a per-frame decoder reject.
Either alone is enough to make AMD-AV1 unwatchable; together you get the ~96 %-failure decode rate.
Suggested fix
Normalising the temporal unit in internal/protocols/rtmp/to_stream.go's OnDataAV1 callback handles both at once, and the result is byte-for-byte identical to NVENC's input shape (modulo frame count). Patch:
Equivalent to running ffmpeg -bsf:v "av1_metadata=td=insert:delete_padding=1" on the bitstream before muxing. Verified end-to-end on a forked 1.17.1 image — AMD-VAAPI HQ streams that froze within 5 s now decode cleanly for arbitrary durations, with the receiver stats no longer showing the dav1d fallback or PLI flood.
Happy to send a PR if the approach looks right. Putting it inside OnDataAV1 keeps the fix scoped to the RTMP-ingest path; arguably nicer homes would be the AV1 packetizer in gortsplib/pkg/format/rtpav1 or even Pion's level, but that depends on where the project wants to draw the "publisher input that should be normalised" vs "packetizer/receiver responsibility to be tolerant" line.
Notes
The publisher (gpu-screen-recorder in our case) uses Enhanced-RTMP with AV_CODEC_FLAG_GLOBAL_HEADER, so the AV1 sequence header is in the FLV config message rather than inline at each IDR. Both encoders treat that the same way, so it's not the differentiator.
Receiver is Chromium 148 / Electron 42; same behaviour reproduces in stock Chromium and (anecdotally) Firefox.
Related: AV1 HLS Muxer Error not enough bytes #5728 is what keeps us on 1.17.x, so this fix would unblock AMD users on the last release where AV1-RTMP→WHEP works at all.
Summary
Streams encoded by Mesa's
av1_vaapi(AMD GPUs) decode reliably as files but freeze in Chromium-based WHEP receivers within a few seconds — first GOP looks fine, then the reference chain breaks and the picture stalls while RTP packets keep arriving. Streams encoded byav1_nvenc(NVIDIA) over the same pipeline work flawlessly.The Chromium-side
RTCInboundRtpStreamStatsfor the AMD case shows the symptom cleanly (12 s sample):Tested on 1.17.1; AV1 RTMP→WHEP isn't usable on 1.18.x because of #5728, so 1.17.1 is the relevant version for AMD-VAAPI publishers.
Repro
Push an AV1 stream over Enhanced-RTMP from AMD hardware (Mesa ≥ 23 +
gpu-screen-recorder's-k av1 -c flvdefaults, or any other producer usingav1_vaapi), consume via WHEP from Chromium / Electron. The freeze is encoder-shape-dependent, not content-dependent, and reproduces on a static UI just as well as a game capture.Two small FLV samples (AV1 + Opus, ~10 s, captured from
gpu-screen-recorder) illustrate the structural difference:amd-vaapi-av1.flv(Mesaav1_vaapi) — freezes in WHEPnvenc-av1.flv(NVIDIAav1_nvenc) — works in WHEPHappy to share these via a non-public channel if useful — they're screen recordings so I'd rather not attach them to a public issue.
Root cause — two AV1 OBU-shape asymmetries
OBU type counts from
ffmpeg -bsf:v trace_headerson each file:OBU_TEMPORAL_DELIMITER. Spec-permitted in low-overhead bitstream form (AV1 §5.6), but libwebrtc-side AV1 RTP receivers tend to lean on them for frame-boundary detection. NVENC emits one per frame; Mesa'sav1_vaapiemits none. dav1d-on-disk is tolerant either way (which is why the captured file decodes cleanly), but routed through MediaMTX→WHEP→Chromium the missing TempDelim costs ~80 % of frames on its own.OBU_PADDING. Mesa pads the bitstream to meet CBR bitrate on low-motion content with multi-KB padding OBUs (1.2–8.2 KB each in our samples). Per spec these are no-ops, but in practice fragmenting an 8 KB padding OBU across ~6 RTP packets seems to break libwebrtc's reassembly. The receiver-stat pattern "first ~20 frames of each GOP decode, reference chain breaks until the next keyframe" matches a per-GOP reassembly breakdown rather than a per-frame decoder reject.Either alone is enough to make AMD-AV1 unwatchable; together you get the ~96 %-failure decode rate.
Suggested fix
Normalising the temporal unit in
internal/protocols/rtmp/to_stream.go'sOnDataAV1callback handles both at once, and the result is byte-for-byte identical to NVENC's input shape (modulo frame count). Patch:Equivalent to running
ffmpeg -bsf:v "av1_metadata=td=insert:delete_padding=1"on the bitstream before muxing. Verified end-to-end on a forked 1.17.1 image — AMD-VAAPI HQ streams that froze within 5 s now decode cleanly for arbitrary durations, with the receiver stats no longer showing the dav1d fallback or PLI flood.Happy to send a PR if the approach looks right. Putting it inside
OnDataAV1keeps the fix scoped to the RTMP-ingest path; arguably nicer homes would be the AV1 packetizer ingortsplib/pkg/format/rtpav1or even Pion's level, but that depends on where the project wants to draw the "publisher input that should be normalised" vs "packetizer/receiver responsibility to be tolerant" line.Notes
gpu-screen-recorderin our case) uses Enhanced-RTMP withAV_CODEC_FLAG_GLOBAL_HEADER, so the AV1 sequence header is in the FLV config message rather than inline at each IDR. Both encoders treat that the same way, so it's not the differentiator.