feat(openai): add OpenAI STT provider support by nathanael-h · Pull Request #12 · bigbluebutton/bbb-livekit-stt

nathanael-h · 2026-03-03T09:30:25Z

As I run a self-hosted OpenAI-compatible https://speaches.ai/ Faster-Whisper server, I am interesting in using this instead of Gladia.

I open this PR as Draft because it is not ready to be reviewed. But as code is here, I think it is better for me to build in the open, rather than hidden on a local branch in my laptop. Also I want to be transparent regarding LLM usage, I used Claude to help me on this. Maintainers can edit this branch!

this PR adds an OpenAI STT provider backed by livekit-agents[openai]. A new STT_PROVIDER env var (default: "gladia") selects the backend at startup. When set to "openai", an OpenAiSttAgent is used instead of GladiaSttAgent. Both agents implement the same EventEmitter interface, so main.py requires only minimal changes (provider selection + using an active_stt_config for confidence thresholds).

LLM assertions that I need to very, read with caution

Here are some differences from the Gladia agent: - update_locale_for_user() stops and restarts the pipeline instead of calling stream.update_options() (not supported by the OpenAI plugin). - Confidence thresholds default to 0.0 because OpenAI STT does not report per-utterance confidence scores. - alternative.language may be None; fall back to original_lang so the locale-mapping logic does not break.

New env vars: OPENAI_API_KEY, OPENAI_STT_MODEL, OPENAI_BASE_URL, OPENAI_INTERIM_RESULTS, OPENAI_MIN_CONFIDENCE_FINAL/INTERIM. OPENAI_BASE_URL allows pointing at any OpenAI-compatible endpoint (e.g. a local faster-whisper server).

Related meta issue bigbluebutton/bigbluebutton#21059

The application only supports Gladia as the STT backend. Users who already have an OpenAI API key, or who run a self-hosted OpenAI-compatible Whisper server, cannot use the application without signing up for Gladia. Add an OpenAI STT provider backed by livekit-agents[openai]. A new STT_PROVIDER env var (default: "gladia") selects the backend at startup. When set to "openai", an OpenAiSttAgent is used instead of GladiaSttAgent. Both agents implement the same EventEmitter interface, so main.py requires only minimal changes (provider selection + using an active_stt_config for confidence thresholds). Key differences from the Gladia agent: - update_locale_for_user() stops and restarts the pipeline instead of calling stream.update_options() (not supported by the OpenAI plugin). - Confidence thresholds default to 0.0 because OpenAI STT does not report per-utterance confidence scores. - alternative.language may be None; fall back to original_lang so the locale-mapping logic does not break. New env vars: OPENAI_API_KEY, OPENAI_STT_MODEL, OPENAI_BASE_URL, OPENAI_INTERIM_RESULTS, OPENAI_MIN_CONFIDENCE_FINAL/INTERIM. OPENAI_BASE_URL allows pointing at any OpenAI-compatible endpoint (e.g. a local faster-whisper server).

prlanzarin · 2026-03-04T01:52:56Z

@nathanael-h Could you please confirm if you already sent in the signed Contributor License Agreement? See https://docs.bigbluebutton.org/support/faq.html#why-do-i-need-to-sign-a-contributor-license-agreement-to-contribute-source-code

Thanks in advance!

nathanael-h · 2026-03-05T15:33:06Z

Hello @prlanzarin I've just signed the CLA and received a confirmation email about this.

prlanzarin · 2026-03-09T15:11:31Z

@nathanael-h Over the weekend, I worked on a refactor that should allow adding multiple STT providers in an easier/cleaner way (at least one that reduces code duplication). See this dev branch: https://github.com/bigbluebutton/bbb-livekit-stt/tree/stt/refactor/generic-providers.

Feel free to rebase this PR against that branch and use it as target if you have the time. I think it'll make this PR leaner. Otherwise, let me know and I could look into it later.

The openai plugin's stream() always connects via WebSocket to the /realtime endpoint, which is not implemented by all OpanAI compatible backends. Switch to recognize() which uses the standard REST/audio/transcriptions endpoint instead. Audio is segmented into speech utterances using energy-based silence detection (RMS threshold) before each recognize() call. Also fixes a NoneType crash in the Redis message handler that occurred when a message arrived before the agent had connected to the room.

The livekit OpenAI plugin's recognize() uses the OpenAI Python SDK which constructs the URL as {base_url}/audio/transcriptions (no /v1/), causing 405 Method Not Allowed on backends like my-selfhosted-openwebui.com/api/. Replace with a direct aiohttp POST to {base_url}/v1/audio/transcriptions, matching the approach used in bbb-livekit-transcriber. Also manage the aiohttp session lifecycle within the agent.

nathanael-h · 2026-03-10T17:39:53Z

Oh nice, I looked quickly at your work to decouple this stt plugin from Gladia, it looks good. So on my side, I fixed the branch nathanael-h:feat/openai-stt of this PR, still targeting the main branch before you decoupled both. Now that I have reach a good milestone, I will create another branch and PR to rebase on top of your work. @prlanzarin are more commits expected in https://github.com/bigbluebutton/bbb-livekit-stt/tree/stt/refactor/generic-providers that could impact the adding of openai support? If no I will start "rebasing" on it.

prlanzarin · 2026-03-10T18:20:17Z

Oh nice, I looked quickly at your work to decouple this stt plugin from Gladia, it looks good. So on my side, I fixed the branch nathanael-h:feat/openai-stt of this PR, still targeting the main branch before you decoupled both. Now that I have reach a good milestone, I will create another branch and PR to rebase on top of your work. @prlanzarin are more commits expected in https://github.com/bigbluebutton/bbb-livekit-stt/tree/stt/refactor/generic-providers that could impact the adding of openai support? If no I will start "rebasing" on it.

No commits expected for now - go for it.

…dings Each speech segment was missing start_time/end_time on SpeechData, causing all transcripts to share the same transcriptId (open_time + 0.0). BBB's AudioCaptions model treated every utterance after the first as a same-ID update, returning empty text, which resulted in an empty VTT file in recordings even though live captions worked correctly.

prlanzarin · 2026-03-11T19:21:33Z

Superseded by #13

nathanael-h marked this pull request as draft March 3, 2026 09:31

prlanzarin self-requested a review March 3, 2026 12:14

nathanael-h added 2 commits March 10, 2026 18:12

nathanael-h mentioned this pull request Mar 11, 2026

feat(openai): add OpenAI STT provider with gpt-4o-transcribe support #13

Merged

prlanzarin closed this Mar 11, 2026

antobinary mentioned this pull request Mar 19, 2026

One-click screen sharing for moderators bigbluebutton/bigbluebutton#21231

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(openai): add OpenAI STT provider support#12

feat(openai): add OpenAI STT provider support#12
nathanael-h wants to merge 4 commits intobigbluebutton:developmentfrom
nathanael-h:feat/openai-stt

nathanael-h commented Mar 3, 2026

Uh oh!

prlanzarin commented Mar 4, 2026

Uh oh!

nathanael-h commented Mar 5, 2026

Uh oh!

prlanzarin commented Mar 9, 2026

Uh oh!

nathanael-h commented Mar 10, 2026

Uh oh!

prlanzarin commented Mar 10, 2026 •

edited

Loading

Uh oh!

prlanzarin commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nathanael-h commented Mar 3, 2026

Uh oh!

prlanzarin commented Mar 4, 2026

Uh oh!

nathanael-h commented Mar 5, 2026

Uh oh!

prlanzarin commented Mar 9, 2026

Uh oh!

nathanael-h commented Mar 10, 2026

Uh oh!

prlanzarin commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prlanzarin commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

prlanzarin commented Mar 10, 2026 •

edited

Loading