Skip to content

feat(openai): add OpenAI STT provider support#12

Closed
nathanael-h wants to merge 4 commits intobigbluebutton:developmentfrom
nathanael-h:feat/openai-stt
Closed

feat(openai): add OpenAI STT provider support#12
nathanael-h wants to merge 4 commits intobigbluebutton:developmentfrom
nathanael-h:feat/openai-stt

Conversation

@nathanael-h
Copy link
Copy Markdown

As I run a self-hosted OpenAI-compatible https://speaches.ai/ Faster-Whisper server, I am interesting in using this instead of Gladia.

I open this PR as Draft because it is not ready to be reviewed. But as code is here, I think it is better for me to build in the open, rather than hidden on a local branch in my laptop. Also I want to be transparent regarding LLM usage, I used Claude to help me on this. Maintainers can edit this branch!

this PR adds an OpenAI STT provider backed by livekit-agents[openai]. A new STT_PROVIDER env var (default: "gladia") selects the backend at startup. When set to "openai", an OpenAiSttAgent is used instead of GladiaSttAgent. Both agents implement the same EventEmitter interface, so main.py requires only minimal changes (provider selection + using an active_stt_config for confidence thresholds).

LLM assertions that I need to very, read with caution

Here are some differences from the Gladia agent: - update_locale_for_user() stops and restarts the pipeline instead of calling stream.update_options() (not supported by the OpenAI plugin). - Confidence thresholds default to 0.0 because OpenAI STT does not report per-utterance confidence scores. - alternative.language may be None; fall back to original_lang so the locale-mapping logic does not break.

New env vars: OPENAI_API_KEY, OPENAI_STT_MODEL, OPENAI_BASE_URL, OPENAI_INTERIM_RESULTS, OPENAI_MIN_CONFIDENCE_FINAL/INTERIM. OPENAI_BASE_URL allows pointing at any OpenAI-compatible endpoint (e.g. a local faster-whisper server).

Related meta issue bigbluebutton/bigbluebutton#21059

The application only supports Gladia as the STT backend. Users who
already have an OpenAI API key, or who run a self-hosted
OpenAI-compatible Whisper server, cannot use the application without
signing up for Gladia.

Add an OpenAI STT provider backed by livekit-agents[openai]. A new
STT_PROVIDER env var (default: "gladia") selects the backend at startup.
When set to "openai", an OpenAiSttAgent is used instead of GladiaSttAgent.
Both agents implement the same EventEmitter interface, so main.py
requires only minimal changes (provider selection + using an
active_stt_config for confidence thresholds).

Key differences from the Gladia agent:
- update_locale_for_user() stops and restarts the pipeline instead of
  calling stream.update_options() (not supported by the OpenAI plugin).
- Confidence thresholds default to 0.0 because OpenAI STT does not
  report per-utterance confidence scores.
- alternative.language may be None; fall back to original_lang so
  the locale-mapping logic does not break.

New env vars: OPENAI_API_KEY, OPENAI_STT_MODEL, OPENAI_BASE_URL,
OPENAI_INTERIM_RESULTS, OPENAI_MIN_CONFIDENCE_FINAL/INTERIM.
OPENAI_BASE_URL allows pointing at any OpenAI-compatible endpoint
(e.g. a local faster-whisper server).
@nathanael-h nathanael-h marked this pull request as draft March 3, 2026 09:31
@prlanzarin prlanzarin self-requested a review March 3, 2026 12:14
@prlanzarin
Copy link
Copy Markdown
Member

@nathanael-h Could you please confirm if you already sent in the signed Contributor License Agreement? See https://docs.bigbluebutton.org/support/faq.html#why-do-i-need-to-sign-a-contributor-license-agreement-to-contribute-source-code

Thanks in advance!

@nathanael-h
Copy link
Copy Markdown
Author

Hello @prlanzarin I've just signed the CLA and received a confirmation email about this.

@prlanzarin
Copy link
Copy Markdown
Member

@nathanael-h Over the weekend, I worked on a refactor that should allow adding multiple STT providers in an easier/cleaner way (at least one that reduces code duplication). See this dev branch: https://github.com/bigbluebutton/bbb-livekit-stt/tree/stt/refactor/generic-providers.

Feel free to rebase this PR against that branch and use it as target if you have the time. I think it'll make this PR leaner. Otherwise, let me know and I could look into it later.

The openai plugin's stream() always connects via WebSocket to the
/realtime endpoint, which is not implemented by all OpanAI compatible
backends. Switch to recognize() which uses the standard REST/audio/transcriptions endpoint instead.

Audio is segmented into speech utterances using energy-based silence
detection (RMS threshold) before each recognize() call. Also fixes a
NoneType crash in the Redis message handler that occurred when a message
arrived before the agent had connected to the room.
The livekit OpenAI plugin's recognize() uses the OpenAI Python SDK which
constructs the URL as {base_url}/audio/transcriptions (no /v1/), causing
405 Method Not Allowed on backends like my-selfhosted-openwebui.com/api/.

Replace with a direct aiohttp POST to {base_url}/v1/audio/transcriptions,
matching the approach used in bbb-livekit-transcriber. Also manage the
aiohttp session lifecycle within the agent.
@nathanael-h
Copy link
Copy Markdown
Author

Oh nice, I looked quickly at your work to decouple this stt plugin from Gladia, it looks good. So on my side, I fixed the branch nathanael-h:feat/openai-stt of this PR, still targeting the main branch before you decoupled both. Now that I have reach a good milestone, I will create another branch and PR to rebase on top of your work. @prlanzarin are more commits expected in https://github.com/bigbluebutton/bbb-livekit-stt/tree/stt/refactor/generic-providers that could impact the adding of openai support? If no I will start "rebasing" on it.

@prlanzarin
Copy link
Copy Markdown
Member

prlanzarin commented Mar 10, 2026

Oh nice, I looked quickly at your work to decouple this stt plugin from Gladia, it looks good. So on my side, I fixed the branch nathanael-h:feat/openai-stt of this PR, still targeting the main branch before you decoupled both. Now that I have reach a good milestone, I will create another branch and PR to rebase on top of your work. @prlanzarin are more commits expected in https://github.com/bigbluebutton/bbb-livekit-stt/tree/stt/refactor/generic-providers that could impact the adding of openai support? If no I will start "rebasing" on it.

No commits expected for now - go for it.

…dings

Each speech segment was missing start_time/end_time on SpeechData, causing
all transcripts to share the same transcriptId (open_time + 0.0). BBB's
AudioCaptions model treated every utterance after the first as a same-ID
update, returning empty text, which resulted in an empty VTT file in
recordings even though live captions worked correctly.
@prlanzarin
Copy link
Copy Markdown
Member

Superseded by #13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants