Allow text only replies from multimodal (realtime) agent #323

danielmahon · 2025-03-11T05:36:16Z

We should allow the realtime API to only respond in text, as it is natively supported.
https://platform.openai.com/docs/api-reference/realtime

Currently (If I'm interpreting the behavior right) LiveKit treats text responses as errors, even if the modality is set to text only.

Today I used patch-package to patch @livekit/[email protected] for the project I'm working on.

Here is the diff that solved my problem:

diff --git a/node_modules/@livekit/agents/dist/multimodal/multimodal_agent.d.ts b/node_modules/@livekit/agents/dist/multimodal/multimodal_agent.d.ts
index 3902cda..a6a3c1a 100644
--- a/node_modules/@livekit/agents/dist/multimodal/multimodal_agent.d.ts
+++ b/node_modules/@livekit/agents/dist/multimodal/multimodal_agent.d.ts
@@ -34,11 +34,12 @@ export declare class MultimodalAgent extends EventEmitter {
     linkedParticipant: RemoteParticipant | null;
     subscribedTrack: RemoteAudioTrack | null;
     readMicroTask: Promise<void> | null;
-    constructor({ model, chatCtx, fncCtx, maxTextResponseRetries, }: {
+    constructor({ model, chatCtx, fncCtx, maxTextResponseRetries, allowTextReplies, }: {
         model: RealtimeModel;
         chatCtx?: llm.ChatContext;
         fncCtx?: llm.FunctionContext;
         maxTextResponseRetries?: number;
+        allowTextReplies?: boolean;
     });
     get fncCtx(): llm.FunctionContext | undefined;
     set fncCtx(ctx: llm.FunctionContext | undefined);
diff --git a/node_modules/@livekit/agents/dist/multimodal/multimodal_agent.js b/node_modules/@livekit/agents/dist/multimodal/multimodal_agent.js
index b9f9f54..c61d375 100644
--- a/node_modules/@livekit/agents/dist/multimodal/multimodal_agent.js
+++ b/node_modules/@livekit/agents/dist/multimodal/multimodal_agent.js
@@ -26,17 +26,20 @@ class MultimodalAgent extends EventEmitter {
   readMicroTask = null;
   #textResponseRetries = 0;
   #maxTextResponseRetries;
+  #allowTextReplies = false;
   constructor({
     model,
     chatCtx,
     fncCtx,
-    maxTextResponseRetries = 5
+    maxTextResponseRetries = 5,
+    allowTextReplies = false,
   }) {
     super();
     this.model = model;
     this.#chatCtx = chatCtx;
     this.#fncCtx = fncCtx;
     this.#maxTextResponseRetries = maxTextResponseRetries;
+    this.#allowTextReplies = allowTextReplies;
   }
   #participant = null;
   #agentPublication = null;
@@ -186,7 +189,7 @@ class MultimodalAgent extends EventEmitter {
         this.#playingHandle = handle;
       });
       this.#session.on("response_content_done", (message) => {
-        if (message.contentType === "text") {
+        if (message.contentType === "text" && !this.#allowTextReplies) {
           if (this.#textResponseRetries >= this.#maxTextResponseRetries) {
             throw new Error(
               `The OpenAI Realtime API returned a text response after ${this.#maxTextResponseRetries} retries. Please try to reduce the number of text system or assistant messages in the chat context.`

This issue body was partially generated by patch-package.

The text was updated successfully, but these errors were encountered:

Fixes livekit#323

arthurblake · 2025-03-13T15:44:14Z

Thanks for the patch-package recommendation! I wouldn't have thought of that and it gives me a very clean way to get my PR #283 into my build process while I am waiting for team to review. I hope you don't have to wait as long as I have been waiting!

Chachamaru127 · 2025-05-02T06:00:45Z

I would like to be able to use only TTS and LLM in RealtimeAPI and use Caretsia for TTS.

danielmahon changed the title ~~Allow text only replies from multimodal (realtime) LLM~~ Allow text only replies from multimodal (realtime) agent Mar 11, 2025

danielmahon added a commit to danielmahon/agents-js that referenced this issue Mar 11, 2025

Allow text only replies from multimodal (realtime) agent

f4d5f5e

Fixes livekit#323

danielmahon linked a pull request Mar 11, 2025 that will close this issue

Allow text only replies from multimodal (realtime) agent #324

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow text only replies from multimodal (realtime) agent #323

Allow text only replies from multimodal (realtime) agent #323

danielmahon commented Mar 11, 2025 •

edited

Loading

arthurblake commented Mar 13, 2025

Chachamaru127 commented May 2, 2025

Allow text only replies from multimodal (realtime) agent #323

Allow text only replies from multimodal (realtime) agent #323

Comments

danielmahon commented Mar 11, 2025 • edited Loading

arthurblake commented Mar 13, 2025

Chachamaru127 commented May 2, 2025

danielmahon commented Mar 11, 2025 •

edited

Loading