Skip to content

Enhance context handling in Live API #93

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,37 @@ Project consists of:
- communication layer for processing audio in and out
- a boilerplate view for starting to build your apps and view logs

## Handling Context with Incomplete Turns

When working with the Live API, you might want to send contextual text data while continuing to stream real-time input. This is useful when you want to provide additional context to the model without ending the current conversation turn.

### Example: Sending Context Without Completing a Turn

```typescript
// Send context without completing the turn
client.sendContext([{ text: "The user is looking at a laptop." }]);

// Continue with real-time input (will force complete the turn if needed)
client.sendRealtimeInput([
{
mimeType: "audio/pcm;rate=16000",
data: audioData,
}
]);

// Check if there's an incomplete turn
if (client.hasIncompleteTurn()) {
// Manually complete a turn if needed
client.completeTurn();
}
```

### Differences Between send(), sendContext(), and sendRealtimeInput()

- `send(parts, turnComplete = true)`: Standard method to send text data. By default completes the turn.
- `sendContext(parts)`: A convenience method that calls `send(parts, false)`. Used when providing context without completing the turn.
- `sendRealtimeInput(chunks, completeTurn = true)`: Sends real-time media data. The `completeTurn` parameter determines if any incomplete turns should be automatically completed first.

## Available Scripts

In the project directory, you can run:
Expand Down
4 changes: 2 additions & 2 deletions src/components/control-tray/ControlTray.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ function ControlTray({
mimeType: "audio/pcm;rate=16000",
data: base64,
},
]);
], true);
};
if (connected && !muted && audioRecorder) {
audioRecorder.on("data", onData).on("volume", setInVolume).start();
Expand Down Expand Up @@ -131,7 +131,7 @@ function ControlTray({
ctx.drawImage(videoRef.current, 0, 0, canvas.width, canvas.height);
const base64 = canvas.toDataURL("image/jpeg", 1.0);
const data = base64.slice(base64.indexOf(",") + 1, Infinity);
client.sendRealtimeInput([{ mimeType: "image/jpeg", data }]);
client.sendRealtimeInput([{ mimeType: "image/jpeg", data }], true);
}
if (connected) {
timeoutId = window.setTimeout(sendVideoFrame, 1000 / 0.5);
Expand Down
20 changes: 20 additions & 0 deletions src/components/side-panel/SidePanel.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ export default function SidePanel() {
value: string;
label: string;
} | null>(null);
const [completeTurn, setCompleteTurn] = useState(true);
const inputRef = useRef<HTMLTextAreaElement>(null);

//scroll the log to the bottom when new logs come in
Expand Down Expand Up @@ -127,6 +128,25 @@ export default function SidePanel() {
/>
</div>
<div className={cn("input-container", { disabled: !connected })}>
<div className="turn-option">
<label>
<input
type="checkbox"
checked={completeTurn}
onChange={(e) => setCompleteTurn(e.target.checked)}
/>
<span>Complete turn</span>
</label>
{client.hasIncompleteTurn() && (
<button
className="complete-turn-button"
onClick={() => client.completeTurn()}
title="Force complete the current turn"
>
Complete turn
</button>
)}
</div>
<div className="input-content">
<textarea
className="input-area"
Expand Down
35 changes: 34 additions & 1 deletion src/components/side-panel/side-panel.scss
Original file line number Diff line number Diff line change
Expand Up @@ -136,13 +136,46 @@
}

.input-container {
height: 50px;
height: 80px;
flex-grow: 0;
flex-shrink: 0;
border-top: 1px solid var(--Neutral-20);
padding: 14px 25px;
overflow: hidden;

.turn-option {
display: flex;
align-items: center;
justify-content: space-between;
margin-bottom: 8px;
color: var(--Neutral-90);
font-size: 12px;

label {
display: flex;
align-items: center;
cursor: pointer;

input[type="checkbox"] {
margin-right: 5px;
}
}

.complete-turn-button {
background: var(--Neutral-20);
border: none;
color: var(--Neutral-90);
padding: 2px 8px;
border-radius: 4px;
font-size: 11px;
cursor: pointer;

&:hover {
background: var(--Neutral-30);
}
}
}

.input-content {
position: relative;
background: var(--Neutral-10);
Expand Down
57 changes: 56 additions & 1 deletion src/lib/multimodal-live-client.ts
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ export class MultimodalLiveClient extends EventEmitter<MultimodalLiveClientEvent
public ws: WebSocket | null = null;
protected config: LiveConfig | null = null;
public url: string = "";
private incompleteTurn: boolean = false;

public getConfig() {
return { ...this.config };
}
Expand Down Expand Up @@ -198,6 +200,7 @@ export class MultimodalLiveClient extends EventEmitter<MultimodalLiveClientEvent
if (isTurnComplete(serverContent)) {
this.log("server.send", "turnComplete");
this.emit("turncomplete");
this.incompleteTurn = false;
//plausible theres more to the message, continue
}

Expand Down Expand Up @@ -239,7 +242,7 @@ export class MultimodalLiveClient extends EventEmitter<MultimodalLiveClientEvent
/**
* send realtimeInput, this is base64 chunks of "audio/pcm" and/or "image/jpg"
*/
sendRealtimeInput(chunks: GenerativeContentBlob[]) {
sendRealtimeInput(chunks: GenerativeContentBlob[], completeTurn: boolean = true) {
let hasAudio = false;
let hasVideo = false;
for (let i = 0; i < chunks.length; i++) {
Expand All @@ -263,6 +266,21 @@ export class MultimodalLiveClient extends EventEmitter<MultimodalLiveClientEvent
? "video"
: "unknown";

// If we're in the middle of a turn that's not complete, send a ClientContentMessage
// with turnComplete=true first to close the previous turn
if (completeTurn && this.incompleteTurn) {
// Force-complete any previous turn by sending an empty ClientContentMessage
const completionMessage: ClientContentMessage = {
clientContent: {
turns: [],
turnComplete: true,
},
};
this._sendDirect(completionMessage);
this.log(`client.completeTurn`, "completing previous turn");
this.incompleteTurn = false;
}

const data: RealtimeInputMessage = {
realtimeInput: {
mediaChunks: chunks,
Expand Down Expand Up @@ -303,6 +321,18 @@ export class MultimodalLiveClient extends EventEmitter<MultimodalLiveClientEvent

this._sendDirect(clientContentRequest);
this.log(`client.send`, clientContentRequest);

// Update the incomplete turn state
this.incompleteTurn = !turnComplete;
}

/**
* Sends contextual text data without completing the turn.
* This is useful when you want to provide context to the model
* while continuing to stream realtime input.
*/
sendContext(parts: Part | Part[]) {
this.send(parts, false);
}

/**
Expand All @@ -316,4 +346,29 @@ export class MultimodalLiveClient extends EventEmitter<MultimodalLiveClientEvent
const str = JSON.stringify(request);
this.ws.send(str);
}

/**
* Checks if there's an incomplete turn and returns the current state.
*/
hasIncompleteTurn(): boolean {
return this.incompleteTurn;
}

/**
* Forcibly completes any ongoing turn. Use this if you need to ensure
* all incomplete turns are finished before starting something new.
*/
completeTurn(): void {
if (this.incompleteTurn) {
const completionMessage: ClientContentMessage = {
clientContent: {
turns: [],
turnComplete: true,
},
};
this._sendDirect(completionMessage);
this.log(`client.completeTurn`, "manually completing turn");
this.incompleteTurn = false;
}
}
}