Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Realtime API from OpenAI working #545

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

franpb14
Copy link

@franpb14 franpb14 commented Nov 4, 2024

This is a very basic PR that we can use to iterate. Right now I'm using it with ActionCable in a rails app like this:

class OpenAiChannel < ApplicationCable::Channel
  def subscribed
    stream_from "open_ai_channel"
    @client = OpenAI::Client.new(access_token: ENV['OPENAI_API_KEY'])
    @client.real_time.on_message do |event|
      ActionCable.server.broadcast 'open_ai_channel', { message: event.data.force_encoding('UTF-8') }
    end
    @client.real_time.connect
  end

  def send_message(data)
    @client.real_time.send_event(data['event'])
  end
end

In the example data['event'] can be something like this:

{
  type: "response.create",
  response: {
    modalities: ["text", "audio"],
    instructions: "Please assist the user.",
  }
}

Maybe we could add more functions in order to facilitate event management. Something like I said in this comment.

Dependencies

  • faye-websocket
  • eventmachine

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?

Closes #524

end

def connect(model: "gpt-4o-realtime-preview-2024-10-01")
uri = "#{File.join(@client.websocket_uri_base, @client.api_version, 'realtime')}?model=#{model}"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably this uri shouldn't be here but I was not sure where to put it

@drnic
Copy link
Contributor

drnic commented Nov 7, 2024

Is it out of scope to add a sample sinatra app into the repo with some stimulusjs that demos the client-side setup of the websocket/connection to feed/receive messages to the backend?

Comment on lines +21 to +24
EM.run do
@websocket = Faye::WebSocket::Client.new(uri, nil, headers: openai_realtime_headers)
@websocket.on :message, @on_message
end
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should replace Eventmachine since it last release was 6 yeas ago. I think we could use async. I have tried and it works fine with something like this:

    Async do
      endpoint = Async::HTTP::Endpoint.parse(uri, alpn_protocols: Async::HTTP::Protocol::HTTP11.names)
      Async::WebSocket::Client.connect(endpoint, headers: @headers) do |connection|
        @websocket = connection

        while (message = connection.read)
          @on_message
        end
      end
    end

@franpb14
Copy link
Author

franpb14 commented Dec 9, 2024

@drnic Sorry the delay in the answer, I think it'd be a good thing to have, for me it was the hardest part. If we finally merge this I could do that app easily.

@ngelx
Copy link

ngelx commented Mar 3, 2025

What is the status of this PR? looks likes just what it is need to ease RealTime implementation. I can jump in and help to move this forward in case it needs some more work force.

In addition, I'm trying to understand the failure on CircleCI, but the logs seem to be private. Would it be possible to share the details of the failing job? Thank you!

@alexrudall
Copy link
Owner

Thanks @ngelx - Realtime is my top priority for v8.1 - would you be able to test this PR and see how useful you find it?

@franpb14
Copy link
Author

franpb14 commented Mar 3, 2025

@alexrudall @ngelx since I did this PR, openAI has introduced the possibility of doing it with webRTC https://platform.openai.com/docs/guides/realtime-webrtc and it works pretty well, by using it you don't need to set up Faye or other dependency, I'm not sure if this PR makes sense and we should only include the endpoint to get the ephemeral key or if we should have both possibilities. What do you think?

@ngelx
Copy link

ngelx commented Mar 4, 2025

@alexrudall @ngelx since I did this PR, openAI has introduced the possibility of doing it with webRTC https://platform.openai.com/docs/guides/realtime-webrtc and it works pretty well, by using it you don't need to set up Faye or other dependency, I'm not sure if this PR makes sense and we should only include the endpoint to get the ephemeral key or if we should have both possibilities. What do you think?

Why not both?

As you mentioned, the WebRTC implementation only requires exposing the ephemeral key endpoint, leaving the rest to the client. Since it's part of the API, it makes sense to support it.

On the other hand, the WebSocket implementation requires more backend work but simplifies the client-side integration. It's also officially beta supported by other SDKs (e.g., openai-python realtime api).

To sum up, @franpb14 raises a valid point by suggesting WebRTC. Maybe @alexrudall already has plans to support both options?

@alexrudall
Copy link
Owner

@alexrudall @ngelx since I did this PR, openAI has introduced the possibility of doing it with webRTC https://platform.openai.com/docs/guides/realtime-webrtc and it works pretty well, by using it you don't need to set up Faye or other dependency, I'm not sure if this PR makes sense and we should only include the endpoint to get the ephemeral key or if we should have both possibilities. What do you think?

Why not both?

As you mentioned, the WebRTC implementation only requires exposing the ephemeral key endpoint, leaving the rest to the client. Since it's part of the API, it makes sense to support it.

On the other hand, the WebSocket implementation requires more backend work but simplifies the client-side integration. It's also officially beta supported by other SDKs (e.g., openai-python realtime api).

To sum up, @franpb14 raises a valid point by suggesting WebRTC. Maybe @alexrudall already has plans to support both options?

Agree I think. Although I need to understand it better. If there's one way that makes it easier and simpler for the user I prefer to support that and only that, even if it's more work in the gem, but in this case maybe both make sense. Have you seen this thread?

@ngelx
Copy link

ngelx commented Mar 20, 2025

Sorry for the delay on the reply. I did try this fork and did work, but in particular for my application, was too laggy. So i went for the WebRTC implementation.

I did a separate PR #582 but following the structure of this one so they can be easily merged. The code is quite simple. The heavy part is in the web client, but i guess that is what WebRTC is all about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Realtime
4 participants