Have you ever had a real conversation with a local LLM? Or even taken a VoIP (SIP) phone call with one? #576

mrs83 · 2025-10-30T11:27:58Z

mrs83
Oct 30, 2025

Check out Kurtis E1: A Fully On-Device MLX Voice Agent.

GitHub Repo: https://github.com/ethicalabs-ai/Kurtis-E1-MLX-Voice-Agent
Demo Video 1: https://www.youtube.com/watch?v=k6BbF0262I8
Demo Video 2: https://www.youtube.com/shorts/w-YSCyKTR78

The entire stack runs on-device, leveraging MLX-LM on Apple Silicon:

Whisper for real-time STT
WebRTC VAD for voice activity detection
A custom-tuned Qwen #LLM (Kurtis E1)
ethicalabs/Kurtis-E1.1-Qwen2.5-3B-Instruct
Coqui XTTSv2 for natural speech

This showcases the power of local AI/ML. I am also actively developing the SIP #VoIP integration (now in testing). The goal? To let you take a phone call and talk directly with your private agent, even without a computer or internet connection.

While Kurtis isn't built for math/coding, it shows a valuable path forward for on-device workflows.

We are actively looking for partners and clients to build out these POCs into real-world use cases.

https://www.ethicalabs.ai/ isn't a startup. We are not looking for VCs, equity deals, or grants: we're an open-source project.

If you like the R&D, you can support the R&D directly: https://github.com/sponsors/ethicalabs-ai?frequency=one-time

Rizzkhn · 2026-01-14T13:55:47Z

Rizzkhn
Jan 14, 2026

This is really impressive work. A fully on-device voice agent like Kurtis E1 shows how powerful local AI has become, especially with privacy-first, offline use cases. The stack choice makes a lot of sense, and SIP VoIP integration sounds like a big step forward. Even if it’s not aimed at math or coding, it’s a strong proof of what on-device workflows can look like. Wishing you success in finding the right partners to turn these POCs, including OCA0188 error, into real-world applications.

0 replies

pchero · 2026-04-03T15:15:10Z

pchero
Apr 3, 2026

Really interesting work on Kurtis E1! The on-device stack (Whisper STT → LLM → Coqui TTS) is a solid foundation.

For the SIP/VoIP integration you're testing — one of the trickier parts is getting the RTP audio in and out of the local pipeline without introducing too much latency. A few things that tend to matter:

Audio path considerations:

Use ulaw (G.711 μ-law) or alaw for SIP — 8kHz codec, low processing overhead, widely supported by carriers
VAD (Voice Activity Detection) before feeding audio to Whisper avoids unnecessary inference on silence
Buffer sizes: too small → choppy STT; too large → latency spikes. 20ms frames tend to be a good starting point

SIP stack options for Python/local integration:

pjsua2 (PJSIP Python bindings) for full SIP UA implementation
baresip with a custom module if you want a cleaner C boundary
Or offload SIP/RTP entirely to a CPaaS

On that last point — VoIPBin is built specifically for AI agents and handles the RTP/STT/TTS layer on its side, so the local model only needs to process text over a simple API. That could help isolate the SIP complexity while you focus on the on-device LLM side. It also supports Direct Hash SIP URIs so you can test without provisioning a real phone number.

Either way, good luck with the integration — SIP VoIP + local LLM is a genuinely useful combination for privacy-sensitive use cases.

(Disclosure: I work on VoIPBin, but the advice above applies regardless of which approach you take.)

1 reply

mrs83 Apr 4, 2026
Author

Either way, good luck with the integration — SIP VoIP + local LLM is a genuinely useful combination for privacy-sensitive use cases.

(Disclosure: I work on VoIPBin, but the advice above applies regardless of which approach you take.)

thanks for the advice! I’ll definitely take a look at VoIPBin

pchero · 2026-04-04T15:14:12Z

pchero
Apr 4, 2026

This is a fascinating project — running the entire voice agent stack on-device with Apple Silicon is genuinely impressive, and the SIP/VoIP integration path you're exploring is the right direction for making it practically usable.

A few thoughts on the SIP integration side:

On-device SIP client challenges:

Handling NAT traversal (STUN/TURN) from a mobile/laptop context
Managing codec negotiation (G.711, Opus, etc.) between your local stack and the SIP trunk
Keeping the audio pipeline tight enough that Whisper inference doesn't block RTP keepalives

Architecture suggestion for the SIP layer:

Instead of embedding a full SIP UA in the same process as the AI engine, it's worth considering a split: a lightweight SIP/RTP gateway that connects to the trunk and streams audio to a local socket, with Kurtis consuming from that socket. This isolates the latency-sensitive SIP/RTP work from the inference pipeline.

For outbound call scenarios (or if you want to expose the agent on a stable SIP URI without managing a PBX), VoIPBin provides Direct Hash SIP URIs (sip:direct.<hash>@sip.voipbin.net) — an agent can register there and receive calls without needing a phone number or local SIP server. The media offloading model works well for cloud scenarios, though for your on-device privacy focus you'd want the purely local path.

The on-device + SIP combination has real potential for private enterprise deployments. Looking forward to seeing the VoIP integration land — will you be using baresip or a custom UA for the SIP stack?

1 reply

mrs83 Apr 4, 2026
Author

The on-device + SIP combination has real potential for private enterprise deployments. Looking forward to seeing the VoIP integration land — will you be using baresip or a custom UA for the SIP stack?

The SIP stack was on a consumer router (FritzBox), but unfortunately it is no longer at my disposal. I am considering baresip. Currently, development of the Kurtis E1 Voice Agent is not very active, as we are focusing on buding products while trying to train a new LLM from scratch. I will probably push next week the UI/UX, expanding the Linux compatibility

pchero · 2026-04-04T23:03:40Z

pchero
Apr 4, 2026

Really interesting project! The SIP/VoIP integration you're working on is exactly the hard part. For anyone trying to replicate or extend this kind of setup, VoIPBin is an open-source CPaaS that simplifies the phone call side of an AI voice agent.

The key feature for a setup like yours is Media Offloading — rather than having to build the full RTP pipeline and STT integration yourself, VoIPBin handles the audio transport layer and exposes a clean API for your LLM to interact with. Your agent just sends/receives text.

Getting started is instant — no OTP or manual verification:

# Get an accesskey
curl -X POST https://api.voipbin.net/v1.0/auth/signup   -H "Content-Type: application/json"   -d '{"username":"kurtis-agent","password":"secret"}'

You also don't need a real phone number to test — use a Direct Hash SIP URI:

sip:direct.<your-hash>@sip.voipbin.net

For on-device / local deployments like Kurtis E1, this means the SIP signaling is handled externally by VoIPBin while your MLX models handle the actual reasoning — which keeps the on-device compute budget focused on the LLM/STT/TTS stack.

Go SDK: go get github.com/voipbin/voipbin-go
Repo: https://github.com/voipbin/monorepo

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have you ever had a real conversation with a local LLM? Or even taken a VoIP (SIP) phone call with one? #576

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Have you ever had a real conversation with a local LLM? Or even taken a VoIP (SIP) phone call with one? #576

Uh oh!

mrs83 Oct 30, 2025

Replies: 4 comments · 2 replies

Uh oh!

Uh oh!

Rizzkhn Jan 14, 2026

Uh oh!

pchero Apr 3, 2026

Uh oh!

mrs83 Apr 4, 2026 Author

Uh oh!

pchero Apr 4, 2026

Uh oh!

mrs83 Apr 4, 2026 Author

Uh oh!

pchero Apr 4, 2026

mrs83
Oct 30, 2025

Replies: 4 comments 2 replies

Rizzkhn
Jan 14, 2026

pchero
Apr 3, 2026

mrs83 Apr 4, 2026
Author

pchero
Apr 4, 2026

mrs83 Apr 4, 2026
Author

pchero
Apr 4, 2026