Skip to content

feat: support for Oracle OCI Platform#2058

Open
zshbleaker wants to merge 13 commits intoTEN-framework:mainfrom
zshbleaker:oracle-speech-ai
Open

feat: support for Oracle OCI Platform#2058
zshbleaker wants to merge 13 commits intoTEN-framework:mainfrom
zshbleaker:oracle-speech-ai

Conversation

@zshbleaker
Copy link

Includding:

Feat:

  1. New extension of Oracle ASR
  2. New extension of Oracle TTS
  3. Add custom headers in OpenAI Extension

Fix:

  1. Fix issues when using insecure Agora APP ID (no token)
  2. Fix issues when using GPT-5.1-chat as the previous default model gpt-4o is removed by OpenAI

@zshbleaker zshbleaker changed the title Support for Oracle Speech AI feat: support for Oracle OCI Platform Feb 22, 2026
@github-actions
Copy link

PR Review: feat/support-oracle-oci-platform

Thanks for this contribution adding Oracle OCI ASR and TTS extensions, along with OpenAI LLM2 improvements. The overall structure is solid, with good test coverage and thoughtful error handling. Below is my detailed review.


Critical Issues

1. Incomplete graph in voice_assistant_oracle

The property.json graph for voice_assistant_oracle is missing essential text data-flow connections. The defined connections cover audio routing (agora_rtc <-> stt <-> streamid_adapter) and control signals (main_control <- agora_rtc, main_control <- stt), but the following connections appear to be absent:

  • main_control -> llm (to send ASR text for LLM processing)
  • llm -> tts (to send LLM response for speech synthesis)
  • llm -> message_collector (for transcription display)

Without these, the oracle voice assistant pipeline will not function -- the llm and tts nodes are defined but never receive input. Please compare against the existing voice_assistant graph connections.


2. asyncio.Lock.acquire_nowait() is not a public API

In oracle_asr_python/extension.py:

if not self._reconnect_lock.acquire_nowait():

asyncio.Lock does not expose acquire_nowait() as a documented public method in the Python standard library. Relying on an undocumented/private method creates fragility across Python versions. The correct pattern for non-blocking async lock acquisition:

  • Use asyncio.Lock.locked() (the TOCTOU gap is acceptable in a single-threaded event loop since there is no preemption between locked() and subsequent operations in the same coroutine), or
  • Use a boolean flag guard instead

The test file test_reconnect_lock.py validates this pattern but does not guard against Python version incompatibility.


Moderate Issues

3. Substring match for reasoning models is too broad

In openai_llm2_python/openai.py:

if is_reasoning_model or "o1" in model_lower or "o3" in model_lower:

This would false-positive on model names containing o1 or o3 as substrings (e.g., "tool1", "pool3", "gpt-4o1"). Consider more precise matching:

import re
is_o_series = bool(re.search(r'\bo[13]\b', model_lower))
if is_reasoning_model or is_o_series:

Or at minimum: model_lower.startswith("o1") / model_lower.startswith("o3").

4. Using OCI SDK private internals in recognition.py

headers = signer._basic_signer.sign(...)

_basic_signer is a private attribute of OCI's Signer class. This is fragile and may break on OCI SDK updates. Check whether there is an officially supported method for computing WebSocket auth headers (e.g., via oci.auth.signers or the oci.signer utilities).

5. Empty default region in ASR property.json

"region": "${env:OCI_REGION|}"

If OCI_REGION is unset, region becomes an empty string, producing an invalid WebSocket URL: wss://realtime.aiservice..oci.oraclecloud.com. The TTS property.json correctly defaults to us-phoenix-1. The ASR should follow suit:

"region": "${env:OCI_REGION|us-ashburn-1}"

6. ReconnectManager reports FATAL_ERROR on retryable failures

In reconnect_manager.py, when an exception occurs during a reconnect attempt but retries remain, it reports FATAL_ERROR. Reporting FATAL_ERROR on an intermediate retry is misleading -- it implies the extension is permanently broken while reconnection is still being attempted. This should be NON_FATAL_ERROR. Reserve FATAL_ERROR for when max_attempts is exhausted.

7. Hardcoded model name in _build_model_details

In oracle_tts.py, any model name other than TTS_1_STANDARD silently resolves to TTS_2_NATURAL due to the else branch hardcoding model_name="TTS_2_NATURAL". Consider explicitly enumerating known models or raising a ValueError for unknown model names to surface configuration errors early.


Minor Issues

8. OracleASRConfig.update() bypasses pydantic validation

setattr on a pydantic BaseModel bypasses field validators. Confirm this behaves as expected with the pydantic version in use (pydantic v1 vs v2 differ significantly here).

9. Missing README.md files

Both extension manifest.json files include README.md in their package includes, but no README files were added. This creates a packaging gap.

10. URL-encoding of language code in recognition.py

The language code is appended to the URL without encoding (f"languageCode={self._language}"). While language codes are typically safe strings, using quote(self._language) is a defensive best practice consistent with how customizations already uses quote().

11. OpenAI LLM2 default model not updated in code

.env.example changes the default to gpt-5.1-chat-latest, but OpenAILLM2Config still defaults to gpt-4o in the Pydantic field definition. If the env var is not set, the fallback remains the now-deprecated model. Consider updating the Pydantic field default as well.


Positive Notes

  • The reconnect logic with exponential backoff is well-designed and has thorough unit test coverage (test_reconnect_manager.py, test_reconnect_lock.py).
  • _strip_wav_header in oracle_tts.py carefully handles both the streaming placeholder case and trailing metadata chunks, with comprehensive tests covering all edge cases.
  • Sensitive credential masking via encrypt() in to_json(sensitive_handling=True) is a good security practice.
  • _sanitize_default_headers in openai.py correctly blocks injection of api-key and proxy-authorization headers.
  • Using asyncio.to_thread() for the synchronous OCI API call avoids blocking the event loop.
  • Sampling parameters (temperature, top_p, etc.) being made Optional is the correct fix for LLM provider compatibility.

@zshbleaker
Copy link
Author

  1. Issue 1 (Incomplete graph in voice_assistant_oracle): No changes made -- both graphs have identical connections arrays. The data flows main_control -> llm, llm -> tts, and llm -> message_collector are not defined in either graph's connections. They are implemented programmatically by the main_python extension via direct ten_env API calls (_send_cmd_ex, _send_data). Recommend replying to the reviewer to clarify this is by design.
  2. Issue 2: extension.py -- Replaced asyncio.Lock.acquire_nowait() (undocumented API) with locked() guard + await acquire() pattern. This is TOCTOU-safe in a single-threaded async event loop. Updated test_reconnect_lock.py accordingly.
  3. Issue 3: openai.py -- Changed "o1" in model_lower to model_lower.startswith("o1") (same for "o3"), preventing false positives on model names like "tool1" or "pool3".
  4. Issue 4: recognition.py -- Replaced private signer._basic_signer.sign(...) with the public OCI Signer call API by constructing a requests.PreparedRequest and calling signer(prepared).
  5. Issue 5: Set sensible default regions when OCI_REGION env var is unset -- ASR property.json defaults to us-ashburn-1, TTS defaults to us-phoenix-1. Also fixed the same issue in the voice_assistant_oracle graph definition.
  6. Issue 6: reconnect_manager.py -- Changed intermediate retry failure error level from FATAL_ERROR to NON_FATAL_ERROR. FATAL_ERROR is now only reported when all retry attempts are exhausted.
  7. Issue 7: oracle_tts.py -- Added explicit model name validation with _KNOWN_MODELS set. Unknown model names now raise ValueError instead of silently falling back to TTS_2_NATURAL.
  8. Issue 8: config.py -- OracleASRConfig.update() now uses pydantic's model_validate() to ensure field validators are applied, instead of raw setattr.
  9. Issue 9: Created README.md files for both oracle_asr_python and oracle_tts_python extensions, documenting configuration parameters and environment variables.
  10. Issue 10: recognition.py -- Applied quote() to the languageCode URL parameter for defensive URL encoding, consistent with how customizations is already handled.
  11. Issue 11: openai.py -- Updated the default model from gpt-4o to gpt-4o-mini to align with .env.example and use a currently supported model.

@github-actions
Copy link

PR Review: feat: support for Oracle OCI Platform

Thanks for adding Oracle OCI support! The extensions are well-structured with good test coverage. Below are findings ranging from blocking issues to minor suggestions.


Critical / Functional

1. Incomplete graph connections in voice_assistant_oracle

The property.json graph is missing connections between the llm, message_collector, and the rest of the pipeline. The connections section only wires:

  • main_control <- ASR result from stt
  • agora_rtc <-> streamid_adapter <-> stt
  • agora_rtc <- audio from tts

But there are NO connections from:

  • main_control -> llm (chat completion requests)
  • llm -> tts (synthesize text)
  • llm/tts -> message_collector (transcript forwarding)

As-is, this graph will not function. Audio goes into ASR and stops there. Compare with the working voice_assistant graph for the missing edges.


2. Shared OCI_REGION env var with conflicting defaults

In the voice_assistant_oracle graph, ASR uses us-ashburn-1 as fallback while TTS uses us-phoenix-1 as fallback. The README for oracle_tts explicitly notes TTS may only be available in us-phoenix-1. Sharing a single OCI_REGION variable means:

  • If unset: ASR goes to us-ashburn-1, TTS goes to us-phoenix-1 (inconsistent)
  • If OCI_REGION=us-ashburn-1 is explicitly set, TTS will likely fail in that region

Consider separate env vars (OCI_ASR_REGION / OCI_TTS_REGION), or document clearly that OCI_REGION must be set to us-phoenix-1 when using TTS.


Bugs / Correctness

3. gpt-5.1-chat-latest model name inconsistency

.env.example sets OPENAI_MODEL=gpt-5.1-chat-latest, but openai.py changes the code default to gpt-4o-mini. These are inconsistent. Please verify that gpt-5.1-chat-latest is a valid current OpenAI model identifier, or align both to the same value.

4. on_close() callback can fire during intentional close()

In recognition.py, when close() cancels _message_task, the finally block of _message_handler fires callback.on_close(). In extension.py, on_close() then calls _handle_reconnect(). The guard (if not self.stopped) should prevent a spurious reconnect, but this relies on self.stopped being True before recognition.close() is called. If stop_connection() is ever called outside of a shutdown path, a reconnect loop could be triggered. Adding a suppress_callback flag to OracleASRRecognition.close() would make the intent explicit.

5. _strip_wav_header trailing chunk detection is fragile

The heuristic checks if bytes at the next position are ASCII letters or spaces to detect trailing WAV chunks. PCM audio data can contain ASCII byte values, producing false positives. Consider explicitly matching known chunk IDs (LIST, fact, id3, INFO) instead of the character range check.


Code Quality

6. OracleTTS.reset() is unused

The async reset() method in oracle_tts.py re-initializes the client but is never called from extension.py or anywhere else. Wire it up or remove it.

7. vendor() in TTS extension missing @OverRide

oracle_tts_python/extension.py defines vendor() without the @OverRide decorator while the ASR extension uses it consistently.

8. default_headers blocked list is incomplete

In openai.py _sanitize_default_headers, the blocked set is only api-key and proxy-authorization. The common variant x-api-key is not blocked. Also, authorization only triggers a warning (not a block), allowing callers to silently replace the built-in Bearer token. Consider whether this is intended.

9. Taskfile symlink step uses ln -snf silently

ln -snf silently replaces any existing symlink or real directory at the destination. Add a guard or at minimum a comment explaining the intent, to avoid masking errors if ten_packages/system already exists.


Minor / Nits

10. .gitignore PEM scope too narrow
ai_agents/.pem only covers the root of ai_agents/. Consider ai_agents/**/.pem.

11. key_file path not validated at startup
An os.path.isfile(key_file) check in start_connection would produce a clearer error than a cryptic OCI SDK exception when the file is missing.

12. Verify ORACLE_LLM_BASE_URL endpoint path
The URL in .env.example ends with /20231130/actions/v1. Please confirm this is the correct path for OpenAI-compatible inference on OCI Generative AI, and add a comment clarifying which service it targets.


Positives

  • Solid reconnect-with-backoff in ReconnectManager, well tested
  • Sensitivity masking in to_json(sensitive_handling=True) is well done
  • WAV header stripping correctly handles both streaming placeholder (size=0) and trailing metadata chunk edge cases
  • Thorough unit tests for error classification, lock pattern, config validation, and WAV parsing
  • _sanitize_default_headers is a thoughtful guard for the custom headers feature
  • Conditional stripping of sampling params for reasoning models (o1, o3, gpt-5.x) is a needed fix

The most important items to address before merge are #1 (incomplete graph connections) and #2 (OCI_REGION conflict), as they would prevent the oracle voice assistant from working out of the box.

@github-actions
Copy link

PR Review: feat: support for Oracle OCI Platform

Thanks for this well-structured contribution! The Oracle ASR/TTS extensions follow the existing patterns cleanly and include solid test coverage. Below are my findings, from critical to minor.


🚨 Critical — Functional

1. Incomplete graph connections in voice_assistant_oracle (property.json)

The oracle voice-assistant graph is missing several connections:

  • main_controlllm (send text queries)
  • llmtts (send LLM output for synthesis)
  • llmmessage_collector (chat transcription)

Without these connections the oracle assistant will hear the user (ASR works) but never call the LLM or produce speech. Compare with the existing voice_assistant graph in the same file to see the missing edges.


🐛 Bugs

2. OracleASRRecognitionCallback — abstract methods not enforced

recognition.py uses @abstractmethod but the class doesn't inherit from ABC:

# current (no enforcement)
class OracleASRRecognitionCallback:
    @abstractmethod
    async def on_open(self): ...

# fix
from abc import ABC, abstractmethod
class OracleASRRecognitionCallback(ABC):
    @abstractmethod
    async def on_open(self): ...

Python only enforces abstract methods when the class inherits from ABC (or ABCMeta). Without it a subclass can be instantiated without implementing all callbacks, causing silent failures at runtime.

3. godotenv.Overload() reverses env-var precedence (server/main.go)

The change from Load to Overload means the .env file now overwrites variables that are already set in the OS environment:

// before: OS env vars win over .env (standard 12-factor behaviour)
err := godotenv.Load()

// after: .env file silently shadows real env vars (breaks containers)
err := godotenv.Overload()

This is a subtle production footgun — Kubernetes/Docker deployments that inject secrets via env vars will have them silently overridden by a stale .env file. The fallback to ../.env is a nice developer UX improvement; consider combining it with the original Load behaviour:

if err := godotenv.Load(); err != nil {
    if err2 := godotenv.Load("../.env"); err2 != nil {
        slog.Warn("load .env file failed", "err", err)
    }
}

⚠️ Design / Code Quality

4. Missing license headers in oracle_asr_python

The TTS extension files have the standard Apache 2.0 header block but the ASR extension files (addon.py, config.py, const.py, extension.py, recognition.py, reconnect_manager.py) are missing it. Align with the project convention.

5. EVENT_TTS_INVALID_KEY_ERROR handling is duplicated in request_tts

In oracle_tts_python/extension.py, the EVENT_TTS_INVALID_KEY_ERROR branch inside request_tts manually re-implements the has_text_input_end check + send_tts_audio_end + finish_request logic that _handle_error_with_end already encapsulates. Consider delegating:

elif event == EVENT_TTS_INVALID_KEY_ERROR:
    error_msg = audio_chunk.decode("utf-8") if audio_chunk else "OCI authentication error"
    # send fatal error
    await self.send_tts_error(request_id=..., error=ModuleError(...FATAL_ERROR...))
    await self._handle_error_with_end(request_id, error_msg)  # handles audio_end
    return

6. default_headers allows overriding Authorization with only a warning

_sanitize_default_headers correctly blocks api-key and proxy-authorization, but merely warns when Authorization is overridden. Since the constructor always sets Authorization: Bearer <api_key>, a user-supplied Authorization header in default_headers will silently replace the real auth (the **safe_default_headers spread comes last). Either also block it, or at least make the warning message clearer about the consequence.

7. Default model changed to gpt-4o-mini without documentation

Changing the code default from gpt-4o to gpt-4o-mini is a capability downgrade for anyone relying on the implicit default. The PR description mentions gpt-4o was removed, but the .env.example now shows gpt-5.1-chat-latest. Consider:

  • Adding a brief note to the PR or README about the change
  • Using gpt-4o-mini only as the fallback in OpenAILLM2Config and relying on .env.example to guide users toward the intended default

8. ReconnectManager hard-codes MODULE_NAME_ASR

reconnect_manager.py is placed under oracle_asr_python but references MODULE_NAME_ASR (from const.py) in its error messages. If TTS ever needs the same manager pattern, the module name will be wrong. Consider accepting the module name as a constructor parameter.

9. Minor: reasoning-model check could be unified

The existing code already defines is_reasoning_model = model.startswith("gpt-5"). The PR appends another check:

if is_reasoning_model or model_lower.startswith("o1") or model_lower.startswith("o3"):

For clarity, fold the o1/o3 cases into the original definition:

is_reasoning_model = model_lower.startswith(("gpt-5", "o1", "o3"))

✅ Positives

  • The Agora insecure-token fix (token: nil instead of token: AppId) and the playground token || null fix are correct and well-scoped.
  • _strip_wav_header is thoughtfully documented, handles both the accurate and placeholder data chunk-size cases, and guards against 16-bit PCM alignment.
  • _sanitize_default_headers is a solid approach — explicit block-list with logging.
  • Making temperature, top_p, etc. Optional (default None) so they're omitted rather than sent as zero is the right fix for reasoning models.
  • Test coverage for config validation, language normalisation, reconnect backoff, and the asyncio lock pattern is thorough.
  • Adding ai_agents/*.pem to .gitignore is a good security hygiene addition.
  • asyncio.to_thread for the synchronous OCI TTS call avoids blocking the event loop.

Summary: Items 1–3 should be addressed before merge. Items 4–9 are strongly recommended quality improvements. Happy to discuss any of these!

@zshbleaker
Copy link
Author

zshbleaker commented Feb 26, 2026

All 18 fixes are complete with no linter errors. Here's the summary:

Critical / Bugs (1-4):

  1. Added llm, tts, message_collector graph connections to voice_assistant_oracle in property.json
  2. Added ABC inheritance to OracleASRRecognitionCallback so @abstractmethod is enforced
  3. Changed godotenv.Overload() back to godotenv.Load() in server/main.go to preserve OS env var precedence
  4. Added _closing flag in recognition.py to suppress on_close() callback during intentional close()

Design / Quality (5-13):

  1. Added Apache 2.0 license headers to all 6 oracle ASR Python files
  2. Refactored EVENT_TTS_INVALID_KEY_ERROR to delegate to _handle_error_with_end() with configurable error_code parameter
  3. Blocked authorization and x-api-key in _sanitize_default_headers (moved from warn to block)
  4. Changed default model from gpt-4o-mini to gpt-5.1-chat-latest to match .env.example
  5. Added module_name parameter to ReconnectManager.init() instead of hardcoding MODULE_NAME_ASR
  6. Unified reasoning model check -- folded o1/o3 into is_reasoning_model definition, removed duplicate check
  7. Aligned OCI_REGION default to us-phoenix-1 for both ASR and TTS (property.json, code, README)
  8. Added @OverRide decorator to vendor() in TTS extension
  9. Removed unused reset() method from oracle_tts.py

Minor (14-18):

  1. Changed .gitignore from ai_agents/.pem to ai_agents/**/.pem for recursive coverage
  2. Added os.path.isfile(key_file) validation in start_connection() for clearer errors
  3. Replaced ASCII range heuristic with known WAV chunk ID set in _strip_wav_header
  4. Added comment clarifying the Oracle LLM URL in .env.example
  5. Added comment explaining the ln -snf symlink intent in Taskfile.yml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants