The current OrcaHello pipeline is designed to ingest live streaming audio. Audio segments are continuously pulled from S3, processed into 60-second WAV files, analyzed, and written to CosmosDB if there are detections.
This architecture does not currently expose a "send me a WAV and I'll return detections" API. We would like a lightweight way to analyze static audio files without setting up the full Azure/S3 ingestion flow.
Proposal
Expose a small API endpoint that accepts a WAV/FLAC, and returns time stamped hits. Outputs should match the live system, including confidence levels and a spectrogram representation.
Benefits / use cases
- testing model behavior
- comparing model performance against other models
- re-running false negatives to find blind spots
- live endpoint for an 'analyze this clip' UI feature
Skills needed
- Python / PyTorch
- ffmpeg / torchaudio
- FastAPI or ASP.NET
Originally discussed in orcasite#931
The current OrcaHello pipeline is designed to ingest live streaming audio. Audio segments are continuously pulled from S3, processed into 60-second WAV files, analyzed, and written to CosmosDB if there are detections.
This architecture does not currently expose a "send me a WAV and I'll return detections" API. We would like a lightweight way to analyze static audio files without setting up the full Azure/S3 ingestion flow.
Proposal
Expose a small API endpoint that accepts a WAV/FLAC, and returns time stamped hits. Outputs should match the live system, including confidence levels and a spectrogram representation.
Benefits / use cases
Skills needed
Originally discussed in orcasite#931