Skip to content

Fix Zhipu batch ASR chunking#509

Merged
H-Chris233 merged 1 commit into
Open-Less:betafrom
H-Chris233:issue-508-zhipu-asr-chunking
May 20, 2026
Merged

Fix Zhipu batch ASR chunking#509
H-Chris233 merged 1 commit into
Open-Less:betafrom
H-Chris233:issue-508-zhipu-asr-chunking

Conversation

@H-Chris233
Copy link
Copy Markdown
Collaborator

@H-Chris233 H-Chris233 commented May 20, 2026

User description

Summary

  • Split Zhipu GLM-ASR batch uploads into 30-second PCM chunks before WAV encoding.
  • Keep other Whisper-compatible providers on the existing single-request path.
  • Concatenate chunk transcripts with CJK/ASCII/punctuation-aware spacing.

Official docs checked

  • Zhipu GLM-ASR transcription endpoint accepts multipart file + model at /api/paas/v4/audio/transcriptions.
  • Official limit: audio duration <= 30 seconds, file size <= 25 MB.
  • Current upload path already matches Bearer auth, WAV upload, model field, and JSON text response.

Validation

  • cargo test --manifest-path "/home/chris233/openless/openless-all/app/src-tauri/Cargo.toml" -- --test-threads=1
  • git diff --check
  • Subagent review: APPROVE

Fixes #508


PR Type

Bug fix, Tests


Description

  • Split batch ASR uploads by duration

  • Apply 30-second Zhipu limit

  • Join chunk transcripts intelligently

  • Add chunking and routing tests


Diagram Walkthrough

flowchart LR
  A["Dictation coordinator"] -- "passes provider limit" --> B["WhisperBatchASR"]
  B -- "splits PCM by duration" --> C["Multiple ASR requests"]
  C -- "merges chunk transcripts" --> D["Final transcript"]
Loading

File Walkthrough

Relevant files
Bug fix
whisper.rs
Add PCM chunking and transcript merge                                       

openless-all/app/src-tauri/src/asr/whisper.rs

  • Adds optional max_chunk_duration_ms to batch ASR requests.
  • Splits long PCM audio into duration-based chunks before upload.
  • Joins chunk transcripts with punctuation and CJK-aware spacing.
  • Expands tests for splitting, joining, and request flow.
+350/-7 
dictation.rs
Wire chunk limit into dictation setup                                       

openless-all/app/src-tauri/src/coordinator/dictation.rs

  • Supplies the active provider chunk limit when creating batch Whisper
    ASR.
  • Ensures Zhipu sessions use the 30-second upload cap.
+20/-2   
Enhancement
coordinator.rs
Route Zhipu to batch chunk limit                                                 

openless-all/app/src-tauri/src/coordinator.rs

  • Passes a provider-specific chunk limit into WhisperBatchASR::new.
  • Adds batch_asr_chunk_limit_ms for Zhipu only.
  • Verifies the limit mapping in coordinator tests.
+1/-0     

@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

508 - Partially compliant

Compliant requirements:

  • Split Zhipu GLM-ASR batch audio into chunks no longer than 30 seconds.
  • Send each chunk as a separate request for batch providers.
  • Concatenate the per-chunk transcripts into one final result.
  • Keep streaming providers on the existing single-request path.
  • Add tests covering chunking and provider routing behavior.

Non-compliant requirements:

  • None

Requires further human verification:

  • None
⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Mid-word split

The new chunking still splits purely by byte count, so a 30-second boundary can fall in the middle of a spoken word or phrase. In that case the two chunk transcripts are merged with an inserted space, which can produce incorrect output like Open Less or hel lo on continuous speech with no silence near the cut point.

fn split_pcm_by_duration(pcm: &[u8], max_chunk_duration_ms: Option<u64>) -> Vec<&[u8]> {
    let Some(max_chunk_duration_ms) = max_chunk_duration_ms else {
        return vec![pcm];
    };
    if max_chunk_duration_ms == 0 {
        return vec![pcm];
    }

    let samples_per_chunk = PCM_SAMPLE_RATE_HZ * max_chunk_duration_ms / 1000;
    let bytes_per_chunk = samples_per_chunk as usize * PCM_BYTES_PER_SAMPLE;
    if bytes_per_chunk == 0 || pcm.len() <= bytes_per_chunk {
        return vec![pcm];
    }

    pcm.chunks(bytes_per_chunk).collect()
}

fn join_transcript_chunks(chunks: &[String]) -> String {
    let mut joined = String::new();
    for chunk in chunks.iter().map(|chunk| chunk.trim()) {
        if chunk.is_empty() {
            continue;
        }
        if needs_chunk_separator(&joined, chunk) {
            joined.push(' ');
        }
        joined.push_str(chunk);
    }
    joined
}

fn needs_chunk_separator(current: &str, next: &str) -> bool {
    let Some(prev) = current.chars().last() else {
        return false;
    };
    let Some(first) = next.chars().next() else {
        return false;
    };

    if is_closing_punctuation(first) || is_opening_punctuation(prev) {
        return false;
    }
    if is_cjk(prev) && (is_cjk(first) || is_opening_punctuation(first)) {
        return false;
    }
    if is_cjk(first) && is_closing_punctuation(prev) {
        return false;
    }
    if is_cjk_punctuation(prev) && is_cjk(first) {
        return false;
    }
    true

@H-Chris233 H-Chris233 merged commit e889858 into Open-Less:beta May 20, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No audio splitting for batch ASR providers (e.g. GLM-ASR 30s limit)

1 participant