Skip to content

fix: adjustment in audio transcription with official api #1556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 7, 2025

Conversation

splusoficial
Copy link

@splusoficial splusoficial commented Jun 5, 2025

Summary by Sourcery

Refine audio message handling by structuring payloads, standardizing media downloads, and integrating optional OpenAI speech-to-text transcription with improved error handling.

New Features:

  • Add optional speech-to-text transcription for audio messages using OpenAI after media download

Bug Fixes:

  • Correct media download requests to use only the Authorization header
  • Improve error logging and rethrow errors during media downloads

Enhancements:

  • Introduce a dedicated helper to build audio message JSON including PTT flags
  • Determine audio file extensions dynamically based on MIME type
  • Streamline payload construction to consistently merge contextInfo when present

Copy link
Contributor

sourcery-ai bot commented Jun 5, 2025

Reviewer's Guide

Refactored media download and audio message handling in WhatsApp business service to use the official API, integrated OpenAI speech-to-text for audio, improved error logging, and ensured correct audio file extensions.

Sequence Diagram for Modified Media Download from Meta API

sequenceDiagram
    participant BSS as BusinessStartupService
    participant AX as axios
    participant META as Meta API Server

    BSS->>AX: GET /<version>/<id> (to get media URL, Headers: Content-Type, Auth)
    AX->>META: HTTP GET Request
    META-->>AX: Response { data: { url: mediaFileUrl } }
    AX-->>BSS: Return { data: { url: mediaFileUrl } }

    BSS->>AX: GET mediaFileUrl (to download file, Headers: Auth only, responseType: arraybuffer)
    AX->>META: HTTP GET Request
    META-->>AX: Response (media file as arraybuffer)
    AX-->>BSS: Return media file data
Loading

Sequence Diagram for Audio Processing with Speech-to-Text (S3 Enabled)

sequenceDiagram
    actor User
    participant WP as WhatsAppPlatform
    participant BSS as BusinessStartupService
    participant META as Meta API Server
    participant S3 as MinioService/S3
    participant DB as PrismaRepository
    participant OAI as OpenAIService
    participant WS as WebhookSystem

    User->>WP: Sends audio message
    WP->>BSS: Receives audio message (received)
    BSS->>BSS: Calls messageAudioJson(received)
    BSS->>META: Download audio file
    META-->>BSS: Audio file data
    BSS->>S3: Upload audio file (buffer, fileName with correct extension)
    S3-->>BSS: mediaUrl
    BSS->>DB: findFirst OpenAI settings (instanceId)
    DB-->>BSS: openAiDefaultSettings
    alt OpenAI STT Enabled and Configured
        BSS->>OAI: speechToText(creds, { message: { mediaUrl, ... } })
        OAI-->>BSS: transcribedText
        BSS->>BSS: Update messageRaw with speechToText
    end
    BSS->>WS: sendDataWebhook(MESSAGES_UPSERT, messageRaw)
Loading

Sequence Diagram for Audio Processing with Speech-to-Text (S3 Disabled)

sequenceDiagram
    actor User
    participant WP as WhatsAppPlatform
    participant BSS as BusinessStartupService
    participant DB as PrismaRepository
    participant OAI as OpenAIService
    participant WS as WebhookSystem

    User->>WP: Sends audio message
    WP->>BSS: Receives audio message (received)
    BSS->>BSS: Calls messageAudioJson(received)
    BSS->>BSS: downloadMediaMessage(received.messages[0]) from Meta API
    BSS-->>BSS: Audio file buffer
    BSS->>BSS: Convert buffer to base64
    BSS->>DB: findFirst OpenAI settings (instanceId)
    DB-->>BSS: openAiDefaultSettings
    alt OpenAI STT Enabled and Configured
        BSS->>OAI: speechToText(creds, { message: { base64, ... } })
        OAI-->>BSS: transcribedText
        BSS->>BSS: Update messageRaw with speechToText
    end
    BSS->>WS: sendDataWebhook(MESSAGES_UPSERT, messageRaw)
Loading

Updated Class Diagram for BusinessStartupService

classDiagram
    class BusinessStartupService {
        -configService: ConfigService
        -logger: Logger
        -token: string
        -instanceId: string
        -prismaRepository: PrismaRepository
        -openaiService: OpenAIService
        +downloadMediaMessage(received: any): Promise<Buffer>  // Modified logic for headers
        -messageAudioJson(received: any): any  // New private method for audio message structure
        +onMessage(received: any): void // Represents main handler with updated audio processing & OpenAI STT
    }
    BusinessStartupService ..> axios : uses
    BusinessStartupService ..> ConfigService : uses
    BusinessStartupService ..> Logger : uses
    BusinessStartupService ..> PrismaRepository : uses
    BusinessStartupService ..> OpenAIService : uses
Loading

File-Level Changes

Change Details Files
Refactor media download flow
  • Separate metadata fetch and file download into two axios calls
  • Use only the Authorization header when downloading media
  • Enhance error logging to include context and rethrow errors
src/api/integrations/channel/meta/whatsapp.business.service.ts
Introduce dedicated audio JSON builder and refine message parsing
  • Replace inline context merge with explicit if-statement
  • Add private messageAudioJson method with ptt flag
  • Switch messageRaw construction to use messageAudioJson for audio types
src/api/integrations/channel/meta/whatsapp.business.service.ts
Ensure correct audio file extensions based on mimetype
  • Add logic to assign .ogg, .mp3, or .m4a extension for audio media
src/api/integrations/channel/meta/whatsapp.business.service.ts
Integrate OpenAI speech-to-text for audio messages
  • Invoke openaiService.speechToText after mediaUrl is available in S3 branch
  • Invoke openaiService.speechToText in direct download branch
  • Remove obsolete OpenAI processing block at the end
src/api/integrations/channel/meta/whatsapp.business.service.ts

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@DavidsonGomes DavidsonGomes merged commit 614ad7c into EvolutionAPI:develop Jun 7, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants