fix: adjustment in audio transcription with official api #1556

splusoficial · 2025-06-05T23:33:14Z

Summary by Sourcery

Refine audio message handling by structuring payloads, standardizing media downloads, and integrating optional OpenAI speech-to-text transcription with improved error handling.

New Features:

Add optional speech-to-text transcription for audio messages using OpenAI after media download

Bug Fixes:

Correct media download requests to use only the Authorization header
Improve error logging and rethrow errors during media downloads

Enhancements:

Introduce a dedicated helper to build audio message JSON including PTT flags
Determine audio file extensions dynamically based on MIME type
Streamline payload construction to consistently merge contextInfo when present

sourcery-ai · 2025-06-05T23:33:18Z

Reviewer's Guide

Refactored media download and audio message handling in WhatsApp business service to use the official API, integrated OpenAI speech-to-text for audio, improved error logging, and ensured correct audio file extensions.

Sequence Diagram for Modified Media Download from Meta API

sequenceDiagram
    participant BSS as BusinessStartupService
    participant AX as axios
    participant META as Meta API Server

    BSS->>AX: GET /<version>/<id> (to get media URL, Headers: Content-Type, Auth)
    AX->>META: HTTP GET Request
    META-->>AX: Response { data: { url: mediaFileUrl } }
    AX-->>BSS: Return { data: { url: mediaFileUrl } }

    BSS->>AX: GET mediaFileUrl (to download file, Headers: Auth only, responseType: arraybuffer)
    AX->>META: HTTP GET Request
    META-->>AX: Response (media file as arraybuffer)
    AX-->>BSS: Return media file data

Sequence Diagram for Audio Processing with Speech-to-Text (S3 Enabled)

sequenceDiagram
    actor User
    participant WP as WhatsAppPlatform
    participant BSS as BusinessStartupService
    participant META as Meta API Server
    participant S3 as MinioService/S3
    participant DB as PrismaRepository
    participant OAI as OpenAIService
    participant WS as WebhookSystem

    User->>WP: Sends audio message
    WP->>BSS: Receives audio message (received)
    BSS->>BSS: Calls messageAudioJson(received)
    BSS->>META: Download audio file
    META-->>BSS: Audio file data
    BSS->>S3: Upload audio file (buffer, fileName with correct extension)
    S3-->>BSS: mediaUrl
    BSS->>DB: findFirst OpenAI settings (instanceId)
    DB-->>BSS: openAiDefaultSettings
    alt OpenAI STT Enabled and Configured
        BSS->>OAI: speechToText(creds, { message: { mediaUrl, ... } })
        OAI-->>BSS: transcribedText
        BSS->>BSS: Update messageRaw with speechToText
    end
    BSS->>WS: sendDataWebhook(MESSAGES_UPSERT, messageRaw)

Sequence Diagram for Audio Processing with Speech-to-Text (S3 Disabled)

sequenceDiagram
    actor User
    participant WP as WhatsAppPlatform
    participant BSS as BusinessStartupService
    participant DB as PrismaRepository
    participant OAI as OpenAIService
    participant WS as WebhookSystem

    User->>WP: Sends audio message
    WP->>BSS: Receives audio message (received)
    BSS->>BSS: Calls messageAudioJson(received)
    BSS->>BSS: downloadMediaMessage(received.messages[0]) from Meta API
    BSS-->>BSS: Audio file buffer
    BSS->>BSS: Convert buffer to base64
    BSS->>DB: findFirst OpenAI settings (instanceId)
    DB-->>BSS: openAiDefaultSettings
    alt OpenAI STT Enabled and Configured
        BSS->>OAI: speechToText(creds, { message: { base64, ... } })
        OAI-->>BSS: transcribedText
        BSS->>BSS: Update messageRaw with speechToText
    end
    BSS->>WS: sendDataWebhook(MESSAGES_UPSERT, messageRaw)

Updated Class Diagram for BusinessStartupService

classDiagram
    class BusinessStartupService {
        -configService: ConfigService
        -logger: Logger
        -token: string
        -instanceId: string
        -prismaRepository: PrismaRepository
        -openaiService: OpenAIService
        +downloadMediaMessage(received: any): Promise<Buffer>  // Modified logic for headers
        -messageAudioJson(received: any): any  // New private method for audio message structure
        +onMessage(received: any): void // Represents main handler with updated audio processing & OpenAI STT
    }
    BusinessStartupService ..> axios : uses
    BusinessStartupService ..> ConfigService : uses
    BusinessStartupService ..> Logger : uses
    BusinessStartupService ..> PrismaRepository : uses
    BusinessStartupService ..> OpenAIService : uses

File-Level Changes

Change	Details	Files
Refactor media download flow	Separate metadata fetch and file download into two axios calls Use only the Authorization header when downloading media Enhance error logging to include context and rethrow errors	`src/api/integrations/channel/meta/whatsapp.business.service.ts`
Introduce dedicated audio JSON builder and refine message parsing	Replace inline context merge with explicit if-statement Add private messageAudioJson method with ptt flag Switch messageRaw construction to use messageAudioJson for audio types	`src/api/integrations/channel/meta/whatsapp.business.service.ts`
Ensure correct audio file extensions based on mimetype	Add logic to assign .ogg, .mp3, or .m4a extension for audio media	`src/api/integrations/channel/meta/whatsapp.business.service.ts`
Integrate OpenAI speech-to-text for audio messages	Invoke openaiService.speechToText after mediaUrl is available in S3 branch Invoke openaiService.speechToText in direct download branch Remove obsolete OpenAI processing block at the end	`src/api/integrations/channel/meta/whatsapp.business.service.ts`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

fix: adjustment in audio transcription with official api

77b3b33

DavidsonGomes merged commit 614ad7c into EvolutionAPI:develop Jun 7, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: adjustment in audio transcription with official api #1556

fix: adjustment in audio transcription with official api #1556

Uh oh!

splusoficial commented Jun 5, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Jun 5, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Uh oh!

Uh oh!

fix: adjustment in audio transcription with official api #1556

fix: adjustment in audio transcription with official api #1556

Uh oh!

Conversation

splusoficial commented Jun 5, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence Diagram for Modified Media Download from Meta API

Sequence Diagram for Audio Processing with Speech-to-Text (S3 Enabled)

Sequence Diagram for Audio Processing with Speech-to-Text (S3 Disabled)

Updated Class Diagram for BusinessStartupService

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Uh oh!

Uh oh!

splusoficial commented Jun 5, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Jun 5, 2025 •

edited

Loading