Skip to content

⚡ Bolt: Add Parakeet model warmup to reduce first-inference latency#66

Open
Whamp wants to merge 1 commit intomainfrom
bolt/parakeet-warmup-13605976518554018741
Open

⚡ Bolt: Add Parakeet model warmup to reduce first-inference latency#66
Whamp wants to merge 1 commit intomainfrom
bolt/parakeet-warmup-13605976518554018741

Conversation

@Whamp
Copy link
Copy Markdown
Owner

@Whamp Whamp commented Feb 6, 2026

User description

💡 What: Implemented ParakeetManager.warmup() and called it during ChirpApp initialization.
🎯 Why: The first transcription request was suffering from "cold start" latency due to ONNX Runtime lazy initialization.
📊 Impact: Reduces latency of the first user interaction by shifting initialization cost to startup.
🔬 Measurement: Verified via unit tests that warmup invokes transcribe. Benchmarking (simulated) confirms first-run penalty is paid at startup.


PR created automatically by Jules for task 13605976518554018741 started by @Whamp


PR Type

Enhancement


Description

  • Add warmup() method to ParakeetManager for model initialization

  • Call warmup during ChirpApp startup to reduce first-inference latency

  • Shift ONNX Runtime initialization cost from user interaction to startup

  • Add comprehensive unit test verifying warmup invokes transcribe correctly


Diagram Walkthrough

flowchart LR
  A["ChirpApp initialization"] -->|calls| B["ParakeetManager.warmup()"]
  B -->|runs dummy inference| C["ONNX Runtime initialization"]
  C -->|shifts cost to startup| D["Faster first user interaction"]
Loading

File Walkthrough

Relevant files
Enhancement
main.py
Invoke warmup during ChirpApp initialization                         

src/chirp/main.py

  • Call self.parakeet.warmup() after ParakeetManager initialization in
    ChirpApp.__init__
  • Ensures model is warmed up before any user interactions occur
+1/-0     
parakeet_manager.py
Implement warmup method for model initialization                 

src/chirp/parakeet_manager.py

  • Add warmup() method that runs dummy inference with zero-filled audio
  • Initializes ONNX Runtime internal buffers and optimizes execution
    graph
  • Includes error handling to log warnings if warmup fails without
    crashing
  • Uses debug logging to track warmup execution
+9/-0     
Tests
test_parakeet_manager.py
Add unit test for warmup functionality                                     

tests/test_parakeet_manager.py

  • Add test_warmup() unit test to verify warmup calls transcribe
  • Verify dummy audio has correct shape (16000,) and dtype (float32)
  • Mock transcribe to isolate warmup behavior and confirm invocation
+23/-0   
Documentation
bolt.md
Document Parakeet cold start optimization                               

.jules/bolt.md

  • Document learning about ONNX Runtime cold start latency issue
  • Record action taken to implement warmup for startup initialization
+4/-0     

Adds a `warmup()` method to `ParakeetManager` that runs a dummy inference during `ChirpApp` initialization. This shifts the ONNX Runtime initialization cost (graph optimization, memory allocation) from the first user interaction to application startup, improving perceived responsiveness.

Co-authored-by: Whamp <1115485+Whamp@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@qodo-code-review
Copy link
Copy Markdown

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Warmup exceptions swallowed: warmup() catches all exceptions and only logs a warning, which may mask real startup/model
failures and lacks traceback/context for effective debugging.

Referred Code
def warmup(self) -> None:
    """Run a dummy inference to initialize internal buffers and optimize execution graph."""
    self._logger.debug("Warming up Parakeet model...")
    dummy_audio = np.zeros(16_000, dtype=np.float32)
    try:
        self.transcribe(dummy_audio)
    except Exception as exc:
        self._logger.warning("Warmup failed (non-fatal): %s", exc)

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Allow warmup failures to crash startup

In the warmup method, remove the try...except block to allow exceptions to
propagate. This ensures the application fails fast at startup if the core
transcription functionality is broken, rather than starting in a non-functional
state.

src/chirp/parakeet_manager.py [126-133]

 def warmup(self) -> None:
-    """Run a dummy inference to initialize internal buffers and optimize execution graph."""
+    """
+    Run a dummy inference to initialize internal buffers and optimize the execution graph.
+
+    Raises:
+        Exception: If the underlying transcription call fails, indicating a problem
+                   with the model or runtime environment.
+    """
     self._logger.debug("Warming up Parakeet model...")
     dummy_audio = np.zeros(16_000, dtype=np.float32)
-    try:
-        self.transcribe(dummy_audio)
-    except Exception as exc:
-        self._logger.warning("Warmup failed (non-fatal): %s", exc)
+    self.transcribe(dummy_audio)
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that suppressing exceptions during warmup can lead to a poor user experience by allowing the application to start in a broken state. Adopting a fail-fast approach by letting exceptions propagate is a more robust design for handling critical startup failures.

Medium
  • More

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant