A JavaScript example app that runs OpenAI's Whisper automatic speech recognition (ASR) model entirely in the browser using Transformers.js and WebAssembly — no server GPU required. Audio files and transcripts are stored in Backblaze B2 cloud storage.
Upload audio (MP3, WAV, M4A, FLAC, OGG, WEBM), transcribe it to text client-side with Whisper, and save both the recording and the transcript to S3-compatible Backblaze B2 object storage — all from a single-page web app.
- No GPU server costs — the Whisper model runs in your browser via WebAssembly, so there's no inference server to provision or pay for
- Privacy — audio never leaves the user's device for transcription
- Simple to deploy — a static frontend + a lightweight Node.js backend for pre-signed URLs is all you need
- Transformers.js — Run Hugging Face AI models like Whisper in the browser with WebAssembly
- OpenAI Whisper — State-of-the-art open-source automatic speech recognition (ASR) model
- Backblaze B2 — S3-compatible cloud object storage at $6/TB/month
- Client-side AI transcription: Run OpenAI Whisper entirely in the browser — no server GPU required
- Cost-effective cloud storage: Store audio files and transcripts in Backblaze B2
- Secure direct uploads: Browser-to-cloud uploads using S3 pre-signed URLs
- Simple architecture: End-to-end flow from upload → transcribe → store
User → Upload Audio → B2 Storage
↓
Browser Whisper (Transformers.js) → Transcribe
↓
Transcript → B2 Storage
- User selects/drops audio file in browser
- Backend generates pre-signed PUT URL for B2
- Browser uploads audio directly to B2
- Browser loads Whisper model (Xenova/whisper-tiny.en)
- Browser transcribes audio locally
- Backend generates pre-signed PUT URL for transcript
- Browser uploads transcript JSON to B2
- Node.js 18+
- Backblaze B2 Account (free tier available)
- Create a bucket
- Generate an Application Key with
readFiles,writeFiles,writeBucketspermissions
git clone https://github.com/backblaze-b2-samples/b2-whisper-transformersjs-transcriber.git
cd b2-whisper-transformersjs-transcriber/backend
npm installcp .env.example .envEdit .env with your B2 credentials:
B2_ENDPOINT=https://s3.us-west-002.backblazeb2.com
B2_REGION=us-west-002
B2_KEY_ID=your_key_id_here
B2_APP_KEY=your_app_key_here
B2_BUCKET=your-bucket-nameGet your B2 endpoint and region from your bucket details page
npm startThat's it! The server automatically:
- ✅ Configures B2 CORS for browser uploads
- ✅ Serves both frontend and API
- ✅ Opens at
http://localhost:3000
- Open http://localhost:3000 in your browser
- Upload an audio file (MP3, WAV, M4A, etc.)
- Click "Transcribe with Whisper"
- View transcription and access files in B2
⚠️ First run downloads the Whisper model (~40MB) - this takes 1-2 minutes
If auto-setup fails (missing permissions), run manually:
npm run setup-corsRequired B2 Key Permissions:
listBucketsreadFileswriteFileswriteBucketSettings← Required for CORS setup
Alternative - B2 CLI:
b2 update-bucket --cors-rules '[
{
"corsRuleName": "allowBrowserUploads",
"allowedOrigins": ["*"],
"allowedHeaders": ["*"],
"allowedOperations": ["s3_put", "s3_get", "s3_head"],
"maxAgeSeconds": 3600
}
]' <bucket-name> allPublicAlternative - B2 Web Console:
- Go to https://secure.backblaze.com/b2_buckets.htm
- Click your bucket → Bucket Settings → CORS Rules
- Add the rules shown above
- Open the frontend in your browser
- Ensure the Backend API URL is correct (default:
http://localhost:3000) - Drag and drop an audio file or click to browse
- Or download this example audio clip to test
- Audio automatically uploads to B2
- Click "Transcribe with Whisper"
- Wait for transcription (first run downloads model)
- View results and access files in B2
Railway / Render / Fly.io:
- Set environment variables from
.env - Deploy
backend/directory - Update frontend
apiUrlto deployed URL
Docker:
FROM node:18-alpine
WORKDIR /app
COPY backend/package*.json ./
RUN npm install
COPY backend/ ./
CMD ["node", "server.js"]Static Hosting (Netlify, Vercel, Cloudflare Pages):
- Deploy
frontend/directory - Set API URL in settings or hardcode in
index.html:170
B2 Static Hosting:
- Upload
frontend/index.htmlto B2 bucket - Enable website hosting on bucket
- Access via B2 website URL
- Create bucket (Private or Public based on needs)
- For public access to audio/transcripts, set bucket to Public
- Enable CORS if frontend hosted on different domain:
[
{
"corsRuleName": "allowAll",
"allowedOrigins": ["*"],
"allowedHeaders": ["*"],
"allowedOperations": ["s3_put", "s3_get"],
"maxAgeSeconds": 3600
}
]# Using B2 CLI
b2 create-key <keyName> listBuckets,readFiles,writeFilesOr use B2 Web UI → App Keys → Create Key
Request:
{
"filename": "audio.mp3",
"contentType": "audio/mpeg"
}Response:
{
"uploadUrl": "https://...",
"publicUrl": "https://...",
"key": "audio/uuid.mp3",
"fileId": "uuid"
}Request:
{
"fileId": "uuid"
}Response:
{
"uploadUrl": "https://...",
"publicUrl": "https://...",
"key": "transcripts/uuid.json"
}This example uses the Xenova/whisper-tiny.en model, a quantized version of OpenAI's Whisper optimized for in-browser inference via Transformers.js. You can swap it for larger Whisper variants (base, small, medium) for higher accuracy at the cost of longer load times.
- Model: Xenova/whisper-tiny.en (English only, 39M params)
- Library: Transformers.js — Run Hugging Face transformer models in the browser
- Quantization: q8 (8-bit) for faster WebAssembly inference
- Size: ~40MB download (cached in browser after first load)
- Speed: ~30 seconds to transcribe 1 minute of audio
- Provider: Backblaze B2
- API: S3-compatible API with pre-signed URLs
- Pricing: $6/TB/month storage, uploads are FREE
- Documentation: B2 S3-Compatible API Docs
MP3, WAV, OGG, M4A, WEBM, FLAC
- Chrome 90+
- Edge 90+
- Firefox 90+
- Safari 15.4+
Requires WebAssembly and ES6 modules support.
- First transcription loads model (~40MB, one-time)
- Whisper-tiny less accurate than base/small/medium
- English only (use
Xenova/whisper-tinyfor multilingual) - Browser must stay open during transcription
- Large files (>30min) may be slow
- Add recording directly in browser (MediaRecorder API)
- Support larger Whisper models (base, small, medium)
- Progress callback for transcription
- Batch processing multiple files
- Word-level timestamps
- Speaker diarization
- Multi-language support using
Xenova/whisper-tiny(not-tiny.en)
- Transformers.js Documentation — Run Hugging Face AI models in the browser with WebAssembly
- Transformers.js GitHub — Source code and examples
- OpenAI Whisper — Original Whisper automatic speech recognition model
- Whisper Models on Hugging Face — Pre-trained Whisper model variants (tiny, base, small, medium, large)
- Backblaze B2 Documentation — Cloud storage API docs
- B2 S3-Compatible API — Use standard S3 SDKs with Backblaze B2
Problem: Browser shows CORS error when uploading audio.
Solution:
- Run
npm run setup-corsin the backend directory - Or manually configure CORS on your B2 bucket (see Setup section)
- Verify CORS is set: Go to B2 Console → Your Bucket → Settings → CORS Rules
Required CORS settings:
- Allowed Origins:
*(or specific origins likehttp://localhost:8080) - Allowed Methods:
GET,PUT,HEAD - Allowed Headers:
*
Problem: Frontend can't connect to backend API.
Solution:
- Verify backend is running:
curl http://localhost:3000/health - Check API URL in frontend matches backend (default:
http://localhost:3000) - Look for CORS errors in backend logs
Problem: Whisper model fails to load or transcribe.
Solution:
- First run takes time: Model downloads ~40MB, wait 1-2 minutes
- Check browser console: Look for specific errors
- Try smaller file: Test with <1 minute audio first
- Clear cache: Hard refresh browser (Ctrl+Shift+R / Cmd+Shift+R)
- Use supported browser: Chrome, Edge, or Firefox recommended
Problem: Files upload but URLs don't work.
Solution:
- Check bucket is public or URLs are pre-signed
- Verify endpoint URL matches bucket region
- Try accessing URL directly in browser
- Check B2 bucket lifecycle rules aren't deleting files
Problem: Console shows errors from contentScript.bundle.js.
Solution: These are from browser extensions (like Claude Code). Safe to ignore - they don't affect the app.
This project is licensed under the MIT License. See the LICENSE file for details.