A JavaScript example app that runs OpenAI's CLIP zero-shot image classification model entirely in the browser using Transformers.js and WebAssembly — no server GPU required. Images and classification results are stored in Backblaze B2 cloud storage.
Upload an image (JPG, PNG, GIF, WebP, BMP), provide custom labels, classify it with CLIP client-side, and save both the image and results to S3-compatible Backblaze B2 object storage — all from a single-page web app.
- No GPU server costs — the CLIP model runs in your browser via WebAssembly, so there's no inference server to provision or pay for
- Privacy — images never leave the user's device for classification
- Simple to deploy — a static frontend + a lightweight Node.js backend for pre-signed URLs is all you need
- Transformers.js — Run Hugging Face AI models like CLIP in the browser with WebAssembly
- OpenAI CLIP — State-of-the-art open-source vision-language model for zero-shot image classification
- Backblaze B2 — S3-compatible cloud object storage at $6/TB/month
- Client-side AI classification: Run OpenAI CLIP entirely in the browser — no server GPU required
- Zero-shot flexibility: Classify images with any custom labels — no retraining needed
- Cost-effective cloud storage: Store images and results in Backblaze B2
- Secure direct uploads: Browser-to-cloud uploads using S3 pre-signed URLs
- Simple architecture: End-to-end flow from upload → classify → store
User → Upload Image → B2 Storage
↓
Browser CLIP (Transformers.js) → Classify with custom labels
↓
Results → B2 Storage
- User selects/drops image file in browser
- Backend generates pre-signed PUT URL for B2
- Browser uploads image directly to B2
- User enters classification labels (or picks a preset)
- Browser loads CLIP model (Xenova/clip-vit-base-patch32)
- Browser classifies image locally against provided labels
- Backend generates pre-signed PUT URL for results
- Browser uploads classification results JSON to B2
- Node.js 18+
- Backblaze B2 Account (free tier available)
- Create a bucket
- Generate an Application Key with
readFiles,writeFiles,writeBucketspermissions
git clone https://github.com/backblaze-b2-samples/b2-zeroshot-image-classifier.git
cd b2-zeroshot-image-classifier/backend
npm installcp .env.example .envEdit .env with your B2 credentials:
B2_ENDPOINT=https://s3.us-west-002.backblazeb2.com
B2_REGION=us-west-002
B2_KEY_ID=your_key_id_here
B2_APP_KEY=your_app_key_here
B2_BUCKET=your-bucket-nameGet your B2 endpoint and region from your bucket details page
npm startThat's it! The server automatically:
- Configures B2 CORS for browser uploads
- Serves both frontend and API
- Opens at
http://localhost:3000
- Open http://localhost:3000 in your browser
- Upload an image (JPG, PNG, GIF, etc.)
- Enter classification labels or pick a preset
- Click "Classify with CLIP"
- View results bar chart and access files in B2
First run downloads the CLIP model (~350MB) - this may take a few minutes
If auto-setup fails (missing permissions), run manually:
npm run setup-corsRequired B2 Key Permissions:
listBucketsreadFileswriteFileswriteBucketSettings— Required for CORS setup
Alternative - B2 CLI:
b2 update-bucket --cors-rules '[
{
"corsRuleName": "allowBrowserUploads",
"allowedOrigins": ["*"],
"allowedHeaders": ["*"],
"allowedOperations": ["s3_put", "s3_get", "s3_head"],
"maxAgeSeconds": 3600
}
]' <bucket-name> allPublicAlternative - B2 Web Console:
- Go to https://secure.backblaze.com/b2_buckets.htm
- Click your bucket -> Bucket Settings -> CORS Rules
- Add the rules shown above
- Open the frontend in your browser
- Ensure the Backend API URL is correct (default:
http://localhost:3000) - Drag and drop an image or click to browse
- Image automatically uploads to B2
- Enter comma-separated labels or click a preset button
- Click "Classify with CLIP"
- Wait for classification (first run downloads model)
- View bar chart results and access files in B2
Railway / Render / Fly.io:
- Set environment variables from
.env - Deploy
backend/directory - Update frontend
apiUrlto deployed URL
Docker:
FROM node:18-alpine
WORKDIR /app
COPY backend/package*.json ./
RUN npm install
COPY backend/ ./
CMD ["node", "server.js"]Static Hosting (Netlify, Vercel, Cloudflare Pages):
- Deploy
frontend/directory - Set API URL in settings or hardcode in
index.html
B2 Static Hosting:
- Upload
frontend/index.htmlto B2 bucket - Enable website hosting on bucket
- Access via B2 website URL
- Create bucket (Private or Public based on needs)
- For public access to images/results, set bucket to Public
- Enable CORS if frontend hosted on different domain:
[
{
"corsRuleName": "allowAll",
"allowedOrigins": ["*"],
"allowedHeaders": ["*"],
"allowedOperations": ["s3_put", "s3_get"],
"maxAgeSeconds": 3600
}
]# Using B2 CLI
b2 create-key <keyName> listBuckets,readFiles,writeFilesOr use B2 Web UI -> App Keys -> Create Key
Request:
{
"filename": "photo.jpg",
"contentType": "image/jpeg"
}Response:
{
"uploadUrl": "https://...",
"publicUrl": "https://...",
"key": "images/uuid.jpg",
"fileId": "uuid"
}Request:
{
"fileId": "uuid"
}Response:
{
"uploadUrl": "https://...",
"publicUrl": "https://...",
"key": "results/uuid.json"
}This example uses the Xenova/clip-vit-base-patch32 model, a quantized version of OpenAI's CLIP optimized for in-browser inference via Transformers.js.
- Model: Xenova/clip-vit-base-patch32 (ViT-B/32 vision encoder + text encoder)
- Library: Transformers.js — Run Hugging Face transformer models in the browser
- Size: ~350MB download (cached in browser after first load)
- Task: Zero-shot image classification — classify images against any set of text labels without retraining
- Provider: Backblaze B2
- API: S3-compatible API with pre-signed URLs
- Pricing: $6/TB/month storage, uploads are FREE
- Documentation: B2 S3-Compatible API Docs
JPG, JPEG, PNG, GIF, WebP, BMP
- Chrome 90+
- Edge 90+
- Firefox 90+
- Safari 15.4+
Requires WebAssembly and ES6 modules support.
- First classification loads model (~350MB, one-time)
- ViT-B/32 is a base model — larger CLIP variants may be more accurate
- Browser must stay open during classification
- Maximum image size: 10 MB
- Requires at least 2 labels for meaningful zero-shot classification
- Add webcam/camera capture
- Support larger CLIP models (ViT-L/14)
- Batch classification of multiple images
- Image segmentation with region-specific labels
- Multi-language label support
- Confidence threshold filtering
- Export results as CSV
- Transformers.js Documentation — Run Hugging Face AI models in the browser with WebAssembly
- Transformers.js GitHub — Source code and examples
- OpenAI CLIP — Original CLIP vision-language model
- CLIP Models on Hugging Face — Pre-trained CLIP model variants
- Backblaze B2 Documentation — Cloud storage API docs
- B2 S3-Compatible API — Use standard S3 SDKs with Backblaze B2
Problem: Browser shows CORS error when uploading image.
Solution:
- Run
npm run setup-corsin the backend directory - Or manually configure CORS on your B2 bucket (see Setup section)
- Verify CORS is set: Go to B2 Console -> Your Bucket -> Settings -> CORS Rules
Required CORS settings:
- Allowed Origins:
*(or specific origins likehttp://localhost:8080) - Allowed Methods:
GET,PUT,HEAD - Allowed Headers:
*
Problem: Frontend can't connect to backend API.
Solution:
- Verify backend is running:
curl http://localhost:3000/health - Check API URL in frontend matches backend (default:
http://localhost:3000) - Look for CORS errors in backend logs
Problem: CLIP model fails to load or classify.
Solution:
- First run takes time: Model downloads ~350MB, wait a few minutes
- Check browser console: Look for specific errors
- Try smaller image: Test with a small JPG first
- Clear cache: Hard refresh browser (Ctrl+Shift+R / Cmd+Shift+R)
- Use supported browser: Chrome, Edge, or Firefox recommended
Problem: Files upload but URLs don't work.
Solution:
- Check bucket is public or URLs are pre-signed
- Verify endpoint URL matches bucket region
- Try accessing URL directly in browser
- Check B2 bucket lifecycle rules aren't deleting files
Problem: Console shows errors from contentScript.bundle.js.
Solution: These are from browser extensions. Safe to ignore - they don't affect the app.
This project is licensed under the MIT License. See the LICENSE file for details.