See Your Meal Before You Order — AI-powered menu translator and dish visualizer for travelers.
Live Demo: predine-ar.web.app | Video Demo
A mobile-first Progressive Web App that helps foreign tourists understand unfamiliar restaurant menus in real time through multimodal AI (camera + voice + text + image generation + AR + speech).
You're dining in a foreign country. The menu has no photos, descriptions are vague, there's a language barrier, and you're afraid of ordering the wrong thing.
- Scan — Point your camera at any menu. OCR extracts dishes, AI interprets them.
- Understand — Tap any dish for a clear description, allergen warnings, and spice level.
- Visualize — Generate AI images of dishes to see what you're ordering.
- AR Preview — Place dish images on your table via camera overlay (drag/pinch/rotate).
- Communicate — Translate dietary preferences to the local language. Show your waiter a full-screen card with local text, transliteration, and audio playback.
- Converse — Ask the AI assistant anything by voice or text. Interrupt anytime.
┌─────────────────────────────────────────────────┐
│ FRONTEND — Next.js 15 App Router (PWA) │
│ Camera · Voice · AR Overlay · Bottom Sheet │
├─────────────────────────────────────────────────┤
│ REAL-TIME LAYER — Gemini Live API Client │
│ Voice streaming · Interruptions · Turn-taking │
├─────────────────────────────────────────────────┤
│ AGENT LAYER — ADK (DiningAssistantAgent) │
│ Intent routing · Tool orchestration · Sessions │
├─────────────────────────────────────────────────┤
│ BACKEND — Firebase + Provider Abstractions │
│ Auth · Firestore · Analytics · App Check │
└─────────────────────────────────────────────────┘
The DiningAssistantAgent orchestrates all AI workflows:
| Tool | Purpose |
|---|---|
parse_menu |
OCR + dish detection from camera frames |
interpret_dish |
Explain dish with allergens, spice, confidence |
generate_dish_image |
AI image generation for dish visualization |
translate_for_waiter |
Translate preferences with transliteration + back-translation |
speak_waiter_phrase |
TTS playback in local language |
recommend_dishes |
Personalized recommendations based on preferences |
Workflow patterns:
- Sequential: scan → parse → interpret
- Parallel: image generation + recommendations
- Conditional: low confidence → ask clarification
- Interruptible: any task can be cancelled mid-flight
All AI capabilities are behind swappable interfaces:
| Interface | Mock Implementation | Production Target |
|---|---|---|
OCRProvider |
Sample Japanese menu data | Google Cloud Vision / Gemini |
VisionUnderstandingProvider |
Dish knowledge base (8 Thai dishes) | Gemini Pro Vision |
ImageGenerationProvider |
SVG placeholders | Imagen 3 / DALL-E |
TranslationProvider |
Phrasebook (8 common phrases) | Google Translate / Gemini |
TextToSpeechProvider |
Web Speech API | Google Cloud TTS / ElevenLabs |
SpeechToTextProvider |
Web Speech API | Gemini Live / Deepgram |
RecommendationProvider |
Score-based matching | Gemini with context |
Swap any provider in src/lib/providers/index.ts.
# Install dependencies
npm install
# Run development server
npm run dev
# Open on your phone (same network)
# http://<your-ip>:3000Copy .env.example to .env.local:
cp .env.example .env.localFirebase and Gemini keys are optional for the MVP — mock providers work without them.
npm run build
npm startsrc/
├── app/
│ ├── layout.tsx # Root layout, PWA meta, SW registration
│ ├── page.tsx # Main app: camera, nav tabs, bottom sheet
│ ├── sw-register.tsx # Service worker registration
│ └── api/agent/route.ts # ADK agent API endpoint
├── components/
│ ├── camera/camera-view.tsx # Camera feed with viewfinder + scan animation
│ ├── menu/
│ │ ├── menu-list.tsx # Scrollable menu with empty state
│ │ └── menu-item-card.tsx # Expandable card: tags, image, actions
│ ├── assistant/
│ │ └── assistant-panel.tsx # Chat UI: voice, text, streaming, interrupt
│ ├── ar/ar-overlay.tsx # Drag/pinch/rotate dish overlay on camera
│ ├── waiter/waiter-panel.tsx # Translation panel with "show waiter" mode
│ └── ui/ # Button, Chip, BottomSheet, StreamingDots
├── hooks/
│ ├── use-session.ts # Agent integration + all app actions
│ ├── use-camera.ts # Camera lifecycle + frame capture
│ ├── use-voice.ts # STT + TTS hook
│ └── use-service-worker.ts # PWA SW registration
├── lib/
│ ├── adk/
│ │ ├── agent.ts # DiningAssistantAgent (core orchestrator)
│ │ └── tools.ts # 6 agent tools
│ ├── providers/
│ │ ├── base.ts # 7 provider interfaces
│ │ ├── index.ts # Provider registry (swap here)
│ │ └── mock-*.ts # Mock implementations
│ ├── firebase/
│ │ ├── config.ts # Firebase init
│ │ ├── session.ts # Firestore session CRUD
│ │ └── analytics-events.ts # Typed analytics events
│ ├── realtime/
│ │ └── gemini-live.ts # Gemini Live API client scaffold
│ ├── types/
│ │ ├── index.ts # All data models (30+ types)
│ │ └── events.ts # Real-time event types
│ └── utils.ts # cn(), generateId(), formatConfidence()
└── public/
├── manifest.json # PWA manifest
├── sw.js # Service worker
└── icons/ # PWA icons
The complete end-to-end flow with mock providers:
- Open app → Camera activates, assistant greets you
- Tap scan button → OCR extracts 8 Thai dishes from mock menu
- Tap "Menu" tab → See all dishes with prices
- Tap a dish → Expands with description, allergens, spice level
- Tap "Visualize" → AI generates a dish image (SVG placeholder)
- Tap "See on Table" → Image appears on camera view, drag to reposition
- Tap "Waiter" tab → Select dietary preferences or quick phrases
- Tap "No peanuts please" → Shows Japanese translation + transliteration
- Tap "Show Waiter" → Full-screen card with large text for the waiter
- Tap speaker icon → TTS reads the phrase aloud (via Web Speech API)
- Tap "Chat" tab → Ask anything: "what's popular?", "no beef", "show another"
- Voice input → Tap mic, speak naturally, agent processes your request
Replace mock providers in src/lib/providers/:
// src/lib/providers/index.ts
import { GeminiVisionProvider } from "./gemini-vision";
export function getVisionProvider() {
return new GeminiVisionProvider(process.env.GEMINI_API_KEY);
}The client scaffold is at src/lib/realtime/gemini-live.ts. Connect to:
wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent
Add your config to .env.local. The app will automatically use Firestore for session persistence and Analytics for event tracking.
| Decision | Rationale |
|---|---|
| Mock providers over stubs | Realistic UX testing without API keys |
| Client-side agent (MVP) | Faster iteration; production moves to server-side ADK |
| Bottom sheet over pages | Mobile-native feel, maintains camera context |
| SVG dish images | No external dependencies for demo; swappable |
| Web Speech API | Zero-config STT/TTS; works on Chrome/Safari |
| Camera-first layout | Primary use case is scanning; sheet slides over |
- Mock providers return pre-defined data (8 Thai dishes in Japanese)
- Image generation produces SVG placeholders, not photorealistic images
- AR is camera-overlay only (no WebXR surface detection yet)
- Gemini Live connection is scaffolded but not wired to a live endpoint
- Session persistence requires Firebase project setup
- Voice recognition requires browser support (Chrome recommended)
- Wire Gemini Pro Vision for real OCR + dish interpretation
- Wire Imagen 3 for photorealistic dish generation
- Connect Gemini Live API for real-time voice interaction
- Add WebXR for true AR surface detection on supported devices
- Server-side ADK migration for production security
- Multi-language support beyond Japanese mock data
- Menu history — save and revisit past restaurant menus
- Social features — share dish recommendations
- Offline mode — cache menu data for no-connectivity dining
| Name | |
|---|---|
| Chao Zhang | [email protected] |
| John Chong | [email protected] |
| Timothy Asiimwe | [email protected] |
| Louis Cheng | [email protected] |
- Framework: Next.js 15 (App Router)
- Language: TypeScript (strict)
- Styling: Tailwind CSS 4
- AI Orchestration: ADK pattern (DiningAssistantAgent)
- Real-time: Gemini Live API client
- Backend: Firebase (Auth, Firestore, Analytics)
- PWA: Service worker, manifest, safe area handling