PreDine AR

See Your Meal Before You Order — AI-powered menu translator and dish visualizer for travelers.

Live Demo: predine-ar.web.app | Video Demo

A mobile-first Progressive Web App that helps foreign tourists understand unfamiliar restaurant menus in real time through multimodal AI (camera + voice + text + image generation + AR + speech).

The Problem

You're dining in a foreign country. The menu has no photos, descriptions are vague, there's a language barrier, and you're afraid of ordering the wrong thing.

What PreDine Does

Scan — Point your camera at any menu. OCR extracts dishes, AI interprets them.
Understand — Tap any dish for a clear description, allergen warnings, and spice level.
Visualize — Generate AI images of dishes to see what you're ordering.
AR Preview — Place dish images on your table via camera overlay (drag/pinch/rotate).
Communicate — Translate dietary preferences to the local language. Show your waiter a full-screen card with local text, transliteration, and audio playback.
Converse — Ask the AI assistant anything by voice or text. Interrupt anytime.

Architecture

┌─────────────────────────────────────────────────┐
│  FRONTEND — Next.js 15 App Router (PWA)         │
│  Camera · Voice · AR Overlay · Bottom Sheet      │
├─────────────────────────────────────────────────┤
│  REAL-TIME LAYER — Gemini Live API Client        │
│  Voice streaming · Interruptions · Turn-taking   │
├─────────────────────────────────────────────────┤
│  AGENT LAYER — ADK (DiningAssistantAgent)        │
│  Intent routing · Tool orchestration · Sessions  │
├─────────────────────────────────────────────────┤
│  BACKEND — Firebase + Provider Abstractions      │
│  Auth · Firestore · Analytics · App Check        │
└─────────────────────────────────────────────────┘

ADK Agent Design

The DiningAssistantAgent orchestrates all AI workflows:

Tool	Purpose
`parse_menu`	OCR + dish detection from camera frames
`interpret_dish`	Explain dish with allergens, spice, confidence
`generate_dish_image`	AI image generation for dish visualization
`translate_for_waiter`	Translate preferences with transliteration + back-translation
`speak_waiter_phrase`	TTS playback in local language
`recommend_dishes`	Personalized recommendations based on preferences

Workflow patterns:

Sequential: scan → parse → interpret
Parallel: image generation + recommendations
Conditional: low confidence → ask clarification
Interruptible: any task can be cancelled mid-flight

Provider Abstractions

All AI capabilities are behind swappable interfaces:

Interface	Mock Implementation	Production Target
`OCRProvider`	Sample Japanese menu data	Google Cloud Vision / Gemini
`VisionUnderstandingProvider`	Dish knowledge base (8 Thai dishes)	Gemini Pro Vision
`ImageGenerationProvider`	SVG placeholders	Imagen 3 / DALL-E
`TranslationProvider`	Phrasebook (8 common phrases)	Google Translate / Gemini
`TextToSpeechProvider`	Web Speech API	Google Cloud TTS / ElevenLabs
`SpeechToTextProvider`	Web Speech API	Gemini Live / Deepgram
`RecommendationProvider`	Score-based matching	Gemini with context

Swap any provider in src/lib/providers/index.ts.

Getting Started

# Install dependencies
npm install

# Run development server
npm run dev

# Open on your phone (same network)
# http://<your-ip>:3000

Environment Variables

Copy .env.example to .env.local:

cp .env.example .env.local

Firebase and Gemini keys are optional for the MVP — mock providers work without them.

Build for Production

npm run build
npm start

Project Structure

src/
├── app/
│   ├── layout.tsx              # Root layout, PWA meta, SW registration
│   ├── page.tsx                # Main app: camera, nav tabs, bottom sheet
│   ├── sw-register.tsx         # Service worker registration
│   └── api/agent/route.ts      # ADK agent API endpoint
├── components/
│   ├── camera/camera-view.tsx   # Camera feed with viewfinder + scan animation
│   ├── menu/
│   │   ├── menu-list.tsx       # Scrollable menu with empty state
│   │   └── menu-item-card.tsx  # Expandable card: tags, image, actions
│   ├── assistant/
│   │   └── assistant-panel.tsx # Chat UI: voice, text, streaming, interrupt
│   ├── ar/ar-overlay.tsx       # Drag/pinch/rotate dish overlay on camera
│   ├── waiter/waiter-panel.tsx # Translation panel with "show waiter" mode
│   └── ui/                    # Button, Chip, BottomSheet, StreamingDots
├── hooks/
│   ├── use-session.ts          # Agent integration + all app actions
│   ├── use-camera.ts           # Camera lifecycle + frame capture
│   ├── use-voice.ts            # STT + TTS hook
│   └── use-service-worker.ts   # PWA SW registration
├── lib/
│   ├── adk/
│   │   ├── agent.ts            # DiningAssistantAgent (core orchestrator)
│   │   └── tools.ts            # 6 agent tools
│   ├── providers/
│   │   ├── base.ts             # 7 provider interfaces
│   │   ├── index.ts            # Provider registry (swap here)
│   │   └── mock-*.ts           # Mock implementations
│   ├── firebase/
│   │   ├── config.ts           # Firebase init
│   │   ├── session.ts          # Firestore session CRUD
│   │   └── analytics-events.ts # Typed analytics events
│   ├── realtime/
│   │   └── gemini-live.ts      # Gemini Live API client scaffold
│   ├── types/
│   │   ├── index.ts            # All data models (30+ types)
│   │   └── events.ts           # Real-time event types
│   └── utils.ts                # cn(), generateId(), formatConfidence()
└── public/
    ├── manifest.json           # PWA manifest
    ├── sw.js                   # Service worker
    └── icons/                  # PWA icons

Demo Flow

The complete end-to-end flow with mock providers:

Open app → Camera activates, assistant greets you
Tap scan button → OCR extracts 8 Thai dishes from mock menu
Tap "Menu" tab → See all dishes with prices
Tap a dish → Expands with description, allergens, spice level
Tap "Visualize" → AI generates a dish image (SVG placeholder)
Tap "See on Table" → Image appears on camera view, drag to reposition
Tap "Waiter" tab → Select dietary preferences or quick phrases
Tap "No peanuts please" → Shows Japanese translation + transliteration
Tap "Show Waiter" → Full-screen card with large text for the waiter
Tap speaker icon → TTS reads the phrase aloud (via Web Speech API)
Tap "Chat" tab → Ask anything: "what's popular?", "no beef", "show another"
Voice input → Tap mic, speak naturally, agent processes your request

Where to Plug Real APIs

Gemini / Firebase AI Logic

Replace mock providers in src/lib/providers/:

// src/lib/providers/index.ts
import { GeminiVisionProvider } from "./gemini-vision";

export function getVisionProvider() {
  return new GeminiVisionProvider(process.env.GEMINI_API_KEY);
}

Gemini Live API

The client scaffold is at src/lib/realtime/gemini-live.ts. Connect to:

wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent

Firebase

Add your config to .env.local. The app will automatically use Firestore for session persistence and Analytics for event tracking.

Design Decisions

Decision	Rationale
Mock providers over stubs	Realistic UX testing without API keys
Client-side agent (MVP)	Faster iteration; production moves to server-side ADK
Bottom sheet over pages	Mobile-native feel, maintains camera context
SVG dish images	No external dependencies for demo; swappable
Web Speech API	Zero-config STT/TTS; works on Chrome/Safari
Camera-first layout	Primary use case is scanning; sheet slides over

Limitations (MVP)

Mock providers return pre-defined data (8 Thai dishes in Japanese)
Image generation produces SVG placeholders, not photorealistic images
AR is camera-overlay only (no WebXR surface detection yet)
Gemini Live connection is scaffolded but not wired to a live endpoint
Session persistence requires Firebase project setup
Voice recognition requires browser support (Chrome recommended)

Next Steps

Wire Gemini Pro Vision for real OCR + dish interpretation
Wire Imagen 3 for photorealistic dish generation
Connect Gemini Live API for real-time voice interaction
Add WebXR for true AR surface detection on supported devices
Server-side ADK migration for production security
Multi-language support beyond Japanese mock data
Menu history — save and revisit past restaurant menus
Social features — share dish recommendations
Offline mode — cache menu data for no-connectivity dining

Team

Name	Email
Chao Zhang	[email protected]
John Chong	[email protected]
Timothy Asiimwe	[email protected]
Louis Cheng	[email protected]

Tech Stack

Framework: Next.js 15 (App Router)
Language: TypeScript (strict)
Styling: Tailwind CSS 4
AI Orchestration: ADK pattern (DiningAssistantAgent)
Real-time: Gemini Live API client
Backend: Firebase (Auth, Firestore, Analytics)
PWA: Service worker, manifest, safe area handling

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.claude		.claude
.firebase		.firebase
dataconnect		dataconnect
public		public
src		src
.env.example		.env.example
.firebaserc		.firebaserc
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
eslint.config.mjs		eslint.config.mjs
firebase.json		firebase.json
firestore.indexes.json		firestore.indexes.json
firestore.rules		firestore.rules
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
predine-architecture.png		predine-architecture.png
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PreDine AR

The Problem

What PreDine Does

Architecture

ADK Agent Design

Provider Abstractions

Getting Started

Environment Variables

Build for Production

Project Structure

Demo Flow

Where to Plug Real APIs

Gemini / Firebase AI Logic

Gemini Live API

Firebase

Design Decisions

Limitations (MVP)

Next Steps

Team

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PreDine AR

The Problem

What PreDine Does

Architecture

ADK Agent Design

Provider Abstractions

Getting Started

Environment Variables

Build for Production

Project Structure

Demo Flow

Where to Plug Real APIs

Gemini / Firebase AI Logic

Gemini Live API

Firebase

Design Decisions

Limitations (MVP)

Next Steps

Team

Tech Stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages