Skip to content

atimothee/predine-app

Repository files navigation

PreDine AR

See Your Meal Before You Order — AI-powered menu translator and dish visualizer for travelers.

Live Demo: predine-ar.web.app | Video Demo

A mobile-first Progressive Web App that helps foreign tourists understand unfamiliar restaurant menus in real time through multimodal AI (camera + voice + text + image generation + AR + speech).

The Problem

You're dining in a foreign country. The menu has no photos, descriptions are vague, there's a language barrier, and you're afraid of ordering the wrong thing.

What PreDine Does

  1. Scan — Point your camera at any menu. OCR extracts dishes, AI interprets them.
  2. Understand — Tap any dish for a clear description, allergen warnings, and spice level.
  3. Visualize — Generate AI images of dishes to see what you're ordering.
  4. AR Preview — Place dish images on your table via camera overlay (drag/pinch/rotate).
  5. Communicate — Translate dietary preferences to the local language. Show your waiter a full-screen card with local text, transliteration, and audio playback.
  6. Converse — Ask the AI assistant anything by voice or text. Interrupt anytime.

Architecture

┌─────────────────────────────────────────────────┐
│  FRONTEND — Next.js 15 App Router (PWA)         │
│  Camera · Voice · AR Overlay · Bottom Sheet      │
├─────────────────────────────────────────────────┤
│  REAL-TIME LAYER — Gemini Live API Client        │
│  Voice streaming · Interruptions · Turn-taking   │
├─────────────────────────────────────────────────┤
│  AGENT LAYER — ADK (DiningAssistantAgent)        │
│  Intent routing · Tool orchestration · Sessions  │
├─────────────────────────────────────────────────┤
│  BACKEND — Firebase + Provider Abstractions      │
│  Auth · Firestore · Analytics · App Check        │
└─────────────────────────────────────────────────┘

ADK Agent Design

The DiningAssistantAgent orchestrates all AI workflows:

Tool Purpose
parse_menu OCR + dish detection from camera frames
interpret_dish Explain dish with allergens, spice, confidence
generate_dish_image AI image generation for dish visualization
translate_for_waiter Translate preferences with transliteration + back-translation
speak_waiter_phrase TTS playback in local language
recommend_dishes Personalized recommendations based on preferences

Workflow patterns:

  • Sequential: scan → parse → interpret
  • Parallel: image generation + recommendations
  • Conditional: low confidence → ask clarification
  • Interruptible: any task can be cancelled mid-flight

Provider Abstractions

All AI capabilities are behind swappable interfaces:

Interface Mock Implementation Production Target
OCRProvider Sample Japanese menu data Google Cloud Vision / Gemini
VisionUnderstandingProvider Dish knowledge base (8 Thai dishes) Gemini Pro Vision
ImageGenerationProvider SVG placeholders Imagen 3 / DALL-E
TranslationProvider Phrasebook (8 common phrases) Google Translate / Gemini
TextToSpeechProvider Web Speech API Google Cloud TTS / ElevenLabs
SpeechToTextProvider Web Speech API Gemini Live / Deepgram
RecommendationProvider Score-based matching Gemini with context

Swap any provider in src/lib/providers/index.ts.

Getting Started

# Install dependencies
npm install

# Run development server
npm run dev

# Open on your phone (same network)
# http://<your-ip>:3000

Environment Variables

Copy .env.example to .env.local:

cp .env.example .env.local

Firebase and Gemini keys are optional for the MVP — mock providers work without them.

Build for Production

npm run build
npm start

Project Structure

src/
├── app/
│   ├── layout.tsx              # Root layout, PWA meta, SW registration
│   ├── page.tsx                # Main app: camera, nav tabs, bottom sheet
│   ├── sw-register.tsx         # Service worker registration
│   └── api/agent/route.ts      # ADK agent API endpoint
├── components/
│   ├── camera/camera-view.tsx   # Camera feed with viewfinder + scan animation
│   ├── menu/
│   │   ├── menu-list.tsx       # Scrollable menu with empty state
│   │   └── menu-item-card.tsx  # Expandable card: tags, image, actions
│   ├── assistant/
│   │   └── assistant-panel.tsx # Chat UI: voice, text, streaming, interrupt
│   ├── ar/ar-overlay.tsx       # Drag/pinch/rotate dish overlay on camera
│   ├── waiter/waiter-panel.tsx # Translation panel with "show waiter" mode
│   └── ui/                    # Button, Chip, BottomSheet, StreamingDots
├── hooks/
│   ├── use-session.ts          # Agent integration + all app actions
│   ├── use-camera.ts           # Camera lifecycle + frame capture
│   ├── use-voice.ts            # STT + TTS hook
│   └── use-service-worker.ts   # PWA SW registration
├── lib/
│   ├── adk/
│   │   ├── agent.ts            # DiningAssistantAgent (core orchestrator)
│   │   └── tools.ts            # 6 agent tools
│   ├── providers/
│   │   ├── base.ts             # 7 provider interfaces
│   │   ├── index.ts            # Provider registry (swap here)
│   │   └── mock-*.ts           # Mock implementations
│   ├── firebase/
│   │   ├── config.ts           # Firebase init
│   │   ├── session.ts          # Firestore session CRUD
│   │   └── analytics-events.ts # Typed analytics events
│   ├── realtime/
│   │   └── gemini-live.ts      # Gemini Live API client scaffold
│   ├── types/
│   │   ├── index.ts            # All data models (30+ types)
│   │   └── events.ts           # Real-time event types
│   └── utils.ts                # cn(), generateId(), formatConfidence()
└── public/
    ├── manifest.json           # PWA manifest
    ├── sw.js                   # Service worker
    └── icons/                  # PWA icons

Demo Flow

The complete end-to-end flow with mock providers:

  1. Open app → Camera activates, assistant greets you
  2. Tap scan button → OCR extracts 8 Thai dishes from mock menu
  3. Tap "Menu" tab → See all dishes with prices
  4. Tap a dish → Expands with description, allergens, spice level
  5. Tap "Visualize" → AI generates a dish image (SVG placeholder)
  6. Tap "See on Table" → Image appears on camera view, drag to reposition
  7. Tap "Waiter" tab → Select dietary preferences or quick phrases
  8. Tap "No peanuts please" → Shows Japanese translation + transliteration
  9. Tap "Show Waiter" → Full-screen card with large text for the waiter
  10. Tap speaker icon → TTS reads the phrase aloud (via Web Speech API)
  11. Tap "Chat" tab → Ask anything: "what's popular?", "no beef", "show another"
  12. Voice input → Tap mic, speak naturally, agent processes your request

Where to Plug Real APIs

Gemini / Firebase AI Logic

Replace mock providers in src/lib/providers/:

// src/lib/providers/index.ts
import { GeminiVisionProvider } from "./gemini-vision";

export function getVisionProvider() {
  return new GeminiVisionProvider(process.env.GEMINI_API_KEY);
}

Gemini Live API

The client scaffold is at src/lib/realtime/gemini-live.ts. Connect to:

wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent

Firebase

Add your config to .env.local. The app will automatically use Firestore for session persistence and Analytics for event tracking.

Design Decisions

Decision Rationale
Mock providers over stubs Realistic UX testing without API keys
Client-side agent (MVP) Faster iteration; production moves to server-side ADK
Bottom sheet over pages Mobile-native feel, maintains camera context
SVG dish images No external dependencies for demo; swappable
Web Speech API Zero-config STT/TTS; works on Chrome/Safari
Camera-first layout Primary use case is scanning; sheet slides over

Limitations (MVP)

  • Mock providers return pre-defined data (8 Thai dishes in Japanese)
  • Image generation produces SVG placeholders, not photorealistic images
  • AR is camera-overlay only (no WebXR surface detection yet)
  • Gemini Live connection is scaffolded but not wired to a live endpoint
  • Session persistence requires Firebase project setup
  • Voice recognition requires browser support (Chrome recommended)

Next Steps

  1. Wire Gemini Pro Vision for real OCR + dish interpretation
  2. Wire Imagen 3 for photorealistic dish generation
  3. Connect Gemini Live API for real-time voice interaction
  4. Add WebXR for true AR surface detection on supported devices
  5. Server-side ADK migration for production security
  6. Multi-language support beyond Japanese mock data
  7. Menu history — save and revisit past restaurant menus
  8. Social features — share dish recommendations
  9. Offline mode — cache menu data for no-connectivity dining

Team

Name Email
Chao Zhang [email protected]
John Chong [email protected]
Timothy Asiimwe [email protected]
Louis Cheng [email protected]

Tech Stack

  • Framework: Next.js 15 (App Router)
  • Language: TypeScript (strict)
  • Styling: Tailwind CSS 4
  • AI Orchestration: ADK pattern (DiningAssistantAgent)
  • Real-time: Gemini Live API client
  • Backend: Firebase (Auth, Firestore, Analytics)
  • PWA: Service worker, manifest, safe area handling

About

AI-powered menu translator and dish visualizer for travelers. Scan menus, visualize dishes in AR, and communicate with waiters — built with Next.js, Gemini Live, and Firebase.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors