SPEECHMATE

Offline-first language learning for endangered Nicobarese and Great Andamanese languages.
Built for tribal primary schools in the Andaman & Nicobar Islands.

🌍 Problem • 🎯 Core • 📊 Status • ⚡ Why Us • 🚀 Install • 📜 License

🌍 The Problem

"A Nicobarese child walks into a government school in Car Nicobar. The teacher speaks Hindi. The textbook is in English. The child's mother tongue — spoken by fewer than 30,000 people — has no place in the classroom. By the time she graduates, she may no longer speak it."

Car Nicobarese and Great Andamanese are critically endangered. There are no widely accessible or child-focused digital tools for learning them. Children in these schools cannot bridge the gap between their mother tongue and the medium of instruction — not because they lack ability, but because offline-first tools tailored for tribal classrooms don't exist.

A language dies every 14 days globally.
The Andaman & Nicobar school system has no tribal-language digital curriculum.
Remote islands have unreliable or zero internet connectivity — cloud-based tools don't work here.

SpeechMate is an attempt to build that missing tool. It is a pre-pilot prototype — functional, but not yet validated in schools.

🎯 Core Value

One sentence: Offline voice-based Nicobarese vocabulary learning for tribal primary school children and their teachers.

Two features that matter most:

Feature	What it does	Status
🎙️ Voice Translation	Speak in Hindi/Tamil/Bengali/Telugu → hear Nicobarese audio	✅ Working
📚 Vocabulary Learning	12 categories, 2,400+ words, games, flashcards, spaced repetition	✅ Working

Everything else in this app supports these two features or is exploratory.

📊 Status (Honest)

This is a pre-pilot prototype. Here is what each feature's status actually is:

Feature	Status	Notes
Nicobarese dictionary (2,400+ words)	✅ Working	JSON lexicons, seeded to SQLite
Word learning games (4 games)	✅ Working	Match, Flashcards, Scramble, Runner
SM-2 Spaced Repetition (SRS)	✅ Working	Standard Anki-style algorithm
XP / leveling / daily missions	✅ Working	Deterministic, no backend needed
Regional language translation (4 lang)	✅ Working	Google ML Kit offline models
On-device STT (Whisper Base)	✅ Working	Via NDK 27 C++ — tested on mid-range Android
Teacher dashboard	✅ Working	Phrase bank, quiz mode, OCR scanner
AR object → Nicobarese overlay	⚠️ Partial	Works on static images; live video overlay is experimental
Malayalam translation	⚠️ Partial	Requires internet (cloud fallback)
Great Andamanese hub	⚠️ Partial	Lexicon loaded; voice + OCR functional, community features UI-only
Omni-Broadcast (5 languages at once)	🧪 Experimental	Architecture built; latency on low-end devices not validated
Dialect heatmap, Culture Hub	🧪 Experimental	UI complete; data is placeholder
P2P sync, Document translation	🧪 Experimental	Feature works but no field testing
Virtual pet	🧪 Experimental	Functional; pedagogical value untested

Not yet done: User testing, accuracy benchmarks, school deployment, community audio recordings.

⚡ Why SpeechMate?

Feature	Google Translate	Duolingo	Offline Dictionary Apps	SpeechMate
Tribal Languages	No Nicobarese support	No Nicobarese support	No tribal languages	✅ Nicobarese & G. Andamanese
Connectivity	Requires internet	Requires internet	✅ Works offline	✅ Works fully offline*
Target Audience	General audience	✅ Child-focused	General audience	✅ Built for ages 6–14
Classroom Fit	Not designed for tribal classrooms	Not designed for tribal classrooms	Not designed for tribal classrooms	✅ Tailored for tribal classrooms
Educator Tools	None	Schools edition	None	✅ Custom teacher dashboard
Voice Learning	✅ Supported	✅ Supported	No voice features	✅ Supported

*Core vocabulary and translation is offline. AR overlay and Malayalam use partial online features.

The gap is real. No tool — commercial or academic — currently supports Nicobarese or Great Andamanese language learning for children.

🏗️ Architecture

Flutter UI (Riverpod)
    │
    ├── WhisperService (NDK 27 C++) — On-device speech-to-text
    ├── NeuralEngine — Offline translation pipeline (dictionary + NLP)
    ├── ML Kit — Regional translation + OCR + object detection
    └── DatabaseManager (SQLite) — All linguistic data, locally stored

Translation pipeline (7 meaningful stages): Dictionary lookup → Bigram/Trigram phrase match → Stemming → Synonym expansion → Soundex phonetic fallback → Compound word split → Levenshtein fuzzy match.

No generative AI or cloud LLMs. All translation is deterministic dictionary + algorithmic NLP.

Technical Specs

Metric	Value	Notes
App size	~250 MB	Includes 141 MB Whisper model
STT latency	~600ms	Tested on a mid-range Android (Snapdragon 6xx). Low-end devices not benchmarked.
Translation speed	<100ms	SQLite indexed lookup
Dictionary	2,400+ entries	Core Nicobarese lexicon
Min Android	API 24	Android 7.0+

🛤️ Roadmap

Phase 1 — Validation (Next)

Pilot in 2–3 A&N tribal schools with ~30 students
Collect teacher feedback on classroom usability
Community recording program with tribal elders
Benchmark STT latency on low-end school devices

Phase 2 — Refinement (Post-Pilot)

Lean APK build (<50 MB, Whisper-optional)
Validated accuracy metrics for translation
Onges language module

Phase 3 — Scale (Future)

Cloud sync for community content
On-device LLM for generative practice (SmolLM2 GGUF)
Government curriculum integration

🚀 Installation

# 1. Clone
git clone https://github.com/sathiyatskrj/Speechmate.git
cd Speechmate

# 2. Install dependencies
flutter pub get

# 3. Pull Whisper model (Git LFS — 141 MB)
git lfs pull

# 4. Run
flutter run

Requirements: Flutter 3.29+ · Dart 3.2+ · Android NDK 27 · Android SDK 34

📂 Additional Documentation

Document	Contents
`/docs/architecture.md`	Full system architecture, service layer, Mermaid diagrams
`/docs/features.md`	Complete feature list for all modules
`/docs/data.md`	Lexicon structure, JSON format, how to add languages

📜 Licensing

SpeechMate uses a split-licensing model to keep the source code open while strictly protecting indigenous data sovereignty.

1. Software Code: Apache 2.0

All source code in this repository is licensed under the Apache License 2.0. You are free to use, modify, and distribute the software for commercial and non-commercial purposes.

2. Linguistic Data: CC BY-NC 4.0

All dictionary entries, audio recordings, and cultural content (located in assets/data/ and assets/audio/) are licensed under Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0), combined with Traditional Knowledge (TK) protocols. See DATA_TERMS.txt for details.

Key rules for the data:

🚫 No Commercial Use: You may not sell, monetize, or use the data in commercial products without written consent from the relevant tribal council.
🚫 No AI Training: You may not use this data to train commercial AI or LLMs.
✅ Educational/Research Use: You may use the data freely for non-commercial research, personal learning, and educational tools with proper attribution.

Summary: You can fork and build upon the app's code freely under Apache 2.0, but the tribal language data is strictly protected under CC BY-NC 4.0 and cannot be exploited for profit.

Built for the tribal communities of the Andaman & Nicobar Islands.
Pre-pilot prototype — not yet validated in schools.

Name		Name	Last commit message	Last commit date
Latest commit History 717 Commits
.github/workflows		.github/workflows
android		android
assets		assets
docs		docs
github_readme_ss		github_readme_ss
ios		ios
lib		lib
test		test
wiki		wiki
.gitattributes		.gitattributes
.gitignore		.gitignore
.metadata		.metadata
DATA_TERMS.txt		DATA_TERMS.txt
LICENSE		LICENSE
README.md		README.md
Speechmate.code-workspace		Speechmate.code-workspace
analysis_options.yaml		analysis_options.yaml
analyze_out.txt		analyze_out.txt
analyze_output.txt		analyze_output.txt
doctor_output.txt		doctor_output.txt
duplicate_analysis.txt		duplicate_analysis.txt
generate_general_splash.py		generate_general_splash.py
generate_main_splash.py		generate_main_splash.py
pubspec.lock		pubspec.lock
pubspec.yaml		pubspec.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPEECHMATE

🌍 The Problem

🎯 Core Value

📊 Status (Honest)

⚡ Why SpeechMate?

🏗️ Architecture

Technical Specs

🛤️ Roadmap

Phase 1 — Validation (Next)

Phase 2 — Refinement (Post-Pilot)

Phase 3 — Scale (Future)

🚀 Installation

📂 Additional Documentation

📜 Licensing

1. Software Code: Apache 2.0

2. Linguistic Data: CC BY-NC 4.0

About

Uh oh!

Releases 21

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SPEECHMATE

🌍 The Problem

🎯 Core Value

📊 Status (Honest)

⚡ Why SpeechMate?

🏗️ Architecture

Technical Specs

🛤️ Roadmap

Phase 1 — Validation (Next)

Phase 2 — Refinement (Post-Pilot)

Phase 3 — Scale (Future)

🚀 Installation

📂 Additional Documentation

📜 Licensing

1. Software Code: Apache 2.0

2. Linguistic Data: CC BY-NC 4.0

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 21

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages