LearnMate

Personal AI tutor that teaches you from your own notes.

LearnMate embeds your Obsidian wiki into Chroma Cloud, then gives you two ways to learn from it: ask questions (teach mode) and get quizzed (test mode). It uses your own knowledge base as the source of truth — not the internet, not a generic model.

The Problem

I was consuming a lot — podcasts, articles, wiki notes — but never testing myself. Passive consumption with zero active recall. The wiki was growing but I had no way to verify what I actually understood.

LearnMate solves this by turning a static knowledge base into a conversational tutor.

How It Works

Obsidian wiki (.md files)
        ↓
   ingest pipeline
   (parse → embed)
        ↓
   Chroma Cloud
   (hybrid index)
        ↓
 query at runtime
   (dense + sparse
      via RRF)
        ↓
  GPT-4o generates
  answer / question

Hybrid search combines two retrieval methods via Reciprocal Rank Fusion:

Dense (Qwen3 embeddings) — semantic similarity
Sparse (Splade) — keyword matching

This gives better retrieval than cosine-only search, especially for technical terms.

Two Modes

Teach Mode

Ask anything from your wiki. LearnMate answers with source citations and suggests follow-up questions to push you deeper.

You: RAG ile long context arasındaki fark ne?

LearnMate: RAG ve long context arasındaki temel fark...
           [cevap]

Daha derine:
1. RAG pipeline'ında retrieval kalitesi nasıl ölçülür?
2. Long context modellerinin sınırı nerede?

Sources: IBM - Is RAG Still Needed, Divy Yadav - 9 RAG Architectures

Test Mode

LearnMate generates a scenario-based quiz question from your wiki. You answer. It evaluates.

Questions are always scenario-based ("Bir e-ticaret sitesi için...") — never flat definitions. Evaluation is concept-focused, not keyword-matching.

Soru: Bir RAG sisteminde retrieval sonuçları kötüyse, sorun
      chunking'de mi, embedding'de mi, yoksa query'de mi?
      Nasıl ayırt edersin?

[cevap]

Score: Kısmi ~
Değerlendirme: Chunking ve embedding'i doğru tespit ettin...
               [detaylı geri bildirim]
Doğru cevap: ...

Tech Stack

Layer	Choice	Why
LLM	GPT-4o	Generation quality
Embeddings	Chroma Cloud Qwen3 + Splade	No per-token cost, hybrid search
Vector DB	Chroma Cloud	Managed, hybrid search built-in
Retrieval	RRF (dense 0.7 + sparse 0.3)	Better than cosine-only
Framework	LangChain (chat only)	Conversation memory, message formatting
Web UI	FastAPI + vanilla HTML	Zero build step, SSE streaming
Knowledge source	Obsidian markdown	Personal wiki, structured with frontmatter

Project Structure

learnmate/
├── src/
│   ├── ingest/
│   │   ├── parser.py       # Markdown parsing, frontmatter extraction
│   │   ├── embedder.py     # Chroma Cloud client, Qwen+Splade schema
│   │   ├── pipeline.py     # Incremental sync (content hash)
│   │   └── migrate.py      # One-time migration from local ChromaDB
│   ├── chat/
│   │   ├── retriever.py    # Hybrid search, TeachSession, streaming
│   │   └── quiz.py         # Question generation, answer evaluation
│   ├── api/
│   │   ├── app.py          # FastAPI endpoints (teach, test, reset)
│   │   └── static/
│   │       └── index.html  # Single-page UI
│   └── utils/
│       └── config.py       # Env vars
├── decisions.md            # Architecture decisions log
├── .env.example
└── requirements.txt

Key Decisions

Full log in decisions.md. Short version:

Chroma Cloud over local ChromaDB — OpenAI embedding costs add up on every ingest run. Chroma Cloud's Qwen + Splade are included in Chroma credits. Hybrid search is a bonus.

1 chunk per wiki file — Wiki files are 800–4000 chars. Splitting would break context. Full-document retrieval works better at this scale.

LangChain only for chat — Ingest uses ChromaDB's native client directly. LangChain adds value in the conversation layer (message history, ChatOpenAI) but would be overhead in the pipeline.

FastAPI + vanilla HTML — No build step. SSE streaming works natively. Tailwind CDN handles styling. For a personal tool, this is enough.

Scenario-based quiz questions — Flat definition questions ("RAG nedir?") test memorization, not understanding. Forcing scenario framing ("Bir startup'ta...") makes questions harder and more useful.

Setup

1. Clone and install

git clone https://github.com/tmy-datamesa/learnmate
cd learnmate
pip install -r requirements.txt

2. Environment variables

cp .env.example .env

Edit .env:

OPENAI_API_KEY=...
WIKI_PATH=/path/to/your/obsidian/wiki
CHROMA_API_KEY=...        # from trychroma.com
CHROMA_TENANT=...
CHROMA_DATABASE=...

3. Embed your wiki

python -m src.ingest.pipeline

Incremental — only new or changed files are re-embedded.

4. Run

Web UI:

uvicorn src.api.app:app --reload
# → http://localhost:8000

CLI:

python -m src.chat.cli

Wiki Structure

LearnMate expects Obsidian markdown with frontmatter:

---
type: concept        # concept | source | entity
subfolder: concepts
---

# RAG

...

The type field controls how documents are used:

concept / source — used in both teach and test context
entity — used in teach context only (skipped in quiz generation)

Only documents from subfolder: sources appear in the Sources citation list.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
architecture.md		architecture.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LearnMate

The Problem

How It Works

Two Modes

Teach Mode

Test Mode

Tech Stack

Project Structure

Key Decisions

Setup

1. Clone and install

2. Environment variables

3. Embed your wiki

4. Run

Wiki Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LearnMate

The Problem

How It Works

Two Modes

Teach Mode

Test Mode

Tech Stack

Project Structure

Key Decisions

Setup

1. Clone and install

2. Environment variables

3. Embed your wiki

4. Run

Wiki Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages