Readability Navigator is a personalized text recommendation project that suggests the next best document based on user interests and reading difficulty.
The system combines:
- symbolic readability metrics (Flesch Reading Ease)
- semantic embeddings (SBERT, 384 dimensions)
- iterative user-profile updates driven by feedback
The goal is not to simplify text automatically, but to select the most suitable next text for each user.
Recommendations balance:
- semantic relevance to user interests
- distance from the user readability target
Pipeline:
- Load engineered features and document embeddings.
- Load or create a user profile (topic_vector, target_readability, history).
- Build a candidate catalog:
- remove already read documents
- keep documents within readability tolerance
- Compute hybrid score:
The readability gap is dynamically penalized when a text is above the user target.
- Rank documents and return Top-K.
- Collect difficulty feedback (1-5) and update:
- reading history
- topic vector
- target readability
- app/: Streamlit dashboard and presentation pages
- src/recommender/: ranking and recommendation engine
- src/user/: user profile creation and update logic
- src/features/: preprocessing and embeddings
- src/eval/: offline evaluation (NDCG)
- utils/: loading and I/O utilities
- conf/project.yaml: core parameters and paths
- data/: processed datasets and user JSON profiles
Prerequisites:
- Python 3.10+
- pip
Install dependencies:
pip install --upgrade pip
pip install -r requirements.txtDownload required NLTK resource:
python -c "import nltk; nltk.download('punkt')"Run the Streamlit app from project root:
streamlit run app/App.pyThe requirements file includes both runtime and testing dependencies.
Recommended validation workflow:
- Smoke check on processed data:
python src/test/test.py- Offline recommender evaluation:
python src/eval/evaluation.py- Run unit/integration test suite (when tests are added/extended):
pytest -qMinimal example using main.py:
from main import main
user = {
"user_id": 1,
"target_readability": 60,
"topic_vector": [0.0] * 384,
"history": []
}
ranked_df = main(user)
print(ranked_df.head())Primary dataset used in this repository: OneStopEnglish (processed version).
Expected local assets:
- data/interim/onestop_texts.csv
- data/processed/onestop_nltk_features.csv
- src/features/doc_embedding.pickle
- Main model parameters are in conf/project.yaml.
- User profiles are saved in data/user/json_file/.
- Some scripts in src/ingest and src/features are intended for experimentation in addition to app runtime.
Francesco Lazzarotto