OpenRouter-first assessment core and local speaking coach app.
Internal repo and module names still use assess_speaking in places for compatibility.
Pipeline: Transcription (faster-whisper) -> Deterministic metrics -> schema-validated CEFR-style rubric via OpenRouter (default), Ollama (local), or a generic OpenAI-compatible endpoint.
Current CLI/runtime contract:
- OpenRouter as the default remote scoring path.
- Local Ollama support through
--provider ollama --llm-model .... - Structured nested
reportoutput with validated:inputmetricschecksscoresrubricrequires_human_review
- Goal-oriented gates for:
- language match
- topic relevance
- speaking duration
- minimum word count
brew install ffmpegOptional local LLM mode:
brew install ollama
ollama pull llama3.1
ollama listRemote LLM mode (default):
export OPENROUTER_API_KEY="..."
export OPENROUTER_MODEL="google/gemini-3.1-pro-preview"./scripts/setup_env.sh # prefers python3.12 → python3.11 → python3
source .venv/bin/activateYou can pass a custom target directory or interpreter, e.g.
PYTHON_BIN=/path/to/python3.12 ./scripts/setup_env.sh. The script installs all
requirements inside .venv, leaving the global Python untouched. PyPI provides
macOS wheels for av, ctranslate2, onnxruntime, praat-parselmouth, and
rapidfuzz on Python 3.13 (verified on macOS 15/Sequoia, Oct 2025).
If you do not want to activate the venv manually, use the repo-local launcher:
./scripts/python.sh ...
Recommended first path:
- Install
ffmpeg. - Optionally install
ollamaand pull a local model such asllama3.1. - Launch the app with
./scripts/run_app.py. - On the Home screen, choose
Set up local AI. - In
Runtime Setup, keep the default local path unless you specifically need advanced provider options. - Continue with
Session Setup -> Speak -> Review -> History.
The app now starts a localhost-only backend automatically and keeps user data under the app-data root instead of depending on the repo working directory.
./scripts/generate_sample.sh --all
./scripts/generate_sample.sh --language it --cefr B2Shipped sample responses now live under samples/cefr/{it,en}/{B1,B2,C1}/:
samples/cefr/it/B1/travel_story.wavsamples/cefr/it/B2/remote_work.wavsamples/cefr/it/C1/public_debate.wavsamples/cefr/en/B1/travel_story.wavsamples/cefr/en/B2/remote_work.wavsamples/cefr/en/C1/public_debate.wav
Optional flags -v/--voice, -t/--text, -o/--output, --language, and --cefr, e.g.
./scripts/generate_sample.sh --language en --cefr C1 --output /tmp/c1-public-debate.wav
./scripts/generate_sample.sh --voice "Alice" --text "Questo e un test." --output /tmp/test.wavpython assess_speaking.py --list-ollama
python assess_speaking.py --selftest --provider openrouter --llm-model google/gemini-3.1-pro-preview
python assess_speaking.py --selftest --provider ollama --llm-model llama3.1python assess_speaking.py sample.wav \
--provider openrouter \
--llm-model google/gemini-3.1-pro-preview \
--theme "la mia città" \
--target-duration-sec 120 \
--llm-timeout 30 > report.jsonLocal Ollama mode:
python assess_speaking.py sample.wav --provider ollama --llm-model llama3.1 > report.json
cat report.jsonEvery run is also stored in reports/ (structured JSON + history.csv). Use
--label "B1-test" or --notes "Morning session" to tag a run. With
--log-dir path/to/reports you control the destination, --no-log disables the
persistence layer.
If an existing history.csv was created with an older header, delete or replace
it before appending new runs. The CLI no longer rewrites legacy history files.
Top-level CLI output remains backward-compatible for existing scripts:
metricstranscript_previewllm_rubric- optional
baseline_comparison - optional
suggested_training
New code should read the nested report object. It contains the validated
assessment contract, including checks, scores, rubric,
requires_human_review, and progress_delta when an earlier run exists for
the same speaker and task family.
python scripts/progress_dashboard.py --log-dir reports
python scripts/progress_dashboard.py --log-dir reports --export-html reports/dashboard.html
python scripts/progress_dashboard.py --log-dir reports --speaker-id bern --task-family travel_narrative
open reports/dashboard.html # macOS previewThe CLI dashboard renders the history table (via rich) and can export an HTML
snapshot. It also supports speaker and task-family filters so progress on
travel_narrative is not mixed with unrelated speaking tasks.
The primary product-facing UI is now the multipage app shell. The preferred launcher is the local bootstrap wrapper:
./scripts/python.sh scripts/run_app.py
./scripts/python.sh scripts/run_app.py --checkstreamlit run streamlit_app.py remains the stable entrypoint, but the launcher
is better for local product use and later desktop packaging because it resolves
stable app-data paths, bootstraps cache defaults, and can be run from outside
the repo root.
The learner-facing flow is now optimized for a single local user:
HomeRuntime Setuponly when no active connection is configuredSession SetupSpeakReviewHistory
Library, Settings, and Scoring Guide stay available as secondary screens.
The Home screen now also renders startup diagnostics for app-data writability,
ffmpeg, Whisper cache readiness, runtime configuration, and recorder readiness.
The app shell stores its local data outside the repo by default:
- macOS app data:
~/Library/Application Support/Vostavo - macOS cache:
~/Library/Caches/Vostavo - Windows app data and cache now resolve through
platformdirsusing the app identityfrommherz_it+Vostavo - Ubuntu/Linux app data:
~/.local/share/Vostavoby default, or$XDG_DATA_HOME/Vostavowhen XDG overrides are set - Ubuntu/Linux cache:
~/.cache/Vostavoby default, or$XDG_CACHE_HOME/Vostavowhen XDG overrides are set - Override roots with
VOSTAVO_HOMEandVOSTAVO_CACHE_HOME - Legacy roots and env vars from
Speaking Studioremain supported for existing installs - Repo-local app data is no longer used as a default on any platform. It remains available only through explicit developer overrides such as
VOSTAVO_HOME,SPEAKING_STUDIO_HOME,--app-data-dir, or--cache-dir.
The current app-data layout keeps one per-user root and separates user outputs from backend state inside it:
- app-data root:
backend_state.jsonjobs/logs/reports/recordings/uploads/tmp/
- cache root:
whisper/huggingface/
Legacy reports/jobs/ directories are migrated into the root-level jobs/
directory on backend startup when the new location is still empty.
The wrapper writes reports/history under the app-data root by default, so the UI keeps working even when launched from an arbitrary working directory.
- Unit tests:
./scripts/run_tests.sh - Source coverage:
./scripts/run_coverage.sh - Full coverage (including tests):
./scripts/run_coverage.sh --full - The test and coverage wrappers always use the repo-local
.venvvia./scripts/python.sh, so they stay consistent even when a globalpytestorcoverageinstallation points at a different Python. - Coverage outputs:
- source mode:
coverage.json+htmlcov/ - full mode:
coverage.full.json+htmlcov-full/
- source mode:
- OpenRouter integration (opt-in):
RUN_OPENROUTER_INTEGRATION=1 ./scripts/python.sh -m unittest tests.test_integration_openrouter -v - Optional sample-audio integration test (no microphone required):
RUN_AUDIO_INTEGRATION=1 WHISPER_MODEL=tiny ./scripts/python.sh -m unittest tests.test_sample_integration - Self-hosted real-ASR lane:
.github/workflows/real-asr-selfhosted.ymlruns the sample-audio integration on a self-hosted Apple Silicon runner with labelsself-hosted,macOS,ARM64,icosa-apple-ci,assess-speaking. It warms thefaster-whispermodel cache first so the runner keeps a persistent local model between jobs. The runner still needs either Hugging Face access on first use or a preloaded Whisper model in its local cache. The workflow is manual (workflow_dispatch) by design so the real-ASR lane stays opt-in and does not slow down or destabilize the default hosted PR checks. Each run uploads an artifact bundle with the sample integration log, CLI output, saved report JSON/history, and a cache/runner metadata snapshot. - End-to-end tests (Playwright + pytest):
./scripts/run_e2e.sh- Traces, videos, and screenshots are saved automatically on failure in
test-results/andplaywright-report/(see Playwright Test and pytest-playwright). - The wrapper always uses the repo-local virtualenv and the Playwright-only
pytest config, so plain
pytestno longer depends on Playwright plugins being installed globally.
- Traces, videos, and screenshots are saved automatically on failure in
- Interactive research browser (Playwright CLI + dedicated Chrome profile):
use
./scripts/playwright_research.sh open 'https://example.com'for a stable, Playwright-owned Chrome profile under.playwright/profiles/research. Reuse it with./scripts/playwright_research.sh snapshot,click,type, andrun-code. For CELI specifically,./scripts/playwright_celi.sh open 'https://apps.unistrapg.it/cqpweb/celi/'uses a separate dedicated profile under.playwright/profiles/celiso corpus logins do not mix with general research state. Quote URLs that contain?, and run commands sequentially (open, thensnapshot, thenclick, etc.) rather than in parallel so the session has time to settle after navigation. To fully reset a profile, close the browser session and remove the matching directory under.playwright/profiles/. - CELI harvesting CLI: after logging into CELI once with
./scripts/playwright_celi.sh, use./scripts/python.sh scripts/harvest_celi_queries.py matrix --terms casa,scuola,lavoro --levels B1,B2,C1,C2 --output tmp/celi_harvest/query_matrix.jsonfor query matrices,./scripts/python.sh scripts/harvest_celi_queries.py frequency --term casafor the frequency-breakdown page, and./scripts/python.sh scripts/harvest_celi_queries.py export --term casa --level C2for a metadata-rich concordance export. These commands reuse the dedicated Playwright CELI profile and write snapshots/downloads undertmp/celi_harvestplusoutput/playwright/celi/. For the checked-in Italian benchmark wordlist, run./scripts/python.sh scripts/harvest_celi_queries.py manifest --manifest tests/fixtures/celi_wordlists/italian_core_benchmark_v1.json --output-dir tmp/celi_harvestto produce a stable bundle withbundle.json,query_matrix.tsv, andfrequency_breakdowns.tsv. Then rank terms by CEFR skew with./scripts/python.sh scripts/harvest_celi_queries.py analyze --bundle tmp/celi_harvest/italian_celi_core_benchmark_v1/bundle.json, which writesskew_analysis.jsonandskew_ranking.tsv. - LIPS spoken-corpus pipeline: build the phase-1 included/excluded artifacts with
./scripts/python.sh scripts/build_lips_manifest.py '/tmp/Corpus LIPS/Corpus LIPS' --output-dir tmp/lips_manifest_realand validate the resulting JSONL bundle with./scripts/python.sh scripts/validate_lips_manifest.py tmp/lips_manifest_real. The build writeslips_sections_included.jsonl,lips_sections_excluded.jsonl,lips_build_report.json, andlips_review_sample.jsonl. Strict validation is designed to block sign-off until a completed manual review file is supplied. - LIPS review support: generate a fresh included/excluded review packet with
./scripts/python.sh scripts/review_lips_manifest.py prepare tmp/lips_manifest_real --included-sample-size 20 --excluded-sample-size 20and summarize completed review files with./scripts/python.sh scripts/review_lips_manifest.py summarize --included-review tmp/lips_manifest_real/lips_review_sample.jsonl --excluded-review tmp/lips_manifest_real/lips_excluded_audit_sample.jsonl. This keeps the review loop low-fi and file-based: JSONL in, JSON summary out. - GitHub Actions workflow (
.github/workflows/ci.yml) runs both suites and installs the Chromium browser viaplaywright install --with-deps chromium.
- If Whisper model download fails behind a SOCKS proxy with an error mentioning
socksio, reinstall dependencies fromrequirements.txtor runpython -m pip install socksio. - If Whisper cannot download models because the proxy or network blocks Hugging Face access, rerun once network access is available or pre-download the requested faster-whisper model locally.
- The sample-audio integration test is intentionally opt-in and may skip when ASR runtime prerequisites or model downloads are unavailable.
- Default provider is OpenRouter.
- Use
--llm-timeoutorLLM_TIMEOUT_SECto bound remote rubric requests. - Ollama remains available as the local provider option.
- Other local options:
llama3.2:3b(fast),qwen2.5:14b(stronger); pick according to RAM and speed requirements. - Objective metrics include WPM, pauses (≥300 ms), filler count, cohesion markers, and a heuristic complexity index (relative clauses / conditionals).
- If the rubric path degrades or the detected language does not match the
expected language, the structured report is marked with
requires_human_review: true.
MIT
Optional can now upload the generated report to a Learning Management System such as Canvas or Moodle. Pass the following flags to provide credentials and context:
| Flag | Description |
|---|---|
--lms-type |
canvas or moodle – provider name |
--lms-url |
Base URL of the LMS instance (e.g. https://canvas.example.edu) |
--lms-token |
Bearer/secret token for API access (optional when CANVAS_TOKEN or MOODLE_TOKEN is set) |
--lms-course-id |
Canvas course ID (required for --lms-type canvas) |
--lms-assign-id |
Assignment ID where the report should be posted |
--lms-score |
Optional numeric score to include in the submission |
--lms-dry-run |
Print the LMS request preview without uploading |
Example usage:
python assess_speaking.py sample.wav \
--lms-type canvas \
--lms-url https://canvas.example.edu \
--lms-token $CANVAS_TOKEN \
--lms-course-id 99 \
--lms-assign-id 42 \
--lms-score 75Or use the provider token from the environment and validate the payload first:
export CANVAS_TOKEN=...
python assess_speaking.py sample.wav \
--lms-type canvas \
--lms-url https://canvas.example.edu \
--lms-course-id 99 \
--lms-assign-id 42 \
--lms-score 75 \
--lms-dry-run