Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
13ac00b
refactor versa with oo update -> a major update
ftshijt Jun 16, 2025
58eb805
Merge branch 'main' into refactor
ftshijt Jun 19, 2025
8d2f6f0
Merge branch 'main' into refactor
ftshijt Jun 19, 2025
8796810
add asvspoof.py
ftshijt Jun 30, 2025
53c3e0a
Merge branch 'refactor' of https://github.com/ftshijt/versa into refa…
ftshijt Jun 30, 2025
7e95ee8
update discrets speech / chroma_alignment
ftshijt Jun 30, 2025
b7b9dd4
update test function and versa with black and emo_vad
ftshijt Jun 30, 2025
dcc1822
Merge branch 'main' into refactor
ftshijt Jun 30, 2025
8ff163a
update emo_similarity
ftshijt Jun 30, 2025
78894ee
fix metric list and set setup.py
ftshijt Jun 30, 2025
e5e10bf
fix setup.py
ftshijt Jun 30, 2025
892a13b
fix scorer shared for all cases
ftshijt Jul 5, 2025
ce9f828
update code multiple new metrics
ftshijt Jul 5, 2025
e95cd4b
fix versa/test for test functions
ftshijt Jul 5, 2025
f4799fd
add pam fixed
ftshijt Jul 5, 2025
03ccbda
add pesq
ftshijt Jul 5, 2025
20e155d
Migrate base metrics to OO interface
ftshijt Apr 29, 2026
b6f50f1
Migrate VAD metric to OO interface
ftshijt Apr 29, 2026
fcdd9af
Migrate additional utterance metrics
ftshijt Apr 29, 2026
61cc53f
Fix metric migration real setup
ftshijt Apr 29, 2026
e7d494b
Merge pull request #1 from wavlab-speech/codex/pr-37-refactor
ftshijt Apr 29, 2026
4e6913a
Restore legacy metric support
ftshijt Apr 30, 2026
1454f78
Restore legacy scorer compatibility
ftshijt May 5, 2026
404fc77
Use local cache for ESPnet metrics
ftshijt May 5, 2026
bc32cbb
Fix legacy metric setup paths
ftshijt May 5, 2026
7a1ee6f
Route Hugging Face metric caches locally
ftshijt May 5, 2026
feea48d
Fix legacy metric installers and pipeline baselines
ftshijt May 5, 2026
2a7d7ad
Clean up metric cache installers
ftshijt May 5, 2026
200bb4a
Clean up singer identity cache installer
ftshijt May 5, 2026
22410f4
Merge pull request #2 from wavlab-speech/codex/pr-37-refactor
ftshijt May 5, 2026
19f270a
Merge main metric additions into refactor interface
ftshijt May 5, 2026
76b9da9
Avoid WVMOS import-time downloads
ftshijt May 5, 2026
b788c1e
Fix PR 37 CI failures
ftshijt May 5, 2026
f2f4c81
Merge upstream main into refactor
ftshijt May 6, 2026
bcec446
Make README example commands runnable
ftshijt May 6, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -169,4 +169,14 @@ fadtk/
scoreq/
fairseq/
UTMOSv2/

# Versa optional metric installer output and model caches
versa_cache/
tools/NISQA/
tools/Noresqa/
tools/SRMRpy/
tools/audiobox-aesthetics/
tools/emotion2vec/
ssl-singer-identity/
pretrained_models/
wvmos/
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,10 +55,10 @@ For metrics marked without "x" in the "Auto-Install" column of our metrics table

```bash
# Test core functionality
python versa/test/test_pipeline/test_general.py
python -m pytest test/test_general.py

# Test specific metrics that require additional installation
python versa/test/test_pipeline/test_{metric}.py
python -m pytest test/test_metrics/test_{metric}.py
```


Expand All @@ -69,31 +69,31 @@ python versa/test/test_pipeline/test_{metric}.py
```bash
# Direct usage with file paths
python versa/bin/scorer.py \
--score_config egs/speech.yaml \
--score_config egs/speech_cpu.yaml \
--gt test/test_samples/test1 \
--pred test/test_samples/test2 \
--output_file test_result \
--io dir

# With SCP-style input
python versa/bin/scorer.py \
--score_config egs/speech.yaml \
--score_config egs/speech_cpu.yaml \
--gt test/test_samples/test1.scp \
--pred test/test_samples/test2.scp \
--output_file test_result \
--io soundfile

# With Kaldi-ARK style input (compatible with ESPnet)
python versa/bin/scorer.py \
--score_config egs/speech.yaml \
--score_config egs/speech_cpu.yaml \
--gt test/test_samples/test1.scp \
--pred test/test_samples/test2.scp \
--output_file test_result \
--io kaldi

# Including text transcription information
python versa/bin/scorer.py \
--score_config egs/separate_metrics/wer.yaml \
--score_config egs/separate_metrics/wer_tiny.yaml \
--gt test/test_samples/test1.scp \
--pred test/test_samples/test2.scp \
--output_file test_result \
Expand Down
151 changes: 151 additions & 0 deletions docs/metric_migration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# Metric Migration Guide

This guide summarizes the preferred process for migrating existing Versa metrics
to the new object-oriented metric interface.

## Migration Goal

Use `versa.definition.BaseMetric` as the source of truth for metric
implementations. Preserve user-facing behavior, but do not preserve legacy
internal helper APIs unless they are still needed by public callers.

Preserve:

- YAML metric names
- CLI/scorer behavior
- output score keys
- documented config defaults
- optional dependency behavior

Clean up:

- old function-style metric internals
- duplicated setup code
- eager optional dependency imports
- tests that only exercise legacy helper functions

## Required Metric Shape

Each migrated metric should provide:

- a `BaseMetric` subclass
- `_setup(self)` for config defaults, dependency checks, and model setup
- `compute(self, predictions, references=None, metadata=None)` for scoring
- `get_metadata(self)` returning `MetricMetadata`
- `register_<metric>_metric(registry)` as the registry integration point

`compute` should:

- validate required inputs
- read sample rate from `metadata.get("sample_rate", 16000)` when needed
- return the same output keys users already receive
- avoid changing user-visible numeric conventions unless the migration requires it

## Metadata Checklist

Every metric registration should define:

- canonical metric name
- `MetricCategory`: `INDEPENDENT`, `DEPENDENT`, `NON_MATCH`, or `DISTRIBUTIONAL`
- `MetricType`: usually `FLOAT` for one score or `DICT` for grouped scores
- `requires_reference`
- `requires_text`
- `gpu_compatible`
- `auto_install`
- dependency import names
- short description
- paper reference and implementation source when known
- useful aliases for existing YAML or common names

## Optional Dependencies

Optional dependencies must not break `import versa`.

Use guarded imports inside metric modules, and raise a clear `ImportError` from
`_setup` when a required optional package is missing. Register optional metrics
from `versa/__init__.py` through `_optional_metric_import(...)`.

## Tests

Prefer tests for the new public path:

- metric class behavior
- registry registration and aliases
- `VersaScorer` pipeline behavior with existing sample audio when lightweight
- missing optional dependency behavior
- unchanged user-facing output keys

Do not add tests solely to preserve old internal helper APIs unless those APIs
remain part of the public interface.

Base-install focused tests currently live in:

- `test/test_metrics/test_base_metrics.py`
- `test/test_pipeline/test_base_metrics_pipeline.py`

## Migration Candidates

The following modules still appear to use the old interface because they do not
define or import `BaseMetric`. This list is based on a repository scan and should
be updated as each metric is migrated.

### Corpus and Distributional Metrics

- `versa/corpus_metrics/fad.py`
- `versa/corpus_metrics/individual_fad.py`
- `versa/corpus_metrics/kid.py`
- `versa/corpus_metrics/clap_score.py`

### Already Migrated Examples

Use these as local references when migrating the remaining metrics:

- `versa/sequence_metrics/mcd_f0.py`
- `versa/sequence_metrics/signal_metric.py`
- `versa/sequence_metrics/warpq.py`
- `versa/corpus_metrics/espnet_wer.py`
- `versa/corpus_metrics/owsm_wer.py`
- `versa/corpus_metrics/whisper_wer.py`
- `versa/utterance_metrics/log_wmse.py`
- `versa/utterance_metrics/pseudo_mos.py`
- `versa/utterance_metrics/qwen2_audio.py`
- `versa/utterance_metrics/qwen_omni.py`
- `versa/utterance_metrics/speaking_rate.py`
- `versa/utterance_metrics/scoreq.py`
- `versa/utterance_metrics/se_snr.py`
- `versa/utterance_metrics/sheet_ssqa.py`
- `versa/utterance_metrics/singer.py`
- `versa/utterance_metrics/speaker.py`
- `versa/utterance_metrics/stoi.py`
- `versa/utterance_metrics/pesq_score.py`
- `versa/utterance_metrics/squim.py`
- `versa/utterance_metrics/universa.py`
- `versa/utterance_metrics/vad.py`
- `versa/utterance_metrics/visqol_score.py`
- `versa/utterance_metrics/vqscore.py`

## Verification

Run focused checks before broader validation:

```bash
/opt/homebrew/bin/mamba run -n versa-dev python -m pytest <focused tests> -q
/opt/homebrew/bin/mamba run -n versa-dev python -m black --check <touched files>
/opt/homebrew/bin/mamba run -n versa-dev python -m flake8 <touched files>
```

The base migration tests use mocks for heavy model-backed metrics. They validate
registry integration, pipeline wiring, input handling, and output keys, but do
not prove checkpoint download or real inference.

Run optional real model checks locally after installing the metric dependencies:

```bash
tools/install_scoreq.sh
VERSA_RUN_REAL_MODEL_TESTS=1 \
/opt/homebrew/bin/mamba run -n versa-dev python -m pytest \
test/test_pipeline/test_scoreq.py -q -s
```

These tests are marked `real_model` and are skipped unless
`VERSA_RUN_REAL_MODEL_TESTS=1` is set.
27 changes: 12 additions & 15 deletions docs/supported_metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ We include x mark if the metric is auto-installed in versa.
| 6 | x | PESQ in TorchAudio-Squim | squim_no_ref | torch_squim_pesq | [torch_squim](https://pytorch.org/audio/main/tutorials/squim_tutorial.html) | [paper](https://arxiv.org/abs/2304.01448) |
| 7 | x | STOI in TorchAudio-Squim | squim_no_ref | torch_squim_stoi | [torch_squim](https://pytorch.org/audio/main/tutorials/squim_tutorial.html) | [paper](https://arxiv.org/abs/2304.01448) |
| 8 | x | SI-SDR in TorchAudio-Squim | squim_no_ref | torch_squim_si_sdr | [torch_squim](https://pytorch.org/audio/main/tutorials/squim_tutorial.html) | [paper](https://arxiv.org/abs/2304.01448) |
| 9 | x | Singing voice MOS | pseudo_mos | singmos_v1 |[singmos](https://github.com/South-Twilight/SingMOS) | [paper](https://arxiv.org/abs/2406.10911) |
| 9 | x | Singing voice MOS | singmos | singmos |[singmos](https://github.com/South-Twilight/SingMOS/tree/main) | [paper](https://arxiv.org/abs/2406.10911) |
| 10 | x | Sheet SSQA MOS Models | sheet_ssqa | sheet_ssqa |[Sheet](https://github.com/unilight/sheet/tree/main) | [paper](https://arxiv.org/abs/2411.03715) |
| 11 | | UTMOSv2: UTokyo-SaruLab MOS Prediction System | utmosv2 | utmosv2 |[UTMOSv2](https://github.com/sarulab-speech/UTMOSv2) | [paper](https://arxiv.org/abs/2409.09305) |
| 12 | | Speech Contrastive Regression for Quality Assessment without reference (ScoreQ) | scoreq_nr | scoreq_nr |[ScoreQ](https://github.com/ftshijt/scoreq/tree/main) | [paper](https://arxiv.org/pdf/2410.06675) |
Expand Down Expand Up @@ -50,9 +50,9 @@ We include x mark if the metric is auto-installed in versa.
| 43 | x | Qwen2 Recording Environment - Background | qwen2_speech_background_environment_metric | qwen2_speech_background_environment_metric | [Qwen2 Audio](https://github.com/QwenLM/Qwen2-Audio) | [paper](https://arxiv.org/abs/2407.10759) |
| 44 | x | Qwen2 Recording Environment - Quality | qwen2_recording_quality_metric | qwen2_recording_quality_metric | [Qwen2 Audio](https://github.com/QwenLM/Qwen2-Audio) | [paper](https://arxiv.org/abs/2407.10759) |
| 45 | x | Qwen2 Recording Environment - Channel Type | qwen2_channel_type_metric | qwen2_channel_type_metric | [Qwen2 Audio](https://github.com/QwenLM/Qwen2-Audio) | [paper](https://arxiv.org/abs/2407.10759) |
| 46 | x | Dimensional Emotion | w2v2_dimensional_emotion | w2v2_dimensional_emotion | [w2v2-how-to](https://github.com/audeering/w2v2-how-to) | [paper](https://arxiv.org/pdf/2203.07378) |
| 47 | | Uni-VERSA (Versatile Speech Assessment with a Unified Framework) - No Reference | universa_noref | universa_score | [Uni-VERSA](https://huggingface.co/collections/espnet/universa-6834e7c0a28225bffb6e2526) | [paper](https://arxiv.org/abs/2505.20741) |
| 48 | | ARECHO (Audio Reference Echo Cancellation and Codec Quality Assessment) - No Reference | arecho_noref | arecho_score | [ARECHO](https://huggingface.co/espnet/arecho_base_v0) | [paper](https://arxiv.org/abs/2505.20741) |
| 46 | x | Dimensional Emotion | emo_vad | arousal_emo_vad, valence_emo_vad, dominance_emo_vad | [w2v2-how-to](https://github.com/audeering/w2v2-how-to) | [paper](https://arxiv.org/pdf/2203.07378) |
| 47 | x | Uni-VERSA (Versatile Speech Assessment with a Unified Framework) | universa, universa_noref, universa_audioref, universa_textref, universa_fullref | universa_{sub_metrics} | [Uni-VERSA](https://huggingface.co/collections/espnet/universa-6834e7c0a28225bffb6e2526) | [paper](https://arxiv.org/abs/2505.20741) |
| 48 | | ARECHO (Audio Reference Echo Cancellation and Codec Quality Assessment) - No Reference | arecho, arecho_noref | arecho_{sub_metrics} | [ARECHO](https://huggingface.co/espnet/arecho_base_v0) | [paper](https://arxiv.org/abs/2505.20741) |
| 49 | x | DNSMOS Pro: A Reduced-Size DNN for Probabilistic MOS of Speech | pseudo_mos | dnsmos_pro_bvcc | [DNSMOSPro](https://github.com/fcumlin/DNSMOSPro/tree/main) | [paper](https://www.isca-archive.org/interspeech_2024/cumlin24_interspeech.html) |
| 50 | x | DNSMOS Pro: A Reduced-Size DNN for Probabilistic MOS of Speech | pseudo_mos | dnsmos_pro_nisqa | [DNSMOSPro](https://github.com/fcumlin/DNSMOSPro/tree/main) | [paper](https://www.isca-archive.org/interspeech_2024/cumlin24_interspeech.html) |
| 51 | x | DNSMOS Pro: A Reduced-Size DNN for Probabilistic MOS of Speech | pseudo_mos | dnsmos_pro_vcc2018 | [DNSMOSPro](https://github.com/fcumlin/DNSMOSPro/tree/main) | [paper](https://www.isca-archive.org/interspeech_2024/cumlin24_interspeech.html) |
Expand All @@ -61,6 +61,7 @@ We include x mark if the metric is auto-installed in versa.
| 54 | x | VQScore (Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech) | vqscore | vqscore | [VQScore](https://github.com/JasonSWFu/VQscore) | [paper](https://arxiv.org/abs/2402.16321) |
| 55 | x | Singing voice MOS | pseudo_mos | singmos_pro |[singmos](https://github.com/South-Twilight/SingMOS) | [paper](https://arxiv.org/abs/2510.01812) |


### Dependent Metrics
|Number| Auto-Install | Metric Name (Auto-Install) | Key in config | Key in report | Code Source | References |
|---|---|------------------|---------------|---------------|-----------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|
Expand All @@ -70,7 +71,7 @@ We include x mark if the metric is auto-installed in versa.
| 4 | x | Signal-to-interference Ratio (SIR) | signal_metric | sir | [espnet](https://github.com/espnet/espnet) | - |
| 5 | x | Signal-to-artifact Ratio (SAR) | signal_metric | sar | [espnet](https://github.com/espnet/espnet) | - |
| 6 | x | Signal-to-distortion Ratio (SDR) | signal_metric | sdr | [espnet](https://github.com/espnet/espnet) | - |
| 7 | x | Convolutional scale-invariant signal-to-distortion ratio (CI-SDR) | signal_metric | ci-sdr | [ci_sdr](https://github.com/fgnt/ci_sdr) | [paper](https://arxiv.(org/abs/2011.15003) |
| 7 | x | Convolutional scale-invariant signal-to-distortion ratio (CI-SDR) | signal_metric | ci-sdr | [ci_sdr](https://github.com/fgnt/ci_sdr) | [paper](https://arxiv.org/abs/2011.15003) |
| 8 | x | Scale-invariant signal-to-noise ratio (SI-SNR) | signal_metric | si-snr | [espnet](https://github.com/espnet/espnet) | [paper](https://arxiv.org/abs/1711.00541) |
| 9 | x | Perceptual Evaluation of Speech Quality (PESQ) | pesq | pesq | [pesq](https://pypi.org/project/pesq/) | [paper](https://ieeexplore.ieee.org/document/941023) |
| 10 | x | Short-Time Objective Intelligibility (STOI) | stoi | stoi | [pystoi](https://github.com/mpariente/pystoi) | [paper](https://ieeexplore.ieee.org/document/5495701) |
Expand All @@ -89,11 +90,10 @@ We include x mark if the metric is auto-installed in versa.
| 23 | | Composite Objective Speech Quality (composite) | pysepm | pysepm_Csig, pysepm_Cbak, pysepm_Covl | [pysepm](https://github.com/shimhz/pysepm.git) | [Paper](https://ecs.utdallas.edu/loizou/speech/obj_paper_jan08.pdf)|
| 24 | | Coherence and speech intelligibility index (CSII) | pysepm | pysepm_csii_high, pysepm_csii_mid, pysepm_csii_low | [pysepm](https://github.com/shimhz/pysepm.git) | [Paper](https://www.researchgate.net/profile/James-Kates-2/publication/7842209_Coherence_and_the_speech_intelligibility_index/links/546f5dab0cf2d67fc0310f88/Coherence-and-the-speech-intelligibility-index.pdf)|
| 25 | | Normalized-covariance measure (NCM) | pysepm | pysepm_ncm | [pysepm](https://github.com/shimhz/pysepm.git) | [Paper](https://pmc.ncbi.nlm.nih.gov/articles/PMC3037773/pdf/JASMAN-000128-003715_1.pdf)|
| 26 | | Uni-VERSA (Versatile Speech Assessment with a Unified Framework) with Audio Reference | universa_audioref | universa_score | [Uni-VERSA](https://huggingface.co/collections/espnet/universa-6834e7c0a28225bffb6e2526) | [paper](https://arxiv.org/abs/2505.20741) |
| 27 | | ARECHO (Audio Reference Echo Cancellation and Codec Quality Assessment) with Audio Reference | arecho_audioref | arecho_score | [ARECHO](https://huggingface.co/espnet/arecho_base_v0) | [paper](https://arxiv.org/abs/2505.20741) |
| 28 | x | Chroma-related Alignment | chroma_alignment | chroma_{stft,cqt,cens}_{cosine, euclidean}_dtw{"", _log, _raw} | - | - |
| 29 | x | Deep Perceptual Audio Metric (DPAM) | dpam | dpam_distance | [PerceptualAudio_Pytorch](https://github.com/adrienchaton/PerceptualAudio_pytorch) | [paper](https://arxiv.org/abs/2001.04460) |
| 30 | x | Contrastive learning-based Deep Perceptual Audio Metric (CDPAM) | cdpam | cdpam_distance | [PerceptualAudio](https://github.com/pranaymanocha/PerceptualAudio/cdpam) | [paper](https://arxiv.org/abs/2102.05109) |
| 26 | x | Uni-VERSA (Versatile Speech Assessment with a Unified Framework) with Paired Reference | universa | universa_{sub_metrics} | [Uni-VERSA](https://huggingface.co/collections/espnet/universa-6834e7c0a28225bffb6e2526) | [paper](https://arxiv.org/abs/2505.20741) |
| 27 | x | Chroma-related Alignment | chroma_alignment | chroma_{stft,cqt,cens}_{cosine, euclidean}_dtw{"", _log, _raw} | - | - |
| 28 | x | Deep Perceptual Audio Metric (DPAM) | dpam | dpam_distance | [PerceptualAudio_Pytorch](https://github.com/adrienchaton/PerceptualAudio_pytorch) | [paper](https://arxiv.org/abs/2001.04460) |
| 29 | x | Contrastive learning-based Deep Perceptual Audio Metric (CDPAM) | cdpam | cdpam_distance | [PerceptualAudio](https://github.com/pranaymanocha/PerceptualAudio/cdpam) | [paper](https://arxiv.org/abs/2102.05109) |


### Non-match Metrics
Expand All @@ -111,11 +111,8 @@ We include x mark if the metric is auto-installed in versa.
| 9 | | Contrastive Language-Audio Pretraining Score (CLAP Score) | clap_score | clap_score | [fadtk](https://github.com/gudgud96/frechet-audio-distance) | [paper](https://arxiv.org/abs/2301.12661) |
| 10 | | Accompaniment Prompt Adherence (APA) | apa | apa | [Sony-audio-metrics](https://github.com/SonyCSLParis/audio-metrics) | [paper](https://arxiv.org/abs/2404.00775) |
| 11 | | Log Likelihood Ratio (LLR) | pysepm | pysepm_llr | [pysepm](https://github.com/shimhz/pysepm.git) | [Paper](https://ecs.utdallas.edu/loizou/speech/obj_paper_jan08.pdf)|
| 12 | | Uni-VERSA (Versatile Speech Assessment with a Unified Framework) with Text Reference | universa_textref | universa_score | [Uni-VERSA](https://huggingface.co/collections/espnet/universa-6834e7c0a28225bffb6e2526) | [paper](https://arxiv.org/abs/2505.20741) |
| 13 | | Uni-VERSA (Versatile Speech Assessment with a Unified Framework) with Full Reference | universa_fullref | universa_score | [Uni-VERSA](https://huggingface.co/collections/espnet/universa-6834e7c0a28225bffb6e2526) | [paper](https://arxiv.org/abs/2505.20741) |
| 14 | | ARECHO (Audio Reference Echo Cancellation and Codec Quality Assessment) with Text Reference | arecho_textref | arecho_score | [ARECHO](https://huggingface.co/espnet/arecho_base_v0) | [paper](https://arxiv.org/abs/2505.20741) |
| 15 | | ARECHO (Audio Reference Echo Cancellation and Codec Quality Assessment) with Full Reference | arecho_fullref | arecho_score | [ARECHO](https://huggingface.co/espnet/arecho_base_v0) | [paper](https://arxiv.org/abs/2505.20741) |
| 16 | | Singer Embedding Similarity | singer | singer_similarity | [SSL-Singer-Identity](https://github.com/SonyCSLParis/ssl-singer-identity) | [paper](https://hal.science/hal-04186048v1) |
| 12 | x | Uni-VERSA (Versatile Speech Assessment with a Unified Framework) with Paired Text | universa | universa_{sub_metrics} | [Uni-VERSA](https://huggingface.co/collections/espnet/universa-6834e7c0a28225bffb6e2526) | [paper](https://arxiv.org/abs/2505.20741) |
| 13 | | Singer Embedding Similarity | singer | singer_similarity | [SSL-Singer-Identity](https://github.com/SonyCSLParis/ssl-singer-identity) | [paper](https://hal.science/hal-04186048v1) |

### Distributional Metrics (in verifying)

Expand Down
2 changes: 1 addition & 1 deletion egs/demo/se.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -90,4 +90,4 @@
# --nisqa_loud_pred: NISQA loudness prediction
# NOTE(jiatong): pretrain model can be downloaded with `./tools/setup_nisqa.sh`
- name: nisqa
nisqa_model_path: ./tools/NISQA/weights/nisqa.tar
nisqa_model_path: versa_cache/nisqa/nisqa.tar
5 changes: 5 additions & 0 deletions egs/separate_metrics/cdpam_distance.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# CDPAM distance metrics
# CDPAM distance between audio samples
# More info in https://github.com/facebookresearch/audiocraft
# -- cdpam_distance: the CDPAM distance between audio samples
- name: cdpam_distance
Loading
Loading