@@ -51,6 +51,8 @@ We include x mark if the metric is auto-installed in versa.
5151| 44 | x | Qwen2 Recording Environment - Quality | qwen2_recording_quality_metric | qwen2_recording_quality_metric | [ Qwen2 Audio] ( https://github.com/QwenLM/Qwen2-Audio ) | [ paper] ( https://arxiv.org/abs/2407.10759 ) |
5252| 45 | x | Qwen2 Recording Environment - Channel Type | qwen2_channel_type_metric | qwen2_channel_type_metric | [ Qwen2 Audio] ( https://github.com/QwenLM/Qwen2-Audio ) | [ paper] ( https://arxiv.org/abs/2407.10759 ) |
5353| 46 | x | Dimensional Emotion | w2v2_dimensional_emotion | w2v2_dimensional_emotion | [ w2v2-how-to] ( https://github.com/audeering/w2v2-how-to ) | [ paper] ( https://arxiv.org/pdf/2203.07378 ) |
54+ | 47 | x | Uni-VERSA (Versatile Speech Assessment with a Unified Framework) | universa | universa_ {sub_metrics} | [ Uni-VERSA] ( https://huggingface.co/collections/espnet/universa-6834e7c0a28225bffb6e2526 ) | [ paper] ( https://arxiv.org/abs/2505.20741 ) |
55+
5456
5557
5658### Dependent Metrics
@@ -81,7 +83,8 @@ We include x mark if the metric is auto-installed in versa.
8183| 23 | | Composite Objective Speech Quality (composite) | pysepm | pysepm_Csig, pysepm_Cbak, pysepm_Covl | [ pysepm] ( https://github.com/shimhz/pysepm.git ) | [ Paper] ( https://ecs.utdallas.edu/loizou/speech/obj_paper_jan08.pdf ) |
8284| 24 | | Coherence and speech intelligibility index (CSII) | pysepm | pysepm_csii_high, pysepm_csii_mid, pysepm_csii_low | [ pysepm] ( https://github.com/shimhz/pysepm.git ) | [ Paper] ( https://www.researchgate.net/profile/James-Kates-2/publication/7842209_Coherence_and_the_speech_intelligibility_index/links/546f5dab0cf2d67fc0310f88/Coherence-and-the-speech-intelligibility-index.pdf ) |
8385| 25 | | Normalized-covariance measure (NCM) | pysepm | pysepm_ncm | [ pysepm] ( https://github.com/shimhz/pysepm.git ) | [ Paper] ( https://pmc.ncbi.nlm.nih.gov/articles/PMC3037773/pdf/JASMAN-000128-003715_1.pdf ) |
84-
86+ | 26 | x | Uni-VERSA (Versatile Speech Assessment with a Unified Framework) with Paired Reference | universa | universa_ {sub_metrics} | [ Uni-VERSA] ( https://huggingface.co/collections/espnet/universa-6834e7c0a28225bffb6e2526 ) | [ paper] ( https://arxiv.org/abs/2505.20741 ) |
87+ | 27 | x | Chroma-related Alignment | chroma_alignment | chroma_ {stft,cqt,cens}_ {cosine, euclidean}_ dtw{"", _ log, _ raw} | - | - |
8588
8689### Non-match Metrics
8790
@@ -98,7 +101,8 @@ We include x mark if the metric is auto-installed in versa.
98101| 9 | | Contrastive Language-Audio Pretraining Score (CLAP Score) | clap_score | clap_score | [ fadtk] ( https://github.com/gudgud96/frechet-audio-distance ) | [ paper] ( https://arxiv.org/abs/2301.12661 ) |
99102| 10 | | Accompaniment Prompt Adherence (APA) | apa | apa | [ Sony-audio-metrics] ( https://github.com/SonyCSLParis/audio-metrics ) | [ paper] ( https://arxiv.org/abs/2404.00775 ) |
100103| 11 | | Log Likelihood Ratio (LLR) | pysepm | pysepm_llr | [ pysepm] ( https://github.com/shimhz/pysepm.git ) | [ Paper] ( https://ecs.utdallas.edu/loizou/speech/obj_paper_jan08.pdf ) |
101-
104+ | 12 | x | Uni-VERSA (Versatile Speech Assessment with a Unified Framework) with Paired Text | universa | universa_ {sub_metrics} | [ Uni-VERSA] ( https://huggingface.co/collections/espnet/universa-6834e7c0a28225bffb6e2526 ) | [ paper] ( https://arxiv.org/abs/2505.20741 ) |
105+ | 13 | | Singer Embedding Similarity | singer | singer_similarity | [ SSL-Singer-Identity] ( https://github.com/SonyCSLParis/ssl-singer-identity ) | [ paper] ( https://hal.science/hal-04186048v1 ) |
102106
103107### Distributional Metrics (in verifying)
104108
0 commit comments