Skip to content

Commit 4767f78

Browse files
authored
Merge branch 'main' into dpam
2 parents fa4d140 + 34b37cb commit 4767f78

22 files changed

Lines changed: 3397 additions & 438 deletions

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,17 +8,17 @@
88
[![arXiv](https://img.shields.io/badge/arXiv-2412.17667-b31b1b.svg)](https://arxiv.org/abs/2412.17667)
99
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
1010

11-
VERSA (Versatile Evaluation of Speech and Audio) is a comprehensive toolkit for evaluating speech and audio quality. It provides seamless access to over 80 evaluation/profiling metrics with 10x variants, enabling researchers and developers to assess audio quality through multiple dimensions.
11+
VERSA (Versatile Evaluation of Speech and Audio) is a comprehensive toolkit for evaluating speech and audio quality. It provides seamless access to over 90 evaluation/profiling metrics with 10x variants, enabling researchers and developers to assess audio quality through multiple dimensions.
1212

1313
## 🚨 Exciting News
1414
- Jun 2025 - Update launch scripts for local machine to support multi-process/multi-gpu (automatic rank assignment) for VERSA.
1515
- May 2025 – VERSA presented at NAACL 2025, showcasing its unified multi-metric evaluation framework for speech and audio ([🎥 Presentation Video](https://www.youtube.com/watch?v=e7TdOlzyJcE))
1616
- Feb 2025 – Integrated support for Qwen2-Audio-based perceptual metrics, extending VERSA's capacity for LLM-informed audio quality profiling
17-
- Dec 2024 – Official release of VERSA v1.0, featuring 80+ evaluation metrics and full integration with ESPnet and Slurm-based distributed evaluation
17+
- Dec 2024 – Official release of VERSA v1.0, featuring 90+ evaluation metrics and full integration with ESPnet and Slurm-based distributed evaluation
1818

1919
## 🚀 Features
2020

21-
- **Comprehensive**: 80+ metrics covering perceptual quality, intelligibility, and technical measurements (check [full metrics documentation](https://github.com/wavlab-speech/versa/blob/main/docs/supported_metrics.md) for a complete list)
21+
- **Comprehensive**: 90+ metrics covering perceptual quality, intelligibility, and technical measurements (check [full metrics documentation](https://github.com/wavlab-speech/versa/blob/main/docs/supported_metrics.md) for a complete list)
2222
- **Integrated**: Tightly integrated with [ESPnet](https://github.com/espnet/espnet.git)
2323
- **Flexible**: Support for various input formats (file paths, SCP files, Kaldi-style ARKs)
2424
- **Scalable**: Built-in support for distributed evaluation using Slurm

docs/supported_metrics.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,8 @@ We include x mark if the metric is auto-installed in versa.
5151
| 44 | x | Qwen2 Recording Environment - Quality | qwen2_recording_quality_metric | qwen2_recording_quality_metric | [Qwen2 Audio](https://github.com/QwenLM/Qwen2-Audio) | [paper](https://arxiv.org/abs/2407.10759) |
5252
| 45 | x | Qwen2 Recording Environment - Channel Type | qwen2_channel_type_metric | qwen2_channel_type_metric | [Qwen2 Audio](https://github.com/QwenLM/Qwen2-Audio) | [paper](https://arxiv.org/abs/2407.10759) |
5353
| 46 | x | Dimensional Emotion | w2v2_dimensional_emotion | w2v2_dimensional_emotion | [w2v2-how-to](https://github.com/audeering/w2v2-how-to) | [paper](https://arxiv.org/pdf/2203.07378) |
54+
| 47 | x | Uni-VERSA (Versatile Speech Assessment with a Unified Framework) | universa | universa_{sub_metrics} | [Uni-VERSA](https://huggingface.co/collections/espnet/universa-6834e7c0a28225bffb6e2526) | [paper](https://arxiv.org/abs/2505.20741) |
55+
5456

5557

5658
### Dependent Metrics
@@ -81,8 +83,10 @@ We include x mark if the metric is auto-installed in versa.
8183
| 23 | | Composite Objective Speech Quality (composite) | pysepm | pysepm_Csig, pysepm_Cbak, pysepm_Covl | [pysepm](https://github.com/shimhz/pysepm.git) | [Paper](https://ecs.utdallas.edu/loizou/speech/obj_paper_jan08.pdf)|
8284
| 24 | | Coherence and speech intelligibility index (CSII) | pysepm | pysepm_csii_high, pysepm_csii_mid, pysepm_csii_low | [pysepm](https://github.com/shimhz/pysepm.git) | [Paper](https://www.researchgate.net/profile/James-Kates-2/publication/7842209_Coherence_and_the_speech_intelligibility_index/links/546f5dab0cf2d67fc0310f88/Coherence-and-the-speech-intelligibility-index.pdf)|
8385
| 25 | | Normalized-covariance measure (NCM) | pysepm | pysepm_ncm | [pysepm](https://github.com/shimhz/pysepm.git) | [Paper](https://pmc.ncbi.nlm.nih.gov/articles/PMC3037773/pdf/JASMAN-000128-003715_1.pdf)|
84-
| 26 | x | Deep Perceptual Audio Metric (DPAM) | dpam | dpam_distance | [PerceptualAudio_Pytorch](https://github.com/adrienchaton/PerceptualAudio_pytorch) | [paper](https://arxiv.org/abs/2001.04460) |
85-
| 27 | x | Contrastive learning-based Deep Perceptual Audio Metric (CDPAM) | cdpam | cdpam_distance | [PerceptualAudio](https://github.com/pranaymanocha/PerceptualAudio/cdpam) | [paper](https://arxiv.org/abs/2102.05109) |
86+
| 26 | x | Uni-VERSA (Versatile Speech Assessment with a Unified Framework) with Paired Reference | universa | universa_{sub_metrics} | [Uni-VERSA](https://huggingface.co/collections/espnet/universa-6834e7c0a28225bffb6e2526) | [paper](https://arxiv.org/abs/2505.20741) |
87+
| 27 | x | Chroma-related Alignment | chroma_alignment | chroma_{stft,cqt,cens}_{cosine, euclidean}_dtw{"", _log, _raw} | - | - |
88+
| 28 | x | Deep Perceptual Audio Metric (DPAM) | dpam | dpam_distance | [PerceptualAudio_Pytorch](https://github.com/adrienchaton/PerceptualAudio_pytorch) | [paper](https://arxiv.org/abs/2001.04460) |
89+
| 29 | x | Contrastive learning-based Deep Perceptual Audio Metric (CDPAM) | cdpam | cdpam_distance | [PerceptualAudio](https://github.com/pranaymanocha/PerceptualAudio/cdpam) | [paper](https://arxiv.org/abs/2102.05109) |
8690

8791

8892
### Non-match Metrics
@@ -100,7 +104,8 @@ We include x mark if the metric is auto-installed in versa.
100104
| 9 | | Contrastive Language-Audio Pretraining Score (CLAP Score) | clap_score | clap_score | [fadtk](https://github.com/gudgud96/frechet-audio-distance) | [paper](https://arxiv.org/abs/2301.12661) |
101105
| 10 | | Accompaniment Prompt Adherence (APA) | apa | apa | [Sony-audio-metrics](https://github.com/SonyCSLParis/audio-metrics) | [paper](https://arxiv.org/abs/2404.00775) |
102106
| 11 | | Log Likelihood Ratio (LLR) | pysepm | pysepm_llr | [pysepm](https://github.com/shimhz/pysepm.git) | [Paper](https://ecs.utdallas.edu/loizou/speech/obj_paper_jan08.pdf)|
103-
107+
| 12 | x | Uni-VERSA (Versatile Speech Assessment with a Unified Framework) with Paired Text | universa | universa_{sub_metrics} | [Uni-VERSA](https://huggingface.co/collections/espnet/universa-6834e7c0a28225bffb6e2526) | [paper](https://arxiv.org/abs/2505.20741) |
108+
| 13 | | Singer Embedding Similarity | singer | singer_similarity | [SSL-Singer-Identity](https://github.com/SonyCSLParis/ssl-singer-identity) | [paper](https://hal.science/hal-04186048v1) |
104109

105110
### Distributional Metrics (in verifying)
106111

launch_local.sh

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -351,16 +351,17 @@ get_next_gpu_rank() {
351351

352352
# Function to wait for available GPU slot
353353
wait_for_gpu_slot() {
354+
local idx
354355
while [ ${#gpu_job_pids[@]} -ge $GPU_MAX_PARALLEL ]; do
355356
# Check for completed GPU jobs
356-
for i in "${!gpu_job_pids[@]}"; do
357-
if ! kill -0 "${gpu_job_pids[$i]}" 2>/dev/null; then
357+
for idx in "${!gpu_job_pids[@]}"; do
358+
if ! kill -0 "${gpu_job_pids[$idx]}" 2>/dev/null; then
358359
# Job has finished
359-
wait "${gpu_job_pids[$i]}"
360+
wait "${gpu_job_pids[$idx]}"
360361
local exit_code=$?
361362

362363
# Extract GPU rank from job info and free it
363-
local job_info="${gpu_job_info[$i]}"
364+
local job_info="${gpu_job_info[$idx]}"
364365
if [[ $job_info =~ GPU_RANK:([0-9]+) ]]; then
365366
local freed_rank="${BASH_REMATCH[1]}"
366367
# Remove freed rank from in-use array
@@ -374,14 +375,14 @@ wait_for_gpu_slot() {
374375
fi
375376

376377
if [ $exit_code -eq 0 ]; then
377-
echo "${gpu_job_info[$i]}" >> "${COMPLETED_JOBS_FILE}"
378+
echo "${gpu_job_info[$idx]}" >> "${COMPLETED_JOBS_FILE}"
378379
else
379-
echo "${gpu_job_info[$i]}" >> "${FAILED_JOBS_FILE}"
380+
echo "${gpu_job_info[$idx]}" >> "${FAILED_JOBS_FILE}"
380381
fi
381382

382383
# Remove from tracking arrays
383-
unset gpu_job_pids[$i]
384-
unset gpu_job_info[$i]
384+
unset gpu_job_pids[$idx]
385+
unset gpu_job_info[$idx]
385386

386387
# Rebuild arrays to remove gaps
387388
gpu_job_pids=("${gpu_job_pids[@]}")
@@ -395,23 +396,24 @@ wait_for_gpu_slot() {
395396

396397
# Function to wait for available CPU slot
397398
wait_for_cpu_slot() {
399+
local idx
398400
while [ ${#cpu_job_pids[@]} -ge $CPU_MAX_PARALLEL ]; do
399401
# Check for completed CPU jobs
400-
for i in "${!cpu_job_pids[@]}"; do
401-
if ! kill -0 "${cpu_job_pids[$i]}" 2>/dev/null; then
402+
for idx in "${!cpu_job_pids[@]}"; do
403+
if ! kill -0 "${cpu_job_pids[$idx]}" 2>/dev/null; then
402404
# Job has finished
403-
wait "${cpu_job_pids[$i]}"
405+
wait "${cpu_job_pids[$idx]}"
404406
local exit_code=$?
405407

406408
if [ $exit_code -eq 0 ]; then
407-
echo "${cpu_job_info[$i]}" >> "${COMPLETED_JOBS_FILE}"
409+
echo "${cpu_job_info[$idx]}" >> "${COMPLETED_JOBS_FILE}"
408410
else
409-
echo "${cpu_job_info[$i]}" >> "${FAILED_JOBS_FILE}"
411+
echo "${cpu_job_info[$idx]}" >> "${FAILED_JOBS_FILE}"
410412
fi
411413

412414
# Remove from tracking arrays
413-
unset cpu_job_pids[$i]
414-
unset cpu_job_info[$i]
415+
unset cpu_job_pids[$idx]
416+
unset cpu_job_info[$idx]
415417

416418
# Rebuild arrays to remove gaps
417419
cpu_job_pids=("${cpu_job_pids[@]}")
@@ -459,7 +461,7 @@ for ((i=0; i<${#pred_list[@]}; i++)); do
459461
fi
460462

461463
gpu_ranks_in_use+=($gpu_rank)
462-
464+
463465
run_job "gpu" \
464466
"${sub_pred_wavscp}" \
465467
"${sub_gt_wavscp}" \
@@ -469,7 +471,7 @@ for ((i=0; i<${#pred_list[@]}; i++)); do
469471
"${job_prefix}" \
470472
"${chunk_info}" \
471473
"${gpu_rank}" &
472-
474+
473475
gpu_pid=$!
474476
gpu_job_pids+=($gpu_pid)
475477
gpu_job_info+=("GPU:$gpu_pid GPU_RANK:${gpu_rank} CHUNK:${chunk_info} FILE:${job_prefix}")
@@ -486,7 +488,7 @@ for ((i=0; i<${#pred_list[@]}; i++)); do
486488
"${sub_gt_wavscp}" \
487489
"${sub_text_file}" \
488490
"${SCORE_DIR}/result/$(basename "${sub_pred_wavscp}").result.cpu.txt" \
489-
"egs/universa_prepare/cpu_subset.yaml" \
491+
"egs/quality_check.yaml" \
490492
"${job_prefix}" \
491493
"${chunk_info}" &
492494

scripts/aggregate_results.py

Lines changed: 0 additions & 72 deletions
This file was deleted.

scripts/extract_key.py

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
import json
2+
3+
4+
def extract_keys_from_jsonl(input_file, output_file):
5+
"""
6+
Extract only the 'key' values from a JSONL file and save them to a text file.
7+
8+
Args:
9+
input_file (str): Path to the input JSONL file
10+
output_file (str): Path to the output text file
11+
"""
12+
keys = []
13+
14+
try:
15+
with open(input_file, "r", encoding="utf-8") as f:
16+
for line_num, line in enumerate(f, 1):
17+
line = line.strip()
18+
if not line: # Skip empty lines
19+
continue
20+
21+
try:
22+
data = json.loads(line)
23+
ref_length = len(data["ref_text"].split())
24+
if ref_length < 16:
25+
continue
26+
if "key" in data:
27+
start, end = data["key"].split("_")[-2:]
28+
if (
29+
float(end) - float(start) > 30
30+
or float(end) - float(start) < 10
31+
):
32+
continue
33+
keys.append(data["key"])
34+
else:
35+
print(f"Warning: No 'key' field found in line {line_num}")
36+
37+
except json.JSONDecodeError as e:
38+
print(f"Error parsing JSON on line {line_num}: {e}")
39+
continue
40+
41+
# Write keys to output file
42+
with open(output_file, "w", encoding="utf-8") as f:
43+
for key in keys:
44+
f.write(key + "\n")
45+
46+
print(f"Successfully extracted {len(keys)} keys to '{output_file}'")
47+
return keys
48+
49+
except FileNotFoundError:
50+
print(f"Error: Input file '{input_file}' not found")
51+
return []
52+
except Exception as e:
53+
print(f"Error processing file: {e}")
54+
return []
55+
56+
57+
# Example usage:
58+
if __name__ == "__main__":
59+
# Replace 'input.jsonl' with your actual input file path
60+
# Replace 'keys_only.txt' with your desired output file path
61+
input_filename = "filtered_results/deletion_error_lt_0.05.jsonl"
62+
output_filename = "deletion_error_lt0.05_300h.txt"
63+
64+
extracted_keys = extract_keys_from_jsonl(input_filename, output_filename)
65+
66+
# Optional: Print the first few keys as a preview
67+
if extracted_keys:
68+
print(f"\nFirst few keys extracted:")
69+
for i, key in enumerate(extracted_keys[:5]):
70+
print(f"{i+1}: {key}")
71+
if len(extracted_keys) > 5:
72+
print(f"... and {len(extracted_keys) - 5} more keys")
73+
74+
75+
# Alternative one-liner approach using list comprehension:
76+
def extract_keys_one_liner(input_file, output_file):
77+
"""
78+
One-liner version to extract keys from JSONL file
79+
"""
80+
try:
81+
with open(input_file, "r") as f_in, open(output_file, "w") as f_out:
82+
keys = [json.loads(line)["key"] for line in f_in if line.strip()]
83+
f_out.write("\n".join(keys))
84+
print(f"Extracted {len(keys)} keys using one-liner approach")
85+
except Exception as e:
86+
print(f"Error: {e}")

0 commit comments

Comments
 (0)