wavlab-speech
diff --git a/‎README.md‎
Lines changed: 3 additions & 3 deletions b/‎README.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/supported_metrics.md‎
Lines changed: 8 additions & 3 deletions b/‎docs/supported_metrics.md‎
Lines changed: 8 additions & 3 deletions
diff --git a/‎launch_local.sh‎
Lines changed: 20 additions & 18 deletions b/‎launch_local.sh‎
Lines changed: 20 additions & 18 deletions
diff --git a/‎scripts/aggregate_results.py‎
Lines changed: 0 additions & 72 deletions b/‎scripts/aggregate_results.py‎
Lines changed: 0 additions & 72 deletions
diff --git a/‎scripts/extract_key.py‎
Lines changed: 86 additions & 0 deletions b/‎scripts/extract_key.py‎
Lines changed: 86 additions & 0 deletions
@@ -8,17 +8,17 @@
 [![arXiv](https://img.shields.io/badge/arXiv-2412.17667-b31b1b.svg)](https://arxiv.org/abs/2412.17667)
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 
-VERSA (Versatile Evaluation of Speech and Audio) is a comprehensive toolkit for evaluating speech and audio quality. It provides seamless access to over 80 evaluation/profiling metrics with 10x variants, enabling researchers and developers to assess audio quality through multiple dimensions.
+VERSA (Versatile Evaluation of Speech and Audio) is a comprehensive toolkit for evaluating speech and audio quality. It provides seamless access to over 90 evaluation/profiling metrics with 10x variants, enabling researchers and developers to assess audio quality through multiple dimensions.
 
 ## 🚨 Exciting News
 - Jun 2025 - Update launch scripts for local machine to support multi-process/multi-gpu (automatic rank assignment) for VERSA.
 - May 2025 – VERSA presented at NAACL 2025, showcasing its unified multi-metric evaluation framework for speech and audio ([🎥 Presentation Video](https://www.youtube.com/watch?v=e7TdOlzyJcE))
 - Feb 2025 – Integrated support for Qwen2-Audio-based perceptual metrics, extending VERSA's capacity for LLM-informed audio quality profiling
-- Dec 2024 – Official release of VERSA v1.0, featuring 80+ evaluation metrics and full integration with ESPnet and Slurm-based distributed evaluation
+- Dec 2024 – Official release of VERSA v1.0, featuring 90+ evaluation metrics and full integration with ESPnet and Slurm-based distributed evaluation
 
 ## 🚀 Features
 
-- **Comprehensive**: 80+ metrics covering perceptual quality, intelligibility, and technical measurements (check [full metrics documentation](https://github.com/wavlab-speech/versa/blob/main/docs/supported_metrics.md) for a complete list)
+- **Comprehensive**: 90+ metrics covering perceptual quality, intelligibility, and technical measurements (check [full metrics documentation](https://github.com/wavlab-speech/versa/blob/main/docs/supported_metrics.md) for a complete list)
 - **Integrated**: Tightly integrated with [ESPnet](https://github.com/espnet/espnet.git)
 - **Flexible**: Support for various input formats (file paths, SCP files, Kaldi-style ARKs)
 - **Scalable**: Built-in support for distributed evaluation using Slurm
 
@@ -51,6 +51,8 @@ We include x mark if the metric is auto-installed in versa.
 | 44 | x | Qwen2 Recording Environment - Quality | qwen2_recording_quality_metric | qwen2_recording_quality_metric | [Qwen2 Audio](https://github.com/QwenLM/Qwen2-Audio) | [paper](https://arxiv.org/abs/2407.10759) |
 | 45 | x | Qwen2 Recording Environment - Channel Type | qwen2_channel_type_metric | qwen2_channel_type_metric | [Qwen2 Audio](https://github.com/QwenLM/Qwen2-Audio) | [paper](https://arxiv.org/abs/2407.10759) |
 | 46 | x | Dimensional Emotion | w2v2_dimensional_emotion | w2v2_dimensional_emotion | [w2v2-how-to](https://github.com/audeering/w2v2-how-to) | [paper](https://arxiv.org/pdf/2203.07378) |
+| 47 | x | Uni-VERSA (Versatile Speech Assessment with a Unified Framework) | universa | universa_{sub_metrics} | [Uni-VERSA](https://huggingface.co/collections/espnet/universa-6834e7c0a28225bffb6e2526) | [paper](https://arxiv.org/abs/2505.20741) |
+
 
 
 ### Dependent Metrics
@@ -81,8 +83,10 @@ We include x mark if the metric is auto-installed in versa.
 | 23 |  | Composite Objective Speech Quality (composite) | pysepm | pysepm_Csig, pysepm_Cbak, pysepm_Covl | [pysepm](https://github.com/shimhz/pysepm.git) | [Paper](https://ecs.utdallas.edu/loizou/speech/obj_paper_jan08.pdf)|
 | 24 |  | Coherence and speech intelligibility index (CSII) | pysepm | pysepm_csii_high, pysepm_csii_mid, pysepm_csii_low | [pysepm](https://github.com/shimhz/pysepm.git) | [Paper](https://www.researchgate.net/profile/James-Kates-2/publication/7842209_Coherence_and_the_speech_intelligibility_index/links/546f5dab0cf2d67fc0310f88/Coherence-and-the-speech-intelligibility-index.pdf)|
 | 25 |  | Normalized-covariance measure (NCM) | pysepm | pysepm_ncm | [pysepm](https://github.com/shimhz/pysepm.git) | [Paper](https://pmc.ncbi.nlm.nih.gov/articles/PMC3037773/pdf/JASMAN-000128-003715_1.pdf)|
-| 26 | x | Deep Perceptual Audio Metric (DPAM) | dpam | dpam_distance | [PerceptualAudio_Pytorch](https://github.com/adrienchaton/PerceptualAudio_pytorch)  | [paper](https://arxiv.org/abs/2001.04460) |
-| 27 | x | Contrastive learning-based Deep Perceptual Audio Metric (CDPAM) | cdpam | cdpam_distance | [PerceptualAudio](https://github.com/pranaymanocha/PerceptualAudio/cdpam) | [paper](https://arxiv.org/abs/2102.05109) |
+| 26 | x | Uni-VERSA (Versatile Speech Assessment with a Unified Framework) with Paired Reference | universa | universa_{sub_metrics} | [Uni-VERSA](https://huggingface.co/collections/espnet/universa-6834e7c0a28225bffb6e2526) | [paper](https://arxiv.org/abs/2505.20741) |
+| 27 | x | Chroma-related Alignment | chroma_alignment | chroma_{stft,cqt,cens}_{cosine, euclidean}_dtw{"", _log, _raw} | - | - |
+| 28 | x | Deep Perceptual Audio Metric (DPAM) | dpam | dpam_distance | [PerceptualAudio_Pytorch](https://github.com/adrienchaton/PerceptualAudio_pytorch)  | [paper](https://arxiv.org/abs/2001.04460) |
+| 29 | x | Contrastive learning-based Deep Perceptual Audio Metric (CDPAM) | cdpam | cdpam_distance | [PerceptualAudio](https://github.com/pranaymanocha/PerceptualAudio/cdpam) | [paper](https://arxiv.org/abs/2102.05109) |
 
 
 ### Non-match Metrics
@@ -100,7 +104,8 @@ We include x mark if the metric is auto-installed in versa.
 | 9 |   | Contrastive Language-Audio Pretraining Score (CLAP Score) | clap_score | clap_score | [fadtk](https://github.com/gudgud96/frechet-audio-distance) | [paper](https://arxiv.org/abs/2301.12661) |
 | 10 |   | Accompaniment Prompt Adherence (APA) | apa | apa | [Sony-audio-metrics](https://github.com/SonyCSLParis/audio-metrics) | [paper](https://arxiv.org/abs/2404.00775) |
 | 11 |  | Log Likelihood Ratio (LLR) | pysepm | pysepm_llr | [pysepm](https://github.com/shimhz/pysepm.git) | [Paper](https://ecs.utdallas.edu/loizou/speech/obj_paper_jan08.pdf)|
-
+| 12 | x | Uni-VERSA (Versatile Speech Assessment with a Unified Framework) with Paired Text | universa | universa_{sub_metrics} | [Uni-VERSA](https://huggingface.co/collections/espnet/universa-6834e7c0a28225bffb6e2526) | [paper](https://arxiv.org/abs/2505.20741) |
+| 13 |  | Singer Embedding Similarity  | singer | singer_similarity | [SSL-Singer-Identity](https://github.com/SonyCSLParis/ssl-singer-identity) | [paper](https://hal.science/hal-04186048v1) |
 
 ### Distributional Metrics (in verifying)
 
 
@@ -351,16 +351,17 @@ get_next_gpu_rank() {
 
 # Function to wait for available GPU slot
 wait_for_gpu_slot() {
+    local idx
     while [ ${#gpu_job_pids[@]} -ge $GPU_MAX_PARALLEL ]; do
         # Check for completed GPU jobs
-        for i in "${!gpu_job_pids[@]}"; do
-            if ! kill -0 "${gpu_job_pids[$i]}" 2>/dev/null; then
+        for idx in "${!gpu_job_pids[@]}"; do
+            if ! kill -0 "${gpu_job_pids[$idx]}" 2>/dev/null; then
                 # Job has finished
-                wait "${gpu_job_pids[$i]}"
+                wait "${gpu_job_pids[$idx]}"
                 local exit_code=$?
 
                 # Extract GPU rank from job info and free it
-                local job_info="${gpu_job_info[$i]}"
+                local job_info="${gpu_job_info[$idx]}"
                 if [[ $job_info =~ GPU_RANK:([0-9]+) ]]; then
                     local freed_rank="${BASH_REMATCH[1]}"
                     # Remove freed rank from in-use array
@@ -374,14 +375,14 @@ wait_for_gpu_slot() {
                 fi
 
                 if [ $exit_code -eq 0 ]; then
-                    echo "${gpu_job_info[$i]}" >> "${COMPLETED_JOBS_FILE}"
+                    echo "${gpu_job_info[$idx]}" >> "${COMPLETED_JOBS_FILE}"
                 else
-                    echo "${gpu_job_info[$i]}" >> "${FAILED_JOBS_FILE}"
+                    echo "${gpu_job_info[$idx]}" >> "${FAILED_JOBS_FILE}"
                 fi
 
                 # Remove from tracking arrays
-                unset gpu_job_pids[$i]
-                unset gpu_job_info[$i]
+                unset gpu_job_pids[$idx]
+                unset gpu_job_info[$idx]
 
                 # Rebuild arrays to remove gaps
                 gpu_job_pids=("${gpu_job_pids[@]}")
@@ -395,23 +396,24 @@ wait_for_gpu_slot() {
 
 # Function to wait for available CPU slot
 wait_for_cpu_slot() {
+    local idx
     while [ ${#cpu_job_pids[@]} -ge $CPU_MAX_PARALLEL ]; do
         # Check for completed CPU jobs
-        for i in "${!cpu_job_pids[@]}"; do
-            if ! kill -0 "${cpu_job_pids[$i]}" 2>/dev/null; then
+        for idx in "${!cpu_job_pids[@]}"; do
+            if ! kill -0 "${cpu_job_pids[$idx]}" 2>/dev/null; then
                 # Job has finished
-                wait "${cpu_job_pids[$i]}"
+                wait "${cpu_job_pids[$idx]}"
                 local exit_code=$?
 
                 if [ $exit_code -eq 0 ]; then
-                    echo "${cpu_job_info[$i]}" >> "${COMPLETED_JOBS_FILE}"
+                    echo "${cpu_job_info[$idx]}" >> "${COMPLETED_JOBS_FILE}"
                 else
-                    echo "${cpu_job_info[$i]}" >> "${FAILED_JOBS_FILE}"
+                    echo "${cpu_job_info[$idx]}" >> "${FAILED_JOBS_FILE}"
                 fi
 
                 # Remove from tracking arrays
-                unset cpu_job_pids[$i]
-                unset cpu_job_info[$i]
+                unset cpu_job_pids[$idx]
+                unset cpu_job_info[$idx]
 
                 # Rebuild arrays to remove gaps
                 cpu_job_pids=("${cpu_job_pids[@]}")
@@ -459,7 +461,7 @@ for ((i=0; i<${#pred_list[@]}; i++)); do
         fi
 
         gpu_ranks_in_use+=($gpu_rank)
-        
+    
         run_job "gpu" \
             "${sub_pred_wavscp}" \
             "${sub_gt_wavscp}" \
@@ -469,7 +471,7 @@ for ((i=0; i<${#pred_list[@]}; i++)); do
             "${job_prefix}" \
             "${chunk_info}" \
             "${gpu_rank}" &
-        
+
         gpu_pid=$!
         gpu_job_pids+=($gpu_pid)
         gpu_job_info+=("GPU:$gpu_pid GPU_RANK:${gpu_rank} CHUNK:${chunk_info} FILE:${job_prefix}")
@@ -486,7 +488,7 @@ for ((i=0; i<${#pred_list[@]}; i++)); do
             "${sub_gt_wavscp}" \
             "${sub_text_file}" \
             "${SCORE_DIR}/result/$(basename "${sub_pred_wavscp}").result.cpu.txt" \
-            "egs/universa_prepare/cpu_subset.yaml" \
+            "egs/quality_check.yaml" \
             "${job_prefix}" \
             "${chunk_info}" &
 
 
@@ -0,0 +1,86 @@
+import json
+
+
+def extract_keys_from_jsonl(input_file, output_file):
+    """
+    Extract only the 'key' values from a JSONL file and save them to a text file.
+
+    Args:
+        input_file (str): Path to the input JSONL file
+        output_file (str): Path to the output text file
+    """
+    keys = []
+
+    try:
+        with open(input_file, "r", encoding="utf-8") as f:
+            for line_num, line in enumerate(f, 1):
+                line = line.strip()
+                if not line:  # Skip empty lines
+                    continue
+
+                try:
+                    data = json.loads(line)
+                    ref_length = len(data["ref_text"].split())
+                    if ref_length < 16:
+                        continue
+                    if "key" in data:
+                        start, end = data["key"].split("_")[-2:]
+                        if (
+                            float(end) - float(start) > 30
+                            or float(end) - float(start) < 10
+                        ):
+                            continue
+                        keys.append(data["key"])
+                    else:
+                        print(f"Warning: No 'key' field found in line {line_num}")
+
+                except json.JSONDecodeError as e:
+                    print(f"Error parsing JSON on line {line_num}: {e}")
+                    continue
+
+        # Write keys to output file
+        with open(output_file, "w", encoding="utf-8") as f:
+            for key in keys:
+                f.write(key + "\n")
+
+        print(f"Successfully extracted {len(keys)} keys to '{output_file}'")
+        return keys
+
+    except FileNotFoundError:
+        print(f"Error: Input file '{input_file}' not found")
+        return []
+    except Exception as e:
+        print(f"Error processing file: {e}")
+        return []
+
+
+# Example usage:
+if __name__ == "__main__":
+    # Replace 'input.jsonl' with your actual input file path
+    # Replace 'keys_only.txt' with your desired output file path
+    input_filename = "filtered_results/deletion_error_lt_0.05.jsonl"
+    output_filename = "deletion_error_lt0.05_300h.txt"
+
+    extracted_keys = extract_keys_from_jsonl(input_filename, output_filename)
+
+    # Optional: Print the first few keys as a preview
+    if extracted_keys:
+        print(f"\nFirst few keys extracted:")
+        for i, key in enumerate(extracted_keys[:5]):
+            print(f"{i+1}: {key}")
+        if len(extracted_keys) > 5:
+            print(f"... and {len(extracted_keys) - 5} more keys")
+
+
+# Alternative one-liner approach using list comprehension:
+def extract_keys_one_liner(input_file, output_file):
+    """
+    One-liner version to extract keys from JSONL file
+    """
+    try:
+        with open(input_file, "r") as f_in, open(output_file, "w") as f_out:
+            keys = [json.loads(line)["key"] for line in f_in if line.strip()]
+            f_out.write("\n".join(keys))
+        print(f"Extracted {len(keys)} keys using one-liner approach")
+    except Exception as e:
+        print(f"Error: {e}")