Skip to content

Code in support of the ICASSP 2026 submission titled "Unseen but not Unknown: Using Dataset Concealment to Robustly Evaluate Speech Quality Estimation Models."

License

Notifications You must be signed in to change notification settings

NTIA/Dataset-Concealment

Repository files navigation

Dataset-Concealment

This repository is in support of the ICASSP 2026 submission titled "Unseen but not Unknown: Using Dataset Concealment to Robustly Evaluate Speech Quality Estimation Models."

The software builds off of the published repository for AlignNets.

Installation

Clone the repository. In order for the clone to include the trained models you will need git lfs. Installation instructions are here.

conda env create -f environment.yml
conda activate dsc
pip install .

In order to run and create an environment with package versions exactly matching those used to run tests for the paper run

conda env create -f environment-paper.yml
conda activate dsc-paper
pip install .

Data Preparation

In order to quickly define the Individual, Concealed, and Global dataset groups and train models with them, use the prep-dataset-groups.py CLI.

Example

Here we present an example for running Dataset Concealment results with three example datasets. It is easily extensible to more datasets. The file example_dataset_input.yaml has an example of what a dataset group dictionary looks like. It is formatted as

"Dataset1": "path/to/dataset1"
"Dataset2": "path/to/dataset2"
"Dataset3": "path/to/dataset3"

It is expected that each dataset path is structured as follows:

path/to/dataset1
├── test.csv
├── train.csv
└── valid.csv

Running

python prep-dataset-groups.py example_dataset_input.yaml --output-dir dsc/config/data/data_dirs/example

will generate all the required dataset group files in the dsc/config/data/data_dirs/example folder. The data_dirs/example folder has the following structure.

dsc/config/data/data_dirs/example
├── Concealed
│   ├── Conceal-Dataset1.yaml
│   ├── Conceal-Dataset2.yaml
│   └── Conceal-Dataset3.yaml
├── Global.yaml
├── Individual
│   ├── Dataset1.yaml
│   ├── Dataset2.yaml
│   └── Dataset3.yaml

Training Details

The following enables Global training on the first optimization device availble. This may need to be updated depending on your configuration.

Wav2Vec

python train.py \
data/data_dirs=Global \
'optimization.devices=[0]' \
--config-dir dsc/config/models/ \
--config-name alignnet-wav2vec

NISQA

python train.py \
data/data_dirs=Global \
'optimization.devices=[0]' \
--config-dir dsc/config/models/ \
--config-name alignnet-nisqa

MOSNet

python train.py \
data/data_dirs=Global \
'optimization.devices=[0]' \
--config-dir dsc/config/models/ \
--config-name alignnet-MOSNet

Training Concealed or Individual models in a multirun

The following example demonstrates quickly training all the concealed dataset groups. The logic for other models or individual dataset groups is similar.

python train.py -m \
data/data_dirs=Concealed/Conceal-Dataset1,Concealed/Conceal-Dataset2,Concealed/Conceal-Dataset3 \
'optimization.devices=[0]' \
--config-dir dsc/config/models/ \
--config-name alignnet-nisqa

Information Regarding Paper Results

The config files in the dsc/config directory detail all the configuration details used to train models to generate results for the paper "Unseen but not Unknown: Using Dataset Concealment to Robustly Evaluate Speech Quality Estimation Models."

Example models

The trained_model folder contains three trained, example model checkpoints used in the paper. The results from the paper come from 10 independently trained versions of each model, so the included checkpoints cannot fully replicate results on their own. They are included for convenience.

Trained models can easily be used at inference via the CLI built into inference.py. Some basic help can be seen via

python inference.py --help

In general, three overrides must be set:

  • model.path - path to a trained model.
  • data.data_dirs - list containing absolute paths to csv files that list audio files to perform inference on.
  • output.file - path to file where inference output will be stored.

After running inference, a csv will be created at output.file with the following columns:

  • file - filenames where audio was loaded from.
  • estimate - estimate generated by the model.
  • dataset - index listing which file from data.data_files this file belongs to.
  • AlignNet dataset index - index listing which dataset within the model the scores come from. This will be the same for every file in the csv. The default dataset will always be the reference dataset, but this can be overriden via model.dataset_index.

For example, to run inference using the included NISQA model trained on the smaller datasets, one would run

python inference.py \
data.data_dirs=[/path/to/datafile.csv] \
model.path=trained_models/NISQA-AlignNet-Global \
output.file=estimations.csv \
--config-name nisqa.yaml

The example Wav2Vec model can be used like:

python inference.py \
data.data_dirs=[/path/to/datafile.csv] \
model.path=trained_models/Wav2Vec-AlignNet-Global \
output.file=estimations.csv \
--config-name wav2vec.yaml

And the example MOSNet model can be used like:

python inference.py \
data.data_dirs=[/path/to/datafile.csv] \
model.path=trained_models/MOSNet-AlignNet-Global \
output.file=estimations.csv \
--config-name mosnet.yaml

Datasets used in the paper

Most of the datasets used in the paper can be found by finding the links and references here.

The remaining dataset can be found at:

  • TMHINT-QI
    • Chen, Y.-W., Tsao, Y. (2022) InQSS: a speech intelligibility and quality assessment model using a multi-task learning network. Proc. Interspeech 2022, 3088-3092, doi: 10.21437/Interspeech.2022-10153.

Training splits

For datasets where specifically curated training/testing data splits are provided they are used. Specifically the following datasets provide this information:

  • NISQA SIM
  • VMC22
  • TMHINT-QI

For both NISQA SIM and TMHINT-QI we randomly split the data provided for training into training and validation sets so that 90% of the data was in the training set and 10% in the validation set. VMC22 provides specifically curated training, validation, and test sets which were used for all tests. To attempt to understand the stability of training individually with VMC22, randomness was achieved by using different random seeds. For all other tests the seed was kept fixed and randomness comes from the different data splits only.

For most of the other datasets we randomly split the data to achieve 70% in the training set, 15% in the validation set, and 15% in the test set. This applies to the following datasets:

  • Tencent
  • VCC18
  • IU
  • PSTN

The FFTNet and NOIZEUS datasets are significantly smaller than the other datasets used during training. They have 1200 and 1664 audio files respectively, an order of magnitude smaller than most of the other datasets. Due to this we opted for larger training data splits for these two datasets to ensure that the model had a sufficient number of examples from each to appropriately learn them when training with multiple datasets at once. For these two datasets we randomly split the data to achieve 80% in the training set, 10% in the validation set, and 10% in the test set.

About

Code in support of the ICASSP 2026 submission titled "Unseen but not Unknown: Using Dataset Concealment to Robustly Evaluate Speech Quality Estimation Models."

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages