By [1] Zhejiang University, [2] Tsinghua University.
- Xinfeng Li* [1], Kai Li* [2], Yifan Zheng [1], Chen Yan† [1], Xiaoyu Ji [1], Wenyuan Xu [1].
This repository is an official implementation of the SafeEar accepted to ACM CCS 2024 (Core-A*, CCF-A, Big4) .
Please also visit our (1) Project Website, (2) Full CVoiceFake Dataset, and (3) Sampled CVoiceFake Dataset.
[2025-03-18]: Supported the batch testing for ASVspoof 2019 and 2021, fixed some bugs for datasets and trainer.
[2024-12-10]: Fixed all the bugs for training and test, and uploaded the files for data generation datas/
.
[2024-12-01]: Uploaded the checkpoint for data generation datas/
.
In this paper, we propose SafeEar, a novel framework that aims to detect deepfake audios without relying on accessing the speech content within. Our key idea is to devise a neural audio codec into a novel decoupling model that well separates the semantic and acoustic information from audio samples, and only use the acoustic information (e.g., prosody and timbre) for deepfake detection. In this way, no semantic content will be exposed to the detector. To overcome the challenge of identifying diverse deepfake audio without semantic clues, we enhance our deepfake detector with multi-head self-attention and codec augmentation. Extensive experiments conducted on four benchmark datasets demonstrate SafeEar’s effectiveness in detecting various deepfake techniques with an equal error rate (EER) down to 2.02%. Simultaneously, it shields five-language speech content from being deciphered by both machine and human auditory analysis, demonstrated by word error rates (WERs) all above 93.93% and our user study. Furthermore, our benchmark constructed for anti-deepfake and anti-content recovery evaluation helps provide a basis for future research in the realms of audio privacy preservation and deepfake detection.
- Clone the repository:
git clone [email protected]:LetterLiGo/SafeEar.git
cd SafeEar/
- Create and activate the conda environment:
conda create -n safeear python=3.9
conda activate safeear
- Install PyTorch and torchvision following the official instructions. The code requires
python=3.9
,pytorch=1.13
,torchvision=0.14
.
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
- Install other dependencies:
pip install pip==24.0
pip install -r requirements.txt
Please download the ASVspoof 2019 and ASVspoof 2021 datasets and extract them to the datas/datasets
directory.
datas/datasets/ASVspoof2019
datas/datasets/ASVspoof2021
mkdir model_zoos
cd model_zoos
wget https://dl.fbaipublicfiles.com/hubert/hubert_base_ls960.pt
wget https://cloud.tsinghua.edu.cn/f/413a0cd2e6f749eea956/?dl=1 -O SpeechTokenizer.pt
cd ../datas
# Generate the Hubert L9 feature files for ASVspoof 2019
python dump_hubert_avg_feature.py datasets/ASVSpoof2019 datasets/ASVSpoof2019_Hubert_L9
# Generate the Hubert L9 feature files for ASVspoof 2021
python dump_hubert_avg_feature.py datasets/ASVSpoof2021 datasets/ASVSpoof2021_Hubert_L9
Before starting training, please modify the parameter configurations in configs
.
Use the following commands to start training:
python train.py --conf_dir config/train19.yaml
python train.py --conf_dir config/train21.yaml
To evaluate a model on one or more GPUs, specify the CUDA_VISIBLE_DEVICES
, dataset
, model
and checkpoint
:
python test.py --conf_dir Exps/ASVspoof19/config.yaml
python test.py --conf_dir Exps/ASVspoof21/config.yaml
If you meet RuntimeError: Failed to load audio from <_io.BytesIO object at 0x7f45cb978f90>
, please use the following command to fix it:
conda install -c anaconda 'ffmpeg<4.4'
If you find our work/code/dataset helpful, please consider citing:
@inproceedings{li2024safeear,
author = {Li, Xinfeng and Li, Kai and Zheng, Yifan and Yan, Chen and Ji, Xiaoyu, and Xu, Wenyuan},
title = {{SafeEar: Content Privacy-Preserving Audio Deepfake Detection}},
booktitle = {Proceedings of the 2024 {ACM} {SIGSAC} Conference on Computer and Communications Security (CCS)}
year = {2024},
}