Skip to content

[ACM CCS'24] SafeEar: Content Privacy-Preserving Audio Deepfake Detection

License

Notifications You must be signed in to change notification settings

LetterLiGo/SafeEar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SafeEaricon: Content Privacy-Preserving Audio Deepfake Detection

arXiv PRs Welcome CC BY 4.0 GitHub stars GitHub forks Website

By [1] Zhejiang University, [2] Tsinghua University.

  • Xinfeng Li* [1], Kai Li* [2], Yifan Zheng [1], Chen Yan† [1], Xiaoyu Ji [1], Wenyuan Xu [1].

This repository is an official implementation of the SafeEar accepted to ACM CCS 2024 (Core-A*, CCF-A, Big4) .

Please also visit our (1) Project Website, (2) Full CVoiceFake Dataset, and (3) Sampled CVoiceFake Dataset.

🔥News

[2025-03-18]: Supported the batch testing for ASVspoof 2019 and 2021, fixed some bugs for datasets and trainer.

[2024-12-10]: Fixed all the bugs for training and test, and uploaded the files for data generation datas/.

[2024-12-01]: Uploaded the checkpoint for data generation datas/.

✨Key Highlights:

In this paper, we propose SafeEar, a novel framework that aims to detect deepfake audios without relying on accessing the speech content within. Our key idea is to devise a neural audio codec into a novel decoupling model that well separates the semantic and acoustic information from audio samples, and only use the acoustic information (e.g., prosody and timbre) for deepfake detection. In this way, no semantic content will be exposed to the detector. To overcome the challenge of identifying diverse deepfake audio without semantic clues, we enhance our deepfake detector with multi-head self-attention and codec augmentation. Extensive experiments conducted on four benchmark datasets demonstrate SafeEar’s effectiveness in detecting various deepfake techniques with an equal error rate (EER) down to 2.02%. Simultaneously, it shields five-language speech content from being deciphered by both machine and human auditory analysis, demonstrated by word error rates (WERs) all above 93.93% and our user study. Furthermore, our benchmark constructed for anti-deepfake and anti-content recovery evaluation helps provide a basis for future research in the realms of audio privacy preservation and deepfake detection.

🚀Overall Pipeline

pipeline

🔧Installation

  1. Clone the repository:
git clone [email protected]:LetterLiGo/SafeEar.git
cd SafeEar/
  1. Create and activate the conda environment:
conda create -n safeear python=3.9 
conda activate safeear
  1. Install PyTorch and torchvision following the official instructions. The code requires python=3.9, pytorch=1.13, torchvision=0.14.
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
  1. Install other dependencies:
pip install pip==24.0
pip install -r requirements.txt

📊Model Performance

ASVspoof 2019 & 2021

Speech Recognition Performance

Data preparation

AVSpoof 2019 & 2021

Please download the ASVspoof 2019 and ASVspoof 2021 datasets and extract them to the datas/datasets directory.

datas/datasets/ASVspoof2019
datas/datasets/ASVspoof2021

Generate the Hubert L9 feature files

mkdir model_zoos
cd model_zoos
wget https://dl.fbaipublicfiles.com/hubert/hubert_base_ls960.pt
wget https://cloud.tsinghua.edu.cn/f/413a0cd2e6f749eea956/?dl=1 -O SpeechTokenizer.pt
cd ../datas
# Generate the Hubert L9 feature files for ASVspoof 2019
python dump_hubert_avg_feature.py datasets/ASVSpoof2019 datasets/ASVSpoof2019_Hubert_L9
# Generate the Hubert L9 feature files for ASVspoof 2021
python dump_hubert_avg_feature.py datasets/ASVSpoof2021 datasets/ASVSpoof2021_Hubert_L9

📚Training

Before starting training, please modify the parameter configurations in configs.

Use the following commands to start training:

python train.py --conf_dir config/train19.yaml
python train.py --conf_dir config/train21.yaml

📈Testing/Inference

To evaluate a model on one or more GPUs, specify the CUDA_VISIBLE_DEVICES, dataset, model and checkpoint:

python test.py --conf_dir Exps/ASVspoof19/config.yaml
python test.py --conf_dir Exps/ASVspoof21/config.yaml

Bugs and Issues

If you meet RuntimeError: Failed to load audio from <_io.BytesIO object at 0x7f45cb978f90>, please use the following command to fix it:

conda install -c anaconda 'ffmpeg<4.4'

📜Citation

If you find our work/code/dataset helpful, please consider citing:

@inproceedings{li2024safeear,
  author       = {Li, Xinfeng and Li, Kai and Zheng, Yifan and Yan, Chen and Ji, Xiaoyu, and Xu, Wenyuan},
  title        = {{SafeEar: Content Privacy-Preserving Audio Deepfake Detection}},
  booktitle    = {Proceedings of the 2024 {ACM} {SIGSAC} Conference on Computer and Communications Security (CCS)}
  year         = {2024},
}