LibriQuote: Speech Dataset of Fictional Character Utterances for Expressive Zero-Shot Speech Synthesis

This repository contains helper functions to process LibriQuote data, and benchmark expressive TTS systems using LibriQuote-test.

This repository contains:

Helper python classes to process LibriQuote in the processing/ folder
Evaluation scripts to benchmark TTS systems on LibriQuote-test in the evaluation/ folder.
The LibriQuote dataset hosted on HuggingFace.

Figure 1. t-SNE projection of emotion vector representations computed with emotion2vec-plus-base. LibriQuote-test (a) quotations and (b) reference narration (non-quotation) utterances; (c) Subsample of LibriHeavy segments (N=5734).

Benchmarking Only

If you use LibriQuote-test only for benchmarking, we provide target and reference samples (in 16KHz) directly in the HuggingFace repository. Check-out the evaluation/ folder to find evaluation scripts.

LibriLight Audio Files

LibriQuote comes with segments derived from narration paragraphs and quotation from characters in fiction novels. It is derived from LibriVox recordings, and currently uses LibriLight audio files as backend audio files. Note that these audio files are encoded in 16KHz.

Please follow LibriLight instructions to download and prepare audio files.

We provide a bash script that will untar only necessary LibriQuote files, reducing the overall processing time and total disk space required

Processing LibriQuote

Find more information in the processing/ folder.

Benchmarking using LibriQuote-test

Find more information in the evaluation/ folder.

Citing

If you use LibriQuote or part of this code in your publications, you can cite this work with the following BibTex entry:

@misc{Michel2025LibriQuote,
    title={LibriQuote: A Speech Dataset of Fictional Character Utterances for Expressive Zero-Shot Speech Synthesis}, 
    author={Gaspard Michel and Elena V. Epure and Christophe Cerisara},
    year={2025},
    eprint={2509.04072},
    archivePrefix={arXiv},
    primaryClass={eess.AS},
    url={https://arxiv.org/abs/2509.04072}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LibriQuote: Speech Dataset of Fictional Character Utterances for Expressive Zero-Shot Speech Synthesis

Links

Benchmarking Only

LibriLight Audio Files

Processing LibriQuote

Benchmarking using LibriQuote-test

Citing

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

LibriQuote: Speech Dataset of Fictional Character Utterances for Expressive Zero-Shot Speech Synthesis

Links

Benchmarking Only

LibriLight Audio Files

Processing LibriQuote

Benchmarking using LibriQuote-test

Citing