EmotioNotes is a curated dataset of historical concert program notes annotated with emotional content, designed to support research in music emotion recognition (MER), music information retrieval, and cultural musicology.
This dataset leverages archival materials from the New York Philharmonic dating back to the 1840s. Each entry includes metadata about the performance, detailed program notes, and estimated emotional scores in valence, arousal, and dominance (VAD) dimensions.
Each JSON entry in the dataset corresponds to a specific performance of a classical music work and includes the following fields:
id
: Unique identifierprogramID
: Identifier of the original concert programorchestra
: Performing ensembleseason
: Concert season (e.g., "1842–43")concerts
: List of performance details:eventType
: e.g., "Subscription Season", "Special"location
: e.g., "Manhattan, NY"venue
: Venue namedate
: Date of performancetime
: Time of performance
workTitle
: Title of the performed piecemovement
: Specific movement (if available)composerName
: Name of the composerconductorName
: Name of the conductorsoloists
: List of soloists with:soloistName
soloistInstrument
soloistRoles
ProgramNote
: Extracted textual description of the musical work, drawn from archival concert booklets.
Emotion scores are derived using a lexicon-based method (Warriner et al., 2013):
valence_mean
: Pleasantness of a stimulus (0–1 scale)arousal_mean
: Intensity/energy (0–1 scale)dominance_mean
: Sense of control (0–1 scale)valence_std
,arousal_std
,dominance_std
: Standard deviation of emotional scores
BLEUScore
: A text similarity score used to evaluate the quality of extracted program notes against the raw OCR text.
This dataset is ideal for:
- Text-based emotion recognition in music
- Music recommendation systems based on emotion
- Cultural analysis of how music was described across centuries
- Using text-based emotion annotations to audio/music scores as proxies
The dataset is provided in JSON format. Example entry:
{
"workTitle": "Symphony No. 3 In E Flat Major, Op. 55 (Eroica)",
"composerName": "Ludwig van Beethoven",
"ProgramNote": "This great work was commenced when Napoleon was first Consul...",
"valence_mean": 0.585,
"arousal_mean": 0.384,
"dominance_mean": 0.618
}
If you use this dataset, please cite the following paper:
Khanal, P., & Donnelly, P. (2024). EmotioNotes Dataset: Decoding emotions in classical music through concert program notes. In: Johnson, C., Machado, P., Santos, I. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2025. Lecture Notes in Computer Science, vol xxxxx. Springer, Cham. https://doi.org/10.1007/978-3-031-90167-6_23
This project would not have been possible without the following:
- The New York Philharmonic Digital Archives, for access to centuries of concert program materials.
- Meta AI for open-sourcing LLaMA 3.1, which enabled structured text extraction from unstructured OCR text.
- Warriner et al. (2013) for the valence, arousal, and dominance (VAD) emotion lexicon used for annotation.
Install miniconda if you don't have it: wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod a+x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh
Note - While going through Conda’s guided installer, it will ask you to specify an installation directory. This will default to your home directory - which has a 14gb disk restriction as a student. You will likely want to change this path to use the SoundBendOR Lab filespace. /to/soundbendor/
Create a new conda environment and activate it conda create -n program_notes python=3.12
conda activate program_notes
pip install jupyterlab
module load gcc cmake cuda/11.8 slurm
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
If you get this error and can’t open Jupyter Session: AttributeError: partially initialized module 'charset_normalizer' has no attribute 'md__mypyc' (most likely due to a circular import) Do this: pip install -U --force-reinstall charset-normalizer
#pip install everything below: python-dotenv==1.0.1
pydot==2.0.0
langchain==0.2.3
langchain-cohere==0.1.5
langchain-community==0.0.38
langchain-core==0.2.5
langchain-groq==0.1.3
langchain-text-splitters==0.2.1
langcodes==3.4.0
langdetect==1.0.9
langsmith==0.1.75
sentence-transformers==2.7.0
transformers==4.41.2
accelerate==0.30.1
conda install -c pytorch faiss-gpu
Now you will be able to run the batch file.