Load these modules

EmotioNotes is a curated dataset of historical concert program notes annotated with emotional content, designed to support research in music emotion recognition (MER), music information retrieval, and cultural musicology.

This dataset leverages archival materials from the New York Philharmonic dating back to the 1840s. Each entry includes metadata about the performance, detailed program notes, and estimated emotional scores in valence, arousal, and dominance (VAD) dimensions.

📦 Dataset Description

Each JSON entry in the dataset corresponds to a specific performance of a classical music work and includes the following fields:

🎼 Performance Metadata

id: Unique identifier
programID: Identifier of the original concert program
orchestra: Performing ensemble
season: Concert season (e.g., "1842–43")
concerts: List of performance details:
- eventType: e.g., "Subscription Season", "Special"
- location: e.g., "Manhattan, NY"
- venue: Venue name
- date: Date of performance
- time: Time of performance

🎶 Musical Work Details

workTitle: Title of the performed piece
movement: Specific movement (if available)
composerName: Name of the composer
conductorName: Name of the conductor
soloists: List of soloists with:
- soloistName
- soloistInstrument
- soloistRoles

📝 Program Note

ProgramNote: Extracted textual description of the musical work, drawn from archival concert booklets.

Emotion Annotations

Emotion scores are derived using a lexicon-based method (Warriner et al., 2013):

valence_mean: Pleasantness of a stimulus (0–1 scale)
arousal_mean: Intensity/energy (0–1 scale)
dominance_mean: Sense of control (0–1 scale)
valence_std, arousal_std, dominance_std: Standard deviation of emotional scores

Metrics for avoiding hallucinations

BLEUScore: A text similarity score used to evaluate the quality of extracted program notes against the raw OCR text.

📊 Use Cases

This dataset is ideal for:

Text-based emotion recognition in music
Music recommendation systems based on emotion
Cultural analysis of how music was described across centuries
Using text-based emotion annotations to audio/music scores as proxies

📁 Format

The dataset is provided in JSON format. Example entry:

{
  "workTitle": "Symphony No. 3 In E Flat Major, Op. 55 (Eroica)",
  "composerName": "Ludwig van Beethoven",
  "ProgramNote": "This great work was commenced when Napoleon was first Consul...",
  "valence_mean": 0.585,
  "arousal_mean": 0.384,
  "dominance_mean": 0.618
}

📚 Citation

If you use this dataset, please cite the following paper:

Khanal, P., & Donnelly, P. (2024). EmotioNotes Dataset: Decoding emotions in classical music through concert program notes. In: Johnson, C., Machado, P., Santos, I. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2025. Lecture Notes in Computer Science, vol xxxxx. Springer, Cham. https://doi.org/10.1007/978-3-031-90167-6_23

📂 Dataset & Code

🙌 Acknowledgments

This project would not have been possible without the following:

The New York Philharmonic Digital Archives, for access to centuries of concert program materials.
Meta AI for open-sourcing LLaMA 3.1, which enabled structured text extraction from unstructured OCR text.
Warriner et al. (2013) for the valence, arousal, and dominance (VAD) emotion lexicon used for annotation.

Getting Started

Install miniconda if you don't have it: wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

chmod a+x Miniconda3-latest-Linux-x86_64.sh

./Miniconda3-latest-Linux-x86_64.sh

Note - While going through CondaÃ¢â‚¬â„¢s guided installer, it will ask you to specify an installation directory. This will default to your home directory - which has a 14gb disk restriction as a student. You will likely want to change this path to use the SoundBendOR Lab filespace. /to/soundbendor/

Create a new conda environment and activate it conda create -n program_notes python=3.12

conda activate program_notes

pip install jupyterlab

Load these modules

module load gcc cmake cuda/11.8 slurm

Install PyTorch

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

If you get this error and canÃ¢â‚¬â„¢t open Jupyter Session: AttributeError: partially initialized module 'charset_normalizer' has no attribute 'md__mypyc' (most likely due to a circular import) Do this: pip install -U --force-reinstall charset-normalizer

#pip install everything below: python-dotenv==1.0.1

pydot==2.0.0

langchain==0.2.3

langchain-cohere==0.1.5

langchain-community==0.0.38

langchain-core==0.2.5

langchain-groq==0.1.3

langchain-text-splitters==0.2.1

langcodes==3.4.0

langdetect==1.0.9

langsmith==0.1.75

sentence-transformers==2.7.0

transformers==4.41.2

accelerate==0.30.1

conda install -c pytorch faiss-gpu

Now you will be able to run the batch file.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
Extractions		Extractions
.env		.env
.gitattributes		.gitattributes
.gitignore		.gitignore
Llama31_total_data_with_bleu_score_filter.json		Llama31_total_data_with_bleu_score_filter.json
RAG_llama3.1.py		RAG_llama3.1.py
RAG_rerank.py		RAG_rerank.py
README.md		README.md
new_pip_requirements.txt		new_pip_requirements.txt
new_requirements.txt		new_requirements.txt
nyphilharmonic_programs.json		nyphilharmonic_programs.json
program_note_extraction_llama3.1.py		program_note_extraction_llama3.1.py
requirements.txt		requirements.txt
requirements_noversion.txt		requirements_noversion.txt
run_extraction.sh		run_extraction.sh
run_extraction_Patrick.sh		run_extraction_Patrick.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📦 Dataset Description

🎼 Performance Metadata

🎶 Musical Work Details

📝 Program Note

Emotion Annotations

Metrics for avoiding hallucinations

📊 Use Cases

📁 Format

📚 Citation

🙌 Acknowledgments

Getting Started

Load these modules

Install PyTorch

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Soundbendor/program_notes_data

Folders and files

Latest commit

History

Repository files navigation

📦 Dataset Description

🎼 Performance Metadata

🎶 Musical Work Details

📝 Program Note

Emotion Annotations

Metrics for avoiding hallucinations

📊 Use Cases

📁 Format

📚 Citation

🙌 Acknowledgments

Getting Started

Load these modules

Install PyTorch

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages