This repository contains code, documentation, and sample data set files to:
- Fetch data dumps from various databases in various file formats.
- Reconcile entries in these databases against entities and properties in WikiData.
- Transform reconciled databases into RDF turtle format and upload it to Virtuoso Staging.
- Please refer to OpenRefine Tips if you are not familiar with how OpenRefine works.
This GitHub repo contains codes, documentations, and test files are used to
- fetch the original data dumps from databases,
- convert them to CSV,
- reconcile,
- convert the reconciled CSV to RDF turtle, then upload them to the Virtuoso Staging.
- Open a terminal in the
/linkedmusic-datalake
folder. - Run
poetry install
to install the required packages. - Activate the virtual environment with
eval $(poetry env activate)
.
Cantus Database is a repository of Latin chants found in medieval manuscripts and early printed books.
Cantus DB provides us their sample Data Sets in CSV format. The work is still in progress.
Refer to the Cantus DB manual for details.
Detailed information is provided within the corresponding .ttl
files.
RISM Database is the Répertoire International des Sources Musicales, an international collaborative database that catalogues historical musical sources. It provides detailed information on manuscripts, prints, and other music-related documents, serving as a crucial resource for researchers, librarians, and musicologists seeking to study and reference historical musical materials. RISM provides us their complete Data Sets in RDF format. We use OpenRefine to reconcile the database against WikiData. Refer to the RISM manual for more details.
MusicBrainz is an open music encyclopedia that provides extensive music metadata and serves as a universal reference for music identification.
MusicBrainz has a public Data Set downloading site. We retrieve those Data Sets in JSON Lines format and process them using RDFLib package from python.
See the MusicBrainz manual for more information.
SIMSSA Database is a discovery tool for symbolic music files (MEI, Kern, MusicXML, MIDI). It evolved from a previous database developed under Julie Cumming’s Digging into Data grant, offering improved functionality. The work is still in progress. Refer to the Simssa DB manual for further instructions.
The Session is a community website dedicated to Irish traditional music. The Session has a public GitHub repo that contains public Data Sets. We retrieve these in CSV format and reconcile them using OpenRefine. Find the Session manual for additional guidance.
The definitive source of music information by allowing anyone to contribute and releasing the data under open licenses.
The universal lingua franca for music by providing a reliable and unambiguous form of music identification, enabling both people and machines to have meaningful conversations about music.
Find https://github.com/DDMAL/linkedmusic-datalake/blob/main/musicbrainz/README.md for further manual.
AcousticBrainz collected acoustic information from music recordings between 2015 and 2022, providing insights into spectral data, genres, moods, keys, and scales.
Consult the AcousticBrainz manual for more details.
- All located in the jsonld_approach folder. We share some test sample files.
In terms of reconciliation, OpenRefine primarily automate the matching of property values. However, perfect matches on Wikidata are not always guaranteed. To address this, we have been creating the "archive", storing those manually reconciled entries. This shared resource ensures that previously verified mappings can be reused, saving time and effort for others. Reconciliation with OpenRefine may not always yield perfect matches on Wikidata. The "archive" stores manually reconciled entries, allowing verified mappings to be reused and saving time and effort.