CompMusic Turkish makam music corpus
To develop methodologies for the automatic analysis, discovery and exploration of the culture-specific characteristics of a music tradition, we need music collections, which are representative of the studied aspects of that music. For Turkish makam music, there are various resources available such as audio recordings, music scores, lyrics and editorial metadata. However, most of these resources are not typically suited for computational analysis, are hard to access, do not have sufficient quality or do not include adequate descriptive information.
To satisfy this need of representative music collections suitable for computational research, we have created a corpus of Turkish makam music. The primary considerations during the creation of this corpus reflect some criteria, namely, purpose, coverage, completeness, quality and reusability.
The CompMusic Turkish-makam corpus consists of around 6500 audio recordings, adding up to 412 hours of playback duration; 2200 music scores with lyrics encompassing 883,803 notes, 21536 sections and 100 hours of nominal playback duration; and 13500 instances of editorial metadata in Musicbrainz related to the classical and folk repertoires of Turkish makam music. These statistics make the corpus the biggest corpus of Turkish makam music intended for computational research.

The basic statistics of the CompMusic Turkish makam music corpus.
The nodes represent the metadata stored and the numbers on the arrows indicate the number of relationships between each type of metadata
We have compiled several test datasets from the corpus, providing a ground truth for specific computational tasks:
- Tonic Identification dataset
- Symbolic phrase segmentation dataset
- Partial audio alignment dataset
- Section dataset
- Audio-score alignment dataset
- Makam recognition dataset
- Audio-lyrics alignment dataset
We have also started using the corpus to generate a knowledge base for a domain ontology describing Turkish makam music. We hope that this research corpus will facilitate academic studies in several fields such as music information retrieval and computational musicology.
References[1] Uyar, B., Atlı, H. S., Şentürk, S., Bozkurt, B., and Serra, X. (2014). A corpus for computational research of Turkish makam music. In Proceedings of 1st International Digital Libraries for Musicology Workshop, pages 57–63, London, United Kingdom.