Music Information Retrieval Datasets

Georg Holzmann
Technical report

Neural Information Processing Group, TU Berlin, Germany


The availability of common datasets is very import in the progress of the music information retrieval (MIR) community. Whereas standard benchmark tasks are widely used in other similar research areas (e.g. speech or handwriting recognition), it is difficult to freely distribute music data due to very restrictive copyright laws. However, different groups try to overcome these problems by using music with a free license (e.g. Creative Commons) or by just distributing feature vectors and not the audio data.

This is an attempt to list already available datasets. Similar resources for MIR tools, papers and conferences can be found at the web page. Furthermore there exist an annual Music Information Retrieval Evaluation eXchange Contest during the ISMIR Conference, called MIREX, where groups can evaluate and compare the performance of their algorithms.

EDIT: look also through the Candidate Music IR Test Collections from Donald Byrd at Indiana University !

Syndicate content