This page provides a list of the curated corpora used in Lrn2Cre8. For each corpus, we provide a brief description and, if applicable, a link where the data can be downloaded.
OPNDV files of the first book of J. S. Bach's Das Wohltemperierte Clavier
This archive contains OPNDV-format files for each of the movements in the first book of J. S. Bach's Das Wohltemperierte Clavier. The OPNDV format is described on pages 13-21 of David Meredith's D.Phil. thesis, which is available here. In brief, each file is a Lisp-style list of 4-tuples, such as
((4 FS4 4 2) (8 D4 4 2) (12 B3 4 2) (16 G4 4 2) (20 FS4 4 2) (24 B4 4 2)
(28 AS4 4 2) (32 E4 4 2) (36 DS4 4 2) (40 C5 4 2) (44 B4 4 2) (48 FS4 4 2)
(2400 B2 32 5) (2400 B3 32 3) (2400 DS4 32 2) (2400 B4 32 1))
Each tuple represents a note or sequence of tied notes in the notated score of the piece. The first element in each tuple is the onset time of the note in tatums. The second element gives the pitch name of the note in the format described on pages 37ff. of Meredith's thesis. The third element is the duration of the note in tatums and the fourth element is a natural number indicating the voice to which the note belongs. The fourth element may not be present if the voice of each note is not unambiguously indicated in the score. Java code for processing pitch names can be found here. Java code for reading OPNDV files can be found here (see, specifically, the method called "fromOPND").
Lounge Corpus (Sony)
"Small Corpus", 112 pieces from Café del Mar, Hotel Coste and Buddha Bar. For manual analysis.
"Large Corpus", 1112 pieces from Café del Mar, Hotel Coste and Buddha Bar. For automatic analysis.
"Literal transcriptions", 24 pieces from the small corpus, entirely transcribed in MIDI and remade in multi-track audio.
Magaloff Corpus (OFAI)
A dataset comprising MIDI recordings of the majority of Chopin's piano works, live performed by Nikita Magaloff on a Bösendorfer SE computer-equipped grand piano, in a series of 6 concerts in 1989, in the Konzerthaus, in Vienna. The performances have been aligned to the corresponding symbolic scores of the pieces (on the note level). The following statistics give an overview of the data:
|Playing time||10h 7m 52s|
|Matched grace notes||4289|
|Omitted grace notes||449|
This data set is proprietary. The recordings of Magaloffs performances are property of Irène Magaloff. OFAI is licensed to use the data for research purposes.
Zeilinger Corpus (OFAI)
Symbolic Scores of all Beethoven Piano Sonatas (32 pieces). Audio and Symbolic Recordings of 9 Complete Sonatas. Recorded Jan 3-5 2013, on a Bösendorfer CEUS 290 at the Bruckner University Linz. Performed by Clemens Zeilinger (op109 played by Hanoko, a pupil of C. Zeilinger).
Creator: Sebastian Flossmann
|alignment/alignmentfiles||Score-performance alignments. Files link identifiers of notes in the XML score to note numbers in Zeilingers performance midi.|
|alignment/matchfiles||Score-performance alignments in match-format|
|performances/audio/takes||Audio recordings of all takes|
|performances/audio/audio_final||Final cut versions of all played sonatas|
|performances/audio/createdMixed||Final audio + audio synthetized from the cut midi files (mixed left/right)|
|performances/boesendorfer||The original Bösendorfer files|
|performances/midi_dp||Deadpan midi renderings of the xml scores, following the repetition structure prescribed by the score|
|performances/midi_dp_sharpeye||Deadpan midi renderings exported from SharpEye OMR software|
|performances/midi_zeilinger||Zeilinger's performances as MIDI files.|
|scores/coordinates||Geometrical information of all notes in the xml files (note ID, page number, x-coordinate, y-coordinate)|
|scores/images||Scanned score images|
|scores/mro||SharpEye files after scanning + correction|
|scores/rawXML||XML Files without coordinates (SharpEye after corrections)|
|scores/scr||Scores with repetition structure as printed ("left half" of matchfiles)|
|scores/xml||(Enhanced) XML Files of the scores.|
|scores/sco_zeilinger||Scores with repetition structure as played by C. Zeilinger|
This corpus is proprietary and cannot be distributed.
Uplifting Trance Anthem Loop Corpus (EHU)
This corpus contains 100 loops from uplifting trance anthems, derived from uplifting trance mixes from youtube (search for "uplifting trance"). All chords have diatonic spelled roots, and the corpus contains only major and minor triads.