Curated corpora

This page provides a list of the curated corpora used in Lrn2Cre8. For each corpus, we provide a brief description and, if applicable, a link where the data can be downloaded.

OPNDV files of the first book of J. S. Bach's Das Wohltemperierte Clavier

This archive contains OPNDV-format files for each of the movements in the first book of J. S. Bach's Das Wohltemperierte Clavier. The OPNDV format is described on pages 13-21 of David Meredith's D.Phil. thesis, which is available here. In brief, each file is a Lisp-style list of 4-tuples, such as

((4 FS4 4 2) (8 D4 4 2) (12 B3 4 2) (16 G4 4 2) (20 FS4 4 2) (24 B4 4 2)
(28 AS4 4 2) (32 E4 4 2) (36 DS4 4 2) (40 C5 4 2) (44 B4 4 2) (48 FS4 4 2)
(2400 B2 32 5) (2400 B3 32 3) (2400 DS4 32 2) (2400 B4 32 1))

Each tuple represents a note or sequence of tied notes in the notated score of the piece. The first element in each tuple is the onset time of the note in tatums. The second element gives the pitch name of the note in the format described on pages 37ff. of Meredith's thesis. The third element is the duration of the note in tatums and the fourth element is a natural number indicating the voice to which the note belongs. The fourth element may not be present if the voice of each note is not unambiguously indicated in the score. Java code for processing pitch names can be found here. Java code for reading OPNDV files can be found here (see, specifically, the method called "fromOPND").

Lounge Corpus (Sony)

"Small Corpus", 112 pieces from Café del Mar, Hotel Coste and Buddha Bar. For manual analysis.

"Large Corpus", 1112 pieces from Café del Mar, Hotel Coste and Buddha Bar. For automatic analysis.

"Literal transcriptions", 24 pieces from the small corpus, entirely transcribed in MIDI and remade in multi-track audio.

Magaloff Corpus (OFAI)

A dataset comprising MIDI recordings of the majority of Chopin's piano works, live performed by Nikita Magaloff on a Bösendorfer SE computer-equipped grand piano, in a series of 6 concerts in 1989, in the Konzerthaus, in Vienna. The performances have been aligned to the corresponding symbolic scores of the pieces (on the note level). The following statistics give an overview of the data:

Pieces/Movements 155
Score pages 930
Score notes 328.800
Performed notes 335.542
Playing time 10h 7m 52s
Matched notes 318.112
Inserted notes 12.325
Omitted notes 11.506
Substituted notes 5.105
Matched grace notes 4289
Omitted grace notes 449
Trill notes 5923

This data set is proprietary. The recordings of Magaloffs performances are property of Irène Magaloff. OFAI is licensed to use the data for research purposes.

Zeilinger Corpus (OFAI)

Symbolic Scores of all Beethoven Piano Sonatas (32 pieces). Audio and Symbolic Recordings of 9 Complete Sonatas. Recorded Jan 3-5 2013, on a Bösendorfer CEUS 290 at the Bruckner University Linz. Performed by Clemens Zeilinger (op109 played by Hanoko, a pupil of C. Zeilinger).

Creator: Sebastian Flossmann


alignment/alignmentfiles Score-performance alignments. Files link identifiers of notes in the XML score to note numbers in Zeilingers performance midi.
alignment/matchfiles Score-performance alignments in match-format
performances/audio/takes Audio recordings of all takes
performances/audio/audio_final Final cut versions of all played sonatas
performances/audio/createdMixed Final audio + audio synthetized from the cut midi files (mixed left/right)
performances/boesendorfer The original Bösendorfer files
performances/midi_dp Deadpan midi renderings of the xml scores, following the repetition structure prescribed by the score
performances/midi_dp_sharpeye Deadpan midi renderings exported from SharpEye OMR software
performances/midi_zeilinger Zeilinger's performances as MIDI files.
scores/coordinates Geometrical information of all notes in the xml files (note ID, page number, x-coordinate, y-coordinate)
scores/images Scanned score images
scores/mro SharpEye files after scanning + correction
scores/rawXML XML Files without coordinates (SharpEye after corrections)
scores/scr Scores with repetition structure as printed ("left half" of matchfiles)
scores/xml (Enhanced) XML Files of the scores.
scores/sco_zeilinger Scores with repetition structure as played by C. Zeilinger

This corpus is proprietary and cannot be distributed.

Uplifting Trance Anthem Loop Corpus (EHU)

This corpus contains 100 loops from uplifting trance anthems, derived from uplifting trance mixes from youtube (search for "uplifting trance"). All chords have diatonic spelled roots, and the corpus contains only major and minor triads.