{"id":68,"date":"2023-01-02T15:22:47","date_gmt":"2023-01-02T14:22:47","guid":{"rendered":"https:\/\/project.inria.fr\/codex\/?page_id=68"},"modified":"2023-01-04T11:56:38","modified_gmt":"2023-01-04T10:56:38","slug":"activities","status":"publish","type":"page","link":"https:\/\/project.inria.fr\/codex\/activities\/","title":{"rendered":"Activities"},"content":{"rendered":"<p><\/p>\n<h2>Research Interests<\/h2>\n<ul>\n<li>Symbolic\u00a0Music information processing<\/li>\n<li>Quantitative extensions of formal language models<\/li>\n<li>Structured representations of\u00a0Music notation\n<ul>\n<li>Hierarchical representations of digital music scores<\/li>\n<li>\u2028Prior languages of music notation style<\/li>\n<\/ul>\n<\/li>\n<li>Search and Retrieval in symbolic musical content<\/li>\n<li>Similarity metrics and edit-distances<\/li>\n<\/ul>\n<h2>Research Topics<\/h2>\n<h4>Automated Music Transcription<\/h4>\n<p>Trained musicians,\u00a0while listening to music performances,\u00a0are able\u00a0to transcribe them into Common Western music notation. We are studying computational methods to automate this process by\u00a0<strong>parsing<\/strong> an input sequence of timed music events\u00a0(a\u00a0MIDI file) into\u00a0a structured music score in a format such as XML\/MEI.\u00a0Our approach\u00a0is\u00a0based on two main ingredients.<\/p>\n<p>First, we use <a href=\"https:\/\/hal.inria.fr\/hal-01857267v4\"><strong>quantitative language theoretical<\/strong><\/a>\u00a0models and techniques, in order to<\/p>\n<ul>\n<li>represent the notation style aimed for the output score, by means of weighted tree grammars,<\/li>\n<li>compute a distance between output sequence and output notation, with weighted pushdown transducers,<\/li>\n<li>compose the above weighted models, and apply optimization algorithms in order to extract the best (weighted) parsing solutions.<\/li>\n<\/ul>\n<p>Second, we created an abstract <strong>Intermediate Representation<\/strong> of music scores, hierarchical, that can be built from the (parse-trees) returned by the above parsing algorithms, and exported into XML score formats. The design of this IR is a research topic per se, described below.<\/p>\n<p>This method enables to study of theoretical foundations for the problem of music transcription, and it instantiates into several case studies, such as the following.<\/p>\n<p><strong>Monophonic transcription<\/strong><\/p>\n<p>In the case of melodies with at most one note sounding at a time, we were able to\u00a0improve the level of complexity and details of transcriptions, in particular for handling ornaments and rests,\u00a0<span style=\"font-family: 'Source Sans Pro';\">as illustrated in these <\/span><a href=\"https:\/\/qparse.gitlabpages.inria.fr\/docs\/examples\/\">examples<\/a><span style=\"font-family: 'Source Sans Pro';\">.<\/span><\/p>\n<p>The above examples were taken from a <a href=\"https:\/\/gitlab.inria.fr\/qparse\/corpus\">dataset<\/a> made of about 300 extracts from the classical repertoire, with increasing rhythmic and melodic complexity, provided as XML (scores) and MIDI (performance) files.\u00a0<span style=\"font-family: 'Source Sans Pro';\">We have built this dataset for the evaluation and training of our tools. It shall also be useful to the community as a landmark for the evaluation of the transcription of complex melodies.<\/span><\/p>\n<p><strong>Bassline transcription<\/strong><\/p>\n<p>In collaboration with John Xavier Riley from <a href=\"http:\/\/c4dm.eecs.qmul.ac.uk\">C4DM<\/a> at Queen Mary University London, we are working on an approach for <strong>end-to-end transcription<\/strong> of jazz basslines, following a 2-steps workflow: a MIDI file is first extracted from audio recordings of jazz standards, with source separation, pitch, and onset estimation techniques, and\u00a0<span style=\"font-family: 'Source Sans Pro';\">our transcription tools are then applied as a back-end to provide a music score. Significant improvements have been made to the technics developed in order to deal with the <strong>swing<\/strong> and the particular pitch-spelling issues (see below) corresponding to the case of jazz.\u00a0<\/span>Some preliminary results may be found <a href=\"https:\/\/gitlab.inria.fr\/qparse\/bassbook\">here<\/a> and <a href=\"https:\/\/gitlab.inria.fr\/lyrodrig\/pyqparse\/-\/tree\/main\/tests\/filobass\/dataset\">here<\/a>.<\/p>\n<p><strong>Drum transcription<\/strong><\/p>\n<p>The output of electronic drumkits can be recorded into MIDI files.\u00a0Google Magenta provided a <a href=\"https:\/\/magenta.tensorflow.org\/datasets\/groove\">dataset<\/a> of more than 13 hours of MIDI recordings of drummers\u00a0on a <a href=\"https:\/\/www.roland.com\/us\/products\/td-11\/\">Roland TD-11<\/a>\u00a0kit. A<span style=\"font-family: 'Source Sans Pro';\">fter some successful first experiments presented in these international <a href=\"https:\/\/hal.archives-ouvertes.fr\/hal-03815760v3\">workshop<\/a> and <a href=\"https:\/\/hal.archives-ouvertes.fr\/hal-03847232\">conference,<\/a>\u00a0w<\/span>e are now\u00a0<span style=\"font-family: 'Source Sans Pro';\">conducting a effort for the \u00a0transcription of this dataset into drum scores, based on our tools.\u00a0This work, in the context of the thesis of\u00a0Lydia Rodriguez-de la Nava, benefits from the expertise of\u00a0Martin Digard, a professional drummer holding a Master&#8217;s degree in Natural Language Processing from INALCO. It is also the first real-case application of our transcription approach to a polyphonic instrument.<\/span><\/p>\n<p><strong>Piano transcription<\/strong><\/p>\n<p>Scaling up from monophonic to polyphonic instruments is a difficult step in the context of transcription. This is one of the main topics of the\u00a0thesis of Lydia Rodriguez-de la Nava. She is developing voice-separation algorithms for this purpose and integrating them into our transcription framework and models. This work is helped by the use of the dataset <a href=\"https:\/\/hal-cnam.archives-ouvertes.fr\/hal-02929324\">ASAP,<\/a> made of linked piano scores and MIDI performances, published in\u00a0<span style=\"font-family: 'Source Sans Pro';\">collaboration with\u00a0Francesco Foscarin and Andrew Mc Leod.<\/span><\/p>\n<h4>Music Score Model<\/h4>\n<p><span style=\"font-family: 'Source Sans Pro';\">We are developing an abstract\u00a0<strong>Intermediate Representation<\/strong> of music scores, for various\u00a0Music Information Retrieval problems such as transcription or music score analysis.\u00a0<\/span>It enables us to deal with various score file formats (XML or plain text) in input or output, to reason on and apply transformations to music content without the hassle of decoding these formats.<\/p>\n<p>The main originality of our model is its <strong>tree structure<\/strong>, even at low levels (<em>e.g.<\/em> for the description of rhythms).\u00a0<span style=\"font-family: 'Source Sans Pro';\">A salient feature of music notation is indeed its hierarchical nature: events are grouped into bars, tuplets, and under beams, and durations are defined proportionally and composed with ties and dots.\u00a0<\/span><\/p>\n<p>This model is used in particular in our transcription procedures, for post-processing transformations, by <strong>term rewriting<\/strong> of the scores obtained.\u00a0<span style=\"font-family: 'Source Sans Pro';\">It\u00a0was also used for training automata models\u00a0on corpora of digital music scores. \u00a0<\/span><\/p>\n<h4>Voice separation<\/h4>\n<p>During her thesis, Lydia Rodriguez-de la Nava is developing algorithms for the separation of polyphonic music content into\u00a0voices,\u00a0differentiating the different melodies, or chords,\u00a0that we are able to perceive by listening.\u00a0This problem is crucial in the context of music transcription,\u00a0to produce easy-to-read scores, where\u00a0the lines of melodies, accompaniments, bass lines, <em>etc.<\/em>,\u00a0are clearly represented.<\/p>\n<p>She\u00a0proposes a voice separation algorithm,\u00a0which is based on principles of perception\u00a0and is flexible with respect to<br \/>\nthe genre of music and the instrument. It works, roughly, by searching a shortest path in a graph whose vertices are partitions of events into voices at every date. Some rules are defined in order to associate a cost to every such vertex, and another cost to transitions between vertices (at successive dates). This algorithm is evaluated on datasets of the <a href=\"https:\/\/web.mit.edu\/music21\/\">Music 21<\/a> toolkit and on our dataset <a href=\"https:\/\/hal-cnam.archives-ouvertes.fr\/hal-02929324\">ASAP<\/a>.<\/p>\n<h4>Pitch Spelling<\/h4>\n<p>The height of music notes (pitch) is represented in the <a href=\"https:\/\/mitpress.mit.edu\/9780262193948\/beyond-midi\/\">MIDI standard<\/a> by the corresponding number of piano keys (from 1 to 88, one semitone for each).\u00a0<span style=\"font-family: 'Source Sans Pro';\">In CW music notation, however, it is described by a note name (the distance between names is one tone or one semitone) and an optional\u00a0modifier, which is a positive or negative number of semitones (represented by sharp or flat symbols). Therefore, several notations are possible for each piano key (for instance <em>C#<\/em> or <em>Db<\/em> for 73). Choosing appropriate ones is not obvious as it depends on context (direction, tonal context&#8230;).<\/span><\/p>\n<p>We have been proposing two\u00a0approaches for the joint estimation of <strong>pitch spelling<\/strong> and <strong>key signatures<\/strong> from MIDI files.<\/p>\n<ul>\n<li>A first procedure, <a href=\"https:\/\/hal.archives-ouvertes.fr\/hal-03300102\">pkspe<\/a><a href=\"https:\/\/hal.archives-ouvertes.fr\/hal-03300102\">ll<\/a>, is\u00a0data-driven, based on the training of a deep recurrent neural network model on the <a href=\"https:\/\/hal-cnam.archives-ouvertes.fr\/hal-02929324\">ASAP<\/a>\u00a0piano dataset. It has been evaluated on <a href=\"https:\/\/hal-cnam.archives-ouvertes.fr\/hal-02929324\">ASAP<\/a>\u00a0and the\u00a0<a href=\"https:\/\/doi.org\/10.1080\/09298210600834961\">MuseData<\/a>\u00a0dataset\u00a0dedicated to\u00a0pitch spelling.<\/li>\n<li><span style=\"font-family: 'Source Sans Pro';\">A second more recent procedure is algorithmic, based on Dynamic Programming techniques for minimizing the number of symbols in notation. It is currently under evaluation.<\/span><\/li>\n<\/ul>\n<h4>Melodic similarity evaluation<\/h4>\n<p>We are studying metrics of similarity between melodies, defined as\u00a0<span style=\"font-family: 'Source Sans Pro';\"><strong>edit distances<\/strong> for character strings (ED) and labeled trees (TED). In particular, with Mathieu Giraud (<a href=\"http:\/\/www.algomus.fr\">Algomus<\/a> team, Lille), we have conducted an in-depth <\/span><a href=\"https:\/\/hal.inria.fr\/hal-01857267v4\">theoretical study<\/a><span style=\"font-family: 'Source Sans Pro';\"> on the computability of an ED\u00a0introduced in 1990 by <a href=\"https:\/\/doi.org\/10.1007\/BF00117340\">Mongeau and Sankoff<\/a> and\u00a0widespread in the\u00a0Music Information Retrieval community.<\/span><\/p>\n<p><span style=\"font-family: 'Source Sans Pro';\">Moreover, we are developing solutions for comparing digital music score files. This cannot be achieved reliably by running a simple Unix diff procedure on the two (XML) text files, due to the ambiguity and verbosity of the score formats (without mentioning incompatibilities between different formats, different versions of the same format and files produced by different software). W<\/span><span style=\"font-family: 'Source Sans Pro';\">ith Francesco Foscarin, we have proposed a procedure<\/span><span style=\"font-family: 'Source Sans Pro';\">\u00a0based on an ad-hoc similarity metric combining several ED and TED, and much more involved than for the case of text files (Unix diff utility), because of the complex structure of music scores. This approach has been presented\u00a0in this <a href=\"https:\/\/hal.inria.fr\/hal-02267454v2\">conference<\/a>\u00a0and this <a href=\"https:\/\/hal.inria.fr\/hal-02309923\">conference<\/a>.\u00a0<\/span><span style=\"font-family: 'Source Sans Pro';\">It is currently used in a c<\/span><span style=\"font-family: 'Source Sans Pro';\">ase study on Crowdsourced Correction of\u00a0Optical Music Recognition output for a musicological collection with IReMus lab.<\/span><\/p>\n<h4>Digital music score collections<\/h4>\n<p>All the above research topics find application in the constitution, management, and search and retrieval of databases of music scores, for cultural heritage preservation and study.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>","protected":false},"excerpt":{"rendered":"<p>Research Interests Symbolic\u00a0Music information processing Quantitative extensions of formal language models Structured representations of\u00a0Music notation Hierarchical representations of digital music scores \u2028Prior languages of music notation style Search and Retrieval in symbolic musical content Similarity metrics and edit-distances Research Topics Automated Music Transcription Trained musicians,\u00a0while listening to music performances,\u00a0are able\u00a0to&#8230;<\/p>\n<p> <a class=\"continue-reading-link\" href=\"https:\/\/project.inria.fr\/codex\/activities\/\"><span>Continue reading<\/span><i class=\"crycon-right-dir\"><\/i><\/a> <\/p>\n","protected":false},"author":2286,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-68","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/project.inria.fr\/codex\/wp-json\/wp\/v2\/pages\/68","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/project.inria.fr\/codex\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/project.inria.fr\/codex\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/project.inria.fr\/codex\/wp-json\/wp\/v2\/users\/2286"}],"replies":[{"embeddable":true,"href":"https:\/\/project.inria.fr\/codex\/wp-json\/wp\/v2\/comments?post=68"}],"version-history":[{"count":26,"href":"https:\/\/project.inria.fr\/codex\/wp-json\/wp\/v2\/pages\/68\/revisions"}],"predecessor-version":[{"id":206,"href":"https:\/\/project.inria.fr\/codex\/wp-json\/wp\/v2\/pages\/68\/revisions\/206"}],"wp:attachment":[{"href":"https:\/\/project.inria.fr\/codex\/wp-json\/wp\/v2\/media?parent=68"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}