The Best of ISMIR 2022
And finally, after 2 years with virtual-only ISMIRs, this year we had the first post-pandemic in-person ISMIR. And what a fantastic one it has been! My mind is still in constant explosion mode due to the infinity of this beautiful country –that should actually be its own continent due to its sheer amount of foods, cultures, languages, regions, religions, music, etc– called India.
Top 10 Papers
Here I share my top 10 favorite works presented at this year’s ISMIR. Hard to choose only 10, but here we go. Enjoy, titans!
Musika! Fast Infinite Waveform Music Generation (Pasini & Schlüter)
Jan Schlüter doing his magic again, this time with Marco Pasini leading the effort! These titans propose a way to generate music in the waveform domain in an extremely efficient way (several orders of magnitude faster than real time, even on a consumer-grade CPU!). Check out their demo page. All their source code is available online.
Toward postprocessing-free neural networks for joint beat and downbeat estimation (Chen & Su)
I really enjoyed how this work addresses the post-processing part of beat trackers. Post-processing is usually an under-explored part of beat trackers that typically gives a significant boost in performance. Chen & Su propose an end-to-end solution that gives results comparable to SOTA.
Performance MIDI-to-score conversion by neural beat tracking (Liu et al.) Best Paper Award
The titans from TikTok decided to improve performance MIDI to score transcription via beat tracking. Performance MIDI is the one typically generated on-the-fly when performing. Using a beat tracker for this task is simple, smart, and extremely effective. No wonder why they won best paper award, totally deserve it!
Multi-instrument Music Synthesis with Spectrogram Diffusion (Hawthorne et al.)
The folks at Google propose this new neural-based synthesizer to generate high quality waveforms from MIDI files. They play around with a DDPM to generate the spectrogram from the MIDI files, and they show that it is superior than using regular Transformers. Check out their source code and demos here.
Supervised and Unsupervised Learning of Audio Representations for Music Understanding (McCallum et al.)
Well, the folks from Pandora did it again! They trained massive supervised and unsupervised models, and showed that their embeddings are so powerful that obtain SOTA in several MIR metrics. Their unsupervised model (named MULE after Unsupervised Large Embeddings) is open source and its weights are also available here.
Contrastive Audio-Language Learning for Music (Manco et al.)
The intersection of text and music is a fascinating area of research that is gaining more and more attention. This work proposes MusCALL, a framework for Music Contrastive Audio-Language Learning, where one could use natural language to query a large music collection to retrieve the desired track or tracks. Interesting how the authors proposed a weighted version of the InfoNCE loss. Cool stuff, all available in open source here.
Music Separation Enhancement with Generative Modeling (Schaffer et al.)
The titans at Descript and Northwestern propose this method called Make it Sound Good (MSG) that enhances the output of source separation techniques by hallucinating additional components to make them sound more realistic. I always thought that regular source separation techniques needed a little boost, because simply extracting parts of the signal might be insufficient for good quality content (the signal might be too buried in the mix). Check out their demo here.
Sonus Texere! Automated Dense Soundtrack Construction for Books using Movie Adaptations (Shriram et a.) Brave New Idea Award
I love that there’s a new “Brave New Idea Award,” and that this paper won it! This work proposes a simple technique to add soundtracks to books, based on their texts. I think the topic of books (particularly audiobooks) is highly under-explored, and it makes me happy to see this kind of contributions in ISMIR. Congrats on the award!
Traces of Globalization in Online Music Consumption Patterns and Results of Recommendation Algorithms (Lesota et al.) Best Student Paper Award
I love this! Cultural Imperialism is real, and in this work, the authors show how music streaming services tend to overindex on tracks from US and underindex on music from other countries. Who wants to live in a globalized world where music is dominated by a single country? Not me! This paper motivates us to address such biases.
Automatic music mixing with deep learning and out-of-domain data (Martínez-Ramírez et al.)
Interesting approach to the fascinating problem of automatic mixing, where the authors propose to leverage “wet” instrument recordings to train a model that later can input dry inputs to generate a final mix. Check out the code and demos here.
Special Mentions et al.
I did not want to end without mentioning the best special call award, which this year was given to the work titled “Raga Classification From Vocal Performances Using Multimodal Analysis” by Clayton et al. I love how they showed that by using video data, they are able to better classify Indian ragas. Really cool multimodal stuff!
Finally, and as it was finally announced during the last day of the conference, next ISMIR is going to be in Milano. And ISMIR 2024 will be… in the San Francisco Bay Area organized by Blair Kaneshiro, Gautham Mysore, and Yours Truly! I’m beyond excited to have the chance of organize such a conference. We have so much work ahead, but I can’t wait to hang out with all my fellow MIR researchers in California in a couple of years!
But before that, see you next year in Milano, dearest titans!