The Best of ISMIR 2021

Mon, Nov 15, 2021 10-minute read

Last Friday, the 22nd International Society for Music Information Retrieval Conference ended. And what a fantastic experience it was! I had the privilege to serve as the social chair, and I got to organize Yoga (with Lisa Light), Trivia (with Rachel Bittner), music shows (with Sabertooth Unicorn), and the first ever music jam session in a conference where several musicians played live through the internets thanks to JackTrip!

Once again, ISMIR proves to be my favorite conference thanks to its amazingly welcoming and talented community. I would say diversity and ethics are playing a more prominent (and necessary!) role in the conference, and feels like the field is moving towards an even more mature understanding and application of artificial intelligence in music.

Without further ado, here you have my top 10 favorite papers (highly biased towards my background, i.e., music tagging, structure analysis, music embeddings, music recommendation), with especial mention to tutorials, keynotes, and other satellite events.

See you next year in Bengaluru (pandemics permitting)!

Top 10 Papers

Artist Similarity Using Graph Neural Networks by Filip Korzeniowski, Sergio Oramas, and Fabien Gouyon

This work, which won the best poster presentation award and was nominated to best paper award, makes use of Graph Convolutional Neural Networks to add hierarchical relationships to the artist similarity problem. Really elegant solution with a clear increase in performance to this MIR task (even slightly better when combined with low-level audio features). Also loving the acronym of the dataset they put together: OLGA (Oh, what a Large Graph of Artists! –lol).

Codified audio language modeling learns useful representations for MIR by Rodrigo Castellon, Chris Donahue, and Percy Liang

I’m a big fan of this work, which was nominated to best paper. The authors explore features from Jukebox and show how they can further use them in shallow models to obtain around 30% increase in performance on average (compared to MusiCNN) on several MIR tasks: tagging, genre classification, key detection, and mood recognition. Fantastic work that shows how to work on MIR tasks with limited labeled data where transfer learning fails. BERT for music, more or less :-)

Contrastive Learning of Musical Representations by Janne Spijkervet and John Ashley Burgoyne

Loving this trend of self-supervision on music. In this paper, the authors use contrastive learning in a self-supervised fashion (i.e., no labels required), and they show their performance on the music tagging task in MagnaTagATune and the Million Song Dataset. Moreover, they also show how effective in terms of computation these learned embeddings are for music classification. While the overall work is excellent, I need to point out the amazing repo they put together to reproduce their results: Oh, and they also wrote a great blog post about it!

Cross-cultural Mood Perception in Pop Songs and its Alignment with Mood Detection Algorithms by Harin Lee, Frank Hoeger, Marc Schoenwiesner, Minsu Park, and Nori Jacoby

Interesting study that shows how, generally, mood algorithms correlate with actual human ratings. While the study is limited in terms of cultural diversity (it’s only focused on people from Brazil, South Korea, and the US) and music genres (pop music only), it shows that, at least within these countries and music genre, the aggregation of mood algorithms could be used as an objective metric. The study is done over 166 participants, which is quite a large number for an ISMIR publication (music perception and cognition people cry on the inside).

DadaGP: A Dataset of Tokenized GuitarPro Songs for Sequence Models by Pedro Sarmento, Adarsh Kumar, CJ Carr, Zack Zukowski, Mathieu Barthet, and Yi-Hsuan Yang

GuitarPro data have been released! I remember back in 2013 and 2014 when Eric Humphrey was playing with it. Since then, I was hoping that somebody would release such rich and powerful data, and I’m so happy that they did that in this work. Such data can open up so many possibilities not only in the domain of music generation (focused on guitar solos!) but also in terms of chord recognition and even music segmentation (if the authors release segmentation data associated with GP files). Not sure what the titans at DadaBots will do with these data, but whatever it is, I’m sure they’ll keep blowing minds.

De-centering the West: East Asian Philosophies and the Ethics of Applying Artificial Intelligence to Music by Rujing Huang, Bob Sturm, and Andre Holzapfel

This year ISMIR had a cultural diversity special call for papers –a wonderful idea I would say– and this work won the best special call award. And so well-deserved! The authors propose a framework of ethical guidelines for music AI applications in East Asia, making connections with Confucianism, Budhism, and other non-Western philosophies. This is such important work, that brings a breath of fresh air to the MIR community

Leveraging Hierarchical Structures for Few-Shot Musical Instrument Recognition by Hugo Flores Garcia, Aldo Aguilar, Ethan Manilow, and Bryan Pardo

This year’s best paper award was given to this wonderful piece of work. The authors propose the usage of prototypical networks to define a set of hierarchies to improve instrument recognition (e.g., a trumpet is more closely related to a trombone than to a guitar). Moreover, they show how by adding this hierarchical information with a simple and tunable hierarchical loss, the models make less severe mistakes. The idea is so simple, their implementation so elegant, and their results so effective, that it is easy to see why this work got its so well-deserved award.

Semi-supervised Music Tagging Transformer by Minz Won, Keunwoo Choi, and Xavier Serra

Really interesting application of Transformers for music tagging, showing a slight increase in performance with a significant reduction in terms of model parameters. I liked the “noisy student training” process to leverage unlabeled data, so that a limited amount of labeled data is needed to get SOTA results. Feels like tricks from the NLP community are landing strongly on MIR this year.

Supervised Metric Learning For Music Structure Features by Ju-Chiang Wang, Jordan B. L. Smith, Wei-Tsung Lu, and Xuchen Song

Music segmentation is back with a vengeance! The titans from ByteDance trained a model using metric learning that obtains representations for music structure analysis that make the most popular existing methods significantly better. This is not too different than our work at this year’s ISMIR, which, in turn is quite inspired by the great work that McCallum did back in 2019. Standing on the shoulders of giants (or, rather, titans).

The Words Remain the Same: Cover Detection with Lyrics Transcription by Andrea Vaglio, Romain Hennequin, Manuel Moussallam, and Gaël Richard

Finally a modern approach to cover song detection using lyrics. Intuitively, I always thought that lyrics would be one of the strongest signals to identify covers. And this paper shows precisely that, even when using noisy lyrics extracted using singing lyrics detection. Moreover, the authors show that these data are complementary to tonal-based features. Once again, cool stuff from the good people at Deezer and Télécom Paris.

Special Mentions

It was quite challenging to select 10 papers only, so here are a couple more that I also think deserve a special mention:

Really interesting and original MIR application: to identify the best piece of music given the text of a story. Winner of the best student paper award.

Another cool multimodal work: to better understand the relationship between music and its respective music video. I really liked the usage that they gave to the Harmonix Set (something that we never thought of when we put it together!). Winner of the best student poster presentation award (such a hilarious video they put together!).


Here I give a brief description of the tutorials I attended. Overall, I was so impressed by the high quality of their presentations and their companion materials. So, so good. NB: I actually wish I could’ve attended the tutorial on “Tempo, Beat, and Downbeat Estimation” too, but it overlapped with another tutorial and there are only so many hours in a day!

Programming MIR Baselines from Scratch: Three Case Studies by Rachel Bittner, Mark Cartwright, and Ethan Manilow

Stellar line-up, and really great way to get your MIR projects going. I loved how Ethan talked about matrix factorization as a more classical approach to source separation (I have a sweet spot for NMF :)). And then Rachel just blew everybody’s minds by training a pitch detector from scratch that worked amazingly well. ALL LIVE. And finally Mark closes it up by showing how to train/fine-tune models to do instrument recognition. Highly, highly recommended.

Music Classification: Beyond Supervised Learning, Towards Real-world Applications by Minz Won, Janne Spijkervet, and Keunwoo Choi

Another rock-star line-up here! And in this case… holy smokes, they wrote a book! This is definitely my jam. So much great recommendations for music classification, with the latest advances in music understanding and fancy transformer architectures and self-supervision. I’ll add the book to my “To Read” List on Goodreads.

Designing Generative Models for Interactive Co-creation by Anna Huang, Jon Gillick, and Chris Donahue

That was the last tutorial for the day, and again, I was highly impressed by its quality. I’m a fan of Anna Huang’s work at Magenta, and it was really interesting to see all the different designing principles when designing new potential instruments, as discussed by Jon and Chris. The Bach Doodle by Anna was pretty mind-blowing (even when I had already seen it before!).


There were three main keynotes this year (all of them by highly influential women in the field, nice job organizers!!), and I definitely recommend watching them all. However, I will give a special mention to Dr. Laurel Smith Pardue for the immense keynote entitled Improving Diversity and Inclusivity Through Action: Less Talk, More Action! Dr. Pardue gave an overview of the initiatives they have been implementing in NIME during these past few years, and challenged our community to do more to make MIR an even more thriving research field by increasing its diversity. Watching this keynote made me hopeful about the future of MIR, even when the demographics of this year’s ISMIR are, once again, far from representing our beautifully diverse world as a whole.

WiMIR Workshop

I wanted to conclude with the satellite event that WiMIR hosted. This workshop had several project guides talking about relevant problems in MIR, and aimed to encourage people from underrepresented communities (e.g., women, LGBTQ+, BIPOC) to pursue research in the field of MIR. I was lucky enough to be invited as one of these guides, thank you WiMIR, I had a blast! As a reminder, the mentorship program is open for mentors and mentees to sign up, so do it you titan, it’s such an amazing program that can be both rewarding for the mentor and mentee. Seriously, thanks to WiMIR, the MIR field is getting more diverse every year. Kudos to that!


ISMIR is awesome. Even when it has to be hosted on the Internets. I can’t wait to see the beautiful faces of this community next year in Bangalore. After almost two years with this terrible virus among us, I think we all deserve to hang out in real life and celebrate this amazing life. And I can’t think of a better group of humans to do it with than these passionate musicians who happen to also know how to computationally dissect all of these sonic experiences to make a world a better place.