TL;DR: we present efficient models for the task of Spoken Language Recognition plus an effective strategy to gracefully handle unsupported languages via multilabel classification.

The beautiful Island of Rhodes welcomed the most prestigious signal processing conference in the world this past week.

TL;DR: we thoroughly analyzed state-of-the-art audio-text multimodal models and they do not fully leverage natural language.

And finally, after 2 years with virtual-only ISMIRs, this year we had the first post-pandemic in-person ISMIR.

I keep seeing wonderful pictures from this year’s Burning Man, and they inspired me to share a terrible one, to counterbalance.

This year at ICASSP we’re presenting a novel technique to enhance a music signal via generative adversarial networks and diffusion probabilistic models.

And the year 12021 of the Human Era (i.e., 2021 AD, but let’s try not to be religious here) just ended.
Last month, I was invited to give a talk at MagniMind Academy.
