Publications
For further stats and details, check out my Google Scholar Profile.
Peer Reviewed Articles
- Generative Audio Extension and Morphing, , Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv Demo
- Mix2Morph: Learning Sound Morphing From Noisy Mixes, , Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv Demo
- PromptSep: Generative Audio Separation Via Multimodal Prompting, , Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv Demo
- AudioCards: Structured Metadata Improves Audio Language Models For Sound Design, , Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv Demo
- Multi-Domain Audio Question Answering Benchmark Toward Acoustic Content Reasoning, , Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv
- SoundStager: Interactive Design of Story-Driven GenAI Soundscapes for Video, , Proc. of the ACM Conference on Human Factors in Computing Systems (CHI). Barcelona, Spain, 2026. PDF Video
- SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation, , Proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Tahoe City, CA, USA, 2025. arXiv Demo
- FLAM: Frame-Wise Language-Audio Modeling, , Proc. of the 47th International Conference on Machine Learning (ICML). Vancouver, BC, Canada, 2025. arXiv Code Demo
- Video-Guided Foley Sound Generation with Multimodal Controls, , The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR). Nashville, TN, USA, 2025. arXiv Demo
- Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs, , Proc. of the 13th International Conference on Learning Representations (ICLR). Singapore, 2025. arXiv Demo
-
🏆 Top 5.1% conference paper (spotlighted)MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark, , Proc. of the 13th International Conference on Learning Representations (ICLR). Singapore, 2025. arXiv Code Demo
- Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations, , Proc. of the 50th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Hyderabad, India, 2025. arXiv Demo
- ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds, , Proc. of the 50th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Hyderabad, India, 2025. arXiv Code
- Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning, , Proc. of the 25th International Society for Music Information Retrieval Conference (ISMIR). San Francisco, CA, USA, 2024. arXiv
-
🏆 Top 5% conference paper (oral)GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities, , Proc. of the 19th Empirical Methods in Natural Language Processing Conference (EMNLP). Miami, Florida, USA, 2024. arXiv Code Demo
- CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models, , Proc. of the 12th International Conference on Learning Representations (ICLR). Vienna, Austria, 2024. arXiv Code Demo
- Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries, , Proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz, NY, USA, 2023. arXiv Demo
- Efficient Spoken Language Recognition via Multilabel Classification, , Proc. of the 24th InterSpeech Conference. Dublin, Ireland, 2023. arXiv
-
🏆 Top 10% conference paper (highlighted)Language-Guided Audio-Visual Source Separation via Trimodal Consistency, , Proc. of the IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR). Vancouver, BC, Canada, 2023. arXiv Code
- Audio-Text Models Do Not Yet Leverage Natural Language, , Proc. of the 48th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Rhodes, Greece, 2023. arXiv
- Music Enhancement Via Image Translation and Vocoding, , Proc. of the 47th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Singapore, 2022. arXiv Code
- Deep Embeddings and Section Fusion Improve Music Segmentation, , Proc. of the 22nd International Society for Music Information Retrieval Conference (ISMIR), pp. 594-601, 2021. PDF
- Multimodal Metric Learning for Tag-Based Music Retrieval, , Proc. of the 46th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Toronto, Canada, 2021. arXiv
- Audio-Based Music Structure Analysis: Current Trends, Open Challenges, and Applications, , Transactions of the International Society for Music Information Retrieval (TISMIR), 3(1), pp. 246-263, 2020. DOI: 10.5334/tismir.54. PDF
- Mood Classification Using Listening Data, , Proc. of the 21st International Society for Music Information Retrieval Conference (ISMIR). Montreal, Quebec, Canada, 2020. arXiv
- Data-Driven Harmonic Filters For Audio Representation Learning, , Proc. of the 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2020. PDF
- The Harmonix Set: Beats, Downbeats, and Functional Segment Annotations of Western Popular Music, , Proc. of the 20th International Society for Music Information Retrieval Conference (ISMIR). Delft, The Netherlands, 2019. PDF Code
- Investigating Musical Pattern Ambiguity in a Human Annotated Dataset, , Proc. of the 15th International Conference on Music Perception and Cognition (ICMPC). Graz, Austria, 2018. PDF
-
🏆 Best Student PaperEnd-to-End Learning for Music Audio Tagging at Scale, , Proc. of the 19th International Society for Music Information Retrieval Conference (ISMIR). Paris, France, 2018. arXiv
- Multimodal Deep Learning for Music Genre Classification, , Transactions of the International Society for Music Information Retrieval (TISMIR), 2018. arXiv
- Predicting Audio Advertisement Quality, , Proc. of the 11th ACM International Conference on Web Search and Data Mining (WSDM), 2018. arXiv
- A Deep Multimodal Approach for Cold-start Music Recommendation, , Proc. of the 2nd Workshop on Deep Learning for Recommender Systems (DLRS), at RecSys. Como, Italy, 2017. arXiv
- Evaluating Hierarchical Structure in Music Annotations, , Frontiers in Psychology, 8, 2017. DOI: 10.3389/fpsyg.2017.01337. PDF
-
🏆 Best PresentationMulti-label Music Genre Classification from Audio, Text, and Images Using Deep Features, , Proc. of the 18th International Society of Music Information Retrieval Conference (ISMIR). Suzhou, China, 2017. arXiv
- Systematic Exploration of Computational Music Structure Research, , Proc. of the 17th International Society for Music Information Retrieval Conference (ISMIR). New York City, NY, USA, 2016. PDF Code
- Hierarchical Evaluation of Segment Boundary Detection, , Proc. of the 16th International Society for Music Information Retrieval Conference (ISMIR). Málaga, Spain, 2015. PDF
- librosa: Audio and Music Signal Analysis in Python, , Proc. of the 14th Python in Science Conference (SciPy). Austin, TX, USA, 2015. PDF
- Music Segment Similarity Using 2D-Fourier Magnitude Coefficients, , Proc. of the 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Florence, Italy, 2014. PDF
-
🏆 Best Poster PresentationMIR_EVAL: A Transparent Implementation of Common MIR Metrics., , Proc. of the 15th International Society for Music Information Retrieval Conference (ISMIR). Taipei, Taiwan, 2014. PDF
- Identifying Polyphonic Patterns from Audio Recordings Using Music Segmentation Techniques, , Proc. of the 15th International Society for Music Information Retrieval Conference (ISMIR). Taipei, Taiwan, 2014. PDF
- Embodying Theoretical Research in Music Cognition: Four Proposals for Theory-Driven Experimentation, , Proc. of the Annual Meeting of the Cognitive Science Society. Quebec City, Quebec, Canada, 2014. PDF
- Convex Non-Negative Matrix Factorization for Automatic Music Structure Identification, , Proc. of the 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vancouver, BC, Canada, 2013. PDF
- Data Driven and Discriminative Projections for Large-Scale Cover Song Identification, , Proc. of the 14th International Society for Music Information Retrieval Conference (ISMIR). Curitiba, Brazil, 2013. PDF
- Unsupervised Clustering of Extreme Vocal Effects, , Proc. of the 10th International Conference on Advances in Quantitative Laryngology, Voice and Speech Research (AQL). Cincinnati, OH, USA, 2013. PDF
- Fortissimo: Force-Feedback for Mobile Devices, , Proc. of the 13th International Conference on New Interfaces for Musical Expression (NIME). Daejeon and Seoul, Korea, 2013. PDF
- Even More Tactile Feedback for Mobile Devices, , Proc. of the 39th International Computer Music Conference (ICMC). Perth, Australia, 2013. PDF
- Perceptual Evaluation of Automatically Extracted Musical Motives, , Proc. of the 12th International Conference on Music Perception and Cognition (ICMPC), pp. 723-727. Thessaloniki, Greece, 2012. PDF
- Compressing Music Recordings Into Audio Summaries, , Proc. of the 13th International Society for Music Information Retrieval Conference (ISMIR), pp. 313-318. Porto, Portugal, 2012. PDF
Algorithms
- MIREX 2016 Entry: MSAF V0.1.0 Submission, , Music Information Retrieval Evaluation eXchange (MIREX). New York City, NY, USA, 2016. PDF Code
- MIREX 2014 Entry: 2D Fourier Magnitude Coefficients, , Music Information Retrieval Evaluation eXchange (MIREX). Taipei, Taiwan, 2014. PDF Code
- MIREX 2014 Entry: Music Segmentation Techniques and Greedy Path Finder Algorithm to Discover Musical Patterns, , Music Information Retrieval Evaluation eXchange (MIREX). Taipei, Taiwan, 2014. PDF Code
- MIREX 2014 Entry: Convex Non-negative Matrix Factorization, , Music Information Retrieval Evaluation eXchange (MIREX). Taipei, Taiwan, 2014. PDF Code
- MIREX 2013: Discovering Musical Patterns Using Audio Structural Segmentation Techniques, , Music Information Retrieval Evaluation eXchange (MIREX). Curitiba, Brazil, 2013. PDF
Theses
- Discovering Structure in Music: Automatic Approaches and Perceptual Evaluations, , New York University. PhD Dissertation, 2015. PDF Slides Video
- Voice Transformations for Extreme Vocal Effects, , Pompeu Fabra University. Master's Thesis, 2008. PDF
- Desenvolupament Open Source per a E-Learning-II, , Polytechnic University of Catalonia. Undergrad's Thesis, 2007. PDF
Selected Talks
- Project Sound Stager, , Adobe MAX Sneaks 2025. Los Angeles, CA, USA, 2025. Video
- GenAI for Sound Design, , Conversational AI Reading Group at Mila. Montreal, Quebec, Canada, 2025. Video
- Overview, Challenges, and Applications of Audio-based Music Structure Analysis, , Women in Music Information Retrieval Workshop (ISMIR). Virtual, 2021. Slides
- Music Recommendation with Waveform-based Architectures, , 4th Global AI Conference. Santa Clara, CA, USA, 2020. Slides
- Spectral Analysis and Detection of Extreme Vocal Effects (with CNNs), , Research Seminar. Universitat Pompeu Fabra. Barcelona, Spain, 2019. Slides
- Spectral Analysis and Detection of Extreme Vocal Effects, , 2nd International Symposium on Distorted Voices. São Paulo, Brazil, 2019. Slides
- Recommending Music with Waveform Architectures at Scale (Extended Version), , Seminar Series in Data Science. University of San Francisco. San Francisco, CA, USA, 2019. Slides
- Recommending Music with Waveform Architectures at Scale, , Deep Learning Barcelona Symposium. Pompeu Fabra University. Barcelona, Spain, 2018. Slides Video
- Cold-Start Music Recommendation Using Multimodal Deep Architectures, , Systematic Approaches to Deep Learning Methods for Audio. Erwin Schrödinger Institute, University of Vienna. Vienna, Austria, 2017. PDF
- Long Tail Music Recommendation Using Deep Architectures, , International Workshop on Deep Learning for Music (IJCNN). Anchorage, AK, USA, 2017. PDF
- Deep Learning for Large-Scale Music Recommendation, , Data-Driven Research in Music Cognition. Stanford University. Stanford, CA, USA, 2017. PDF
- Deep Learning for Music Recommendation: Machine Listening and Collaborative Filtering, , Seminar on Music Knowledge Extraction Using Machine Learning. Pompeu Fabra University. Barcelona, Spain, 2016. PDF
- Deep Learning for Large Scale Music Recommendation, , Biostat Seminar. Stanford University. Stanford, CA, USA, 2016. PDF
- Multiple Annotations and Subjectivity in the Identification of Segment Boundaries in Music, , Cognitive Music Information Retrieval (CogMIR). Toronto, ON, Canada, 2014. PDF
- Music Segment Similarity Using 2D-Fourier Magnitude Coefficients, , North East Music Information Special Interest Group (NEMISIG). New York, NY, USA, 2014. PDF
- A Perceptually Based Evaluation of Music Boundaries, , Cognitive Music Information Retrieval (CogMIR). Toronto, ON, Canada, 2013. PDF
- Music Structure Analysis and New Musical Interfaces, , Pompeu Fabra University. Barcelona, Spain, 2013. PDF
- Music Structure Analysis by Matrix Factorization, , North East Music Information Special Interest Group (NEMISIG). Boston, MA, USA, 2013. PDF
Music
- La Bossa d'Urina: El Primer Disc, , Published by Record Union, 2022. Pandora Spotify Amazon
- Rumbahía: Casi al Compás, , Published by CDBaby, 2021. Pandora Spotify Amazon
- Rumbahía: Aprendiendo, , Published by CDBaby, 2019. Pandora Spotify Amazon
- La Bossa d'Urina: Merda Fina, , Published by Record Union, 2018. Pandora iTunes Spotify Amazon
- Arkaen: Arkaen, , Published by Record Union, 2017. Pandora iTunes Spotify Amazon
- La Bossa d'Urina: La Bossa d'Urina, , Published by Cydonia Records, 2015. Pandora iTunes Spotify Amazon
- Sargon: Vida, , Album edited by Weight Recordings, 2009. Pandora iTunes Spotify Amazon
- Sargon: Transcriptions, , Album edited by Big Bang Records, 2005. iTunes Spotify
Other
- Automatic Music Tagging with Harmonic CNN, , Late Breaking Session of the International Society for Music Information Retrieval Conference (ISMIR). Delft, The Netherlands, 2019. PDF
- MSAF: Music Structure Analysis Framework, , International Society for Music Information Retrieval Conference (ISMIR). Málaga, Spain, 2015. PDF
- 2013 Late Break Session on Music Segmentation, , Proc. of the 14th International Society for Music Information Retrieval Conference (ISMIR). Curitiba, Brazil, 2013. PDF
- Late-break Session on Music Structure Analysis, , Proc. of the 13th International Society for Music Information Retrieval Conference (ISMIR). Porto, Portugal, 2012. PDF
- Sistemas Operativos: Cuaderno de Laboratorio, , Department of Computer Architecture. Polytechnic University of Catalonia, 2007.