Publications

For further stats and details, check out my Google Scholar Profile.

Peer Reviewed Articles

Generative Audio Extension and Morphing, Seetharaman, P.,* Nieto, O.,* Salamon, J., Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv Demo
Mix2Morph: Learning Sound Morphing From Noisy Mixes, Chu, A., Flores-García, H., Nieto, O., Salamon, J., Pardo, B., Seetharaman, P., Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv Demo
PromptSep: Generative Audio Separation Via Multimodal Prompting, Wen, Y., Chen, K., Seetharaman, P., Nieto, O., Su, J., Kumar, R., Kim, M., Smaragdis, P., Jin, Z., Salamon, J., Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv Demo
AudioCards: Structured Metadata Improves Audio Language Models For Sound Design, Sridhar, S., Seetharaman, P., Nieto, O., Cartwright, M., Salamon, J., Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv Demo
Multi-Domain Audio Question Answering Benchmark Toward Acoustic Content Reasoning, Yang, C.-H. H., Ghosh, S., Wang, Q., Kim, J., Hong, H., Kumar, S., Zhong, G., Kong, Z., Sakshi, S., Lokegaonkar, V., Nieto, O., Duraiswami, R., Manocha, D., Kim, G., Du, J., Valle, R., Catanzaro, B., Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv
SoundStager: Interactive Design of Story-Driven GenAI Soundscapes for Video, Yoo, S., Hernandez-Sebastian, A., Seetharaman, P., Salamon, J., Nieto, O., Truong, A., Proc. of the ACM Conference on Human Factors in Computing Systems (CHI). Barcelona, Spain, 2026. PDF Video
SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation, Kumar, S., Seetharaman, P., Salamon, J., Manocha, D., Nieto, O., Proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Tahoe City, CA, USA, 2025. arXiv Demo
FLAM: Frame-Wise Language-Audio Modeling, Wu, Y., Tsirigotis, C., Chen, K., Huang, C. A., Courville, A., Nieto, O., Seetharaman, P., Salamon, J., Proc. of the 47th International Conference on Machine Learning (ICML). Vancouver, BC, Canada, 2025. arXiv Code Demo
Video-Guided Foley Sound Generation with Multimodal Controls, Chen, Z., Seetharaman, P., Russell, B., Nieto, O., Bourgin, D., Owens, A., Salamon, J., The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR). Nashville, TN, USA, 2025. arXiv Demo
Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs, Ghosh, S., Evuru, C. K. R., Kumar, S., Tyagi, U., Nieto, O., Jin, Z., Manocha, D., Proc. of the 13th International Conference on Learning Representations (ICLR). Singapore, 2025. arXiv Demo
🏆 Top 5.1% conference paper (spotlighted)
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark, Sakshi, S., Tyagi, U., Kumar, S., Seth, A., Selvakumar, R., Nieto, O., Duraiswami, R., Ghosh, S., Manocha, D., Proc. of the 13th International Conference on Learning Representations (ICLR). Singapore, 2025. arXiv Code Demo
Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations, Flores García, H., Nieto, O., Salamon, J., Pardo, B., Seetharaman, P., Proc. of the 50th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Hyderabad, India, 2025. arXiv Demo
ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds, Ghosh, S., Kumar, S., Evuru, C. K. R., Nieto, O., Duraiswami, R., Manocha, D., Proc. of the 50th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Hyderabad, India, 2025. arXiv Code
Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning, Manco, I., Salamon, J., Nieto, O., Proc. of the 25th International Society for Music Information Retrieval Conference (ISMIR). San Francisco, CA, USA, 2024. arXiv
🏆 Top 5% conference paper (oral)
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities, Ghosh, S., Kumar, S., Seth, A., Kiran, C., Evuru, R., Tyagi, U., Sakshi, S., Nieto, O., Duraiswami, R., Manocha, D., Proc. of the 19th Empirical Methods in Natural Language Processing Conference (EMNLP). Miami, Florida, USA, 2024. arXiv Code Demo
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models, Ghosh, S., Seth, A., Kumar, S., Tyagi, U., Evuru, C. K., Ramaneswaran, S., Sakshi, S., Nieto, O., Duraiswami, R., Manocha, D., Proc. of the 12th International Conference on Learning Representations (ICLR). Vienna, Austria, 2024. arXiv Code Demo
Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries, Wilkins, J., Salamon, J., Fuentes, M., Bello, J. P., Nieto, O., Proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz, NY, USA, 2023. arXiv Demo
Efficient Spoken Language Recognition via Multilabel Classification, Nieto, O., Jin, Z., Dernoncourt, F., Salamon, J., Proc. of the 24th InterSpeech Conference. Dublin, Ireland, 2023. arXiv
🏆 Top 10% conference paper (highlighted)
Language-Guided Audio-Visual Source Separation via Trimodal Consistency, Tan, R., Burns, A., Ray, A., Plummer, B. A., Nieto, O., Salamon, J., Russell, B., Saenko, K., Proc. of the IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR). Vancouver, BC, Canada, 2023. arXiv Code
Audio-Text Models Do Not Yet Leverage Natural Language, Wu, H., Nieto, O., Bello, J. P., Salamon, J., Proc. of the 48th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Rhodes, Greece, 2023. arXiv
Music Enhancement Via Image Translation and Vocoding, Kandpal, N., Nieto, O., Jin, Z., Proc. of the 47th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Singapore, 2022. arXiv Code
Deep Embeddings and Section Fusion Improve Music Segmentation, Salamon, J., Nieto, O., Bryan, N. J., Proc. of the 22nd International Society for Music Information Retrieval Conference (ISMIR), pp. 594-601, 2021. PDF
Multimodal Metric Learning for Tag-Based Music Retrieval, Won, M., Oramas, S., Nieto, O., Gouyon, F., Serra, X., Proc. of the 46th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Toronto, Canada, 2021. arXiv
Audio-Based Music Structure Analysis: Current Trends, Open Challenges, and Applications, Nieto, O., Mysore, G. J., Wang, C.-i., Smith, J. B. L., Schlüter, J., Grill, T., McFee, B., Transactions of the International Society for Music Information Retrieval (TISMIR), 3(1), pp. 246-263, 2020. DOI: 10.5334/tismir.54. PDF
Mood Classification Using Listening Data, Korzeniowski, F., Nieto, O., McCallum, M., Won, M., Oramas, S., Schmidt, E., Proc. of the 21st International Society for Music Information Retrieval Conference (ISMIR). Montreal, Quebec, Canada, 2020. arXiv
Data-Driven Harmonic Filters For Audio Representation Learning, Won, M., Chun, S., Nieto, O., Serra, X., Proc. of the 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2020. PDF
The Harmonix Set: Beats, Downbeats, and Functional Segment Annotations of Western Popular Music, Nieto, O., McCallum, M., Davies, M., Robertson, A., Stark, A., Egozy, E., Proc. of the 20th International Society for Music Information Retrieval Conference (ISMIR). Delft, The Netherlands, 2019. PDF Code
Investigating Musical Pattern Ambiguity in a Human Annotated Dataset, Ren, I. Y., Nieto, O., Hendrik, V. K., Volk, A., Swierstra, W., Proc. of the 15th International Conference on Music Perception and Cognition (ICMPC). Graz, Austria, 2018. PDF
🏆 Best Student Paper
End-to-End Learning for Music Audio Tagging at Scale, Pons, J., Nieto, O., Prockup, M., Schmidt, E., Ehmann, A., Serra, X., Proc. of the 19th International Society for Music Information Retrieval Conference (ISMIR). Paris, France, 2018. arXiv
Multimodal Deep Learning for Music Genre Classification, Oramas, S., Barbieri, F., Nieto, O., Serra, X., Transactions of the International Society for Music Information Retrieval (TISMIR), 2018. arXiv
Predicting Audio Advertisement Quality, Ebrahimi, S., Vahabi, H., Prockup, M., Nieto, O., Proc. of the 11th ACM International Conference on Web Search and Data Mining (WSDM), 2018. arXiv
A Deep Multimodal Approach for Cold-start Music Recommendation, Oramas, S., Nieto, O., Sordo, M., Serra, X., Proc. of the 2nd Workshop on Deep Learning for Recommender Systems (DLRS), at RecSys. Como, Italy, 2017. arXiv
Evaluating Hierarchical Structure in Music Annotations, McFee, B., Nieto, O., Farbood, M. M., Bello, J. P., Frontiers in Psychology, 8, 2017. DOI: 10.3389/fpsyg.2017.01337. PDF
🏆 Best Presentation
Multi-label Music Genre Classification from Audio, Text, and Images Using Deep Features, Oramas, S., Nieto, O., Barbieri, F., Serra, X., Proc. of the 18th International Society of Music Information Retrieval Conference (ISMIR). Suzhou, China, 2017. arXiv
Systematic Exploration of Computational Music Structure Research, Nieto, O., Bello, J. P., Proc. of the 17th International Society for Music Information Retrieval Conference (ISMIR). New York City, NY, USA, 2016. PDF Code
Hierarchical Evaluation of Segment Boundary Detection, McFee, B., Nieto, O., Bello, J. P., Proc. of the 16th International Society for Music Information Retrieval Conference (ISMIR). Málaga, Spain, 2015. PDF
librosa: Audio and Music Signal Analysis in Python, McFee, B., Raffel, C., Liang, D., Ellis, D. P. W., McVicar, M., Battenberg, E., Nieto, O., Proc. of the 14th Python in Science Conference (SciPy). Austin, TX, USA, 2015. PDF
Music Segment Similarity Using 2D-Fourier Magnitude Coefficients, Nieto, O., Bello, J. P., Proc. of the 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Florence, Italy, 2014. PDF
🏆 Best Poster Presentation
MIR_EVAL: A Transparent Implementation of Common MIR Metrics., Raffel, C., McFee, B., Humphrey, E. J., Salamon, J., Nieto, O., Liang, D., Ellis, D. P. W., Proc. of the 15th International Society for Music Information Retrieval Conference (ISMIR). Taipei, Taiwan, 2014. PDF
Identifying Polyphonic Patterns from Audio Recordings Using Music Segmentation Techniques, Nieto, O., Farbood, M., Proc. of the 15th International Society for Music Information Retrieval Conference (ISMIR). Taipei, Taiwan, 2014. PDF
Embodying Theoretical Research in Music Cognition: Four Proposals for Theory-Driven Experimentation, Ballus, A., Arnau, E., Nieto, O., Font, F., Torrents, A., Proc. of the Annual Meeting of the Cognitive Science Society. Quebec City, Quebec, Canada, 2014. PDF
Convex Non-Negative Matrix Factorization for Automatic Music Structure Identification, Nieto, O., Jehan, T., Proc. of the 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vancouver, BC, Canada, 2013. PDF
Data Driven and Discriminative Projections for Large-Scale Cover Song Identification, Humphrey, E. J., Nieto, O., Bello, J. P., Proc. of the 14th International Society for Music Information Retrieval Conference (ISMIR). Curitiba, Brazil, 2013. PDF
Unsupervised Clustering of Extreme Vocal Effects, Nieto, O., Proc. of the 10th International Conference on Advances in Quantitative Laryngology, Voice and Speech Research (AQL). Cincinnati, OH, USA, 2013. PDF
Fortissimo: Force-Feedback for Mobile Devices, Park, T. H., Nieto, O., Proc. of the 13th International Conference on New Interfaces for Musical Expression (NIME). Daejeon and Seoul, Korea, 2013. PDF
Even More Tactile Feedback for Mobile Devices, Park, T. H., Crawford, L., Nieto, O., Proc. of the 39th International Computer Music Conference (ICMC). Perth, Australia, 2013. PDF
Perceptual Evaluation of Automatically Extracted Musical Motives, Nieto, O., Farbood, M., Proc. of the 12th International Conference on Music Perception and Cognition (ICMPC), pp. 723-727. Thessaloniki, Greece, 2012. PDF
Compressing Music Recordings Into Audio Summaries, Nieto, O., Humphrey, E. J., Bello, J. P., Proc. of the 13th International Society for Music Information Retrieval Conference (ISMIR), pp. 313-318. Porto, Portugal, 2012. PDF

Algorithms

MIREX 2016 Entry: MSAF V0.1.0 Submission, Nieto, O., Music Information Retrieval Evaluation eXchange (MIREX). New York City, NY, USA, 2016. PDF Code
MIREX 2014 Entry: 2D Fourier Magnitude Coefficients, Nieto, O., Bello, J. P., Music Information Retrieval Evaluation eXchange (MIREX). Taipei, Taiwan, 2014. PDF Code
MIREX 2014 Entry: Music Segmentation Techniques and Greedy Path Finder Algorithm to Discover Musical Patterns, Nieto, O., Farbood, M., Music Information Retrieval Evaluation eXchange (MIREX). Taipei, Taiwan, 2014. PDF Code
MIREX 2014 Entry: Convex Non-negative Matrix Factorization, Nieto, O., Jehan, T., Music Information Retrieval Evaluation eXchange (MIREX). Taipei, Taiwan, 2014. PDF Code
MIREX 2013: Discovering Musical Patterns Using Audio Structural Segmentation Techniques, Nieto, O., Farbood, M., Music Information Retrieval Evaluation eXchange (MIREX). Curitiba, Brazil, 2013. PDF

Theses

Discovering Structure in Music: Automatic Approaches and Perceptual Evaluations, Nieto, O., New York University. PhD Dissertation, 2015. PDF Slides Video
Voice Transformations for Extreme Vocal Effects, Nieto, O., Pompeu Fabra University. Master's Thesis, 2008. PDF
Desenvolupament Open Source per a E-Learning-II, Nieto, O., Polytechnic University of Catalonia. Undergrad's Thesis, 2007. PDF

Selected Talks

Project Sound Stager, Nieto, O., Adobe MAX Sneaks 2025. Los Angeles, CA, USA, 2025. Video
GenAI for Sound Design, Nieto, O., Conversational AI Reading Group at Mila. Montreal, Quebec, Canada, 2025. Video
Overview, Challenges, and Applications of Audio-based Music Structure Analysis, Nieto, O., Women in Music Information Retrieval Workshop (ISMIR). Virtual, 2021. Slides
Music Recommendation with Waveform-based Architectures, Nieto, O., 4th Global AI Conference. Santa Clara, CA, USA, 2020. Slides
Spectral Analysis and Detection of Extreme Vocal Effects (with CNNs), Nieto, O., Research Seminar. Universitat Pompeu Fabra. Barcelona, Spain, 2019. Slides
Spectral Analysis and Detection of Extreme Vocal Effects, Nieto, O., 2nd International Symposium on Distorted Voices. São Paulo, Brazil, 2019. Slides
Recommending Music with Waveform Architectures at Scale (Extended Version), Nieto, O., Seminar Series in Data Science. University of San Francisco. San Francisco, CA, USA, 2019. Slides
Recommending Music with Waveform Architectures at Scale, Nieto, O., Deep Learning Barcelona Symposium. Pompeu Fabra University. Barcelona, Spain, 2018. Slides Video
Cold-Start Music Recommendation Using Multimodal Deep Architectures, Nieto, O., Systematic Approaches to Deep Learning Methods for Audio. Erwin Schrödinger Institute, University of Vienna. Vienna, Austria, 2017. PDF
Long Tail Music Recommendation Using Deep Architectures, Nieto, O., International Workshop on Deep Learning for Music (IJCNN). Anchorage, AK, USA, 2017. PDF
Deep Learning for Large-Scale Music Recommendation, Nieto, O., Data-Driven Research in Music Cognition. Stanford University. Stanford, CA, USA, 2017. PDF
Deep Learning for Music Recommendation: Machine Listening and Collaborative Filtering, Nieto, O., Seminar on Music Knowledge Extraction Using Machine Learning. Pompeu Fabra University. Barcelona, Spain, 2016. PDF
Deep Learning for Large Scale Music Recommendation, Nieto, O., Biostat Seminar. Stanford University. Stanford, CA, USA, 2016. PDF
Multiple Annotations and Subjectivity in the Identification of Segment Boundaries in Music, Nieto, O., Farbood, M., Cognitive Music Information Retrieval (CogMIR). Toronto, ON, Canada, 2014. PDF
Music Segment Similarity Using 2D-Fourier Magnitude Coefficients, Nieto, O., Bello, J. P., North East Music Information Special Interest Group (NEMISIG). New York, NY, USA, 2014. PDF
A Perceptually Based Evaluation of Music Boundaries, Nieto, O., Farbood, M., Bello, J. P., Cognitive Music Information Retrieval (CogMIR). Toronto, ON, Canada, 2013. PDF
Music Structure Analysis and New Musical Interfaces, Nieto, O., Pompeu Fabra University. Barcelona, Spain, 2013. PDF
Music Structure Analysis by Matrix Factorization, Nieto, O., Jehan, T., North East Music Information Special Interest Group (NEMISIG). Boston, MA, USA, 2013. PDF

Music

La Bossa d'Urina: El Primer Disc, Bolsa, D., Nieto, O., Published by Record Union, 2022. Pandora Spotify Amazon
Rumbahía: Casi al Compás, Cobo, L. C., Cardona, J., Grant, E. E., Melendo, D., Nieto, O., Published by CDBaby, 2021. Pandora Spotify Amazon
Rumbahía: Aprendiendo, Cobo, L. C., Cardona, J., Gallegos, J., Melendo, D., Nieto, O., Published by CDBaby, 2019. Pandora Spotify Amazon
La Bossa d'Urina: Merda Fina, Bolsa, D., Nieto, O., Published by Record Union, 2018. Pandora iTunes Spotify Amazon
Arkaen: Arkaen, Henson, S., Nieto, O., Nuñez, J., Remas, E., Rickher, G., Published by Record Union, 2017. Pandora iTunes Spotify Amazon
La Bossa d'Urina: La Bossa d'Urina, Bolsa, D., Nieto, O., Published by Cydonia Records, 2015. Pandora iTunes Spotify Amazon
Sargon: Vida, Ferreiro, C., Llobet, J., Nieto, O., Prim, M., Album edited by Weight Recordings, 2009. Pandora iTunes Spotify Amazon
Sargon: Transcriptions, Ferreiro, C., Llobet, J., Nieto, O., Prim, M., Album edited by Big Bang Records, 2005. iTunes Spotify

Other

Automatic Music Tagging with Harmonic CNN, Won, M., Sanghyuk, C., Nieto, O., Serra, X., Late Breaking Session of the International Society for Music Information Retrieval Conference (ISMIR). Delft, The Netherlands, 2019. PDF
MSAF: Music Structure Analysis Framework, Nieto, O., Bello, J. P., International Society for Music Information Retrieval Conference (ISMIR). Málaga, Spain, 2015. PDF
2013 Late Break Session on Music Segmentation, Nieto, O., Smith, J. B. L., Proc. of the 14th International Society for Music Information Retrieval Conference (ISMIR). Curitiba, Brazil, 2013. PDF
Late-break Session on Music Structure Analysis, Rocha, B., Smith, J. B. L., Peeters, G., Ross, J. C., Nieto, O., Van Balen, J., Proc. of the 13th International Society for Music Information Retrieval Conference (ISMIR). Porto, Portugal, 2012. PDF
Sistemas Operativos: Cuaderno de Laboratorio, Nieto, O., Pajuelo, A., López, D., Millan, A., Heredero, A., Duran, A., Herrero, J. R., Verdú, X., Becerra, Y., Morancho, E., Department of Computer Architecture. Polytechnic University of Catalonia, 2007.

Oriol (Uri) Nieto

Publications

Peer Reviewed Articles

Algorithms

Theses

Selected Talks

Music

Other