Publications

For further stats and details, check out my Google Scholar Profile.

Peer Reviewed Articles

  1. Generative Audio Extension and Morphing, Seetharaman, P.,* Nieto, O.,* Salamon, J., Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv Demo
  2. Mix2Morph: Learning Sound Morphing From Noisy Mixes, Chu, A., Flores-García, H., Nieto, O., Salamon, J., Pardo, B., Seetharaman, P., Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv Demo
  3. PromptSep: Generative Audio Separation Via Multimodal Prompting, Wen, Y., Chen, K., Seetharaman, P., Nieto, O., Su, J., Kumar, R., Kim, M., Smaragdis, P., Jin, Z., Salamon, J., Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv Demo
  4. AudioCards: Structured Metadata Improves Audio Language Models For Sound Design, Sridhar, S., Seetharaman, P., Nieto, O., Cartwright, M., Salamon, J., Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv Demo
  5. Multi-Domain Audio Question Answering Benchmark Toward Acoustic Content Reasoning, Yang, C.-H. H., Ghosh, S., Wang, Q., Kim, J., Hong, H., Kumar, S., Zhong, G., Kong, Z., Sakshi, S., Lokegaonkar, V., Nieto, O., Duraiswami, R., Manocha, D., Kim, G., Du, J., Valle, R., Catanzaro, B., Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv
  6. SoundStager: Interactive Design of Story-Driven GenAI Soundscapes for Video, Yoo, S., Hernandez-Sebastian, A., Seetharaman, P., Salamon, J., Nieto, O., Truong, A., Proc. of the ACM Conference on Human Factors in Computing Systems (CHI). Barcelona, Spain, 2026. PDF Video
  7. SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation, Kumar, S., Seetharaman, P., Salamon, J., Manocha, D., Nieto, O., Proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Tahoe City, CA, USA, 2025. arXiv Demo
  8. FLAM: Frame-Wise Language-Audio Modeling, Wu, Y., Tsirigotis, C., Chen, K., Huang, C. A., Courville, A., Nieto, O., Seetharaman, P., Salamon, J., Proc. of the 47th International Conference on Machine Learning (ICML). Vancouver, BC, Canada, 2025. arXiv Code Demo
  9. Video-Guided Foley Sound Generation with Multimodal Controls, Chen, Z., Seetharaman, P., Russell, B., Nieto, O., Bourgin, D., Owens, A., Salamon, J., The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR). Nashville, TN, USA, 2025. arXiv Demo
  10. Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs, Ghosh, S., Evuru, C. K. R., Kumar, S., Tyagi, U., Nieto, O., Jin, Z., Manocha, D., Proc. of the 13th International Conference on Learning Representations (ICLR). Singapore, 2025. arXiv Demo
  11. 🏆 Top 5.1% conference paper (spotlighted)
    MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark, Sakshi, S., Tyagi, U., Kumar, S., Seth, A., Selvakumar, R., Nieto, O., Duraiswami, R., Ghosh, S., Manocha, D., Proc. of the 13th International Conference on Learning Representations (ICLR). Singapore, 2025. arXiv Code Demo
  12. Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations, Flores García, H., Nieto, O., Salamon, J., Pardo, B., Seetharaman, P., Proc. of the 50th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Hyderabad, India, 2025. arXiv Demo
  13. ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds, Ghosh, S., Kumar, S., Evuru, C. K. R., Nieto, O., Duraiswami, R., Manocha, D., Proc. of the 50th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Hyderabad, India, 2025. arXiv Code
  14. Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning, Manco, I., Salamon, J., Nieto, O., Proc. of the 25th International Society for Music Information Retrieval Conference (ISMIR). San Francisco, CA, USA, 2024. arXiv
  15. 🏆 Top 5% conference paper (oral)
    GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities, Ghosh, S., Kumar, S., Seth, A., Kiran, C., Evuru, R., Tyagi, U., Sakshi, S., Nieto, O., Duraiswami, R., Manocha, D., Proc. of the 19th Empirical Methods in Natural Language Processing Conference (EMNLP). Miami, Florida, USA, 2024. arXiv Code Demo
  16. CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models, Ghosh, S., Seth, A., Kumar, S., Tyagi, U., Evuru, C. K., Ramaneswaran, S., Sakshi, S., Nieto, O., Duraiswami, R., Manocha, D., Proc. of the 12th International Conference on Learning Representations (ICLR). Vienna, Austria, 2024. arXiv Code Demo
  17. Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries, Wilkins, J., Salamon, J., Fuentes, M., Bello, J. P., Nieto, O., Proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz, NY, USA, 2023. arXiv Demo
  18. Efficient Spoken Language Recognition via Multilabel Classification, Nieto, O., Jin, Z., Dernoncourt, F., Salamon, J., Proc. of the 24th InterSpeech Conference. Dublin, Ireland, 2023. arXiv
  19. 🏆 Top 10% conference paper (highlighted)
    Language-Guided Audio-Visual Source Separation via Trimodal Consistency, Tan, R., Burns, A., Ray, A., Plummer, B. A., Nieto, O., Salamon, J., Russell, B., Saenko, K., Proc. of the IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR). Vancouver, BC, Canada, 2023. arXiv Code
  20. Audio-Text Models Do Not Yet Leverage Natural Language, Wu, H., Nieto, O., Bello, J. P., Salamon, J., Proc. of the 48th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Rhodes, Greece, 2023. arXiv
  21. Music Enhancement Via Image Translation and Vocoding, Kandpal, N., Nieto, O., Jin, Z., Proc. of the 47th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Singapore, 2022. arXiv Code
  22. Deep Embeddings and Section Fusion Improve Music Segmentation, Salamon, J., Nieto, O., Bryan, N. J., Proc. of the 22nd International Society for Music Information Retrieval Conference (ISMIR), pp. 594-601, 2021. PDF
  23. Multimodal Metric Learning for Tag-Based Music Retrieval, Won, M., Oramas, S., Nieto, O., Gouyon, F., Serra, X., Proc. of the 46th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Toronto, Canada, 2021. arXiv
  24. Audio-Based Music Structure Analysis: Current Trends, Open Challenges, and Applications, Nieto, O., Mysore, G. J., Wang, C.-i., Smith, J. B. L., Schlüter, J., Grill, T., McFee, B., Transactions of the International Society for Music Information Retrieval (TISMIR), 3(1), pp. 246-263, 2020. DOI: 10.5334/tismir.54. PDF
  25. Mood Classification Using Listening Data, Korzeniowski, F., Nieto, O., McCallum, M., Won, M., Oramas, S., Schmidt, E., Proc. of the 21st International Society for Music Information Retrieval Conference (ISMIR). Montreal, Quebec, Canada, 2020. arXiv
  26. Data-Driven Harmonic Filters For Audio Representation Learning, Won, M., Chun, S., Nieto, O., Serra, X., Proc. of the 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2020. PDF
  27. The Harmonix Set: Beats, Downbeats, and Functional Segment Annotations of Western Popular Music, Nieto, O., McCallum, M., Davies, M., Robertson, A., Stark, A., Egozy, E., Proc. of the 20th International Society for Music Information Retrieval Conference (ISMIR). Delft, The Netherlands, 2019. PDF Code
  28. Investigating Musical Pattern Ambiguity in a Human Annotated Dataset, Ren, I. Y., Nieto, O., Hendrik, V. K., Volk, A., Swierstra, W., Proc. of the 15th International Conference on Music Perception and Cognition (ICMPC). Graz, Austria, 2018. PDF
  29. 🏆 Best Student Paper
    End-to-End Learning for Music Audio Tagging at Scale, Pons, J., Nieto, O., Prockup, M., Schmidt, E., Ehmann, A., Serra, X., Proc. of the 19th International Society for Music Information Retrieval Conference (ISMIR). Paris, France, 2018. arXiv
  30. Multimodal Deep Learning for Music Genre Classification, Oramas, S., Barbieri, F., Nieto, O., Serra, X., Transactions of the International Society for Music Information Retrieval (TISMIR), 2018. arXiv
  31. Predicting Audio Advertisement Quality, Ebrahimi, S., Vahabi, H., Prockup, M., Nieto, O., Proc. of the 11th ACM International Conference on Web Search and Data Mining (WSDM), 2018. arXiv
  32. A Deep Multimodal Approach for Cold-start Music Recommendation, Oramas, S., Nieto, O., Sordo, M., Serra, X., Proc. of the 2nd Workshop on Deep Learning for Recommender Systems (DLRS), at RecSys. Como, Italy, 2017. arXiv
  33. Evaluating Hierarchical Structure in Music Annotations, McFee, B., Nieto, O., Farbood, M. M., Bello, J. P., Frontiers in Psychology, 8, 2017. DOI: 10.3389/fpsyg.2017.01337. PDF
  34. 🏆 Best Presentation
    Multi-label Music Genre Classification from Audio, Text, and Images Using Deep Features, Oramas, S., Nieto, O., Barbieri, F., Serra, X., Proc. of the 18th International Society of Music Information Retrieval Conference (ISMIR). Suzhou, China, 2017. arXiv
  35. Systematic Exploration of Computational Music Structure Research, Nieto, O., Bello, J. P., Proc. of the 17th International Society for Music Information Retrieval Conference (ISMIR). New York City, NY, USA, 2016. PDF Code
  36. Hierarchical Evaluation of Segment Boundary Detection, McFee, B., Nieto, O., Bello, J. P., Proc. of the 16th International Society for Music Information Retrieval Conference (ISMIR). Málaga, Spain, 2015. PDF
  37. librosa: Audio and Music Signal Analysis in Python, McFee, B., Raffel, C., Liang, D., Ellis, D. P. W., McVicar, M., Battenberg, E., Nieto, O., Proc. of the 14th Python in Science Conference (SciPy). Austin, TX, USA, 2015. PDF
  38. Music Segment Similarity Using 2D-Fourier Magnitude Coefficients, Nieto, O., Bello, J. P., Proc. of the 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Florence, Italy, 2014. PDF
  39. 🏆 Best Poster Presentation
    MIR_EVAL: A Transparent Implementation of Common MIR Metrics., Raffel, C., McFee, B., Humphrey, E. J., Salamon, J., Nieto, O., Liang, D., Ellis, D. P. W., Proc. of the 15th International Society for Music Information Retrieval Conference (ISMIR). Taipei, Taiwan, 2014. PDF
  40. Identifying Polyphonic Patterns from Audio Recordings Using Music Segmentation Techniques, Nieto, O., Farbood, M., Proc. of the 15th International Society for Music Information Retrieval Conference (ISMIR). Taipei, Taiwan, 2014. PDF
  41. Embodying Theoretical Research in Music Cognition: Four Proposals for Theory-Driven Experimentation, Ballus, A., Arnau, E., Nieto, O., Font, F., Torrents, A., Proc. of the Annual Meeting of the Cognitive Science Society. Quebec City, Quebec, Canada, 2014. PDF
  42. Convex Non-Negative Matrix Factorization for Automatic Music Structure Identification, Nieto, O., Jehan, T., Proc. of the 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vancouver, BC, Canada, 2013. PDF
  43. Data Driven and Discriminative Projections for Large-Scale Cover Song Identification, Humphrey, E. J., Nieto, O., Bello, J. P., Proc. of the 14th International Society for Music Information Retrieval Conference (ISMIR). Curitiba, Brazil, 2013. PDF
  44. Unsupervised Clustering of Extreme Vocal Effects, Nieto, O., Proc. of the 10th International Conference on Advances in Quantitative Laryngology, Voice and Speech Research (AQL). Cincinnati, OH, USA, 2013. PDF
  45. Fortissimo: Force-Feedback for Mobile Devices, Park, T. H., Nieto, O., Proc. of the 13th International Conference on New Interfaces for Musical Expression (NIME). Daejeon and Seoul, Korea, 2013. PDF
  46. Even More Tactile Feedback for Mobile Devices, Park, T. H., Crawford, L., Nieto, O., Proc. of the 39th International Computer Music Conference (ICMC). Perth, Australia, 2013. PDF
  47. Perceptual Evaluation of Automatically Extracted Musical Motives, Nieto, O., Farbood, M., Proc. of the 12th International Conference on Music Perception and Cognition (ICMPC), pp. 723-727. Thessaloniki, Greece, 2012. PDF
  48. Compressing Music Recordings Into Audio Summaries, Nieto, O., Humphrey, E. J., Bello, J. P., Proc. of the 13th International Society for Music Information Retrieval Conference (ISMIR), pp. 313-318. Porto, Portugal, 2012. PDF

Algorithms

  1. MIREX 2016 Entry: MSAF V0.1.0 Submission, Nieto, O., Music Information Retrieval Evaluation eXchange (MIREX). New York City, NY, USA, 2016. PDF Code
  2. MIREX 2014 Entry: 2D Fourier Magnitude Coefficients, Nieto, O., Bello, J. P., Music Information Retrieval Evaluation eXchange (MIREX). Taipei, Taiwan, 2014. PDF Code
  3. MIREX 2014 Entry: Music Segmentation Techniques and Greedy Path Finder Algorithm to Discover Musical Patterns, Nieto, O., Farbood, M., Music Information Retrieval Evaluation eXchange (MIREX). Taipei, Taiwan, 2014. PDF Code
  4. MIREX 2014 Entry: Convex Non-negative Matrix Factorization, Nieto, O., Jehan, T., Music Information Retrieval Evaluation eXchange (MIREX). Taipei, Taiwan, 2014. PDF Code
  5. MIREX 2013: Discovering Musical Patterns Using Audio Structural Segmentation Techniques, Nieto, O., Farbood, M., Music Information Retrieval Evaluation eXchange (MIREX). Curitiba, Brazil, 2013. PDF

Theses

  1. Discovering Structure in Music: Automatic Approaches and Perceptual Evaluations, Nieto, O., New York University. PhD Dissertation, 2015. PDF Slides Video
  2. Voice Transformations for Extreme Vocal Effects, Nieto, O., Pompeu Fabra University. Master's Thesis, 2008. PDF
  3. Desenvolupament Open Source per a E-Learning-II, Nieto, O., Polytechnic University of Catalonia. Undergrad's Thesis, 2007. PDF

Selected Talks

  1. Project Sound Stager, Nieto, O., Adobe MAX Sneaks 2025. Los Angeles, CA, USA, 2025. Video
  2. GenAI for Sound Design, Nieto, O., Conversational AI Reading Group at Mila. Montreal, Quebec, Canada, 2025. Video
  3. Overview, Challenges, and Applications of Audio-based Music Structure Analysis, Nieto, O., Women in Music Information Retrieval Workshop (ISMIR). Virtual, 2021. Slides
  4. Music Recommendation with Waveform-based Architectures, Nieto, O., 4th Global AI Conference. Santa Clara, CA, USA, 2020. Slides
  5. Spectral Analysis and Detection of Extreme Vocal Effects (with CNNs), Nieto, O., Research Seminar. Universitat Pompeu Fabra. Barcelona, Spain, 2019. Slides
  6. Spectral Analysis and Detection of Extreme Vocal Effects, Nieto, O., 2nd International Symposium on Distorted Voices. São Paulo, Brazil, 2019. Slides
  7. Recommending Music with Waveform Architectures at Scale (Extended Version), Nieto, O., Seminar Series in Data Science. University of San Francisco. San Francisco, CA, USA, 2019. Slides
  8. Recommending Music with Waveform Architectures at Scale, Nieto, O., Deep Learning Barcelona Symposium. Pompeu Fabra University. Barcelona, Spain, 2018. Slides Video
  9. Cold-Start Music Recommendation Using Multimodal Deep Architectures, Nieto, O., Systematic Approaches to Deep Learning Methods for Audio. Erwin Schrödinger Institute, University of Vienna. Vienna, Austria, 2017. PDF
  10. Long Tail Music Recommendation Using Deep Architectures, Nieto, O., International Workshop on Deep Learning for Music (IJCNN). Anchorage, AK, USA, 2017. PDF
  11. Deep Learning for Large-Scale Music Recommendation, Nieto, O., Data-Driven Research in Music Cognition. Stanford University. Stanford, CA, USA, 2017. PDF
  12. Deep Learning for Music Recommendation: Machine Listening and Collaborative Filtering, Nieto, O., Seminar on Music Knowledge Extraction Using Machine Learning. Pompeu Fabra University. Barcelona, Spain, 2016. PDF
  13. Deep Learning for Large Scale Music Recommendation, Nieto, O., Biostat Seminar. Stanford University. Stanford, CA, USA, 2016. PDF
  14. Multiple Annotations and Subjectivity in the Identification of Segment Boundaries in Music, Nieto, O., Farbood, M., Cognitive Music Information Retrieval (CogMIR). Toronto, ON, Canada, 2014. PDF
  15. Music Segment Similarity Using 2D-Fourier Magnitude Coefficients, Nieto, O., Bello, J. P., North East Music Information Special Interest Group (NEMISIG). New York, NY, USA, 2014. PDF
  16. A Perceptually Based Evaluation of Music Boundaries, Nieto, O., Farbood, M., Bello, J. P., Cognitive Music Information Retrieval (CogMIR). Toronto, ON, Canada, 2013. PDF
  17. Music Structure Analysis and New Musical Interfaces, Nieto, O., Pompeu Fabra University. Barcelona, Spain, 2013. PDF
  18. Music Structure Analysis by Matrix Factorization, Nieto, O., Jehan, T., North East Music Information Special Interest Group (NEMISIG). Boston, MA, USA, 2013. PDF

Music

  1. La Bossa d'Urina: El Primer Disc, Bolsa, D., Nieto, O., Published by Record Union, 2022. Pandora Spotify Amazon
  2. Rumbahía: Casi al Compás, Cobo, L. C., Cardona, J., Grant, E. E., Melendo, D., Nieto, O., Published by CDBaby, 2021. Pandora Spotify Amazon
  3. Rumbahía: Aprendiendo, Cobo, L. C., Cardona, J., Gallegos, J., Melendo, D., Nieto, O., Published by CDBaby, 2019. Pandora Spotify Amazon
  4. La Bossa d'Urina: Merda Fina, Bolsa, D., Nieto, O., Published by Record Union, 2018. Pandora iTunes Spotify Amazon
  5. Arkaen: Arkaen, Henson, S., Nieto, O., Nuñez, J., Remas, E., Rickher, G., Published by Record Union, 2017. Pandora iTunes Spotify Amazon
  6. La Bossa d'Urina: La Bossa d'Urina, Bolsa, D., Nieto, O., Published by Cydonia Records, 2015. Pandora iTunes Spotify Amazon
  7. Sargon: Vida, Ferreiro, C., Llobet, J., Nieto, O., Prim, M., Album edited by Weight Recordings, 2009. Pandora iTunes Spotify Amazon
  8. Sargon: Transcriptions, Ferreiro, C., Llobet, J., Nieto, O., Prim, M., Album edited by Big Bang Records, 2005. iTunes Spotify

Other

  1. Automatic Music Tagging with Harmonic CNN, Won, M., Sanghyuk, C., Nieto, O., Serra, X., Late Breaking Session of the International Society for Music Information Retrieval Conference (ISMIR). Delft, The Netherlands, 2019. PDF
  2. MSAF: Music Structure Analysis Framework, Nieto, O., Bello, J. P., International Society for Music Information Retrieval Conference (ISMIR). Málaga, Spain, 2015. PDF
  3. 2013 Late Break Session on Music Segmentation, Nieto, O., Smith, J. B. L., Proc. of the 14th International Society for Music Information Retrieval Conference (ISMIR). Curitiba, Brazil, 2013. PDF
  4. Late-break Session on Music Structure Analysis, Rocha, B., Smith, J. B. L., Peeters, G., Ross, J. C., Nieto, O., Van Balen, J., Proc. of the 13th International Society for Music Information Retrieval Conference (ISMIR). Porto, Portugal, 2012. PDF
  5. Sistemas Operativos: Cuaderno de Laboratorio, Nieto, O., Pajuelo, A., López, D., Millan, A., Heredero, A., Duran, A., Herrero, J. R., Verdú, X., Becerra, Y., Morancho, E., Department of Computer Architecture. Polytechnic University of Catalonia, 2007.