Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.SD
arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Sound

Authors and titles for recent submissions

  • Fri, 1 Aug 2025
  • Thu, 31 Jul 2025
  • Wed, 30 Jul 2025
  • Tue, 29 Jul 2025
  • Mon, 28 Jul 2025

See today's new changes

Total of 57 entries : 1-50 51-57
Showing up to 50 entries per page: fewer | more | all

Fri, 1 Aug 2025 (showing 10 of 10 entries )

[1] arXiv:2507.23590 [pdf, html, other]
Title: Identifying Hearing Difficulty Moments in Conversational Audio
Jack Collins, Adrian Buzea, Chris Collier, Alejandro Ballesta Rosen, Julian Maclaren, Richard F. Lyon, Simon Carlile
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2] arXiv:2507.23365 [pdf, html, other]
Title: "I made this (sort of)": Negotiating authorship, confronting fraudulence, and exploring new musical spaces with prompt-based AI music generation
Bob L. T. Sturm
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[3] arXiv:2507.22995 [pdf, html, other]
Title: Balancing Information Preservation and Disentanglement in Self-Supervised Music Representation Learning
Julia Wilkins, Sivan Ding, Magdalena Fuentes, Juan Pablo Bello
Comments: In proceedings of WASPAA 2025. 4 pages, 4 figures, 1 table
Subjects: Sound (cs.SD)
[4] arXiv:2507.23511 (cross-list from eess.AS) [pdf, html, other]
Title: MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks
Yadong Niu, Tianzi Wang, Heinrich Dinkel, Xingwei Sun, Jiahao Zhou, Gang Li, Jizhong Liu, Xunying Liu, Junbo Zhang, Jian Luan
Comments: 9 main pages, 5 figures, 3 tables, and 14 appendix pages
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[5] arXiv:2507.23298 (cross-list from cs.HC) [pdf, html, other]
Title: Real-time Generation of Various Types of Nodding for Avatar Attentive Listening System
Kazushi Kato, Koji Inoue, Divesh Lala, Keiko Ochi, Tatsuya Kawahara
Comments: Accepted by 27th ACM International Conference on Multimodal Interaction (ICMI '25), Long paper
Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[6] arXiv:2507.23266 (cross-list from eess.AS) [pdf, html, other]
Title: CUHK-EE Systems for the vTAD Challenge at NCMMSC 2025
Aemon Yat Fei Chiu, Jingyu Li, Yusheng Tian, Guangyan Zhang, Tan Lee
Comments: Under review
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[7] arXiv:2507.23223 (cross-list from eess.AS) [pdf, html, other]
Title: Feature Importance across Domains for Improving Non-Intrusive Speech Intelligibility Prediction in Hearing Aids
Ryandhimas E. Zezario, Sabato M. Siniscalchi, Fei Chen, Hsin-Min Wang, Yu Tsao
Comments: Accepted to Interspeech 2025
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8] arXiv:2507.23091 (cross-list from cs.AI) [pdf, other]
Title: Moravec's Paradox: Towards an Auditory Turing Test
David Noever, Forrest McKee
Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9] arXiv:2507.23010 (cross-list from cs.LG) [pdf, html, other]
Title: Investigating the Invertibility of Multimodal Latent Spaces: Limitations of Optimization-Based Methods
Siwoo Park
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[10] arXiv:2507.22964 (cross-list from eess.AS) [pdf, other]
Title: Exploring Dynamic Parameters for Vietnamese Gender-Independent ASR
Sotheara Leang (CADT, M-PSI), Éric Castelli (M-PSI), Dominique Vaufreydaz (M-PSI), Sethserey Sam (CADT)
Journal-ref: The 14th Conference on Information Technology and Its Applications (CITA 2025), Jul 2025, Phnom Penh, Cambodia, Cambodia
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD); Signal Processing (eess.SP)

Thu, 31 Jul 2025 (showing 6 of 6 entries )

[11] arXiv:2507.22746 [pdf, html, other]
Title: Next Tokens Denoising for Speech Synthesis
Yanqing Liu, Ruiqing Xue, Chong Zhang, Yufei Liu, Gang Wang, Bohan Li, Yao Qian, Lei He, Shujie Liu, Sheng Zhao
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[12] arXiv:2507.22612 [pdf, html, other]
Title: Adaptive Duration Model for Text Speech Alignment
Junjie Cao
Comments: 4 pages, 3 figures, 2 tables
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[13] arXiv:2507.22322 [pdf, html, other]
Title: A Two-Step Learning Framework for Enhancing Sound Event Localization and Detection
Hogeon Yu
Comments: 5pages, 2figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14] arXiv:2507.22208 [pdf, html, other]
Title: Quantum-Inspired Audio Unlearning: Towards Privacy-Preserving Voice Biometrics
Shreyansh Pathak, Sonu Shreshtha, Richa Singh, Mayank Vatsa
Comments: 9 pages, 2 figures, 5 tables, Accepted at IJCB 2025 (Osaka, Japan)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[15] arXiv:2507.22628 (cross-list from eess.AS) [pdf, html, other]
Title: A k-space approach to modeling multi-channel parametric array loudspeaker systems
Tao Zhuang, Longbiao He, Feng Niu, Jia-Xin Zhong, Jing Lu
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[16] arXiv:2507.22370 (cross-list from cs.LG) [pdf, html, other]
Title: Prediction of acoustic field in 1-D uniform duct with varying mean flow and temperature using neural networks
D. Veerababu, Prasanta K. Ghosh
Comments: 22 pages
Journal-ref: Journal of Theoretical and Computational Acoustics, 33, 2025
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Wed, 30 Jul 2025 (showing 9 of 9 entries )

[17] arXiv:2507.21642 [pdf, html, other]
Title: Whilter: A Whisper-based Data Filter for "In-the-Wild" Speech Corpora Using Utterance-level Multi-Task Classification
William Ravenscroft, George Close, Kit Bower-Morris, Jamie Stacey, Dmitry Sityaev, Kris Y. Hong
Comments: Accepted for Interspeech 2025
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[18] arXiv:2507.21463 [pdf, html, other]
Title: SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods
Wen Huang, Yanmei Gu, Zhiming Wang, Huijia Zhu, Yanmin Qian
Comments: Published in ACL 2025. Dataset available at: this https URL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19] arXiv:2507.21426 [pdf, html, other]
Title: Relationship between objective and subjective perceptual measures of speech in individuals with head and neck cancer
Bence Mark Halpern, Thomas Tienkamp, Teja Rebernik, Rob J.J.H. van Son, Martijn Wieling, Defne Abur, Tomoki Toda
Comments: 5 pages, 1 figure, 1 table. Accepted at Interspeech 2025
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20] arXiv:2507.21202 [pdf, other]
Title: Combolutional Neural Networks
Cameron Churchwell, Minje Kim, Paris Smaragdis
Comments: 4 pages, 3 figures, accepted to WASPAA 2025
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[21] arXiv:2507.21591 (cross-list from cs.CR) [pdf, html, other]
Title: Hierarchical Graph Neural Network for Compressed Speech Steganalysis
Mustapha Hemis, Hamza Kheddar, Mohamed Chahine Ghanem, Bachir Boudraa
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22] arXiv:2507.21522 (cross-list from cs.CL) [pdf, html, other]
Title: Model-free Speculative Decoding for Transformer-based ASR with Token Map Drafting
Tuan Vu Ho, Hiroaki Kokubo, Masaaki Yamamoto, Yohei Kawaguchi
Comments: Accepted at EUSIPCO 2025
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23] arXiv:2507.21395 (cross-list from cs.MM) [pdf, html, other]
Title: Sync-TVA: A Graph-Attention Framework for Multimodal Emotion Recognition with Cross-Modal Fusion
Zeyu Deng, Yanhui Lu, Jiashu Liao, Shuang Wu, Chongfeng Wei
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[24] arXiv:2507.21331 (cross-list from cs.CL) [pdf, other]
Title: A Deep Learning Automatic Speech Recognition Model for Shona Language
Leslie Wellington Sirora, Mainford Mutandavari
Journal-ref: International Journal of Innovative Research in Computer and Communication Engineering, 12(9) (2024)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[25] arXiv:2507.21138 (cross-list from cs.CL) [pdf, html, other]
Title: TTS-1 Technical Report
Oleg Atamanenko, Anna Chalova, Joseph Coombes, Nikki Cope, Phillip Dang, Zhifeng Deng, Jimmy Du, Michael Ermolenko, Feifan Fan, Yufei Feng, Cheryl Fichter, Pavel Filimonov, Louis Fischer, Kylan Gibbs, Valeria Gusarova, Pavel Karpik, Andreas Assad Kottner, Ian Lee, Oliver Louie, Jasmine Mai, Mikhail Mamontov, Suri Mao, Nurullah Morshed, Igor Poletaev, Florin Radu, Dmytro Semernia, Evgenii Shingarev, Vikram Sivaraja, Peter Skirko, Rinat Takhautdinov, Robert Villahermosa, Jean Wang
Comments: 20 pages, 10 figures. For associated modeling and training code, see this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Tue, 29 Jul 2025 (showing 19 of 19 entries )

[26] arXiv:2507.20900 [pdf, html, other]
Title: Music Arena: Live Evaluation for Text-to-Music
Yonghyun Kim, Wayne Chi, Anastasios N. Angelopoulos, Wei-Lin Chiang, Koichi Saito, Shinji Watanabe, Yuki Mitsufuji, Chris Donahue
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[27] arXiv:2507.20880 [pdf, html, other]
Title: JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment
Renhang Liu, Chia-Yu Hung, Navonil Majumder, Taylor Gautreaux, Amir Ali Bagherzadeh, Chuan Li, Dorien Herremans, Soujanya Poria
Comments: this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[28] arXiv:2507.20731 [pdf, other]
Title: Learning Neural Vocoder from Range-Null Space Decomposition
Andong Li, Tong Lei, Zhihang Sun, Rilin Chen, Erwei Yin, Xiaodong Li, Chengshi Zheng
Comments: 10 pages, 7 figures, IJCAI2025
Subjects: Sound (cs.SD)
[29] arXiv:2507.20624 [pdf, other]
Title: Hyperbolic Embeddings for Order-Aware Classification of Audio Effect Chains
Aogu Wada, Tomohiko Nakamura, Hiroshi Saruwatari
Comments: 7 pages, 3 figures, accepted for the 28th International Conference on Digital Audio Effects (DAFx25)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[30] arXiv:2507.20485 [pdf, html, other]
Title: Sound Safeguarding for Acoustic Measurement Using Any Sounds: Tools and Applications
Hideki Kawahara, Kohei Yatabe, Ken-Ichi Sakakibara
Comments: 2 pages, 2 figures, IEEE GCCE 2025 Demo session, Accepted
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[31] arXiv:2507.20417 [pdf, html, other]
Title: Two Views, One Truth: Spectral and Self-Supervised Features Fusion for Robust Speech Deepfake Detection
Yassine El Kheir, Arnab Das, Enes Erdem Erdogan, Fabian Ritter-Guttierez, Tim Polzehl, Sebastian Möller
Comments: ACCEPTED WASPAA 2025
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[32] arXiv:2507.20169 [pdf, html, other]
Title: Self-Improvement for Audio Large Language Model using Unlabeled Speech
Shaowen Wang, Xinyuan Chen, Yao Xu
Comments: To appear in Interspeech 2025. 6 pages, 1 figure
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33] arXiv:2507.20140 [pdf, html, other]
Title: Do Not Mimic My Voice: Speaker Identity Unlearning for Zero-Shot Text-to-Speech
Taesoo Kim, Jinju Kim, Dongchan Kim, Jong Hwan Ko, Gyeong-Moon Park
Comments: Proceedings of the 42nd International Conference on Machine Learning (ICML 2025), Vancouver, Canada. PMLR 267, 2025. Authors Jinju Kim and Taesoo Kim contributed equally
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[34] arXiv:2507.20128 [pdf, html, other]
Title: Diffusion-based Symbolic Music Generation with Structured State Space Models
Shenghua Yuan, Xing Tang, Jiatao Chen, Tianming Xie, Jing Wang, Bing Shi
Comments: 9 pages,3figures
Subjects: Sound (cs.SD)
[35] arXiv:2507.20052 [pdf, html, other]
Title: Improving Deep Learning-based Respiratory Sound Analysis with Frequency Selection and Attention Mechanism
Nouhaila Fraihi, Ouassim Karrakchou, Mounir Ghogho
Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[36] arXiv:2507.20036 [pdf, html, other]
Title: Improving Audio Classification by Transitioning from Zero- to Few-Shot
James Taylor, Wolfgang Mack
Comments: Submitted to Interspeech 2025
Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[37] arXiv:2507.19991 [pdf, html, other]
Title: Efficient Vocal-Conditioned Music Generation via Soft Alignment Attention and Latent Diffusion
Hei Shing Cheung, Boya Zhang
Comments: 6 page, 3 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[38] arXiv:2507.19835 [pdf, html, other]
Title: SonicGauss: Position-Aware Physical Sound Synthesis for 3D Gaussian Representations
Chunshi Wang, Hongxing Li, Yawei Luo
Comments: Accepted by ACMMM'25
Subjects: Sound (cs.SD); Multimedia (cs.MM)
[39] arXiv:2507.19557 [pdf, html, other]
Title: Joint Feature and Output Distillation for Low-complexity Acoustic Scene Classification
Haowen Li, Ziyi Yang, Mou Wang, Ee-Leng Tan, Junwei Yeow, Santi Peksi, Woon-Seng Gan
Comments: 4 pages, submitted to DCASE2025 Challenge Task 1
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[40] arXiv:2507.20666 (cross-list from eess.AS) [pdf, other]
Title: MIMII-Agent: Leveraging LLMs with Function Calling for Relative Evaluation of Anomalous Sound Detection
Harsh Purohit, Tomoya Nishida, Kota Dohi, Takashi Endo, Yohei Kawaguchi
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[41] arXiv:2507.20627 (cross-list from cs.MM) [pdf, other]
Title: Controllable Video-to-Music Generation with Multiple Time-Varying Conditions
Junxian Wu, Weitao You, Heda Zuo, Dengming Zhang, Pei Chen, Lingyun Sun
Comments: Accepted by the 33rd ACM International Conference on Multimedia (ACMMM 2025). The project page is available at this https URL
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[42] arXiv:2507.20530 (cross-list from eess.AS) [pdf, other]
Title: Binaural Sound Event Localization and Detection based on HRTF Cues for Humanoid Robots
Gyeong-Tae Lee, Hyeonuk Nam, Yong-Hwa Park
Comments: Submitted to IEEE/ACM TASLP
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[43] arXiv:2507.19836 (cross-list from cs.GR) [pdf, html, other]
Title: ChoreoMuse: Robust Music-to-Dance Video Generation with Style Transfer and Beat-Adherent Motion
Xuanchen Wang, Heng Wang, Weidong Cai
Comments: 10 pages, 5 figures, accepted by the 33rd ACM International Conference on Multimedia (ACM MM 2025), demo page: this https URL
Subjects: Graphics (cs.GR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)
[44] arXiv:2507.19634 (cross-list from cs.CL) [pdf, html, other]
Title: MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
Sara Papi, Maike Züfle, Marco Gaido, Beatrice Savoldi, Danni Liu, Ioannis Douros, Luisa Bentivogli, Jan Niehues
Comments: Work in progress
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)

Mon, 28 Jul 2025 (showing first 6 of 13 entries )

[45] arXiv:2507.19308 [pdf, html, other]
Title: The Eloquence team submission for task 1 of MLC-SLM challenge
Lorenzo Concina, Jordi Luque, Alessio Brutti, Marco Matassoni, Yuchen Zhang
Comments: Technical Report for MLC-SLM Challenge of Interspeech2025
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[46] arXiv:2507.19225 [pdf, html, other]
Title: Face2VoiceSync: Lightweight Face-Voice Consistency for Text-Driven Talking Face Generation
Fang Kang, Yin Cao, Haoyu Chen
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[47] arXiv:2507.19202 [pdf, html, other]
Title: Latent Granular Resynthesis using Neural Audio Codecs
Nao Tokui, Tom Baker
Comments: Accepted at ISMIR 2025 Late Breaking Demos
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[48] arXiv:2507.19062 [pdf, html, other]
Title: From Continuous to Discrete: Cross-Domain Collaborative General Speech Enhancement via Hierarchical Language Models
Zhaoxi Mu, Rilin Chen, Andong Li, Meng Yu, Xinyu Yang, Dong Yu
Comments: ACMMM 2025
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[49] arXiv:2507.19037 [pdf, html, other]
Title: MLLM-based Speech Recognition: When and How is Multimodality Beneficial?
Yiwen Guan, Viet Anh Trinh, Vivek Voleti, Jacob Whitehill
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[50] arXiv:2507.18897 [pdf, html, other]
Title: HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling
Rongkun Xue, Yazhe Niu, Shuai Hu, Zixin Yin, Yongqiang Yao, Jing Yang
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Total of 57 entries : 1-50 51-57
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • Click here to contact arXiv Contact
  • Click here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack