Sound

Authors and titles for recent submissions

See today's new changes

Total of 57 entries : 1-50 51-57

Showing up to 50 entries per page: fewer | more | all

[1] arXiv:2507.23590 [pdf, html, other]: Title: Identifying Hearing Difficulty Moments in Conversational Audio

Jack Collins, Adrian Buzea, Chris Collier, Alejandro Ballesta Rosen, Julian Maclaren, Richard F. Lyon, Simon Carlile

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2] arXiv:2507.23365 [pdf, html, other]: Title: "I made this (sort of)": Negotiating authorship, confronting fraudulence, and exploring new musical spaces with prompt-based AI music generation

Bob L. T. Sturm

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[3] arXiv:2507.22995 [pdf, html, other]: Title: Balancing Information Preservation and Disentanglement in Self-Supervised Music Representation Learning

Julia Wilkins, Sivan Ding, Magdalena Fuentes, Juan Pablo Bello

Comments: In proceedings of WASPAA 2025. 4 pages, 4 figures, 1 table

Subjects: Sound (cs.SD)
[4] arXiv:2507.23511 (cross-list from eess.AS) [pdf, html, other]: Title: MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks

Yadong Niu, Tianzi Wang, Heinrich Dinkel, Xingwei Sun, Jiahao Zhou, Gang Li, Jizhong Liu, Xunying Liu, Junbo Zhang, Jian Luan

Comments: 9 main pages, 5 figures, 3 tables, and 14 appendix pages

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[5] arXiv:2507.23298 (cross-list from cs.HC) [pdf, html, other]: Title: Real-time Generation of Various Types of Nodding for Avatar Attentive Listening System

Kazushi Kato, Koji Inoue, Divesh Lala, Keiko Ochi, Tatsuya Kawahara

Comments: Accepted by 27th ACM International Conference on Multimodal Interaction (ICMI '25), Long paper

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[6] arXiv:2507.23266 (cross-list from eess.AS) [pdf, html, other]: Title: CUHK-EE Systems for the vTAD Challenge at NCMMSC 2025

Aemon Yat Fei Chiu, Jingyu Li, Yusheng Tian, Guangyan Zhang, Tan Lee

Comments: Under review

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[7] arXiv:2507.23223 (cross-list from eess.AS) [pdf, html, other]: Title: Feature Importance across Domains for Improving Non-Intrusive Speech Intelligibility Prediction in Hearing Aids

Ryandhimas E. Zezario, Sabato M. Siniscalchi, Fei Chen, Hsin-Min Wang, Yu Tsao

Comments: Accepted to Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8] arXiv:2507.23091 (cross-list from cs.AI) [pdf, other]: Title: Moravec's Paradox: Towards an Auditory Turing Test

David Noever, Forrest McKee

Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9] arXiv:2507.23010 (cross-list from cs.LG) [pdf, html, other]: Title: Investigating the Invertibility of Multimodal Latent Spaces: Limitations of Optimization-Based Methods

Siwoo Park

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[10] arXiv:2507.22964 (cross-list from eess.AS) [pdf, other]: Title: Exploring Dynamic Parameters for Vietnamese Gender-Independent ASR

Sotheara Leang (CADT, M-PSI), Éric Castelli (M-PSI), Dominique Vaufreydaz (M-PSI), Sethserey Sam (CADT)

Journal-ref: The 14th Conference on Information Technology and Its Applications (CITA 2025), Jul 2025, Phnom Penh, Cambodia, Cambodia

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD); Signal Processing (eess.SP)

[11] arXiv:2507.22746 [pdf, html, other]: Title: Next Tokens Denoising for Speech Synthesis

Yanqing Liu, Ruiqing Xue, Chong Zhang, Yufei Liu, Gang Wang, Bohan Li, Yao Qian, Lei He, Shujie Liu, Sheng Zhao

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[12] arXiv:2507.22612 [pdf, html, other]: Title: Adaptive Duration Model for Text Speech Alignment

Junjie Cao

Comments: 4 pages, 3 figures, 2 tables

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[13] arXiv:2507.22322 [pdf, html, other]: Title: A Two-Step Learning Framework for Enhancing Sound Event Localization and Detection

Hogeon Yu

Comments: 5pages, 2figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14] arXiv:2507.22208 [pdf, html, other]: Title: Quantum-Inspired Audio Unlearning: Towards Privacy-Preserving Voice Biometrics

Shreyansh Pathak, Sonu Shreshtha, Richa Singh, Mayank Vatsa

Comments: 9 pages, 2 figures, 5 tables, Accepted at IJCB 2025 (Osaka, Japan)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[15] arXiv:2507.22628 (cross-list from eess.AS) [pdf, html, other]: Title: A k-space approach to modeling multi-channel parametric array loudspeaker systems

Tao Zhuang, Longbiao He, Feng Niu, Jia-Xin Zhong, Jing Lu

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[16] arXiv:2507.22370 (cross-list from cs.LG) [pdf, html, other]: Title: Prediction of acoustic field in 1-D uniform duct with varying mean flow and temperature using neural networks

D. Veerababu, Prasanta K. Ghosh

Comments: 22 pages

Journal-ref: Journal of Theoretical and Computational Acoustics, 33, 2025

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

[17] arXiv:2507.21642 [pdf, html, other]: Title: Whilter: A Whisper-based Data Filter for "In-the-Wild" Speech Corpora Using Utterance-level Multi-Task Classification

William Ravenscroft, George Close, Kit Bower-Morris, Jamie Stacey, Dmitry Sityaev, Kris Y. Hong

Comments: Accepted for Interspeech 2025

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[18] arXiv:2507.21463 [pdf, html, other]: Title: SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods

Wen Huang, Yanmei Gu, Zhiming Wang, Huijia Zhu, Yanmin Qian

Comments: Published in ACL 2025. Dataset available at: this https URL

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19] arXiv:2507.21426 [pdf, html, other]: Title: Relationship between objective and subjective perceptual measures of speech in individuals with head and neck cancer

Bence Mark Halpern, Thomas Tienkamp, Teja Rebernik, Rob J.J.H. van Son, Martijn Wieling, Defne Abur, Tomoki Toda

Comments: 5 pages, 1 figure, 1 table. Accepted at Interspeech 2025

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20] arXiv:2507.21202 [pdf, other]: Title: Combolutional Neural Networks

Cameron Churchwell, Minje Kim, Paris Smaragdis

Comments: 4 pages, 3 figures, accepted to WASPAA 2025

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[21] arXiv:2507.21591 (cross-list from cs.CR) [pdf, html, other]: Title: Hierarchical Graph Neural Network for Compressed Speech Steganalysis

Mustapha Hemis, Hamza Kheddar, Mohamed Chahine Ghanem, Bachir Boudraa

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22] arXiv:2507.21522 (cross-list from cs.CL) [pdf, html, other]: Title: Model-free Speculative Decoding for Transformer-based ASR with Token Map Drafting

Tuan Vu Ho, Hiroaki Kokubo, Masaaki Yamamoto, Yohei Kawaguchi

Comments: Accepted at EUSIPCO 2025

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23] arXiv:2507.21395 (cross-list from cs.MM) [pdf, html, other]: Title: Sync-TVA: A Graph-Attention Framework for Multimodal Emotion Recognition with Cross-Modal Fusion

Zeyu Deng, Yanhui Lu, Jiashu Liao, Shuang Wu, Chongfeng Wei

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[24] arXiv:2507.21331 (cross-list from cs.CL) [pdf, other]: Title: A Deep Learning Automatic Speech Recognition Model for Shona Language

Leslie Wellington Sirora, Mainford Mutandavari

Journal-ref: International Journal of Innovative Research in Computer and Communication Engineering, 12(9) (2024)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[25] arXiv:2507.21138 (cross-list from cs.CL) [pdf, html, other]: Title: TTS-1 Technical Report

Oleg Atamanenko, Anna Chalova, Joseph Coombes, Nikki Cope, Phillip Dang, Zhifeng Deng, Jimmy Du, Michael Ermolenko, Feifan Fan, Yufei Feng, Cheryl Fichter, Pavel Filimonov, Louis Fischer, Kylan Gibbs, Valeria Gusarova, Pavel Karpik, Andreas Assad Kottner, Ian Lee, Oliver Louie, Jasmine Mai, Mikhail Mamontov, Suri Mao, Nurullah Morshed, Igor Poletaev, Florin Radu, Dmytro Semernia, Evgenii Shingarev, Vikram Sivaraja, Peter Skirko, Rinat Takhautdinov, Robert Villahermosa, Jean Wang

Comments: 20 pages, 10 figures. For associated modeling and training code, see this https URL

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

[26] arXiv:2507.20900 [pdf, html, other]: Title: Music Arena: Live Evaluation for Text-to-Music

Yonghyun Kim, Wayne Chi, Anastasios N. Angelopoulos, Wei-Lin Chiang, Koichi Saito, Shinji Watanabe, Yuki Mitsufuji, Chris Donahue

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[27] arXiv:2507.20880 [pdf, html, other]: Title: JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment

Renhang Liu, Chia-Yu Hung, Navonil Majumder, Taylor Gautreaux, Amir Ali Bagherzadeh, Chuan Li, Dorien Herremans, Soujanya Poria

Comments: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[28] arXiv:2507.20731 [pdf, other]: Title: Learning Neural Vocoder from Range-Null Space Decomposition

Andong Li, Tong Lei, Zhihang Sun, Rilin Chen, Erwei Yin, Xiaodong Li, Chengshi Zheng

Comments: 10 pages, 7 figures, IJCAI2025

Subjects: Sound (cs.SD)
[29] arXiv:2507.20624 [pdf, other]: Title: Hyperbolic Embeddings for Order-Aware Classification of Audio Effect Chains

Aogu Wada, Tomohiko Nakamura, Hiroshi Saruwatari

Comments: 7 pages, 3 figures, accepted for the 28th International Conference on Digital Audio Effects (DAFx25)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[30] arXiv:2507.20485 [pdf, html, other]: Title: Sound Safeguarding for Acoustic Measurement Using Any Sounds: Tools and Applications

Hideki Kawahara, Kohei Yatabe, Ken-Ichi Sakakibara

Comments: 2 pages, 2 figures, IEEE GCCE 2025 Demo session, Accepted

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[31] arXiv:2507.20417 [pdf, html, other]: Title: Two Views, One Truth: Spectral and Self-Supervised Features Fusion for Robust Speech Deepfake Detection

Yassine El Kheir, Arnab Das, Enes Erdem Erdogan, Fabian Ritter-Guttierez, Tim Polzehl, Sebastian Möller

Comments: ACCEPTED WASPAA 2025

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[32] arXiv:2507.20169 [pdf, html, other]: Title: Self-Improvement for Audio Large Language Model using Unlabeled Speech

Shaowen Wang, Xinyuan Chen, Yao Xu

Comments: To appear in Interspeech 2025. 6 pages, 1 figure

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33] arXiv:2507.20140 [pdf, html, other]: Title: Do Not Mimic My Voice: Speaker Identity Unlearning for Zero-Shot Text-to-Speech

Taesoo Kim, Jinju Kim, Dongchan Kim, Jong Hwan Ko, Gyeong-Moon Park

Comments: Proceedings of the 42nd International Conference on Machine Learning (ICML 2025), Vancouver, Canada. PMLR 267, 2025. Authors Jinju Kim and Taesoo Kim contributed equally

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[34] arXiv:2507.20128 [pdf, html, other]: Title: Diffusion-based Symbolic Music Generation with Structured State Space Models

Shenghua Yuan, Xing Tang, Jiatao Chen, Tianming Xie, Jing Wang, Bing Shi

Comments: 9 pages,3figures

Subjects: Sound (cs.SD)
[35] arXiv:2507.20052 [pdf, html, other]: Title: Improving Deep Learning-based Respiratory Sound Analysis with Frequency Selection and Attention Mechanism

Nouhaila Fraihi, Ouassim Karrakchou, Mounir Ghogho

Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[36] arXiv:2507.20036 [pdf, html, other]: Title: Improving Audio Classification by Transitioning from Zero- to Few-Shot

James Taylor, Wolfgang Mack

Comments: Submitted to Interspeech 2025

Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[37] arXiv:2507.19991 [pdf, html, other]: Title: Efficient Vocal-Conditioned Music Generation via Soft Alignment Attention and Latent Diffusion

Hei Shing Cheung, Boya Zhang

Comments: 6 page, 3 figures

Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[38] arXiv:2507.19835 [pdf, html, other]: Title: SonicGauss: Position-Aware Physical Sound Synthesis for 3D Gaussian Representations

Chunshi Wang, Hongxing Li, Yawei Luo

Comments: Accepted by ACMMM'25

Subjects: Sound (cs.SD); Multimedia (cs.MM)
[39] arXiv:2507.19557 [pdf, html, other]: Title: Joint Feature and Output Distillation for Low-complexity Acoustic Scene Classification

Haowen Li, Ziyi Yang, Mou Wang, Ee-Leng Tan, Junwei Yeow, Santi Peksi, Woon-Seng Gan

Comments: 4 pages, submitted to DCASE2025 Challenge Task 1

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[40] arXiv:2507.20666 (cross-list from eess.AS) [pdf, other]: Title: MIMII-Agent: Leveraging LLMs with Function Calling for Relative Evaluation of Anomalous Sound Detection

Harsh Purohit, Tomoya Nishida, Kota Dohi, Takashi Endo, Yohei Kawaguchi

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[41] arXiv:2507.20627 (cross-list from cs.MM) [pdf, other]: Title: Controllable Video-to-Music Generation with Multiple Time-Varying Conditions

Junxian Wu, Weitao You, Heda Zuo, Dengming Zhang, Pei Chen, Lingyun Sun

Comments: Accepted by the 33rd ACM International Conference on Multimedia (ACMMM 2025). The project page is available at this https URL

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[42] arXiv:2507.20530 (cross-list from eess.AS) [pdf, other]: Title: Binaural Sound Event Localization and Detection based on HRTF Cues for Humanoid Robots

Gyeong-Tae Lee, Hyeonuk Nam, Yong-Hwa Park

Comments: Submitted to IEEE/ACM TASLP

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[43] arXiv:2507.19836 (cross-list from cs.GR) [pdf, html, other]: Title: ChoreoMuse: Robust Music-to-Dance Video Generation with Style Transfer and Beat-Adherent Motion

Xuanchen Wang, Heng Wang, Weidong Cai

Comments: 10 pages, 5 figures, accepted by the 33rd ACM International Conference on Multimedia (ACM MM 2025), demo page: this https URL

Subjects: Graphics (cs.GR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)
[44] arXiv:2507.19634 (cross-list from cs.CL) [pdf, html, other]: Title: MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks

Sara Papi, Maike Züfle, Marco Gaido, Beatrice Savoldi, Danni Liu, Ioannis Douros, Luisa Bentivogli, Jan Niehues

Comments: Work in progress

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)

[45] arXiv:2507.19308 [pdf, html, other]: Title: The Eloquence team submission for task 1 of MLC-SLM challenge

Lorenzo Concina, Jordi Luque, Alessio Brutti, Marco Matassoni, Yuchen Zhang

Comments: Technical Report for MLC-SLM Challenge of Interspeech2025

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[46] arXiv:2507.19225 [pdf, html, other]: Title: Face2VoiceSync: Lightweight Face-Voice Consistency for Text-Driven Talking Face Generation

Fang Kang, Yin Cao, Haoyu Chen

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[47] arXiv:2507.19202 [pdf, html, other]: Title: Latent Granular Resynthesis using Neural Audio Codecs

Nao Tokui, Tom Baker

Comments: Accepted at ISMIR 2025 Late Breaking Demos

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[48] arXiv:2507.19062 [pdf, html, other]: Title: From Continuous to Discrete: Cross-Domain Collaborative General Speech Enhancement via Hierarchical Language Models

Zhaoxi Mu, Rilin Chen, Andong Li, Meng Yu, Xinyu Yang, Dong Yu

Comments: ACMMM 2025

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[49] arXiv:2507.19037 [pdf, html, other]: Title: MLLM-based Speech Recognition: When and How is Multimodality Beneficial?

Yiwen Guan, Viet Anh Trinh, Vivek Voleti, Jacob Whitehill

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[50] arXiv:2507.18897 [pdf, html, other]: Title: HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling

Rongkun Xue, Yazhe Niu, Shuai Hu, Zixin Yin, Yongqiang Yao, Jing Yang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Total of 57 entries : 1-50 51-57

Showing up to 50 entries per page: fewer | more | all

Sound

Authors and titles for recent submissions

Fri, 1 Aug 2025 (showing 10 of 10 entries )

Thu, 31 Jul 2025 (showing 6 of 6 entries )

Wed, 30 Jul 2025 (showing 9 of 9 entries )

Tue, 29 Jul 2025 (showing 19 of 19 entries )

Mon, 28 Jul 2025 (showing first 6 of 13 entries )