Multimedia

Authors and titles for recent submissions

See today's new changes

Total of 36 entries

Showing up to 50 entries per page: fewer | more | all

[1] arXiv:2507.23444 [pdf, html, other]: Title: Hybrid CNN-Mamba Enhancement Network for Robust Multimodal Sentiment Analysis

Xiang Li, Xianfu Cheng, Xiaoming Zhang, Zhoujun Li

Subjects: Multimedia (cs.MM)
[2] arXiv:2507.23779 (cross-list from cs.CV) [pdf, html, other]: Title: Phi-Ground Tech Report: Advancing Perception in GUI Grounding

Miaosen Zhang, Ziqiang Xu, Jialiang Zhu, Qi Dai, Kai Qiu, Yifan Yang, Chong Luo, Tianyi Chen, Justin Wagle, Tim Franklin, Baining Guo

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[3] arXiv:2507.23042 (cross-list from cs.CV) [pdf, html, other]: Title: Early Goal-Guided Multi-Scale Fusion for Real-Time Vision-Language Driving

Santosh Patapati, Trisanth Srinivasan

Comments: 6 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Robotics (cs.RO)

[4] arXiv:2507.22731 [pdf, html, other]: Title: GestureHYDRA: Semantic Co-speech Gesture Synthesis via Hybrid Modality Diffusion Transformer and Cascaded-Synchronized Retrieval-Augmented Generation

Quanwei Yang, Luying Huang, Kaisiyuan Wang, Jiazhi Guan, Shengyi He, Fengguo Li, Hang Zhou, Lingyun Yu, Yingying Li, Haocheng Feng, Hongtao Xie

Comments: 10 pages, 5 figures, Accepted by ICCV 2025

Subjects: Multimedia (cs.MM)
[5] arXiv:2507.22676 (cross-list from cs.CL) [pdf, html, other]: Title: Listening to the Unspoken: Exploring 365 Aspects of Multimodal Interview Performance Assessment

Jia Li, Yang Wang, Wenhao Qian, Zhenzhen Hu, Richang Hong, Meng Wang

Comments: 8 pages, 4 figures, ACM MM 2025. github:this https URL

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[6] arXiv:2507.22481 (cross-list from eess.IV) [pdf, html, other]: Title: Towards Blind Bitstream-corrupted Video Recovery via a Visual Foundation Model-driven Framework

Tianyi Liu, Kejun Wu, Chen Cai, Yi Wang, Kim-Hui Yap, Lap-Pui Chau

Comments: 10 pages, 5 figures, accepted by ACMMM 2025

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[7] arXiv:2507.22367 (cross-list from cs.CL) [pdf, html, other]: Title: Traits Run Deep: Enhancing Personality Assessment via Psychology-Guided LLM Representations and Multimodal Apparent Behaviors

Jia Li, Yichao He, Jiacheng Xu, Tianhao Luo, Zhenzhen Hu, Richang Hong, Meng Wang

Comments: 8 pages, 3 figures, ACM MM 2025

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[8] arXiv:2507.22099 (cross-list from cs.CV) [pdf, html, other]: Title: Runtime Failure Hunting for Physics Engine Based Software Systems: How Far Can We Go?

Shuqing Li, Qiang Chen, Xiaoxue Ren, Michael R. Lyu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Software Engineering (cs.SE)

[9] arXiv:2507.21926 [pdf, other]: Title: Efficient Sub-pixel Motion Compensation in Learned Video Codecs

Théo Ladune, Thomas Leguay, Pierrick Philippe, Gordon Clare, Félix Henry

Subjects: Multimedia (cs.MM); Image and Video Processing (eess.IV)
[10] arXiv:2507.21557 [pdf, html, other]: Title: PC-JND: Subjective Study and Dataset on Just Noticeable Difference for Point Clouds in 6DoF Virtual Reality

Chunling Fan, Yun Zhang, Dietmar Saupe, Raouf Hamzaoui, Weisi Lin

Comments: 13 pages, 10 figures, Journal

Subjects: Multimedia (cs.MM)
[11] arXiv:2507.21395 [pdf, html, other]: Title: Sync-TVA: A Graph-Attention Framework for Multimodal Emotion Recognition with Cross-Modal Fusion

Zeyu Deng, Yanhui Lu, Jiashu Liao, Shuang Wu, Chongfeng Wei

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[12] arXiv:2507.21741 (cross-list from cs.CV) [pdf, html, other]: Title: MAGE: Multimodal Alignment and Generation Enhancement via Bridging Visual and Semantic Spaces

Shaojun E, Yuchen Yang, Jiaheng Wu, Yan Zhang, Tiejun Zhao, Ziyan Chen

Comments: 9 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[13] arXiv:2507.21507 (cross-list from cs.CV) [pdf, html, other]: Title: VAGU & GtS: LLM-Based Benchmark and Framework for Joint Video Anomaly Grounding and Understanding

Shibo Gao, Peipei Yang, Yangyang Liu, Yi Chen, Han Zhu, Xuyao Zhang, Linlin Huang

Comments: 21 pages, 19 figures, 8 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[14] arXiv:2507.21195 (cross-list from cs.CR) [pdf, html, other]: Title: MaXsive: High-Capacity and Robust Training-Free Generative Image Watermarking in Diffusion Models

Po-Yuan Mao, Cheng-Chang Tsai, Chun-Shien Lu

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[15] arXiv:2507.20177 (cross-list from cs.CV) [pdf, html, other]: Title: Towards Universal Modal Tracking with Online Dense Temporal Token Learning

Yaozong Zheng, Bineng Zhong, Qihua Liang, Shengping Zhang, Guorong Li, Xianxian Li, Rongrong Ji

Comments: arXiv admin note: text overlap with arXiv:2401.01686

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

[16] arXiv:2507.20738 [pdf, other]: Title: Dark Side of Modalities: Reinforced Multimodal Distillation for Multimodal Knowledge Graph Reasoning

Yu Zhao, Ying Zhang, Xuhui Sui, Baohang Zhou, Haoze Zhu, Jeff Z. Pan, Xiaojie Yuan

Comments: Accepted by ACM MM 2025

Subjects: Multimedia (cs.MM)
[17] arXiv:2507.20627 [pdf, other]: Title: Controllable Video-to-Music Generation with Multiple Time-Varying Conditions

Junxian Wu, Weitao You, Heda Zuo, Dengming Zhang, Pei Chen, Lingyun Sun

Comments: Accepted by the 33rd ACM International Conference on Multimedia (ACMMM 2025). The project page is available at this https URL

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[18] arXiv:2507.19863 [pdf, html, other]: Title: Anchoring Trends: Mitigating Social Media Popularity Prediction Drift via Feature Clustering and Expansion

Chia-Ming Lee, Bo-Cheng Qiu, Cheng-Jun Kang, Yi-Hsuan Wu, Jun-Lin Chen, Yu-Fan Lin, Yi-Shiuan Chou, Chih-Chung Hsu

Comments: Accepted by ACM Multimedia 2025

Subjects: Multimedia (cs.MM)
[19] arXiv:2507.20900 (cross-list from cs.SD) [pdf, html, other]: Title: Music Arena: Live Evaluation for Text-to-Music

Yonghyun Kim, Wayne Chi, Anastasios N. Angelopoulos, Wei-Lin Chiang, Koichi Saito, Shinji Watanabe, Yuki Mitsufuji, Chris Donahue

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[20] arXiv:2507.20745 (cross-list from cs.CV) [pdf, other]: Title: Regularizing Subspace Redundancy of Low-Rank Adaptation

Yue Zhu, Haiwen Diao, Shang Gao, Jiazuo Yu, Jiawen Zhu, Yunzhi Zhuge, Shuai Hao, Xu Jia, Lu Zhang, Ying Zhang, Huchuan Lu

Comments: 10 pages, 4 figures, Accepted by ACMMM2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[21] arXiv:2507.20730 (cross-list from cs.HC) [pdf, other]: Title: Vocalize: Lead Acquisition and User Engagement through Gamified Voice Competitions

Edvin Teskeredzic, Muamer Paric, Adna Sestic, Petra Fribert, Anamarija Lukac, Hadzem Hadzic, Kemal Altwlkany, Emanuel Lacic

Comments: Accepted to ACM Hypertext 2025

Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[22] arXiv:2507.20518 (cross-list from cs.CV) [pdf, other]: Title: T2VParser: Adaptive Decomposition Tokens for Partial Alignment in Text to Video Retrieval

Yili Li, Gang Xiong, Gaopeng Gou, Xiangyan Qu, Jiamin Zhuang, Zhen Li, Junzheng Shi

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[23] arXiv:2507.20368 (cross-list from cs.CV) [pdf, html, other]: Title: MagicAnime: A Hierarchically Annotated, Multimodal and Multitasking Dataset with Benchmarks for Cartoon Animation Generation

Shuolin Xu, Bingyuan Wang, Zeyu Cai, Fangteng Fu, Yue Ma, Tongyi Lee, Hongchuan Yu, Zeyu Wang

Comments: 8 pages,6 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[24] arXiv:2507.20300 (cross-list from cs.HC) [pdf, html, other]: Title: Talking-to-Build: How LLM-Assisted Interface Shapes Player Performance and Experience in Minecraft

Xin Sun, Lei Wang, Yue Li, Jie Li, Massimo Poesio, Julian Frommel, Koen Hinriks, Jiahuan Pei

Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[25] arXiv:2507.20286 (cross-list from cs.CV) [pdf, html, other]: Title: T$^\text{3}$SVFND: Towards an Evolving Fake News Detector for Emergencies with Test-time Training on Short Video Platforms

Liyuan Zhang, Zeyun Cheng, Yan Yang, Yong Liu, Jinke Ma

Comments: 16 pages, 3 figures, published to DASFAA 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[26] arXiv:2507.19836 (cross-list from cs.GR) [pdf, html, other]: Title: ChoreoMuse: Robust Music-to-Dance Video Generation with Style Transfer and Beat-Adherent Motion

Xuanchen Wang, Heng Wang, Weidong Cai

Comments: 10 pages, 5 figures, accepted by the 33rd ACM International Conference on Multimedia (ACM MM 2025), demo page: this https URL

Subjects: Graphics (cs.GR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)
[27] arXiv:2507.19835 (cross-list from cs.SD) [pdf, html, other]: Title: SonicGauss: Position-Aware Physical Sound Synthesis for 3D Gaussian Representations

Chunshi Wang, Hongxing Li, Yawei Luo

Comments: Accepted by ACMMM'25

Subjects: Sound (cs.SD); Multimedia (cs.MM)
[28] arXiv:2507.19821 (cross-list from cs.CV) [pdf, html, other]: Title: LAVA: Language Driven Scalable and Versatile Traffic Video Analytics

Yanrui Yu, Tianfei Zhou, Jiaxin Sun, Lianpeng Qiao, Lizhong Ding, Ye Yuan, Guoren Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

[29] arXiv:2507.18932 [pdf, html, other]: Title: Benchmarking Multimodal Understanding and Complex Reasoning for ESG Tasks

Lei Zhang, Xin Zhou, Chaoyue He, Di Wang, Yi Wu, Hong Xu, Wei Liu, Chunyan Miao

Subjects: Multimedia (cs.MM); Computation and Language (cs.CL)
[30] arXiv:2507.18750 [pdf, other]: Title: CatchPhrase: EXPrompt-Guided Encoder Adaptation for Audio-to-Image Generation

Hyunwoo Oh, SeungJu Cha, Kwanyoung Lee, Si-Woo Kim, Dong-Jin Kim

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[31] arXiv:2507.19225 (cross-list from cs.SD) [pdf, html, other]: Title: Face2VoiceSync: Lightweight Face-Voice Consistency for Text-Driven Talking Face Generation

Fang Kang, Yin Cao, Haoyu Chen

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[32] arXiv:2507.19209 (cross-list from cs.CV) [pdf, html, other]: Title: Querying Autonomous Vehicle Point Clouds: Enhanced by 3D Object Counting with CounterNet

Xiaoyu Zhang, Zhifeng Bao, Hai Dong, Ziwei Wang, Jiajun Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[33] arXiv:2507.19125 (cross-list from eess.IV) [pdf, html, other]: Title: Learned Image Compression with Hierarchical Progressive Context Modeling

Yuqi Li, Haotian Zhang, Li Li, Dong Liu

Comments: 17 pages, ICCV 2025

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[34] arXiv:2507.19092 (cross-list from cs.DL) [pdf, html, other]: Title: Comparing OCR Pipelines for Folkloristic Text Digitization

Octavian M. Machidon, Alina L. Machidon

Journal-ref: 4th edition of DigitalHeritage World Congress and Expo 2025

Subjects: Digital Libraries (cs.DL); Multimedia (cs.MM)
[35] arXiv:2507.19037 (cross-list from cs.SD) [pdf, html, other]: Title: MLLM-based Speech Recognition: When and How is Multimodality Beneficial?

Yiwen Guan, Viet Anh Trinh, Vivek Voleti, Jacob Whitehill

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[36] arXiv:2507.18940 (cross-list from cs.CL) [pdf, html, other]: Title: LLaVA-NeuMT: Selective Layer-Neuron Modulation for Efficient Multilingual Multimodal Translation

Jingxuan Wei, Caijun Jia, Qi Chen, Yujun Cai, Linzhuang Sun, Xiangxiang Zhang, Gaowei Wu, Bihui Yu

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)

Total of 36 entries

Showing up to 50 entries per page: fewer | more | all

Multimedia

Authors and titles for recent submissions

Fri, 1 Aug 2025 (showing 3 of 3 entries )

Thu, 31 Jul 2025 (showing 5 of 5 entries )

Wed, 30 Jul 2025 (showing 7 of 7 entries )

Tue, 29 Jul 2025 (showing 13 of 13 entries )

Mon, 28 Jul 2025 (showing 8 of 8 entries )