Electrical Engineering and Systems Science

See recent articles

Showing new listings for Friday, 1 August 2025

Total of 97 entries

Showing up to 2000 entries per page: fewer | more | all

[1] arXiv:2507.22906 [pdf, html, other]: Title: DNN-based Methods of Jointly Sensing Number and Directions of Targets via a Green Massive H2AD MIMO Receiver

Bin Deng, Jiatong Bai, Feilong Zhao, Zuming Xie, Maolin Li, Yan Wang, Feng Shu

Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)

As a green MIMO structure, the heterogeneous hybrid analog-digital H2AD MIMO architecture has been shown to own a great potential to replace the massive or extremely large-scale fully-digital MIMO in the future wireless networks to address the three challenging problems faced by the latter: high energy consumption, high circuit cost, and high complexity. However, how to intelligently sense the number and direction of multi-emitters via such a structure is still an open hard problem. To address this, we propose a two-stage sensing framework that jointly estimates the number and direction values of multiple targets. Specifically, three target number sensing methods are designed: an improved eigen-domain clustering (EDC) framework, an enhanced deep neural network (DNN) based on five key statistical features, and an improved one-dimensional convolutional neural network (1D-CNN) utilizing full eigenvalues. Subsequently, a low-complexity and high-accuracy DOA estimation is achieved via the introduced online micro-clustering (OMC-DOA) method. Furthermore, we derive the Cramér-Rao lower bound (CRLB) for the H2AD under multiple-source conditions as a theoretical performance benchmark. Simulation results show that the developed three methods achieve 100\% number of targets sensing at moderate-to-high SNRs, while the improved 1D-CNN exhibits superior under extremely-low SNR conditions. The introduced OMC-DOA outperforms existing clustering and fusion-based DOA methods in multi-source environments.
[2] arXiv:2507.22909 [pdf, html, other]: Title: Rydberg Atomic Receivers for Wireless Communications: Fundamentals, Potential, Applications, and Challenges

Yin Zhang, Jiayi Zhang, Bokai Xu, Yuanbin Chen, Zhilong Liu, Jiakang Zheng, Enyu Shi, Ziheng Liu, Tierui Gong, Wei E. I. Sha, Chau Yuen, Shi Jin, Bo Ai

Subjects: Signal Processing (eess.SP)

Rydberg atomic receivers (RARs) leverage the quantum coherence of highly excited atoms to overcome the intrinsic physical limitations of conventional radio frequency receivers (RFRs), particularly in sensitivity, and bandwidth. This innovative technology represents a paradigm shift in wireless communication systems. This paper systematically explains the fundamental sensing mechanisms of RARs, contrasts their differences from RFRs in working principles and architectures. We explore their advantages in emerging wireless communication scenarios, such as integrated sensing and communications, quantum Rydberg radar, and quantum space communications. Practical challenges, such as limited instantaneous bandwidth and nonlinear distortion, are identified. To address these issues, mitigation strategies and future research directions are also outlined, supporting the advancement of RAR-aided wireless systems.
[3] arXiv:2507.22953 [pdf, other]: Title: CADS: A Comprehensive Anatomical Dataset and Segmentation for Whole-Body Anatomy in Computed Tomography

Murong Xu, Tamaz Amiranashvili, Fernando Navarro, Maksym Fritsak, Ibrahim Ethem Hamamci, Suprosanna Shit, Bastian Wittmann, Sezgin Er, Sebastian M. Christ, Ezequiel de la Rosa, Julian Deseoe, Robert Graf, Hendrik Möller, Anjany Sekuboyina, Jan C. Peeken, Sven Becker, Giulia Baldini, Johannes Haubold, Felix Nensa, René Hosch, Nikhil Mirajkar, Saad Khalid, Stefan Zachow, Marc-André Weber, Georg Langs, Jakob Wasserthal, Mehmet Kemal Ozdemir, Andrey Fedorov, Ron Kikinis, Stephanie Tanadini-Lang, Jan S. Kirschke, Stephanie E. Combs, Bjoern Menze

Subjects: Image and Video Processing (eess.IV)

Accurate delineation of anatomical structures in volumetric CT scans is crucial for diagnosis and treatment planning. While AI has advanced automated segmentation, current approaches typically target individual structures, creating a fragmented landscape of incompatible models with varying performance and disparate evaluation protocols. Foundational segmentation models address these limitations by providing a holistic anatomical view through a single model. Yet, robust clinical deployment demands comprehensive training data, which is lacking in existing whole-body approaches, both in terms of data heterogeneity and, more importantly, anatomical coverage. In this work, rather than pursuing incremental optimizations in model architecture, we present CADS, an open-source framework that prioritizes the systematic integration, standardization, and labeling of heterogeneous data sources for whole-body CT segmentation. At its core is a large-scale dataset of 22,022 CT volumes with complete annotations for 167 anatomical structures, representing a significant advancement in both scale and coverage, with 18 times more scans than existing collections and 60% more distinct anatomical targets. Building on this diverse dataset, we develop the CADS-model using established architectures for accessible and automated full-body CT segmentation. Through comprehensive evaluation across 18 public datasets and an independent real-world hospital cohort, we demonstrate advantages over SoTA approaches. Notably, thorough testing of the model's performance in segmentation tasks from radiation oncology validates its direct utility for clinical interventions. By making our large-scale dataset, our segmentation models, and our clinical software tool publicly available, we aim to advance robust AI solutions in radiology and make comprehensive anatomical analysis accessible to clinicians and researchers alike.
[4] arXiv:2507.22964 [pdf, other]: Title: Exploring Dynamic Parameters for Vietnamese Gender-Independent ASR

Sotheara Leang (CADT, M-PSI), Éric Castelli (M-PSI), Dominique Vaufreydaz (M-PSI), Sethserey Sam (CADT)

Journal-ref: The 14th Conference on Information Technology and Its Applications (CITA 2025), Jul 2025, Phnom Penh, Cambodia, Cambodia

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD); Signal Processing (eess.SP)

The dynamic characteristics of speech signal provides temporal information and play an important role in enhancing Automatic Speech Recognition (ASR). In this work, we characterized the acoustic transitions in a ratio plane of Spectral Subband Centroid Frequencies (SSCFs) using polar parameters to capture the dynamic characteristics of the speech and minimize spectral variation. These dynamic parameters were combined with Mel-Frequency Cepstral Coefficients (MFCCs) in Vietnamese ASR to capture more detailed spectral information. The SSCF0 was used as a pseudo-feature for the fundamental frequency (F0) to describe the tonal information robustly. The findings showed that the proposed parameters significantly reduce word error rates and exhibit greater gender independence than the baseline MFCCs.
[5] arXiv:2507.23001 [pdf, html, other]: Title: LesionGen: A Concept-Guided Diffusion Model for Dermatology Image Synthesis

Jamil Fayyad, Nourhan Bayasi, Ziyang Yu, Homayoun Najjaran

Comments: Accepted at the MICCAI 2025 ISIC Workshop

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Deep learning models for skin disease classification require large, diverse, and well-annotated datasets. However, such resources are often limited due to privacy concerns, high annotation costs, and insufficient demographic representation. While text-to-image diffusion probabilistic models (T2I-DPMs) offer promise for medical data synthesis, their use in dermatology remains underexplored, largely due to the scarcity of rich textual descriptions in existing skin image datasets. In this work, we introduce LesionGen, a clinically informed T2I-DPM framework for dermatology image synthesis. Unlike prior methods that rely on simplistic disease labels, LesionGen is trained on structured, concept-rich dermatological captions derived from expert annotations and pseudo-generated, concept-guided reports. By fine-tuning a pretrained diffusion model on these high-quality image-caption pairs, we enable the generation of realistic and diverse skin lesion images conditioned on meaningful dermatological descriptions. Our results demonstrate that models trained solely on our synthetic dataset achieve classification accuracy comparable to those trained on real images, with notable gains in worst-case subgroup performance. Code and data are available here.
[6] arXiv:2507.23013 [pdf, html, other]: Title: Stabilization of Age-Structured Competing Populations

Carina Veil, Miroslav Krstić, Patrick McNamee, Oliver Sawodny

Comments: submitted to IFAC Automatica

Subjects: Systems and Control (eess.SY)

Age-structured models represent the dynamic behaviors of populations over time and result in integro-partial differential equations (IPDEs). Such processes arise in biotechnology, economics, demography, and other domains. Coupled age-structured IPDE population dynamics with two or more species occur in epidemiology and ecology, but have received little attention thus far. This work considers an exponentially unstable model of two competing predator populations, formally referred to in the literature as ''competition'' dynamics. If one were to apply an input that simultaneously harvests both predator species, one would have control over only the product of the densities of the species, not over their ratio. Therefore, it is necessary to design a control input that directly harvests only one of the two predator species, while indirectly influencing the other via a backstepping approach. The model is transformed into a system of two coupled ordinary differential equations (ODEs), of which only one is actuated, and two autonomous, exponentially stable integral delay equations (IDEs) which enter the ODEs as nonlinear disturbances. The ODEs are globally stabilized with backstepping and an estimate of the region of attraction of the asymptotically stabilized equilibrium of the full IPDE system is provided, under a positivity restriction on control. These generalizations open exciting possibilities for future research directions, such as investigating population systems with more than two species.
[7] arXiv:2507.23057 [pdf, html, other]: Title: Neural Energy Landscapes Predict Working Memory Decline After Brain Tumor Resection

Triet M. Tran, Sina Khanmohammadi

Subjects: Signal Processing (eess.SP); Neurons and Cognition (q-bio.NC)

Surgical resection is the primary treatment option for brain tumor patients, but it carries the risk of postoperative cognitive dysfunction. This study investigates how tumor-induced alterations in presurgical neural dynamics relate to postoperative working memory decline. We analyzed functional magnetic resonance imaging (fMRI) of brain tumor patients before surgery and extracted energy landscapes of high-order brain interactions. We then examined the relation between these energy features and postoperative working memory performance using statistical and machine learning (random forest) models. Patients with lower postoperative working memory scores exhibited fewer but more extreme transitions between local energy minima and maxima, whereas patients with higher scores showed more frequent but less extreme shifts. Furthermore, the presurgical high-order energy features were able to accurately predict postoperative working memory decline with a mean accuracy of 90\%, F1 score of 87.5\%, and an AUC of 0.95. Our study suggests that the brain tumor-induced disruptions in high-order neural dynamics before surgery are predictive of postoperative working memory decline. Our findings pave the path for personalized surgical planning and targeted interventions to mitigate cognitive risks associated with brain tumor resection.
[8] arXiv:2507.23065 [pdf, html, other]: Title: Diffusion model for gradient preconditioning in hyperspectral imaging inverse problems

Jonathan Monsalve, Kumar Vijay Mishra

Subjects: Image and Video Processing (eess.IV)

Recovering high-dimensional statistical structure from limited measurements is a fundamental challenge in hyperspectral imaging, where capturing full-resolution data is often infeasible due to sensor, bandwidth, or acquisition constraints. A common workaround is to partition measurements and estimate local statistics-such as the covariance matrix-using only partial observations. However, this strategy introduces noise in the optimization gradients, especially when each partition contains few samples. In this work, we reinterpret this accumulation of gradient noise as a diffusion process, where successive partitions inject increasing uncertainty into the learning signal. Building on this insight, we propose a novel framework that leverages denoising diffusion models to learn a reverse process in gradient space. The model is trained to map noisy gradient estimates toward clean, well-conditioned updates, effectively preconditioning the optimization. Our approach bridges generative modeling and inverse problem solving, improving convergence and reconstruction quality under aggressive sampling regimes. We validate our method on hyperspectral recovery tasks, demonstrating significant gains in accuracy and stability over traditional optimization pipelines.
[9] arXiv:2507.23076 [pdf, other]: Title: Terahertz for Radar applications and Wireless Communication

Sofiane Latreche, Hocine Bellahsene, Abdelmalik Taleb-Ahmed

Journal-ref: The First National Conference on Advances in Computational Intelligence, Systems and Networking (2023, November)

Subjects: Systems and Control (eess.SY)

Technological advancements in the design of electronic and optical materials have opened up the possibility of utilizing the latest available Radio Frequency spectrum the Terahertz (THz) band. This band holds great promise for next-generation wireless systems, which are poised to seamlessly integrate a wide array of data-intensive and time-sensitive applications. In this article, we delve into the Terahertz band, providing insights into its properties and showcasing examples of its applications. We begin by exploring the specific characteristics of wireless communications and radar systems operating in the THz band. Subsequently, we analyze various effects and parameters unique to each of these this http URL we scrutinize the application of Terahertz (THz) wireless and radar systems, delving into the modeling of various facets of radio frequency propagation within this domain. The interpretation of our findings will be presented at the conclusion of this study.
[10] arXiv:2507.23078 [pdf, html, other]: Title: Experimentally-Driven Analysis of Stability in Connected Vehicle Platooning: Insights and Control Strategies

Niladri Dutta, Elham Abolfazli, Themistoklis Charalambous

Subjects: Systems and Control (eess.SY); Robotics (cs.RO)

This paper presents the development of a tangible platform for demonstrating the practical implementation of cooperative adaptive cruise control (CACC) systems, an enhancement to the standard adaptive cruise control (ACC) concept by means of Vehicle-to-Everything (V2X) communication. It involves a detailed examination of existing longitudinal controllers and their performance in homogeneous vehicle platoons. Moreover, extensive tests are conducted using multiple autonomous experimental vehicle platform topologies to verify the effectiveness of the controller. The outcomes from both simulations and field tests affirm the substantial benefits of the proposed CACC platooning approach in longitudinal vehicle platooning scenarios. This research is crucial due to a notable gap in the existing literature; while numerous studies focus on simulated vehicle platooning systems, there is lack of research demonstrating these controllers on physical vehicle systems or robot platforms. This paper seeks to fill this gap by providing a practical demonstration of CACC systems in action, showcasing their potential for real-world application in intelligent transportation systems.
[11] arXiv:2507.23129 [pdf, html, other]: Title: MRpro - open PyTorch-based MR reconstruction and processing package

Felix Frederik Zimmermann, Patrick Schuenke, Christoph S. Aigner, Bill A. Bernhardt, Mara Guastini, Johannes Hammacher, Noah Jaitner, Andreas Kofler, Leonid Lunin, Stefan Martin, Catarina Redshaw Kranich, Jakob Schattenfroh, David Schote, Yanglei Wu, Christoph Kolbitsch

Comments: Submitted to Magnetic Resonance in Medicine

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)

We introduce MRpro, an open-source image reconstruction package built upon PyTorch and open data formats. The framework comprises three main areas. First, it provides unified data structures for the consistent manipulation of MR datasets and their associated metadata (e.g., k-space trajectories). Second, it offers a library of composable operators, proximable functionals, and optimization algorithms, including a unified Fourier operator for all common trajectories and an extended phase graph simulation for quantitative MR. These components are used to create ready-to-use implementations of key reconstruction algorithms. Third, for deep learning, MRpro includes essential building blocks such as data consistency layers, differentiable optimization layers, and state-of-the-art backbone networks and integrates public datasets to facilitate reproducibility. MRpro is developed as a collaborative project supported by automated quality control. We demonstrate the versatility of MRpro across multiple applications, including Cartesian, radial, and spiral acquisitions; motion-corrected reconstruction; cardiac MR fingerprinting; learned spatially adaptive regularization weights; model-based learned image reconstruction and quantitative parameter estimation. MRpro offers an extensible framework for MR image reconstruction. With reproducibility and maintainability at its core, it facilitates collaborative development and provides a foundation for future MR imaging research.
[12] arXiv:2507.23139 [pdf, html, other]: Title: Robust Control Design and Analysis for Nonlinear Systems with Uncertain Initial Conditions Based on Lifting Linearization

Sourav Sinha, Mazen Farhood

Comments: 24 pages, 13 figures

Subjects: Systems and Control (eess.SY)

This paper presents a robust control synthesis and analysis framework for nonlinear systems with uncertain initial conditions. First, a deep learning-based lifting approach is proposed to approximate nonlinear dynamical systems with linear parameter-varying (LPV) state-space models in higher-dimensional spaces while simultaneously characterizing the uncertain initial states within the lifted state space. Then, convex synthesis conditions are provided to generate full-state feedback nonstationary LPV (NSLPV) controllers for the lifted LPV system. A performance measure similar to the l2-induced norm is used to provide robust performance guarantees in the presence of exogenous disturbances and uncertain initial conditions. The paper also includes results for synthesizing full-state feedback LTI controllers and output feedback NSLPV controllers. Additionally, a robustness analysis approach based on integral quadratic constraint (IQC) theory is developed to analyze and tune the synthesized controllers while accounting for noise associated with state measurements. This analysis approach characterizes model parameters and disturbance inputs using IQCs to reduce conservatism. Finally, the effectiveness of the proposed framework is demonstrated through two illustrative examples.
[13] arXiv:2507.23147 [pdf, html, other]: Title: Foundation Models for Clean Energy Forecasting: A Comprehensive Review

Md Meftahul Ferdaus, Tanmoy Dam, Md Rasel Sarkar, Moslem Uddin, Sreenatha G. Anavatti

Comments: This paper is currently under review at the journal

Subjects: Systems and Control (eess.SY)

As global energy systems transit to clean energy, accurate renewable generation and renewable demand forecasting is imperative for effective grid management. Foundation Models (FMs) can help improve forecasting of renewable generation and demand because FMs can rapidly process complex, high-dimensional time-series data. This review paper focuses on FMs in the realm of renewable energy forecasting, primarily focusing on wind and solar. We present an overview of the architectures, pretraining strategies, finetuning methods, and types of data used in the context of renewable energy forecasting. We emphasize the role of models that are trained at a large scale, domain specific Transformer architectures, where attention is paid to spatial temporal correlations, the embedding of domain knowledge, and also the brief and intermittent nature of renewable generation. We assess recent FM based advancements in forecast accuracy such as reconciling predictions over multiple time scales and quantifying uncertainty in renewable energy forecasting. We also review existing challenges and areas of improvement in long-term and multivariate time series forecasting. In this survey, a distinction between theory and practice is established regarding the use of FMs in the clean energy forecasting domain. Additionally, it critically assesses the strengths and weaknesses of FMs while advancing future research direction in this new and exciting area of forecasting.
[14] arXiv:2507.23159 [pdf, html, other]: Title: Full-Duplex-Bench v1.5: Evaluating Overlap Handling for Full-Duplex Speech Models

Guan-Ting Lin, Shih-Yun Shan Kuan, Qirui Wang, Jiachen Lian, Tingle Li, Hung-yi Lee

Comments: Work in Progress

Subjects: Audio and Speech Processing (eess.AS)

While full-duplex speech agents promise natural, low-latency human--machine interaction by concurrently processing input and output speech, overlap management remains under-evaluated. We introduce Full-Duplex-Bench v1.5, a modular, fully automated benchmark that simulates four overlap scenarios: user interruption, listener backchannel, side conversation, and ambient speech. Our framework supports both open-sourced and commercial models, offering a comprehensive, extensible metric suite -- categorical dialogue behaviors, stop and response latency, prosodic adaptation, and perceived speech quality -- that can be tailored to application-specific criteria. Benchmarking five state-of-the-art agents reveals two principal strategies: repair-first rapid yielding versus continuity-first sustained flow, and highlights scenario-dependent performance trends. The open-sourced design enables seamless extension with new audio assets, languages, and deployment contexts, empowering practitioners to customize and accelerate the evaluation of robust full-duplex speech systems.
[15] arXiv:2507.23219 [pdf, html, other]: Title: Learning Arbitrary-Scale RAW Image Downscaling with Wavelet-based Recurrent Reconstruction

Yang Ren, Hai Jiang, Wei Li, Menglong Yang, Heng Zhang, Zehua Sheng, Qingsheng Ye, Shuaicheng Liu

Comments: Accepted by ACM MM 2025

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Image downscaling is critical for efficient storage and transmission of high-resolution (HR) images. Existing learning-based methods focus on performing downscaling within the sRGB domain, which typically suffers from blurred details and unexpected artifacts. RAW images, with their unprocessed photonic information, offer greater flexibility but lack specialized downscaling frameworks. In this paper, we propose a wavelet-based recurrent reconstruction framework that leverages the information lossless attribute of wavelet transformation to fulfill the arbitrary-scale RAW image downscaling in a coarse-to-fine manner, in which the Low-Frequency Arbitrary-Scale Downscaling Module (LASDM) and the High-Frequency Prediction Module (HFPM) are proposed to preserve structural and textural integrity of the reconstructed low-resolution (LR) RAW images, alongside an energy-maximization loss to align high-frequency energy between HR and LR domain. Furthermore, we introduce the Realistic Non-Integer RAW Downscaling (Real-NIRD) dataset, featuring a non-integer downscaling factor of 1.3$\times$, and incorporate it with publicly available datasets with integer factors (2$\times$, 3$\times$, 4$\times$) for comprehensive benchmarking arbitrary-scale image downscaling purposes. Extensive experiments demonstrate that our method outperforms existing state-of-the-art competitors both quantitatively and visually. The code and dataset will be released at this https URL.
[16] arXiv:2507.23223 [pdf, html, other]: Title: Feature Importance across Domains for Improving Non-Intrusive Speech Intelligibility Prediction in Hearing Aids

Ryandhimas E. Zezario, Sabato M. Siniscalchi, Fei Chen, Hsin-Min Wang, Yu Tsao

Comments: Accepted to Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Given the critical role of non-intrusive speech intelligibility assessment in hearing aids (HA), this paper enhances its performance by introducing Feature Importance across Domains (FiDo). We estimate feature importance on spectral and time-domain acoustic features as well as latent representations of Whisper. Importance weights are calculated per frame, and based on these weights, features are projected into new spaces, allowing the model to focus on important areas early. Next, feature concatenation is performed to combine the features before the assessment module processes them. Experimental results show that when FiDo is incorporated into the improved multi-branched speech intelligibility model MBI-Net+, RMSE can be reduced by 7.62% (from 26.10 to 24.11). MBI-Net+ with FiDo also achieves a relative RMSE reduction of 3.98% compared to the best system in the 2023 Clarity Prediction Challenge. These results validate FiDo's effectiveness in enhancing neural speech assessment in HA.
[17] arXiv:2507.23224 [pdf, html, other]: Title: EMORe: Motion-Robust 5D MRI Reconstruction via Expectation-Maximization-Guided Binning Correction and Outlier Rejection

Syed M. Arshad, Lee C. Potter, Yingmin Liu, Christopher Crabtree, Matthew S. Tong, Rizwan Ahmad

Subjects: Image and Video Processing (eess.IV); Signal Processing (eess.SP)

We propose EMORe, an adaptive reconstruction method designed to enhance motion robustness in free-running, free-breathing self-gated 5D cardiac magnetic resonance imaging (MRI). Traditional self-gating-based motion binning for 5D MRI often results in residual motion artifacts due to inaccuracies in cardiac and respiratory signal extraction and sporadic bulk motion, compromising clinical utility. EMORe addresses these issues by integrating adaptive inter-bin correction and explicit outlier rejection within an expectation-maximization (EM) framework, whereby the E-step and M-step are executed alternately until convergence. In the E-step, probabilistic (soft) bin assignments are refined by correcting misassignment of valid data and rejecting motion-corrupted data to a dedicated outlier bin. In the M-step, the image estimate is improved using the refined soft bin assignments. Validation in a simulated 5D MRXCAT phantom demonstrated EMORe's superior performance compared to standard compressed sensing reconstruction, showing significant improvements in peak signal-to-noise ratio, structural similarity index, edge sharpness, and bin assignment accuracy across varying levels of simulated bulk motion. In vivo validation in 13 volunteers further confirmed EMORe's robustness, significantly enhancing blood-myocardium edge sharpness and reducing motion artifacts compared to compressed sensing, particularly in scenarios with controlled coughing-induced motion. Although EMORe incurs a modest increase in computational complexity, its adaptability and robust handling of bulk motion artifacts significantly enhance the clinical applicability and diagnostic confidence of 5D cardiac MRI.
[18] arXiv:2507.23235 [pdf, html, other]: Title: In-Orbit Cosmo-SkyMed antenna pattern estimation by a narrowband sweeper receiver

Mohammad Roueinfar, Masoud Ardini

Subjects: Signal Processing (eess.SP)

This paper introduces a novel method for antenna pattern estimation in satellites equipped with Synthetic Aperture Radar (SAR), utilizing a Narrowband Sweeper Receiver (NSR). By accurately measuring power across individual frequencies within SAR's inherently broadband spectrum, the NSR significantly enhances antenna pattern extraction accuracy. Analytical models and practical experiments conducted using the Cosmo-SkyMed satellite validate the receiver's performance, demonstrating superior signal-to-noise ratio (SNR) compared to conventional receivers. This research represents a key advancement in SAR technology, offering a robust framework for future satellite calibration and verification methodologies.
[19] arXiv:2507.23236 [pdf, html, other]: Title: BS-1-to-N: Diffusion-Based Environment-Aware Cross-BS Channel Knowledge Map Generation for Cell-Free Networks

Zhuoyin Dai, Di Wu, Yong Zeng, Xiaoli Xu, Xinyi Wang, Zesong Fei

Subjects: Signal Processing (eess.SP); Image and Video Processing (eess.IV)

Channel knowledge map (CKM) inference across base stations (BSs) is the key to achieving efficient environmentaware communications. This paper proposes an environmentaware cross-BS CKM inference method called BS-1-to-N based on the generative diffusion model. To this end, we first design the BS location embedding (BSLE) method tailored for cross-BS CKM inference to embed BS location information in the feature vector of CKM. Further, we utilize the cross- and self-attention mechanism for the proposed BS-1-to-N model to respectively learn the relationships between source and target BSs, as well as that among target BSs. Therefore, given the locations of the source and target BSs, together with the source CKMs as control conditions, cross-BS CKM inference can be performed for an arbitrary number of source and target BSs. Specifically, in architectures with massive distributed nodes like cell-free networks, traditional methods of sequentially traversing each BS for CKM construction are prohibitively costly. By contrast, the proposed BS-1-to-N model is able to achieve efficient CKM inference for a target BS at any potential location based on the CKMs of source BSs. This is achieved by exploiting the fact that within a given area, different BSs share the same wireless environment that leads to their respective CKMs. Therefore, similar to multi-view synthesis, CKMs of different BSs are representations of the same wireless environment from different BS locations. By mining the implicit correlation between CKM and BS location based on the wireless environment, the proposed BS-1-to-N method achieves efficient CKM inference across BSs. We provide extensive comparisons of CKM inference between the proposed BS-1-to-N generative model versus benchmarking schemes, and provide one use case study to demonstrate its practical application for the optimization of BS deployment.
[20] arXiv:2507.23256 [pdf, html, other]: Title: EMedNeXt: An Enhanced Brain Tumor Segmentation Framework for Sub-Saharan Africa using MedNeXt V2 with Deep Supervision

Ahmed Jaheen, Abdelrahman Elsayed, Damir Kim, Daniil Tikhonov, Matheus Scatolin, Mohor Banerjee, Qiankun Ji, Mostafa Salem, Hu Wang, Sarim Hashmi, Mohammad Yaqub

Comments: Submitted to the BraTS-Lighthouse 2025 Challenge (MICCAI 2025)

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Brain cancer affects millions worldwide, and in nearly every clinical setting, doctors rely on magnetic resonance imaging (MRI) to diagnose and monitor gliomas. However, the current standard for tumor quantification through manual segmentation of multi-parametric MRI is time-consuming, requires expert radiologists, and is often infeasible in under-resourced healthcare systems. This problem is especially pronounced in low-income regions, where MRI scanners are of lower quality and radiology expertise is scarce, leading to incorrect segmentation and quantification. In addition, the number of acquired MRI scans in Africa is typically small. To address these challenges, the BraTS-Lighthouse 2025 Challenge focuses on robust tumor segmentation in sub-Saharan Africa (SSA), where resource constraints and image quality degradation introduce significant shifts. In this study, we present EMedNeXt -- an enhanced brain tumor segmentation framework based on MedNeXt V2 with deep supervision and optimized post-processing pipelines tailored for SSA. EMedNeXt introduces three key contributions: a larger region of interest, an improved nnU-Net v2-based architectural skeleton, and a robust model ensembling system. Evaluated on the hidden validation set, our solution achieved an average LesionWise DSC of 0.897 with an average LesionWise NSD of 0.541 and 0.84 at a tolerance of 0.5 mm and 1.0 mm, respectively.
[21] arXiv:2507.23266 [pdf, html, other]: Title: CUHK-EE Systems for the vTAD Challenge at NCMMSC 2025

Aemon Yat Fei Chiu, Jingyu Li, Yusheng Tian, Guangyan Zhang, Tan Lee

Comments: Under review

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

This paper presents the Voice Timbre Attribute Detection (vTAD) systems developed by the Digital Signal Processing & Speech Technology Laboratory (DSP&STL) of the Department of Electronic Engineering (EE) at The Chinese University of Hong Kong (CUHK) for the 20th National Conference on Human-Computer Speech Communication (NCMMSC 2025) vTAD Challenge. The proposed systems leverage WavLM-Large embeddings with attentive statistical pooling to extract robust speaker representations, followed by two variants of Diff-Net, i.e., Feed-Forward Neural Network (FFN) and Squeeze-and-Excitation-enhanced Residual FFN (SE-ResFFN), to compare timbre attribute intensities between utterance pairs. Experimental results demonstrate that the WavLM-Large+FFN system generalises better to unseen speakers, achieving 77.96% accuracy and 21.79% EER, while the WavLM-Large+SE-ResFFN model excels in the 'Seen' setting with 94.42% accuracy and 5.49% EER. These findings highlight a trade-off between model complexity and generalisation, and underscore the importance of architectural choices in fine-grained speaker modelling. Our analysis also reveals the impact of speaker identity, annotation subjectivity, and data imbalance on system performance, pointing to future directions for improving robustness and fairness in timbre attribute detection.
[22] arXiv:2507.23280 [pdf, other]: Title: Data-Driven Stochastic Control via Non-i.i.d. Trajectories: Foundations and Guarantees

Abolfazl Lavaei

Subjects: Systems and Control (eess.SY)

This work establishes a crucial step toward advancing data-driven trajectory-based methods for stochastic systems with unknown mathematical dynamics. In contrast to scenario-based approaches that rely on independent and identically distributed (i.i.d.) trajectories, this work develops a data-driven framework where each trajectory is gathered over a finite horizon and exhibits temporal dependence-referred to as a non-i.i.d. trajectory. To ensure safety of dynamical systems using such trajectories, the current body of literature primarily considers dynamics subject to unknown-but-bounded disturbances, which facilitates robust analysis. While promising, such bounds may be violated in practice and the resulting worst-case robust analysis tends to be overly conservative. To overcome these fundamental challenges, this paper considers stochastic systems with unknown mathematical dynamics, influenced by process noise with unknown distributions. In the proposed framework, data is collected from stochastic systems under multiple realizations within a finite-horizon experiment, where each realization generates a non-i.i.d. trajectory. Leveraging the concept of stochastic control barrier certificates constructed from data, this work quantifies probabilistic safety guarantees with a certified confidence level. To achieve this, the proposed conditions are formulated as sum-of-squares (SOS) optimization problems, relying solely on empirical average of the collected trajectories and statistical features of the process noise. The efficacy of the approach has been validated on three stochastic benchmarks with both unknown models and noise distributions. In one case study, it is shown that while no safety controller exists for the robust analysis of the system under bounded disturbances, the proposed stochastic framework offers a safety controller with guaranteed probabilistic satisfaction.
[23] arXiv:2507.23308 [pdf, html, other]: Title: A Framework for Ethical Decision-Making in Automated Vehicles through Human Reasons-based Supervision

Lucas Elbert Suryana, Saeed Rahmani, Simeon Craig Calvert, Arkady Zgonnikov, Bart van Arem

Comments: 7 pages, 4 figures

Subjects: Systems and Control (eess.SY)

Ethical dilemmas are a common challenge in everyday driving, requiring human drivers to balance competing priorities such as safety, efficiency, and rule compliance. However, much of the existing research in automated vehicles (AVs) has focused on high-stakes "trolley problems," which involve extreme and rare situations. Such scenarios, though rich in ethical implications, are rarely applicable in real-world AV decision-making. In practice, when AVs confront everyday ethical dilemmas, they often appear to prioritise strict adherence to traffic rules. By contrast, human drivers may bend the rules in context-specific situations, using judgement informed by practical concerns such as safety and efficiency. According to the concept of meaningful human control, AVs should respond to human reasons, including those of drivers, vulnerable road users, and policymakers. This work introduces a novel human reasons-based supervision framework that detects when AV behaviour misaligns with expected human reasons to trigger trajectory reconsideration. The framework integrates with motion planning and control systems to support real-time adaptation, enabling decisions that better reflect safety, efficiency, and regulatory considerations. Simulation results demonstrate that this approach could help AVs respond more effectively to ethical challenges in dynamic driving environments by prompting replanning when the current trajectory fails to align with human reasons. These findings suggest that our approach offers a path toward more adaptable, human-centered decision-making in AVs.
[24] arXiv:2507.23359 [pdf, html, other]: Title: Pixel Embedding Method for Tubular Neurite Segmentation

Huayu Fu, Jiamin Li, Haozhi Qu, Xiaolin Hu, Zengcai Guo

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)

Automatic segmentation of neuronal topology is critical for handling large scale neuroimaging data, as it can greatly accelerate neuron annotation and analysis. However, the intricate morphology of neuronal branches and the occlusions among fibers pose significant challenges for deep learning based segmentation. To address these issues, we propose an improved framework: First, we introduce a deep network that outputs pixel level embedding vectors and design a corresponding loss function, enabling the learned features to effectively distinguish different neuronal connections within occluded regions. Second, building on this model, we develop an end to end pipeline that directly maps raw neuronal images to SWC formatted neuron structure trees. Finally, recognizing that existing evaluation metrics fail to fully capture segmentation accuracy, we propose a novel topological assessment metric to more appropriately quantify the quality of neuron segmentation and reconstruction. Experiments on our fMOST imaging dataset demonstrate that, compared to several classical methods, our approach significantly reduces the error rate in neuronal topology reconstruction.
[25] arXiv:2507.23381 [pdf, other]: Title: A Secure Full-Duplex Wireless Circulator enabled by Non-Reciprocal Beyond-Diagonal RIS

Ziang Liu, Bruno Clerckx

Comments: Submitted for IEEE journal

Subjects: Signal Processing (eess.SP)

Beyond-diagonal reconfigurable intelligent surface (BD-RIS) has arisen as a promising technology for enhancing wireless communication systems by enabling flexible and intelligent wave manipulation. This is achieved through the interconnections among the ports of the impedance network, enabling wave reconfiguration when they flow through the surface. Thus, the output wave at one port depends on waves impinging on neighboring ports, allowing non-local control of both phase and magnitude. Non-reciprocal (NR)-BD-RIS further enhances this capability by breaking circuit reciprocity and, consequently, channel reciprocity. This feature potentially benefits communication among non-aligned transceivers. This paper introduces a novel application of NR-BD-RIS in full-duplex (FD) wireless circulators, where multiple FD devices communicate via an NR-BD-RIS. This system is particularly beneficial for secure transmission, as it enforces one-way communication among FD devices, suppresses signal from all other users, and thus prevents eavesdropping. In addition, a physics-compliant system model is considered by incorporating structural scattering, also known as specular reflection. By accounting for this effect, the advantages of NR-BD-RIS are further validated. Specifically, we formulate an all-user sum-rate maximization problem and propose an iterative optimization algorithm that employs block coordinate descent (BCD) and penalty dual decomposition (PDD) methods. Numerical evaluations illustrate that NR-BD-RIS consistently outperforms reciprocal (R)-BD-RIS and conventional diagonal (D)-RIS in terms of sum-rate performance, particularly when more than two impinging and reflection directions need to be supported. By analyzing the power of signals from all other users and the beampatterns, we show that secure transmission can be achieved.
[26] arXiv:2507.23396 [pdf, html, other]: Title: Energy management and flexibility quantification in a discrete event distribution grid simulation

Sebastian Peter, Daniel Feismann, Johannes Bao, Thomas Oberließen, Christian Rehtanz

Comments: 6 pages, 5 figures, part of PowerTech conference proceedings

Subjects: Systems and Control (eess.SY)

Distribution grid operation faces new challenges caused by a rising share of renewable energy sources and the introduction of additional types of loads to the grid. With the increasing adoption of distributed generation and emerging prosumer households, Energy Management Systems, which manage and apply flexibility of connected devices, are gaining popularity. While potentially beneficial to grid capacity, strategic energy management also adds to the complexity of distribution grid operation and planning processes. Novel approaches of time-series-based planning likewise face increasingly complex simulation scenarios and rising computational cost. Discrete event modelling helps facilitating simulations of such scenarios by restraining computation to the most relevant points in simulation time. We provide an enhancement of a discrete event distribution grid simulation software that offers fast implementation and testing of energy management algorithms, embedded into a feature-rich simulation environment. Physical models are specified using the Discrete Event System Specification. Furthermore, we contribute a communication protocol that makes use of the discrete event paradigm by only computing flexibility potential when necessary.
[27] arXiv:2507.23398 [pdf, html, other]: Title: Smart Video Capsule Endoscopy: Raw Image-Based Localization for Enhanced GI Tract Investigation

Oliver Bause, Julia Werner, Paul Palomero Bernardo, Oliver Bringmann

Comments: Accepted at the 32nd International Conference on Neural Information Processing - ICONIP 2025

Subjects: Image and Video Processing (eess.IV); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV)

For many real-world applications involving low-power sensor edge devices deep neural networks used for image classification might not be suitable. This is due to their typically large model size and require- ment of operations often exceeding the capabilities of such resource lim- ited devices. Furthermore, camera sensors usually capture images with a Bayer color filter applied, which are subsequently converted to RGB images that are commonly used for neural network training. However, on resource-constrained devices, such conversions demands their share of energy and optimally should be skipped if possible. This work ad- dresses the need for hardware-suitable AI targeting sensor edge devices by means of the Video Capsule Endoscopy, an important medical proce- dure for the investigation of the small intestine, which is strongly limited by its battery lifetime. Accurate organ classification is performed with a final accuracy of 93.06% evaluated directly on Bayer images involv- ing a CNN with only 63,000 parameters and time-series analysis in the form of Viterbi decoding. Finally, the process of capturing images with a camera and raw image processing is demonstrated with a customized PULPissimo System-on-Chip with a RISC-V core and an ultra-low power hardware accelerator providing an energy-efficient AI-based image clas- sification approach requiring just 5.31 {\mu}J per image. As a result, it is possible to save an average of 89.9% of energy before entering the small intestine compared to classic video capsules.
[28] arXiv:2507.23401 [pdf, html, other]: Title: Advancing Standard Load Profiles with Data-Driven Techniques and Recent Datasets

Jawana Gabrielski, Ulf Häger

Comments: 6 pages, 11 figures, part of 2024 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm) proceedings

Journal-ref: 2024 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), year:2024, pages:53-58

Subjects: Systems and Control (eess.SY)

Estimating electricity consumption accurately is essential for the planning and operation of energy systems, as well as for billing processes. Standard Load Profiles (SLP) are widely used to estimate consumption patterns of different user groups. However, in Germany these SLP were formulated using historical data from over 20 years ago and have not been adjusted since. Changing electricity consumption behaviour, which leads to increasing deviations between load patterns and SLP, results in a need for a revision taking into account new data. The growing number of smart meters provides a large measurement database, which enables more accurate load modelling. This paper creates updated SLP using recent data. In addition, the assumptions of the SLP method are validated and improvements are proposed, taking into account the ease of applicability. Furthermore, a Fourier Series-based model is proposed as an alternative SLP model. The different models are compared and evaluated.
[29] arXiv:2507.23489 [pdf, html, other]: Title: Distributionally Robust Cascading Risk Quantification in Multi-Agent Rendezvous: Effects of Time Delay and Network Connectivity

Vivek Pandey, Nader Motee

Subjects: Systems and Control (eess.SY)

Achieving safety in autonomous multi-agent systems, particularly in time-critical tasks like rendezvous, is a critical challenge. In this paper, we propose a distributionally robust risk framework for analyzing cascading failures in multi-agent rendezvous. To capture the complex interactions between network connectivity, system dynamics, and communication delays, we use a time-delayed network model as a benchmark. We introduce a conditional distributionally robust functional to quantify cascading effects between agents, utilizing a bi-variate normal distribution. Our approach yields closed-form risk expressions that reveal the impact of time delay, noise statistics, communication topology, and failure modes on rendezvous risk. The insights derived inform the design of resilient networks that mitigate the risk of cascading failures. We validate our theoretical results through extensive simulations, demonstrating the effectiveness of our framework.
[30] arXiv:2507.23511 [pdf, html, other]: Title: MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks

Yadong Niu, Tianzi Wang, Heinrich Dinkel, Xingwei Sun, Jiahao Zhou, Gang Li, Jizhong Liu, Xunying Liu, Junbo Zhang, Jian Luan

Comments: 9 main pages, 5 figures, 3 tables, and 14 appendix pages

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)

While large audio-language models have advanced open-ended audio understanding, they still fall short of nuanced human-level comprehension. This gap persists largely because current benchmarks, limited by data annotations and evaluation metrics, fail to reliably distinguish between generic and highly detailed model outputs. To this end, this work introduces MECAT, a Multi-Expert Constructed Benchmark for Fine-Grained Audio Understanding Tasks. Generated via a pipeline that integrates analysis from specialized expert models with Chain-of-Thought large language model reasoning, MECAT provides multi-perspective, fine-grained captions and open-set question-answering pairs. The benchmark is complemented by a novel metric: DATE (Discriminative-Enhanced Audio Text Evaluation). This metric penalizes generic terms and rewards detailed descriptions by combining single-sample semantic similarity with cross-sample discriminability. A comprehensive evaluation of state-of-the-art audio models is also presented, providing new insights into their current capabilities and limitations. The data and code are available at this https URL
[31] arXiv:2507.23518 [pdf, html, other]: Title: EVMx: An FPGA-Based Smart Contract Processing Unit

Joel Poncha Lemayian, Hachem Bensalem, Ghyslain Gagnon, Kaiwen Zhang, Pascal Giard

Comments: 6 pages

Subjects: Signal Processing (eess.SP)

Ethereum blockchain uses smart contracts (SCs) to implement decentralized applications (dApps). SCs are executed by the Ethereum virtual machine (EVM) running within an Ethereum client. Moreover, the EVM has been widely adopted by other blockchain platforms, including Solana, Cardano, Avalanche, Polkadot, and more. However, the EVM performance is limited by the constraints of the general-purpose computer it operates on. This work proposes offloading SC execution onto a dedicated hardware-based EVM. Specifically, EVMx is an FPGA-based SC execution engine that benefits from the inherent parallelism and high-speed processing capabilities of a hardware architecture. Synthesis results demonstrate a reduction in execution time of 61% to 99% for commonly used operation codes compared to CPU-based SC execution environments. Moreover, the execution time of Ethereum blocks on EVMx is up to 6x faster compared to analogous works in the literature. These results highlight the potential of the proposed architecture to accelerate SC execution and enhance the performance of EVM-compatible blockchains.
[32] arXiv:2507.23521 [pdf, html, other]: Title: JPEG Processing Neural Operator for Backward-Compatible Coding

Woo Kyoung Han, Yongjun Lee, Byeonghun Lee, Sang Hyun Park, Sunghoon Im, Kyong Hwan Jin

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Despite significant advances in learning-based lossy compression algorithms, standardizing codecs remains a critical challenge. In this paper, we present the JPEG Processing Neural Operator (JPNeO), a next-generation JPEG algorithm that maintains full backward compatibility with the current JPEG format. Our JPNeO improves chroma component preservation and enhances reconstruction fidelity compared to existing artifact removal methods by incorporating neural operators in both the encoding and decoding stages. JPNeO achieves practical benefits in terms of reduced memory usage and parameter count. We further validate our hypothesis about the existence of a space with high mutual information through empirical evidence. In summary, the JPNeO functions as a high-performance out-of-the-box image compression pipeline without changing source coding's protocol. Our source code is available at this https URL.
[33] arXiv:2507.23526 [pdf, html, other]: Title: Channel Estimation for 6G Near-Field Wireless Communications: A Comprehensive Survey

Wen-Xuan Long, Shengyu Ye, Marco Moretti, Michele Morelli, Luca Sanguinetti, Rui Chen, Cheng-Xiang Wang

Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

The sixth-generation (6G) wireless systems are expected to adopt extremely large aperture arrays (ELAAs), novel antenna architectures, and operate in extremely high-frequency bands to meet growing data demands. ELAAs significantly increase the number of antennas, enabling finer spatial resolution and improved beamforming. At high frequencies, ELAAs shift communication from the conventional far-field to near-field regime, where spherical wavefronts dominate and the channel response depends on both angle and distance, increasing channel dimensionality. Conventional far-field channel estimation methods, which rely on angular information, struggle in near-field scenarios due to increased pilot overhead and computational complexity. This paper presents a comprehensive survey of recent advances in near-field channel estimation. It first defines the near- and far-field boundary from an electromagnetic perspective and discusses key propagation differences, alongside a brief review of ELAA developments. Then, it introduces mainstream near-field channel models and compares them with far-field models. Major estimation techniques are reviewed under different configurations (single/multi-user, single/multi-carrier), including both direct estimation and RIS-assisted cascaded estimation. These techniques reveal trade-offs among estimation accuracy, complexity, and overhead. This survey aims to provide insights and foundations for efficient and scalable near-field channel estimation in 6G systems, while identifying key challenges and future research directions.
[34] arXiv:2507.23570 [pdf, html, other]: Title: Multiple-Parameter Graph Fractional Fourier Transform: Theory and Applications

Manjun Cui, Zhichao Zhang, Wei Yao

Subjects: Signal Processing (eess.SP)

The graph fractional Fourier transform (GFRFT) applies a single global fractional order to all graph frequencies, which restricts its adaptability to diverse signal characteristics across the spectral domain. To address this limitation, in this paper, we propose two types of multiple-parameter GFRFTs (MPGFRFTs) and establish their corresponding theoretical frameworks. We design a spectral compression strategy tailored for ultra-low compression ratios, effectively preserving essential information even under extreme dimensionality reduction. To enhance flexibility, we introduce a learnable order vector scheme that enables adaptive compression and denoising, demonstrating strong performance on both graph signals and images. We explore the application of MPGFRFTs to image encryption and decryption. Experimental results validate the versatility and superior performance of the proposed MPGFRFT framework across various graph signal processing tasks.
[35] arXiv:2507.23571 [pdf, other]: Title: Asynchronous Grid Connections Providing Fast-Frequency Response: System Integration Study

Felix Wald, Amir Sajadi, Barry Mather, Giovanni De Carne

Subjects: Systems and Control (eess.SY)

This paper presents an integration study for a power electronic-based fast-frequency response technology, an asynchronous grid connection operating as an aggregator for behindthe-meter resources and distributed generators. Both technical feasibility and techno-economic viability studies are presented. The dynamic performance of the fast-frequency response enabled by the asynchronous grid connection is validated with Power Hardware-in-the-Loop experiments and transferred to an IEEE 9-bus system in DigSilent PowerFactory for dynamic stability analysis. We demonstrate that droop-based control enhancements to the local distributed generators could allow their aggregation to provide grid-supporting functionalities and participate in the market for ancillary services. To this end, we performed a long-term simulation embedding the system within the ancillary service market framework of PJM. The fast-frequency response regulation is subsequently used to calculate the potential revenue and project the results on a 15-year investment horizon. Finally, the techno-economic analysis concludes with recommendations for enhancements to access the full potential of distributed generators on a technical and regulatory level.
[36] arXiv:2507.23591 [pdf, other]: Title: Tensor-based reduction of linear parameter-varying state-space models

Bogoljub Terzin, E. Javier Olucha, Amritam Das, Siep Weiland, Roland Tóth

Subjects: Systems and Control (eess.SY)

The Linear Parameter-Varying (LPV) framework is a powerful tool for controlling nonlinear and complex systems, but the conversion of nonlinear models into LPV forms often results in high-dimensional and overly conservative LPV models. To be able to apply control strategies, there is often a need for model reduction in order to reduce computational needs. This paper presents the first systematic approach for the joint reduction of state order and scheduling signal dimension of LPV state space models. The existing methods typically address these reductions separately. By formulating a tensorial form of LPV models with an affine dependency on the scheduling variables, we leverage tensor decomposition to find the dominant components of state and scheduling subspaces. We extend the common Petrov-Galerkin projection approach to LPV framework by adding a scheduling projection. This extension enables the joint reduction. To find suitable subspaces for the extended Petrov-Galerkin projection, we have developed two different methods: tensor-based LPV moment matching, and an approach through Proper Orthogonal Decomposition. Advantages of the proposed methods are demonstrated on two different series-interconnected mass-spring-damper systems with nonlinear springs: one primarily used for comparison with other methods and a more elaborate higher-order model designed to assess scalability.
[37] arXiv:2507.23648 [pdf, html, other]: Title: Towards Field-Ready AI-based Malaria Diagnosis: A Continual Learning Approach

Louise Guillon, Soheib Biga, Yendoube E. Kantchire, Mouhamadou Lamine Sane, Grégoire Pasquier, Kossi Yakpa, Stéphane E. Sossou, Marc Thellier, Laurent Bonnardot, Laurence Lachaud, Renaud Piarroux, Ameyo M. Dorkenoo

Comments: MICCAI 2025 AMAI Workshop, Accepted, Submitted Manuscript Version

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Malaria remains a major global health challenge, particularly in low-resource settings where access to expert microscopy may be limited. Deep learning-based computer-aided diagnosis (CAD) systems have been developed and demonstrate promising performance on thin blood smear images. However, their clinical deployment may be hindered by limited generalization across sites with varying conditions. Yet very few practical solutions have been proposed. In this work, we investigate continual learning (CL) as a strategy to enhance the robustness of malaria CAD models to domain shifts. We frame the problem as a domain-incremental learning scenario, where a YOLO-based object detector must adapt to new acquisition sites while retaining performance on previously seen domains. We evaluate four CL strategies, two rehearsal-based and two regularization-based methods, on real-life conditions thanks to a multi-site clinical dataset of thin blood smear images. Our results suggest that CL, and rehearsal-based methods in particular, can significantly improve performance. These findings highlight the potential of continual learning to support the development of deployable, field-ready CAD tools for malaria.
[38] arXiv:2507.23695 [pdf, html, other]: Title: On the Achievable Rate of Satellite Quantum Communication Channel using Deep Autoencoder Gaussian Mixture Model

Mouli Chakraborty, Subhash Chandra, Avishek Nag, Anshu Mukherjee

Subjects: Signal Processing (eess.SP)

We present a comparative study of the Gaussian mixture model (GMM) and the Deep Autoencoder Gaussian Mixture Model (DAGMM) for estimating satellite quantum channel capacity, considering hybrid quantum noise (HQN) and transmission constraints. While GMM is simple and interpretable, DAGMM better captures non-linear variations and noise distributions. Simulations show that DAGMM provides tighter capacity bounds and improved clustering. This introduces the Deep Cluster Gaussian Mixture Model (DCGMM) for high-dimensional quantum data analysis in quantum satellite communication.
[39] arXiv:2507.23707 [pdf, html, other]: Title: Cellular, Cell-less, and Everything in Between: A Unified Framework for Utility Region Analysis in Wireless Networks

Renato Luis Garrido Cavalcante, Tomasz Piotrowski, Slawomir Stanczak

Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

We introduce a unified framework for analyzing utility regions of wireless networks, with a focus on the signal-to-interference-noise-ratio (SINR) and achievable rate regions. The framework provides valuable insights into interference patterns of modern network architectures, such as cell-less and extremely large MIMO networks, and it generalizes existing characterizations of the weak Pareto boundary. A central contribution is the derivation of sufficient conditions that guarantee convexity of the utility regions. Convexity is an important property because it ensures that time sharing (or user grouping) cannot simultaneously increase the utility of all users when the network operates on the weak Pareto boundary. These sufficient conditions also have two key implications. First, they identify a family of (weighted) sum-rate maximization problems that are inherently convex without any variable transformations, thus paving the way for the development of efficient, provably optimal solvers for this family. Second, they provide a rigorous justification for formulating sum-rate maximization problems directly in terms of achievable rates, rather than SINR levels. Our theoretical insights also motivate an alternative to the concept of favorable propagation in the massive MIMO literature -- one that explicitly accounts for self-interference and the beamforming strategy.
[40] arXiv:2507.23746 [pdf, html, other]: Title: Real-Time Transmission of Uncompressed High-Definition Video Via A VCSEL-Based Optical Wireless Link With Ultra-Low Latency

Hossein Kazemi, Isaac N. O. Osahon, Tiankuo Jiao, David Butler, Nikolay Ledentsov Jr., Ilya Titkov, Nikolay Ledentsov, Harald Haas

Comments: 8 pages, 6 figures, 2 tables

Subjects: Signal Processing (eess.SP)

Real-time transmission of high-resolution video signals in an uncompressed and unencrypted format requires an ultra-reliable and low-latency communications (URLLC) medium with high bandwidth to maintain the quality of experience (QoE) for users. We put forward the design and experimental demonstration of a high-performance laser-based optical wireless communication (OWC) system that enables high-definition (HD) video transmission with submillisecond latencies. The serial digital interface (SDI) output of a camera is used to transmit the live video stream over an optical wireless link by directly modulating the SDI signal on the intensity of a 940 nm vertical cavity surface emitting laser (VCSEL). The proposed SDI over light fidelity (LiFi) system corroborates error-free transmission of full HD (FHD) and 4K ultra-high-definition (UHD) resolutions at data rates of 2.97 Gb/s and 5.94 Gb/s, respectively, with a measured end-to-end latency of under 35 ns. Since SDI standards support various video formats and VCSELs are high-bandwidth and low-power devices, this presents a scalable and inexpensive solution for wireless connectivity between professional broadcast equipment using off-the-shelf SDI components.
[41] arXiv:2507.23763 [pdf, html, other]: Title: Topology Optimization in Medical Image Segmentation with Fast Euler Characteristic

Liu Li, Qiang Ma, Cheng Ouyang, Johannes C. Paetzold, Daniel Rueckert, Bernhard Kainz

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Deep learning-based medical image segmentation techniques have shown promising results when evaluated based on conventional metrics such as the Dice score or Intersection-over-Union. However, these fully automatic methods often fail to meet clinically acceptable accuracy, especially when topological constraints should be observed, e.g., continuous boundaries or closed surfaces. In medical image segmentation, the correctness of a segmentation in terms of the required topological genus sometimes is even more important than the pixel-wise accuracy. Existing topology-aware approaches commonly estimate and constrain the topological structure via the concept of persistent homology (PH). However, these methods are difficult to implement for high dimensional data due to their polynomial computational complexity. To overcome this problem, we propose a novel and fast approach for topology-aware segmentation based on the Euler Characteristic ($\chi$). First, we propose a fast formulation for $\chi$ computation in both 2D and 3D. The scalar $\chi$ error between the prediction and ground-truth serves as the topological evaluation metric. Then we estimate the spatial topology correctness of any segmentation network via a so-called topological violation map, i.e., a detailed map that highlights regions with $\chi$ errors. Finally, the segmentation results from the arbitrary network are refined based on the topological violation maps by a topology-aware correction network. Our experiments are conducted on both 2D and 3D datasets and show that our method can significantly improve topological correctness while preserving pixel-wise segmentation accuracy.

[42] arXiv:2507.22954 (cross-list from cs.LG) [pdf, html, other]: Title: Neural Autoregressive Modeling of Brain Aging

Ridvan Yesiloglu, Wei Peng, Md Tauhidul Islam, Ehsan Adeli

Comments: Accepted at Deep Generative Models Workshop @ MICCAI 2025

Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Neurons and Cognition (q-bio.NC)

Brain aging synthesis is a critical task with broad applications in clinical and computational neuroscience. The ability to predict the future structural evolution of a subject's brain from an earlier MRI scan provides valuable insights into aging trajectories. Yet, the high-dimensionality of data, subtle changes of structure across ages, and subject-specific patterns constitute challenges in the synthesis of the aging brain. To overcome these challenges, we propose NeuroAR, a novel brain aging simulation model based on generative autoregressive transformers. NeuroAR synthesizes the aging brain by autoregressively estimating the discrete token maps of a future scan from a convenient space of concatenated token embeddings of a previous and future scan. To guide the generation, it concatenates into each scale the subject's previous scan, and uses its acquisition age and the target age at each block via cross-attention. We evaluate our approach on both the elderly population and adolescent subjects, demonstrating superior performance over state-of-the-art generative models, including latent diffusion models (LDM) and generative adversarial networks, in terms of image fidelity. Furthermore, we employ a pre-trained age predictor to further validate the consistency and realism of the synthesized images with respect to expected aging patterns. NeuroAR significantly outperforms key models, including LDM, demonstrating its ability to model subject-specific brain aging trajectories with high fidelity.
[43] arXiv:2507.23010 (cross-list from cs.LG) [pdf, html, other]: Title: Investigating the Invertibility of Multimodal Latent Spaces: Limitations of Optimization-Based Methods

Siwoo Park

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)

This paper investigates the inverse capabilities and broader utility of multimodal latent spaces within task-specific AI (Artificial Intelligence) models. While these models excel at their designed forward tasks (e.g., text-to-image generation, audio-to-text transcription), their potential for inverse mappings remains largely unexplored. We propose an optimization-based framework to infer input characteristics from desired outputs, applying it bidirectionally across Text-Image (BLIP, Flux.1-dev) and Text-Audio (Whisper-Large-V3, Chatterbox-TTS) modalities.
Our central hypothesis posits that while optimization can guide models towards inverse tasks, their multimodal latent spaces will not consistently support semantically meaningful and perceptually coherent inverse mappings. Experimental results consistently validate this hypothesis. We demonstrate that while optimization can force models to produce outputs that align textually with targets (e.g., a text-to-image model generating an image that an image captioning model describes correctly, or an ASR model transcribing optimized audio accurately), the perceptual quality of these inversions is chaotic and incoherent. Furthermore, when attempting to infer the original semantic input from generative models, the reconstructed latent space embeddings frequently lack semantic interpretability, aligning with nonsensical vocabulary tokens.
These findings highlight a critical limitation. multimodal latent spaces, primarily optimized for specific forward tasks, do not inherently possess the structure required for robust and interpretable inverse mappings. Our work underscores the need for further research into developing truly semantically rich and invertible multimodal latent spaces.
[44] arXiv:2507.23091 (cross-list from cs.AI) [pdf, other]: Title: Moravec's Paradox: Towards an Auditory Turing Test

David Noever, Forrest McKee

Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

This research work demonstrates that current AI systems fail catastrophically on auditory tasks that humans perform effortlessly. Drawing inspiration from Moravec's paradox (i.e., tasks simple for humans often prove difficult for machines, and vice versa), we introduce an auditory Turing test comprising 917 challenges across seven categories: overlapping speech, speech in noise, temporal distortion, spatial audio, coffee-shop noise, phone distortion, and perceptual illusions. Our evaluation of state-of-the-art audio models including GPT-4's audio capabilities and OpenAI's Whisper reveals a striking failure rate exceeding 93%, with even the best-performing model achieving only 6.9% accuracy on tasks that humans solved at 7.5 times higher success (52%). These results expose focusing failures in how AI systems process complex auditory scenes, particularly in selective attention, noise robustness, and contextual adaptation. Our benchmark not only quantifies the human-machine auditory gap but also provides insights into why these failures occur, suggesting that current architectures lack fundamental mechanisms for human-like auditory scene analysis. The traditional design of audio CAPTCHAs highlights common filters that humans evolved but machines fail to select in multimodal language models. This work establishes a diagnostic framework for measuring progress toward human-level machine listening and highlights the need for novel approaches integrating selective attention, physics-based audio understanding, and context-aware perception into multimodal AI systems.
[45] arXiv:2507.23149 (cross-list from cs.GT) [pdf, html, other]: Title: Learning with Episodic Hypothesis Testing in General Games: A Framework for Equilibrium Selection

Ruifan Yang, Manxi Wu

Subjects: Computer Science and Game Theory (cs.GT); Systems and Control (eess.SY)

We introduce a new hypothesis testing-based learning dynamics in which players update their strategies by combining hypothesis testing with utility-driven exploration. In this dynamics, each player forms beliefs about opponents' strategies and episodically tests these beliefs using empirical observations. Beliefs are resampled either when the hypothesis test is rejected or through exploration, where the probability of exploration decreases with the player's (transformed) utility. In general finite normal-form games, we show that the learning process converges to a set of approximate Nash equilibria and, more importantly, to a refinement that selects equilibria maximizing the minimum (transformed) utility across all players. Our result establishes convergence to equilibrium in general finite games and reveals a novel mechanism for equilibrium selection induced by the structure of the learning dynamics.
[46] arXiv:2507.23185 (cross-list from cs.CV) [pdf, html, other]: Title: Single Image Rain Streak Removal Using Harris Corner Loss and R-CBAM Network

Jongwook Si, Sungyoung Kim

Comments: 21 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

The problem of single-image rain streak removal goes beyond simple noise suppression, requiring the simultaneous preservation of fine structural details and overall visual quality. In this study, we propose a novel image restoration network that effectively constrains the restoration process by introducing a Corner Loss, which prevents the loss of object boundaries and detailed texture information during restoration. Furthermore, we propose a Residual Convolutional Block Attention Module (R-CBAM) Block into the encoder and decoder to dynamically adjust the importance of features in both spatial and channel dimensions, enabling the network to focus more effectively on regions heavily affected by rain streaks. Quantitative evaluations conducted on the Rain100L and Rain100H datasets demonstrate that the proposed method significantly outperforms previous approaches, achieving a PSNR of 33.29 dB on Rain100L and 26.16 dB on Rain100H.
[47] arXiv:2507.23200 (cross-list from cs.IT) [pdf, html, other]: Title: Efficient DFT of Zadoff-Chu Sequences using lmFH Pattern

Fanping Du

Comments: 8 pages, 7 figures

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Having established that Zadoff-Chu (ZC) sequences are inherently linear micro-frequency hopping (lmFH) symbols, this paper first presents an intuitive and visual exposition of the computation of the DFT and IDFT of ZC sequences using the lmFH pattern. This yields interesting results. Subsequently, an alternative form for computing the cumulative sum of ZC sequences using the Generalized Quadratic Gauss Sum is introduced. Furthermore, building on the micro-frequency hopping (mFH) concept, this paper shows that the DFT of ZC sequences can be transformed into an lmFH symbol with frequency shift and phase offset. Therefore, the DFT of ZC sequences can be computed via cumulative frequency points, similar to the computation of normal mFH symbols.
[48] arXiv:2507.23234 (cross-list from cs.IT) [pdf, html, other]: Title: Secure Integrated Sensing and Communication Networks: Stochastic Performance Analysis

Marziyeh Soltani, Mahtab Mirmohseni, Rahim Tafazolli

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

This paper analyzes the stochastic security performance of a multiple-input multiple-output (MIMO) integrated sensing and communication (ISAC) system in a downlink scenario. A base station (BS) transmits a multi-functional signal to simultaneously communicate with a user, sense a target's angular location, and counteract eavesdropping threats. The attack model considers a passive single-antenna communication eavesdropper intercepting communication data, as well as a multi-antenna sensing eavesdropper attempting to infer the target's location. We also consider a malicious target scenario where the target plays the role of the communication eavesdropper. The BS-user and BS-eavesdroppers channels follow Rayleigh fading, while the target's azimuth angle is uniformly distributed. To evaluate the performance in this random network, we derive the ergodic secrecy rate (ESR) and the ergodic Cramer-Rao lower bound (CRB), for target localization, at both the BS and the sensing eavesdropper. This involves computing the probability density functions (PDFs) of the signal-to-noise ratio (SNR) and CRB, leveraging the central limit theorem for tractability. We characterize the boundary of the CRB-secrecy rate region, and interpret the performance tradeoffs between communication and sensing while guaranteeing a level of security and privacy in the random ISAC networks.
[49] arXiv:2507.23270 (cross-list from cs.RO) [pdf, html, other]: Title: Simulation-based planning of Motion Sequences for Automated Procedure Optimization in Multi-Robot Assembly Cells

Loris Schneider, Marc Ungen, Elias Huber, Jan-Felix Klein

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Reconfigurable multi-robot cells offer a promising approach to meet fluctuating assembly demands. However, the recurrent planning of their configurations introduces new challenges, particularly in generating optimized, coordinated multi-robot motion sequences that minimize the assembly duration. This work presents a simulation-based method for generating such optimized sequences. The approach separates assembly steps into task-related core operations and connecting traverse operations. While core operations are constrained and predetermined, traverse operations offer substantial optimization potential. Scheduling the core operations is formulated as an optimization problem, requiring feasible traverse operations to be integrated using a decomposition-based motion planning strategy. Several solution techniques are explored, including a sampling heuristic, tree-based search and gradient-free optimization. For motion planning, a decomposition method is proposed that identifies specific areas in the schedule, which can be solved independently with modified centralized path planning algorithms. The proposed method generates efficient and collision-free multi-robot assembly procedures that outperform a baseline relying on decentralized, robot-individual motion planning. Its effectiveness is demonstrated through simulation experiments.
[50] arXiv:2507.23292 (cross-list from cs.LG) [pdf, html, other]: Title: SequenceLayers: Sequence Processing and Streaming Neural Networks Made Easy

RJ Skerry-Ryan, Julian Salazar, Soroosh Mariooryad, David Kao, Daisy Stanton, Eric Battenberg, Matt Shannon, Ron J. Weiss, Robin Scheibler, Jonas Rothfuss, Tom Bagby

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Programming Languages (cs.PL); Software Engineering (cs.SE); Audio and Speech Processing (eess.AS)

We introduce a neural network layer API and library for sequence modeling, designed for easy creation of sequence models that can be executed both layer-by-layer (e.g., teacher-forced training) and step-by-step (e.g., autoregressive sampling). To achieve this, layers define an explicit representation of their state over time (e.g., a Transformer KV cache, a convolution buffer, an RNN hidden state), and a step method that evolves that state, tested to give identical results to a stateless layer-wise invocation. This and other aspects of the SequenceLayers contract enables complex models to be immediately streamable, mitigates a wide range of common bugs arising in both streaming and parallel sequence processing, and can be implemented in any deep learning library. A composable and declarative API, along with a comprehensive suite of layers and combinators, streamlines the construction of production-scale models from simple streamable components while preserving strong correctness guarantees. Our current implementations of SequenceLayers (JAX, TensorFlow 2) are available at this https URL.
[51] arXiv:2507.23296 (cross-list from cs.IT) [pdf, html, other]: Title: Exploiting Movable Elements of Intelligent Reflecting Surface for Enhancement of Integrated Sensing and Communication

Xingyu Peng, Qin Tao, Yong Liang Guan, Xiaoming Chen

Comments: 16 pages, 13 figures

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In this paper, we propose to exploit movable elements of intelligent reflecting surface (IRS) to enhance the overall performance of integrated sensing and communication (ISAC) systems. Firstly, focusing on a single-user scenario, we reveal the function of movable elements by performance analysis, and then design a joint beamforming and element position optimization scheme. Further, we extend it to a general multi-user scenario, and also propose an element position optimization scheme according to the derived performance expressions. Finally, simulation results confirm that the movement of IRS elements can improve the communication rate and the sensing accuracy, and especially broaden the coverage of ISAC.
[52] arXiv:2507.23298 (cross-list from cs.HC) [pdf, html, other]: Title: Real-time Generation of Various Types of Nodding for Avatar Attentive Listening System

Kazushi Kato, Koji Inoue, Divesh Lala, Keiko Ochi, Tatsuya Kawahara

Comments: Accepted by 27th ACM International Conference on Multimodal Interaction (ICMI '25), Long paper

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)

In human dialogue, nonverbal information such as nodding and facial expressions is as crucial as verbal information, and spoken dialogue systems are also expected to express such nonverbal behaviors. We focus on nodding, which is critical in an attentive listening system, and propose a model that predicts both its timing and type in real time. The proposed model builds on the voice activity projection (VAP) model, which predicts voice activity from both listener and speaker audio. We extend it to prediction of various types of nodding in a continuous and real-time manner unlike conventional models. In addition, the proposed model incorporates multi-task learning with verbal backchannel prediction and pretraining on general dialogue data. In the timing and type prediction task, the effectiveness of multi-task learning was significantly demonstrated. We confirmed that reducing the processing rate enables real-time operation without a substantial drop in accuracy, and integrated the model into an avatar attentive listening system. Subjective evaluations showed that it outperformed the conventional method, which always does nodding in sync with verbal backchannel. The code and trained models are available at this https URL.
[53] arXiv:2507.23339 (cross-list from cs.RO) [pdf, html, other]: Title: Learning to Drift with Individual Wheel Drive: Maneuvering Autonomous Vehicle at the Handling Limits

Yihan Zhou, Yiwen Lu, Bo Yang, Jiayun Li, Yilin Mo

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Drifting, characterized by controlled vehicle motion at high sideslip angles, is crucial for safely handling emergency scenarios at the friction limits. While recent reinforcement learning approaches show promise for drifting control, they struggle with the significant simulation-to-reality gap, as policies that perform well in simulation often fail when transferred to physical systems. In this paper, we present a reinforcement learning framework with GPU-accelerated parallel simulation and systematic domain randomization that effectively bridges the gap. The proposed approach is validated on both simulation and a custom-designed and open-sourced 1/10 scale Individual Wheel Drive (IWD) RC car platform featuring independent wheel speed control. Experiments across various scenarios from steady-state circular drifting to direction transitions and variable-curvature path following demonstrate that our approach achieves precise trajectory tracking while maintaining controlled sideslip angles throughout complex maneuvers in both simulated and real-world environments.
[54] arXiv:2507.23343 (cross-list from cs.CV) [pdf, html, other]: Title: Who is a Better Talker: Subjective and Objective Quality Assessment for AI-Generated Talking Heads

Yingjie Zhou, Jiezhang Cao, Zicheng Zhang, Farong Wen, Yanwei Jiang, Jun Jia, Xiaohong Liu, Xiongkuo Min, Guangtao Zhai

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Speech-driven methods for portraits are figuratively known as "Talkers" because of their capability to synthesize speaking mouth shapes and facial movements. Especially with the rapid development of the Text-to-Image (T2I) models, AI-Generated Talking Heads (AGTHs) have gradually become an emerging digital human media. However, challenges persist regarding the quality of these talkers and AGTHs they generate, and comprehensive studies addressing these issues remain limited. To address this gap, this paper presents the largest AGTH quality assessment dataset THQA-10K to date, which selects 12 prominent T2I models and 14 advanced talkers to generate AGTHs for 14 prompts. After excluding instances where AGTH generation is unsuccessful, the THQA-10K dataset contains 10,457 AGTHs. Then, volunteers are recruited to subjectively rate the AGTHs and give the corresponding distortion categories. In our analysis for subjective experimental results, we evaluate the performance of talkers in terms of generalizability and quality, and also expose the distortions of existing AGTHs. Finally, an objective quality assessment method based on the first frame, Y-T slice and tone-lip consistency is proposed. Experimental results show that this method can achieve state-of-the-art (SOTA) performance in AGTH quality assessment. The work is released at this https URL.
[55] arXiv:2507.23350 (cross-list from cs.RO) [pdf, html, other]: Title: Multi-Waypoint Path Planning and Motion Control for Non-holonomic Mobile Robots in Agricultural Applications

Mahmoud Ghorab, Matthias Lorenzen

Comments: 6 pages

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

There is a growing demand for autonomous mobile robots capable of navigating unstructured agricultural environments. Tasks such as weed control in meadows require efficient path planning through an unordered set of coordinates while minimizing travel distance and adhering to curvature constraints to prevent soil damage and protect vegetation. This paper presents an integrated navigation framework combining a global path planner based on the Dubins Traveling Salesman Problem (DTSP) with a Nonlinear Model Predictive Control (NMPC) strategy for local path planning and control. The DTSP generates a minimum-length, curvature-constrained path that efficiently visits all targets, while the NMPC leverages this path to compute control signals to accurately reach each waypoint. The system's performance was validated through comparative simulation analysis on real-world field datasets, demonstrating that the coupled DTSP-based planner produced smoother and shorter paths, with a reduction of about 16% in the provided scenario, compared to decoupled methods. Based thereon, the NMPC controller effectively steered the robot to the desired waypoints, while locally optimizing the trajectory and ensuring adherence to constraints. These findings demonstrate the potential of the proposed framework for efficient autonomous navigation in agricultural environments.
[56] arXiv:2507.23365 (cross-list from cs.SD) [pdf, html, other]: Title: "I made this (sort of)": Negotiating authorship, confronting fraudulence, and exploring new musical spaces with prompt-based AI music generation

Bob L. T. Sturm

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

I reflect on my experience creating two music albums centered on state-of-the-art prompt-based AI music generation platforms. The first album explicitly poses the question: What happens when I collide my junk mail with these platforms? The second album is a direct response to the first, and toys with the inability of state-of-the-art prompt-based AI music generation platforms to generate music that is not ``practiced'', ``polished'', and ``produced''. I seed a large language model (LLM) with information about these albums and have it interview me, which results in the exploration of several deeper questions: To what extent am I the author? Where am I in the resulting music? How is my musical identity changing as I am faced with machines that are in some ways far more talented than I? What new musical spaces does my work open, for me or anyone/thing else? I conclude by reflecting on my reflections, as well as LLM-mediated self-reflection as method.
[57] arXiv:2507.23419 (cross-list from cs.ET) [pdf, html, other]: Title: WiRM: Wireless Respiration Monitoring Using Conjugate Multiple Channel State Information and Fast Iterative Filtering in Wi-Fi Systems

James Rhodes, Lawrence Ong, Duy T. Ngo

Subjects: Emerging Technologies (cs.ET); Signal Processing (eess.SP)

Monitoring respiratory health with the use of channel state information (CSI) has shown promising results. Many existing methods focus on monitoring only the respiratory rate, while others focus on monitoring the motion of the chest as a patient breathes, which is referred to as the respiratory waveform. This paper presents WiRM, a two-staged approach to contactless respiration monitoring. In the first stage, WiRM improves upon existing respiratory rate estimation techniques by using conjugate multiplication for phase sanitisation and adaptive multi-trace carving (AMTC) for tracing how the respiratory rate changes over time. When compared against three state-of-the-art methods, WiRM has achieved an average reduction of $38\%$ in respiratory rate root mean squared error (RMSE). In the second stage, WiRM uses this improved respiratory rate estimate to inform the decomposition and selection of the respiratory waveform from the CSI data. Remarkably, WiRM delivers a $178.3\%$ improvement in average absolute correlation with the ground truth respiratory waveform. Within the literature, it is difficult to compare the robustness of existing algorithms in noisy environments. In this paper, we develop a purpose-built simulation toolkit to evaluate the robustness of respiration monitoring solutions under various noise conditions, including thermal, multiplicative, and phase noise. Our results show that WiRM demonstrates improved or comparable resilience to these common noise sources.
[58] arXiv:2507.23445 (cross-list from cs.RO) [pdf, html, other]: Title: Quantifying and Visualizing Sim-to-Real Gaps: Physics-Guided Regularization for Reproducibility

Yuta Kawachi

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Simulation-to-real transfer using domain randomization for robot control often relies on low-gear-ratio, backdrivable actuators, but these approaches break down when the sim-to-real gap widens. Inspired by the traditional PID controller, we reinterpret its gains as surrogates for complex, unmodeled plant dynamics. We then introduce a physics-guided gain regularization scheme that measures a robot's effective proportional gains via simple real-world experiments. Then, we penalize any deviation of a neural controller's local input-output sensitivities from these values during training. To avoid the overly conservative bias of naive domain randomization, we also condition the controller on the current plant parameters. On an off-the-shelf two-wheeled balancing robot with a 110:1 gearbox, our gain-regularized, parameter-conditioned RNN achieves angular settling times in hardware that closely match simulation. At the same time, a purely domain-randomized policy exhibits persistent oscillations and a substantial sim-to-real gap. These results demonstrate a lightweight, reproducible framework for closing sim-to-real gaps on affordable robotic hardware.
[59] arXiv:2507.23528 (cross-list from cs.IT) [pdf, html, other]: Title: Hybrid Generative Semantic and Bit Communications in Satellite Networks: Trade-offs in Latency, Generation Quality, and Computation

Chong Huang, Gaojie Chen, Jing Zhu, Qu Luo, Pei Xiao, Wei Huang, Rahim Tafazolli

Comments: 6 pages, accepted for pulication in IEEE Globecom 2025

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

As satellite communications play an increasingly important role in future wireless networks, the issue of limited link budget in satellite systems has attracted significant attention in current research. Although semantic communications emerge as a promising solution to address these constraints, it introduces the challenge of increased computational resource consumption in wireless communications. To address these challenges, we propose a multi-layer hybrid bit and generative semantic communication framework which can adapt to the dynamic satellite communication networks. Furthermore, to balance the semantic communication efficiency and performance in satellite-to-ground transmissions, we introduce a novel semantic communication efficiency metric (SEM) that evaluates the trade-offs among latency, computational consumption, and semantic reconstruction quality in the proposed framework. Moreover, we utilize a novel deep reinforcement learning (DRL) algorithm group relative policy optimization (GRPO) to optimize the resource allocation in the proposed network. Simulation results demonstrate the flexibility of our proposed transmission framework and the effectiveness of the proposed metric SEM, illustrate the relationships among various semantic communication metrics.
[60] arXiv:2507.23546 (cross-list from cs.SI) [pdf, other]: Title: Empirical cross-system meta-analysis of long-term transmission grid evolution

Bálint Hartmann, Michelle T. Cirunay

Comments: 26 pages

Subjects: Social and Information Networks (cs.SI); Systems and Control (eess.SY)

The potential of grid-side flexibility, the latent ability to reconfigure transmission network topology remains under-used partly because of the lack of empirical studies on how real-world grids evolve.
[61] arXiv:2507.23556 (cross-list from cs.NI) [pdf, html, other]: Title: Networked Physical Computing: A New Paradigm for Effective Task Completion via Hypergraph Aided Trusted Task-Resource Matching

Botao Zhu, Xianbin Wang

Journal-ref: Published in IEEE Transactions on Network Science and Engineering, 2025

Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

Due to the diverse physical attributes of computing resources and tasks, developing effective mechanisms to facilitate task and resource matching in complex connected systems for value-oriented task completion has become increasingly challenging. To address the challenge, this paper proposes a networked physical computing system that integrates the physical attributes of computing resources and tasks as well as task-specific trust relationships among devices to enable value-driven task completion. Specifically, we propose a state-of-the-art hypergraph-aided trusted task-resource matching (TTR-matching) framework to achieve the envisioned physical computing. First, a task-specific trusted physical resource hypergraph is defined, which integrates task-specific trust, the physical attributes of resources, and task types. This enables accurate modeling of device collaboration dependencies under specific task types. Next, a task hypergraph is generated to associate the task initiator with the physical attributes of the corresponding tasks. Based on these two hypergraphs, a hypergraph matching algorithm is designed to facilitate task-specific trusted collaborator selection and accurate task-resource matching for value-maximizing task completion. Extensive experimental results demonstrate that the proposed TTR-matching framework outperforms comparison algorithms in identifying task-specific trustworthy collaborators and maximizing the average value of task completion.
[62] arXiv:2507.23579 (cross-list from physics.med-ph) [pdf, html, other]: Title: Impact of a Lower Limb Exosuit Anchor Points on Energetics and Biomechanics

Chiara Lambranzi, Giulia Oberti, Christian Di Natali, Darwin G. Caldwell, Manuela Galli, Elena De Momi, Jesùs Ortiz

Comments: 12 pages, 10 figures

Journal-ref: IEEE Trans. Biomed. Eng. (2025)

Subjects: Medical Physics (physics.med-ph); Robotics (cs.RO); Signal Processing (eess.SP)

Anchor point placement is a crucial yet often overlooked aspect of exosuit design since it determines how forces interact with the human body. This work analyzes the impact of different anchor point positions on gait kinematics, muscular activation and energetic consumption. A total of six experiments were conducted with 11 subjects wearing the XoSoft exosuit, which assists hip flexion in five configurations. Subjects were instrumented with an IMU-based motion tracking system, EMG sensors, and a mask to measure metabolic consumption. The results show that positioning the knee anchor point on the posterior side while keeping the hip anchor on the anterior part can reduce muscle activation in the hip flexors by up to 10.21\% and metabolic expenditure by up to 18.45\%. Even if the only assisted joint was the hip, all the configurations introduced changes also in the knee and ankle kinematics. Overall, no single configuration was optimal across all subjects, suggesting that a personalized approach is necessary to transmit the assistance forces optimally. These findings emphasize that anchor point position does indeed have a significant impact on exoskeleton effectiveness and efficiency. However, these optimal positions are subject-specific to the exosuit design, and there is a strong need for future work to tailor musculoskeletal models to individual characteristics and validate these results in clinical populations.
[63] arXiv:2507.23590 (cross-list from cs.SD) [pdf, html, other]: Title: Identifying Hearing Difficulty Moments in Conversational Audio

Jack Collins, Adrian Buzea, Chris Collier, Alejandro Ballesta Rosen, Julian Maclaren, Richard F. Lyon, Simon Carlile

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Individuals regularly experience Hearing Difficulty Moments in everyday conversation. Identifying these moments of hearing difficulty has particular significance in the field of hearing assistive technology where timely interventions are key for realtime hearing assistance. In this paper, we propose and compare machine learning solutions for continuously detecting utterances that identify these specific moments in conversational audio. We show that audio language models, through their multimodal reasoning capabilities, excel at this task, significantly outperforming a simple ASR hotword heuristic and a more conventional fine-tuning approach with Wav2Vec, an audio-only input architecture that is state-of-the-art for automatic speech recognition (ASR).
[64] arXiv:2507.23592 (cross-list from cs.RO) [pdf, html, other]: Title: Human-Exoskeleton Kinematic Calibration to Improve Hand Tracking for Dexterous Teleoperation

Haiyun Zhang, Stefano Dalla Gasperina, Saad N. Yousaf, Toshimitsu Tsuboi, Tetsuya Narita, Ashish D. Deshpande

Comments: 8 pages, 10 figures, submitted to RA-L

Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Systems and Control (eess.SY)

Hand exoskeletons are critical tools for dexterous teleoperation and immersive manipulation interfaces, but achieving accurate hand tracking remains a challenge due to user-specific anatomical variability and donning inconsistencies. These issues lead to kinematic misalignments that degrade tracking performance and limit applicability in precision tasks. We propose a subject-specific calibration framework for exoskeleton-based hand tracking that uses redundant joint sensing and a residual-weighted optimization strategy to estimate virtual link parameters. Implemented on the Maestro exoskeleton, our method improves joint angle and fingertip position estimation across users with varying hand geometries. We introduce a data-driven approach to empirically tune cost function weights using motion capture ground truth, enabling more accurate and consistent calibration across participants. Quantitative results from seven subjects show substantial reductions in joint and fingertip tracking errors compared to uncalibrated and evenly weighted models. Qualitative visualizations using a Unity-based virtual hand further confirm improvements in motion fidelity. The proposed framework generalizes across exoskeleton designs with closed-loop kinematics and minimal sensing, and lays the foundation for high-fidelity teleoperation and learning-from-demonstration applications.
[65] arXiv:2507.23686 (cross-list from cs.IT) [pdf, html, other]: Title: From Link Diversity to Cross-Band Feedback Collaboration: A New Perspective on Hybrid Optical-RF Systems

Menghan Li, Yulin Shao, Runxin Zhang, Lu Lu

Subjects: Information Theory (cs.IT); Systems and Control (eess.SY)

We suggest a re-examination of the conventional view that hybrid optical-radio frequency (O-RF) systems are primarily diversity-driven networks that switch between RF and optical links for robustness. Instead, we uncover a new architectural opportunity: repurposing the optical downlink to enable real-time feedback channel coding over the RF uplink, where structured decoder feedback is delivered from the access point to guide the transmitter's coding strategy. This insight marks a conceptual paradigm shift from passive link diversity to active cross-band collaboration, where the wideband, interference-free optical wireless communication (OWC) is no longer merely a downlink backup but a functional enabler of uplink reliability. To realize this vision, we propose a novel architecture, O-RF with Cross-Band Feedback (O-RF-CBF), that exploits the optical downlink feedback to facilitate adaptive RF uplink coding. Numerical results reveal that O-RF-CBF achieves significant uplink throughput gains over traditional O-RF systems. Our findings highlight that inter-band synergy, not redundancy, is the key to unlocking the full potential of hybrid wireless networks.
[66] arXiv:2507.23702 (cross-list from cs.IT) [pdf, html, other]: Title: Cell-Free Massive MIMO SWIPT with Beyond Diagonal Reconfigurable Intelligent Surfaces

Duc Thien Hua, Mohammadali Mohammadi, Hien Quoc Ngo, Michail Matthaiou

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

We investigate the integration of beyond diagonal reconfigurable intelligent surfaces (BDRISs) into cell free massive multiple input multiple output (CFmMIMO) systems to enhance simultaneous wireless information and power transfer (SWIPT). To simultaneously support two groups of users energy receivers (ERs) and information receivers (IRs) without sacrificing time frequency resources, a subset of access points (APs) is dedicated to serving ERs with the aid of a BDRIS, while the remaining APs focus on supporting IRs. A protective partial zero forcing precoding technique is implemented at the APs to manage the non coherent interference between the ERs and IRs. Subsequently, closed form expressions for the spectral efficiency of the IRs and the average sum of harvested energy at the ERs are leveraged to formulate a comprehensive optimization problem. This problem jointly optimizes the AP selection, AP power control, and scattering matrix design at the BDRIS, all based on long term statistical channel state information. This challenging problem is then effectively transformed into more tractable forms. To solve these sub problems, efficient algorithms are proposed, including a heuristic search for the scattering matrix design, as well as successive convex approximation and deep reinforcement learning methods for the joint AP mode selection and power control design. Numerical results show that a BDRIS with a group or fully connected architecture achieves significant energy harvesting gains over the conventional diagonal RIS, especially delivering up to a seven fold increase in the average sum of harvested energy when a heuristic based scattering matrix design is employed.

[67] arXiv:2401.08052 (replaced) [pdf, html, other]: Title: Multi-Input Multi-Output Target-Speaker Voice Activity Detection For Unified, Flexible, and Robust Audio-Visual Speaker Diarization

Ming Cheng, Ming Li

Comments: Accepted by IEEE Transactions on Audio, Speech, and Language Processing

Subjects: Audio and Speech Processing (eess.AS)

Audio-visual learning has demonstrated promising results in many classical speech tasks (e.g., speech separation, automatic speech recognition, wake-word spotting). We believe that introducing visual modality will also benefit speaker diarization. To date, Target-Speaker Voice Activity Detection (TS-VAD) plays an important role in highly accurate speaker diarization. However, previous TS-VAD models take audio features and utilize the speaker's acoustic footprint to distinguish his or her personal speech activities, which is easily affected by overlapped speech in multi-speaker scenarios. Although visual information naturally tolerates overlapped speech, it suffers from spatial occlusion, low resolution, etc. The potential modality-missing problem blocks TS-VAD towards an audio-visual approach. This paper proposes a novel Multi-Input Multi-Output Target-Speaker Voice Activity Detection (MIMO-TSVAD) framework for speaker diarization. The proposed method can take audio-visual input and leverage the speaker's acoustic footprint or lip track to flexibly conduct audio-based, video-based, and audio-visual speaker diarization in a unified sequence-to-sequence framework. Experimental results show that the MIMO-TSVAD framework demonstrates state-of-the-art performance on the VoxConverse, DIHARD-III, and MISP 2022 datasets under corresponding evaluation metrics, obtaining the Diarization Error Rates (DERs) of 4.18%, 10.10%, and 8.15%, respectively. In addition, it can perform robustly in heavy lip-missing scenarios.
[68] arXiv:2407.07720 (replaced) [pdf, html, other]: Title: Exploiting Scale-Variant Attention for Segmenting Small Medical Objects

Wei Dai, Rui Liu, Zixuan Wu, Tianyi Wu, Min Wang, Junxian Zhou, Yixuan Yuan, Jun Liu

Comments: 14 pages, 9 figures, under review

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Early detection and accurate diagnosis can predict the risk of malignant disease transformation, thereby increasing the probability of effective treatment. Identifying mild syndrome with small pathological regions serves as an ominous warning and is fundamental in the early diagnosis of diseases. While deep learning algorithms, particularly convolutional neural networks (CNNs), have shown promise in segmenting medical objects, analyzing small areas in medical images remains challenging. This difficulty arises due to information losses and compression defects from convolution and pooling operations in CNNs, which become more pronounced as the network deepens, especially for small medical objects. To address these challenges, we propose a novel scale-variant attention-based network (SvANet) for accurately segmenting small-scale objects in medical images. The SvANet consists of scale-variant attention, cross-scale guidance, Monte Carlo attention, and vision transformer, which incorporates cross-scale features and alleviates compression artifacts for enhancing the discrimination of small medical objects. Quantitative experimental results demonstrate the superior performance of SvANet, achieving 96.12%, 96.11%, 89.79%, 84.15%, 80.25%, 73.05%, and 72.58% in mean Dice coefficient for segmenting kidney tumors, skin lesions, hepatic tumors, polyps, surgical excision cells, retinal vasculatures, and sperms, which occupy less than 1% of the image areas in KiTS23, ISIC 2018, ATLAS, PolypGen, TissueNet, FIVES, and SpermHealth datasets, respectively.
[69] arXiv:2408.04210 (replaced) [pdf, html, other]: Title: Adaptive Cohen's Class Time-Frequency Distribution

Manjun Cui, Zhichao Zhang

Subjects: Signal Processing (eess.SP)

Inspired by the use of adaptive kernel-based Cohen's class time-frequency distributions (CCTFDs) for cross-term suppression, this paper aims to explore novel adaptive kernel functions for denoising. We integrate Wiener filter principle and the time-frequency filtering mechanism of CCTFD to design the least-squares adaptive filter method in the Wigner-Ville distribution (WVD) domain, giving birth to the least-squares adaptive filter-based CCTFD whose kernel function can be adjusted with the input signal automatically to achieve the minimum mean-square error denoising in the WVD domain. Some examples are also carried out to demonstrate that the proposed adaptive CCTFD outperforms some state-of-the-arts in noise suppression.
[70] arXiv:2410.16593 (replaced) [pdf, html, other]: Title: Graph Sampling for Scalable and Expressive Graph Neural Networks on Homophilic Graphs

Haolin Li, Haoyu Wang, Luana Ruiz

Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Graph Neural Networks (GNNs) excel in many graph machine learning tasks but face challenges when scaling to large networks. GNN transferability allows training on smaller graphs and applying the model to larger ones, but existing methods often rely on random subsampling, leading to disconnected subgraphs and reduced model expressivity. We propose a novel graph sampling algorithm that leverages feature homophily to preserve graph structure. By minimizing the trace of the data correlation matrix, our method better preserves the graph Laplacian trace -- a proxy for the graph connectivity -- than random sampling, while achieving lower complexity than spectral methods. Experiments on citation networks show improved performance in preserving Laplacian trace and GNN transferability compared to random sampling.
[71] arXiv:2502.15544 (replaced) [pdf, html, other]: Title: Learning-based model predictive control for passenger-oriented train rescheduling with flexible train composition

Xiaoyu Liu, Caio Fabio Oliveira da Silva, Azita Dabiri, Yihui Wang, Bart De Schutter

Comments: 14 pages, 14 figures, submitted to journal

Subjects: Systems and Control (eess.SY)

This paper focuses on passenger-oriented real-time train rescheduling, considering flexible train composition and rolling stock circulation, by integrating learning-based and optimization-based approaches. A learning-based model predictive control (MPC) approach is developed for real-time train rescheduling with flexible train composition and rolling stock circulation to address time-varying passenger demands. In the proposed approach, the values of the integer variables are obtained by pre-trained long short-term memory (LSTM) networks, while the continuous variables are determined through nonlinear constrained optimization. The learning-based MPC approach enables us to jointly consider efficiency and constraint satisfaction by combining learning-based and optimization-based approaches. In order to reduce the number of integer variables, four presolve techniques are developed to prune a subset of integer decision variables. Numerical simulations based on real-life data from the Beijing urban rail transit system are conducted to illustrate the effectiveness of the developed learning-based MPC approach.
[72] arXiv:2502.17482 (replaced) [pdf, html, other]: Title: MVCNet: Multi-View Contrastive Network for Motor Imagery Classification

Ziwei Wang, Siyang Li, Xiaoqing Chen, Dongrui Wu

Comments: 12 pages, 9 figures

Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Electroencephalography (EEG)-based brain-computer interfaces (BCIs) enable neural interaction by decoding brain activity for external communication. Motor imagery (MI) decoding has received significant attention due to its intuitive mechanism. However, most existing models rely on single-stream architectures and overlook the multi-view nature of EEG signals, leading to limited performance and generalization. We propose a multi-view contrastive network (MVCNet), a dual-branch architecture that parallelly integrates CNN and Transformer blocks to capture both local spatial-temporal features and global temporal dependencies. To enhance the informativeness of training data, MVCNet incorporates a unified augmentation pipeline across time, frequency, and spatial domains. Two contrastive modules are further introduced: a cross-view contrastive module that enforces consistency of original and augmented views, and a cross-model contrastive module that aligns features extracted from both branches. Final representations are fused and jointly optimized by contrastive and classification losses. Experiments on five public MI datasets across three scenarios demonstrate that MVCNet consistently outperforms nine state-of-the-art MI decoding networks, highlighting its effectiveness and generalization ability. MVCNet provides a robust solution for MI decoding by integrating multi-view information and dual-branch modeling, contributing to the development of more reliable BCI systems.
[73] arXiv:2503.06981 (replaced) [pdf, html, other]: Title: Graph Chirp Signal and Graph Fractional Vertex-Frequency Energy Distribution

Manjun Cui, Zhichao Zhang, Wei Yao

Subjects: Signal Processing (eess.SP)

Graph signal processing (GSP) has emerged as a powerful framework for analyzing data on irregular domains. In recent years, many classical techniques in signal processing (SP) have been successfully extended to GSP. Among them, chirp signals play a crucial role in various SP applications. However, graph chirp signals have not been formally defined despite their importance. Here, we define graph chirp signals and establish a comprehensive theoretical framework for their analysis. We propose the graph fractional vertex--frequency energy distribution (GFED), which provides a powerful tool for processing and analyzing graph chirp signals. We introduce the general fractional graph distribution (GFGD), a generalized vertex--frequency distribution, and the reduced interference GFED, which can suppress cross-term interference and enhance signal clarity. Furthermore, we propose a novel method for detecting graph signals through GFED domain filtering, facilitating robust detection and analysis of graph chirp signals in noisy environments. Moreover, this method can be applied to real-world data for denoising more effective than some state-of-the-arts, further demonstrating its practical significance.
[74] arXiv:2503.17564 (replaced) [pdf, html, other]: Title: ModalTune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology

Vishwesh Ramanathan, Tony Xu, Pushpak Pati, Faruk Ahmed, Maged Goubran, Anne L. Martel

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Prediction tasks in digital pathology are challenging due to the massive size of whole-slide images (WSIs) and the weak nature of training signals. Advances in computing, data availability, and self-supervised learning (SSL) have paved the way for slide-level foundation models (SLFMs) that can improve prediction tasks in low-data regimes. However, current methods under-utilize shared information between tasks and modalities. To overcome this challenge, we propose ModalTune, a novel fine-tuning framework which introduces the Modal Adapter to integrate new modalities without modifying SLFM weights. Additionally, we use large-language models (LLMs) to encode labels as text, capturing semantic relationships across multiple tasks and cancer types in a single training recipe. ModalTune achieves state-of-the-art (SOTA) results against both uni-modal and multi-modal models across four cancer types, jointly improving survival and cancer subtype prediction while remaining competitive in pan-cancer settings. Additionally, we show ModalTune is generalizable to two out-of-distribution (OOD) datasets. To our knowledge, this is the first unified fine-tuning framework for multi-modal, multi-task, and pan-cancer modeling in digital pathology.
[75] arXiv:2504.16710 (replaced) [pdf, html, other]: Title: On the Asymptotic MSE-Optimality of Parametric Bayesian Channel Estimation in mmWave Systems

Franz Weißer, Wolfgang Utschick

Subjects: Signal Processing (eess.SP)

The mean square error (MSE)-optimal estimator is known to be the conditional mean estimator (CME). This paper introduces a parametric channel estimation technique based on Bayesian estimation. This technique uses the estimated channel parameters to parameterize the well-known LMMSE channel estimator. We first derive an asymptotic CME formulation that holds for a wide range of priors on the channel parameters. Based on this, we show that parametric Bayesian channel estimation is MSE-optimal for high signal-to-noise ratio (SNR) and/or long coherence intervals, i.e., many noisy observations provided within one coherence interval. Numerical simulations validate the derived formulations.
[76] arXiv:2505.05003 (replaced) [pdf, html, other]: Title: Experimental Study on Reference-Path-Aided System Calibration for mmWave Bistatic ISAC Systems

Chenhao Luo, Chongrui Wang, Aimin Tang, Fei Gao, Chaojun Xu

Comments: 6 pages, 8 figures. Accepted by IEEE GLOBECOM 2025

Subjects: Signal Processing (eess.SP)

Integrated sensing and communications (ISAC) has been regarded as a key enabling technology for next-generation wireless networks. Compared to monostatic ISAC, bistatic ISAC can eliminate the critical challenge of self-interference cancellation and is well compatible with the existing network infrastructures. However, the synchronization between the transmitter and the sensing receiver becomes a crucial problem. The extracted channel state information (CSI) for sensing under communication synchronization contains different types of system errors, such as the sampling time offset (STO), carrier frequency offset (CFO), and random phase shift, which can severely degrade sensing performance or even render sensing infeasible. To address this problem, a reference-path-aided system calibration scheme is designed for mmWave bistatic ISAC systems, where the line-of-sight (LoS) path can be blocked. By exploiting the delay-angle sparsity feature in mmWave ISAC systems, the reference path, which can be either a LoS or a non-LoS (NLoS) path, is first identified. By leveraging the fact that all the paths suffer the same system errors, the channel parameter extracted from the reference path is utilized to compensate for the system errors in all other paths. A mmWave ISAC system is developed to validate our design. Experimental results demonstrate that the proposed scheme can support precise estimation of Doppler shift and delay, maintaining time-synchronization errors within 1 nanosecond.
[77] arXiv:2505.10933 (replaced) [pdf, html, other]: Title: Cross-layer Integrated Sensing and Communication: A Joint Industrial and Academic Perspective

Henk Wymeersch, Nuutti Tervo, Stefan Wänstedt, Sharief Saleh, Joerg Ahlendorf, Ozgur Akgul, Vasileios Tsekenis, Sokratis Barmpounakis, Liping Bai, Martin Beale, Rafael Berkvens, Nabeel Nisar Bhat, Hui Chen, Shrayan Das, Claude Desset, Antonio de la Oliva, Prajnamaya Dass, Jeroen Famaey, Hamed Farhadi, Gerhard P. Fettweis, Yu Ge, Hao Guo, Rreze Halili, Katsuyuki Haneda, Abdur Rahman Mohamed Ismail, Akshay Jain, Sylvaine Kerboeuf, Musa Furkan Keskin, Emad Ibrahim, Bilal Khan, Siddhartha Kumar, Stefan Köpsell, Apostolos Kousaridas, Pekka Kyösti, Simon Lindberg, Mohammad Hossein Moghaddam, Ahmad Nimr, Victor Pettersson, Aarno Pärssinen, Basuki Priyanto, Athanasios Stavridis, Tommy Svensson, Sonika Ujjwal

Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

Integrated sensing and communication (ISAC) enables radio systems to simultaneously sense and communicate with their environment. This paper, developed within the Hexa-X-II project funded by the European Union, presents a comprehensive cross-layer vision for ISAC in 6G networks, integrating insights from physical-layer design, hardware architectures, AI-driven intelligence, and protocol-level innovations. We begin by revisiting the foundational principles of ISAC, highlighting synergies and trade-offs between sensing and communication across different integration levels. Enabling technologies (such as multiband operation, massive and distributed MIMO, non-terrestrial networks, reconfigurable intelligent surfaces, and machine learning) are analyzed in conjunction with hardware considerations including waveform design, synchronization, and full-duplex operation. To bridge implementation and system-level evaluation, we introduce a quantitative cross-layer framework linking design parameters to key performance and value indicators. By synthesizing perspectives from both academia and industry, this paper outlines how deeply integrated ISAC can transform 6G into a programmable and context-aware platform supporting applications from reliable wireless access to autonomous mobility and digital twinning.
[78] arXiv:2506.00497 (replaced) [pdf, html, other]: Title: Second-Order Characterization of Micro Doppler Radar Signatures of Drone Swarms

Anders Malthe Westerkam, Alba Spliid Damkjær, Rasmus Erik Villadsen, Magnus Ørum Bastrup Poulsen, Troels Pedersen

Subjects: Signal Processing (eess.SP)

We investigate the second-order characteristics of the radar return signal from a swarm of rotor drones. We consider the case of a swarm of identical drones, with each a number of rotors comprised of a number of rotor blades. By considering the orientation and speed of each rotor as stochastic variables, we derive expressions for the autocorrelation function (ACF) and power spectral density (PSD). The ACF and PSD are in the form of an infinite series with coefficients that drop to zero at a predictable limit. Thus in practical applications, the series may be truncated. As a special case, we show that for deterministic rotor speed, the ACF can be expressed in closed form. We further investigate how system parameters (Blade length, Rotor speed, number of blades, and number of drones) influence the derived expressions for the ACF and PSD.
[79] arXiv:2506.05898 (replaced) [pdf, html, other]: Title: On Level Crossings and Fade Durations in von Mises-Fisher Scattering Channels

Kenan Turbic, Slawomir Stanczak

Comments: Accepted for presentation at WSA 2025 (Track 2)

Subjects: Signal Processing (eess.SP); Applications (stat.AP)

This paper investigates the second-order statistics of multipath fading channels with von Mises-Fisher (vMF) distributed scatters. Simple closed-form expressions for the mean Doppler shift and Doppler spread are derived as the key spectral moments that capture the impact of mobility and scattering characteristics on level crossings and fade durations. These expressions are then used to analyze the influence of vMF parameters on the Level-Crossing Rate (LCR) and Average Fade Duration (AFD). The results show that isotropic scattering yields the highest LCR and the lowest AFD, while fading dynamics reduce with the decreasing angular spread of scatterers. Moreover, mobile antenna motion parallel to the mean scattering direction results in a lower LCR than the perpendicular motion, with the difference between the two cases increasing with the higher concentration of scatterers.
[80] arXiv:2506.14662 (replaced) [pdf, html, other]: Title: PGLib-CO2: A Power Grid Library for Computing and Optimizing Carbon Emissions

Young-ho Cho, Min-Seung Ko, Hao Zhu

Subjects: Systems and Control (eess.SY)

A sustainable electricity infrastructure requires the explicit integration of carbon emissions into power system modeling and optimization paradigms. However, existing open-source datasets for power system R&D lack generator-level carbon emission profiling, limiting the ability to benchmark and compare various carbon-aware grid operational strategies. To address this gap, this work introduces PGLib-CO2, an open-source extension to the widely adopted PGLib-OPF test case library. PGLib-CO2 enriches standard network cases with CO2 and CO2-equivalent emission intensity factors by expanding the fuel-type categorization used by PGLib-OPF, attaining a realistic generator-level carbon profiling. It is also packaged for both Python's pandapower and Julia's this http URL, for a seamless, user-friendly integration of emission modeling into grid computation and optimization tasks. The dataset produced by PGLib-CO2 can support grid-based carbon accounting, emission metric evaluation, and integration into AC optimal power flow (OPF) and optimal load shifting (OLS) formulations. We demonstrate PGLib-CO2's utility through case studies that quantify cost-emission trade-offs and optimize a carbon-aware objective function. By standardizing carbon-enhanced test cases, PGLib-CO2 provides an open-source, reproducible foundation for benchmarking carbon-aware computation, facilitating future research in sustainable power system operation.
[81] arXiv:2507.14534 (replaced) [pdf, html, other]: Title: Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion

Yu Zhang, Baotong Tian, Zhiyao Duan

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)

Zero-shot online voice conversion (VC) holds significant promise for real-time communications and entertainment. However, current VC models struggle to preserve semantic fidelity under real-time constraints, deliver natural-sounding conversions, and adapt effectively to unseen speaker characteristics. To address these challenges, we introduce Conan, a chunkwise online zero-shot voice conversion model that preserves the content of the source while matching the voice timbre and styles of reference speech. Conan comprises three core components: 1) a Stream Content Extractor that leverages Emformer for low-latency streaming content encoding; 2) an Adaptive Style Encoder that extracts fine-grained stylistic features from reference speech for enhanced style adaptation; 3) a Causal Shuffle Vocoder that implements a fully causal HiFiGAN using a pixel-shuffle mechanism. Experimental evaluations demonstrate that Conan outperforms baseline models in subjective and objective metrics. Audio samples can be found at this https URL.
[82] arXiv:2507.18493 (replaced) [pdf, html, other]: Title: Global Observer Design for a Class of Linear Observed Systems on Groups

Changwu Liu, Yuan Shen

Comments: 16 pages, 1 figure

Subjects: Systems and Control (eess.SY)

Linear observed systems on groups encode the geometry of a variety of practical state estimation problems. In this paper, we propose a unified observer framework for a class of linear observed systems by restricting a bi-invariant system on a Lie group to its normal subgroup. This structural property powerfully enables a system immersion of the original system into a linear time-varying system. Leveraging the immersion, an observer is constructed by first designing a Kalman-like observer for the immersed system and then reconstructing the group-valued state via optimization. Under a rank condition, global exponential stability (GES) is achieved provided one global optimum of the reconstruction optimization is found, reflecting the topological difficulties inherent to the non-Euclidean state space. Semi-global stability is guaranteed when input biases are jointly estimated. The theory is applied to the GES observer design for two-frame systems, capable of modeling a family of navigation problems. Two non-trivial examples are provided to illustrate implementation details.
[83] arXiv:2301.04943 (replaced) [pdf, html, other]: Title: Robust Nonlinear Optimal Control via System Level Synthesis

Antoine P. Leeman, Johannes Köhler, Andrea Zanelli, Samir Bennani, Melanie N. Zeilinger

Comments: Published in IEEE Transactions on Automatic Control (TAC). Code: this https URL

Journal-ref: IEEE Transactions on Automatic Control, Vol. 70, No. 7, 2025, pp. 4780-4787

Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

This paper addresses the problem of finite horizon constrained robust optimal control for nonlinear systems subject to norm-bounded disturbances. To this end, the underlying uncertain nonlinear system is decomposed based on a first-order Taylor series expansion into a nominal system and an error (deviation) described as an uncertain linear time-varying system. This decomposition allows us to leverage system level synthesis to jointly optimize an affine error feedback, a nominal nonlinear trajectory, and, most importantly, a dynamic linearization error over-bound used to ensure robust constraint satisfaction for the nonlinear system. The proposed approach thereby results in less conservative planning compared with state-of-the-art techniques. We demonstrate the benefits of the proposed approach to control the rotational motion of a rigid body subject to state and input constraints.
[84] arXiv:2309.12365 (replaced) [pdf, html, other]: Title: An Efficient Intelligent Semi-Automated Warehouse Inventory Stocktaking System

Chunan Tong

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

In the context of evolving supply chain management, the significance of efficient inventory management has grown substantially for businesses. However, conventional manual and experience-based approaches often struggle to meet the complexities of modern market demands. This research introduces an intelligent inventory management system to address challenges related to inaccurate data, delayed monitoring, and overreliance on subjective experience in forecasting. The proposed system integrates bar code and distributed flutter application technologies for intelligent perception, alongside comprehensive big data analytics to enable data-driven decision-making. Through meticulous analysis, system design, critical technology exploration, and simulation validation, the effectiveness of the proposed system is successfully demonstrated. The intelligent system facilitates second-level monitoring, high-frequency checks, and artificial intelligence-driven forecasting, consequently enhancing the automation, precision, and intelligence of inventory management. This system contributes to cost reduction and optimized inventory sizes through accurate predictions and informed decisions, ultimately achieving a mutually beneficial scenario. The outcomes of this research offer
[85] arXiv:2404.17484 (replaced) [pdf, html, other]: Title: Sparse Reconstruction of Optical Doppler Tomography with Alternative State Space Model and Attention

Zhenghong Li, Jiaxiang Ren, Wensheng Cheng, Yanzuo Liu, Congwu Du, Yingtian Pan, Haibin Ling

Comments: MICCAI25, 10 pages, 3 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Optical coherence Doppler tomography (ODT) is an emerging blood flow imaging technique. The fundamental unit of ODT is the 1D depth-resolved trace named raw A-scans (or A-line). A 2D ODT image (B-scan) is formed by reconstructing a cross-sectional flow image via Doppler phase-subtraction of raw A-scans along B-line. To obtain a high-fidelity B-scan, densely sampled A-scans are required currently, leading to prolonged scanning time and increased storage demands. Addressing this issue, we propose a novel sparse ODT reconstruction framework with an Alternative State Space Attention Network (ASSAN) that effectively reduces raw A-scans needed. Inspired by the distinct distributions of information along A-line and B-line, ASSAN applies 1D State Space Model (SSM) to each A-line to learn the intra-A-scan representation, while using 1D gated self-attention along B-line to capture the inter-A-scan features. In addition, an effective feedforward network based on sequential 1D convolutions along different axes is employed to enhance the local feature. In validation experiments on real animal data, ASSAN shows clear effectiveness in the reconstruction in comparison with state-of-the-art reconstruction methods.
[86] arXiv:2409.03146 (replaced) [pdf, html, other]: Title: Optimal Placement and Coordinated Scheduling of Distributed Space-Based Lasers for Orbital Debris Remediation

David O. Williams Rogers, Matthew C. Fox, Paul R. Stysley, Hang Woon Lee

Comments: 42 pages, Advances in Space Research (accepted), Copyright 2025. This manuscript version is made available under the CC-BY-NC-ND 4.0 license

Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

The significant expansion of the orbital debris population poses a serious threat to the safety and sustainability of space operations. This paper investigates orbital debris remediation through a constellation of collaborative space-based lasers, leveraging the principle of momentum transfer onto debris via laser ablation. A novel delta-v vector analysis framework quantifies the cumulative effects of multiple concurrent laser-to-debris (L2D) engagements by utilizing the vector composition of the imparted delta-v vectors. The paper formulates the Concurrent Location-Scheduling Optimization Problem (CLSP) to optimize the placement of laser platforms and the scheduling of L2D engagements, aiming to maximize debris remediation capacity. Given the computational intractability of the CLSP, a decomposition strategy is employed, yielding two sequential subproblems: (1) determining optimal laser platform locations via the Maximal Covering Location Problem, and (2) scheduling L2D engagements using a novel integer linear programming approach to maximize debris remediation capacity. Computational experiments evaluate the efficacy of the proposed framework across diverse mission scenarios, demonstrating critical constellation functions such as collaborative and controlled nudging, deorbiting, and just-in-time collision avoidance. A sensitivity analysis further explores the impact of varying the number and distribution of laser platforms on debris remediation capacity, offering insights into optimizing the performance of space-based laser constellations.
[87] arXiv:2412.04502 (replaced) [pdf, other]: Title: Physics-informed Gaussian Processes as Linear Model Predictive Controller

Jörn Tebbe, Andreas Besginow, Markus Lange-Hegermann

Comments: Accepted at L4DC 2025

Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY)

We introduce a novel algorithm for controlling linear time invariant systems in a tracking problem. The controller is based on a Gaussian Process (GP) whose realizations satisfy a system of linear ordinary differential equations with constant coefficients. Control inputs for tracking are determined by conditioning the prior GP on the setpoints, i.e. control as inference. The resulting Model Predictive Control scheme incorporates pointwise soft constraints by introducing virtual setpoints to the posterior Gaussian process. We show theoretically that our controller satisfies open-loop stability for the optimal control problem by leveraging general results from Bayesian inference and demonstrate this result in a numerical example.
[88] arXiv:2501.11842 (replaced) [pdf, html, other]: Title: Harnessing Rydberg Atomic Receivers: From Quantum Physics to Wireless Communications

Yuanbin Chen, Xufeng Guo, Chau Yuen, Yufei Zhao, Yong Liang Guan, Chong Meng Samson See, Merouane Débbah, Lajos Hanzo

Comments: This revised manuscript has been submitted to IEEE journal, 16 pages, 10 figures

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The intrinsic integration of Rydberg atomic receivers into wireless communication systems is proposed, by harnessing the principles of quantum physics in wireless communications. More particularly, we conceive a pair of Rydberg atomic receivers, one incorporates a local oscillator (LO), referred to as an LO-dressed receiver, while the other operates without an LO and is termed an LO-free receiver. The appropriate wireless model is developed for each configuration, elaborating on the receiver's responses to the radio frequency (RF) signal, on the potential noise sources, and on the signal-to-noise ratio (SNR) performance. The developed wireless model conforms to the classical RF framework, facilitating compatibility with established signal processing methodologies. Next, we investigate the associated distortion effects that might occur, specifically identifying the conditions under which distortion arises and demonstrating the boundaries of linear dynamic ranges. This provides critical insights into its practical implementations in wireless systems. Finally, extensive simulation results are provided for characterizing the performance of wireless systems, harnessing this pair of Rydberg atomic receivers. Our results demonstrate that LO-dressed systems achieve a significant SNR gain of approximately 40~50 dB over conventional RF receivers in the standard quantum limit regime. This SNR head-room translates into reduced symbol error rates, enabling efficient and reliable transmission with higher-order constellations.
[89] arXiv:2502.02171 (replaced) [pdf, other]: Title: DeepForest: Sensing Into Self-Occluding Volumes of Vegetation With Aerial Imaging

Mohamed Youssef, Jian Peng, Oliver Bimber

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Access to below-canopy volumetric vegetation data is crucial for understanding ecosystem dynamics. We address the long-standing limitation of remote sensing to penetrate deep into dense canopy layers. LiDAR and radar are currently considered the primary options for measuring 3D vegetation structures, while cameras can only extract the reflectance and depth of top layers. Using conventional, high-resolution aerial images, our approach allows sensing deep into self-occluding vegetation volumes, such as forests. It is similar in spirit to the imaging process of wide-field microscopy, but can handle much larger scales and strong occlusion. We scan focal stacks by synthetic-aperture imaging with drones and reduce out-of-focus signal contributions using pre-trained 3D convolutional neural networks with mean squared error (MSE) as the loss function. The resulting volumetric reflectance stacks contain low-frequency representations of the vegetation volume. Combining multiple reflectance stacks from various spectral channels provides insights into plant health, growth, and environmental conditions throughout the entire vegetation volume. Compared with simulated ground truth, our correction leads to ~x7 average improvements (min: ~x2, max: ~x12) for forest densities of 220 trees/ha - 1680 trees/ha. In our field experiment, we achieved an MSE of 0.05 when comparing with the top-vegetation layer that was measured with classical multispectral aerial imaging.
[90] arXiv:2502.15276 (replaced) [pdf, other]: Title: Categorical Lyapunov Theory I: Stability of Flows

Aaron D. Ames, Joe Moeller, Paulo Tabuada

Comments: 31 pages

Subjects: Dynamical Systems (math.DS); Systems and Control (eess.SY); Category Theory (math.CT)

Lyapunov's theorem provides a fundamental characterization of the stability of dynamical systems. This paper presents a categorical framework for Lyapunov theory, generalizing stability analysis with Lyapunov functions categorically. Core to our approach is the set of axioms underlying a setting for stability, which give the necessary ingredients for ``doing Lyapunov theory'' in a category of interest. With these minimal assumptions, we define the stability of equilibria, formulate Lyapunov morphisms, and demonstrate that the existence of Lyapunov morphisms is necessary and sufficient for establishing the stability of flows. To illustrate these constructions, we show how classical notions of stability, e.g., for continuous and discrete time dynamical systems, are captured by this categorical framework for Lyapunov theory. Finally, to demonstrate the extensibility of our framework, we illustrate how enriched categories, e.g., Lawvere metric spaces, yield settings for stability enabling one to ``do Lyapunov theory'' in enriched categories.
[91] arXiv:2504.08278 (replaced) [pdf, html, other]: Title: Line-Search Filter Differential Dynamic Programming for Optimal Control with Nonlinear Equality Constraints

Ming Xu, Stephen Gould, Iman Shames

Subjects: Optimization and Control (math.OC); Robotics (cs.RO); Systems and Control (eess.SY)

We present FilterDDP, a differential dynamic programming algorithm for solving discrete-time, optimal control problems (OCPs) with nonlinear equality constraints. Unlike prior methods based on merit functions or the augmented Lagrangian class of algorithms, FilterDDP uses a step filter in conjunction with a line search to handle equality constraints. We identify two important design choices for the step filter criteria which lead to robust numerical performance: 1) we use the Lagrangian instead of the cost as one of the filter criterion and, 2) for the stopping criteria and backward pass Hessians, we replace the value function gradient with an estimated dual variable of the dynamics constraints. Both choices are rigorously justified, for 2) in particular by a formal proof of local quadratic convergence. We validate FilterDDP on three contact implicit trajectory optimisation problems which arise in robotics.
[92] arXiv:2505.14874 (replaced) [pdf, html, other]: Title: Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages

Chin-Jou Li, Eunjung Yeo, Kwanghee Choi, Paula Andrea Pérez-Toro, Masao Someki, Rohan Kumar Das, Zhengjun Yue, Juan Rafael Orozco-Arroyave, Elmar Nöth, David R. Mortensen

Comments: 5 pages, 1 figure, Accepted to Interspeech 2025

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Automatic speech recognition (ASR) for dysarthric speech remains challenging due to data scarcity, particularly in non-English languages. To address this, we fine-tune a voice conversion model on English dysarthric speech (UASpeech) to encode both speaker characteristics and prosodic distortions, then apply it to convert healthy non-English speech (FLEURS) into non-English dysarthric-like speech. The generated data is then used to fine-tune a multilingual ASR model, Massively Multilingual Speech (MMS), for improved dysarthric speech recognition. Evaluation on PC-GITA (Spanish), EasyCall (Italian), and SSNCE (Tamil) demonstrates that VC with both speaker and prosody conversion significantly outperforms the off-the-shelf MMS performance and conventional augmentation techniques such as speed and tempo perturbation. Objective and subjective analyses of the generated data further confirm that the generated speech simulates dysarthric characteristics.
[93] arXiv:2505.17696 (replaced) [pdf, html, other]: Title: Enhancing AI System Resiliency: Formulation and Guarantee for LSTM Resilience Based on Control Theory

Sota Yoshihara (1), Ryosuke Yamamoto (2), Hiroyuki Kusumoto (1), Masanari Shimura (1) ((1) Graduate School of Mathematics, Nagoya University, (2) AISIN SOFTWARE Co., Ltd.)

Comments: 9 pages, 6 figures. Appendix: 17 pages. First three listed authors have equal contributions

Subjects: Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

This paper proposes a novel theoretical framework for guaranteeing and evaluating the resilience of long short-term memory (LSTM) networks in control systems. We introduce "recovery time" as a new metric of resilience in order to quantify the time required for an LSTM to return to its normal state after anomalous inputs. By mathematically refining incremental input-to-state stability ($\delta$ISS) theory for LSTM, we derive a practical data-independent upper bound on recovery time. This upper bound gives us resilience-aware training. Experimental validation on simple models demonstrates the effectiveness of our resilience estimation and control methods, enhancing a foundation for rigorous quality assurance in safety-critical AI applications.
[94] arXiv:2507.01694 (replaced) [pdf, html, other]: Title: Graph Representation-based Model Poisoning on Federated Large Language Models

Hanlin Cai, Haofan Dong, Houtianfu Wang, Kai Li, Ozgur B. Akan

Comments: 7 pages, 5 figures (Submitted to IEEE Communication Magazine)

Subjects: Cryptography and Security (cs.CR); Systems and Control (eess.SY)

Federated large language models (FedLLMs) enable powerful generative capabilities within wireless networks while preserving data privacy. Nonetheless, FedLLMs remain vulnerable to model poisoning attacks. This article first reviews recent advancements in model poisoning techniques and existing defense mechanisms for FedLLMs, underscoring critical limitations, especially when dealing with non-IID textual data distributions. Current defense strategies predominantly employ distance or similarity-based outlier detection mechanisms, relying on the assumption that malicious updates markedly differ from benign statistical patterns. However, this assumption becomes inadequate against adaptive adversaries targeting billion-parameter LLMs. The article further investigates graph representation-based model poisoning (GRMP), an emerging attack paradigm that exploits higher-order correlations among benign client gradients to craft malicious updates indistinguishable from legitimate ones. GRMP can effectively circumvent advanced defense systems, causing substantial degradation in model accuracy and overall performance. Moreover, the article outlines a forward-looking research roadmap that emphasizes the necessity of graph-aware secure aggregation methods, specialized vulnerability metrics tailored for FedLLMs, and evaluation frameworks to enhance the robustness of federated language model deployments.
[95] arXiv:2507.07526 (replaced) [pdf, html, other]: Title: DMF2Mel: A Dynamic Multiscale Fusion Network for EEG-Driven Mel Spectrogram Reconstruction

Cunhang Fan, Sheng Zhang, Jingjing Zhang, Enrui Liu, Xinhui Li, Minggang Zhao, Zhao Lv

Comments: Accepted by ACM MM 2025

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Decoding speech from brain signals is a challenging research problem. Although existing technologies have made progress in reconstructing the mel spectrograms of auditory stimuli at the word or letter level, there remain core challenges in the precise reconstruction of minute-level continuous imagined speech: traditional models struggle to balance the efficiency of temporal dependency modeling and information retention in long-sequence decoding. To address this issue, this paper proposes the Dynamic Multiscale Fusion Network (DMF2Mel), which consists of four core components: the Dynamic Contrastive Feature Aggregation Module (DC-FAM), the Hierarchical Attention-Guided Multi-Scale Network (HAMS-Net), the SplineMap attention mechanism, and the bidirectional state space module (convMamba). Specifically, the DC-FAM separates speech-related "foreground features" from noisy "background features" through local convolution and global attention mechanisms, effectively suppressing interference and enhancing the representation of transient signals. HAMS-Net, based on the U-Net framework,achieves cross-scale fusion of high-level semantics and low-level details. The SplineMap attention mechanism integrates the Adaptive Gated Kolmogorov-Arnold Network (AGKAN) to combine global context modeling with spline-based local fitting. The convMamba captures long-range temporal dependencies with linear complexity and enhances nonlinear dynamic modeling capabilities. Results on the SparrKULee dataset show that DMF2Mel achieves a Pearson correlation coefficient of 0.074 in mel spectrogram reconstruction for known subjects (a 48% improvement over the baseline) and 0.048 for unknown subjects (a 35% improvement over the baseline).Code is available at: this https URL.
[96] arXiv:2507.07953 (replaced) [pdf, other]: Title: Incremental Collision Laws Based on the Bouc-Wen Model: External Forces and Corner Cases

Mihails Milehins, Dan B. Marghitu

Comments: 12 pages, 3 figures, see this https URL ; (v2-v4) various amendments; arXiv admin note: text overlap with arXiv:2410.08147

Subjects: Classical Physics (physics.class-ph); Systems and Control (eess.SY)

In the article titled "The Bouc-Wen Model for Binary Direct Collinear Collisions of Convex Viscoplastic Bodies" and published in the Journal of Computational and Nonlinear Dynamics (Volume 20, Issue 6, June 2025), the authors studied mathematical models of binary direct collinear collisions of convex viscoplastic bodies that employed two incremental collision laws based on the Bouc-Wen differential model of hysteresis. It was shown that the models possess favorable analytical properties, and several model parameter identification studies were conducted, demonstrating that the models can accurately capture the nature of a variety of collision phenomena. In this article, the aforementioned models are augmented by modeling the effects of external forces as time-dependent inputs that belong to a certain function space. Furthermore, the range of the parameters under which the models possess favorable analytical properties is extended to several corner cases that were not considered in the prior publication. Finally, the previously conducted model parameter identification studies are extended, and an additional model parameter identification study is provided in an attempt to validate the ability of the augmented models to represent the effects of external forces.
[97] arXiv:2507.21886 (replaced) [pdf, html, other]: Title: Efficient Pain Recognition via Respiration Signals: A Single Cross-Attention Transformer Multi-Window Fusion Pipeline

Stefanos Gkikas, Ioannis Kyprakis, Manolis Tsiknakis

Comments: arXiv admin note: text overlap with arXiv:2507.21881, arXiv:2507.21875

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Signal Processing (eess.SP)

Pain is a complex condition affecting a large portion of the population. Accurate and consistent evaluation is essential for individuals experiencing pain, and it supports the development of effective and advanced management strategies. Automatic pain assessment systems provide continuous monitoring and support clinical decision-making, aiming to reduce distress and prevent functional decline. This study has been submitted to the \textit{Second Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN)}. The proposed method introduces a pipeline that leverages respiration as the input signal and incorporates a highly efficient cross-attention transformer alongside a multi-windowing strategy. Extensive experiments demonstrate that respiration is a valuable physiological modality for pain assessment. Moreover, experiments revealed that compact and efficient models, when properly optimized, can achieve strong performance, often surpassing larger counterparts. The proposed multi-window approach effectively captures both short-term and long-term features, as well as global characteristics, thereby enhancing the model's representational capacity.

Total of 97 entries

Showing up to 2000 entries per page: fewer | more | all

Electrical Engineering and Systems Science

Showing new listings for Friday, 1 August 2025

New submissions (showing 41 of 41 entries)

Cross submissions (showing 25 of 25 entries)

Replacement submissions (showing 31 of 31 entries)