Machine Learning
See recent articles
Showing new listings for Friday, 1 August 2025
- [1] arXiv:2507.22954 [pdf, html, other]
-
Title: Neural Autoregressive Modeling of Brain AgingComments: Accepted at Deep Generative Models Workshop @ MICCAI 2025Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Neurons and Cognition (q-bio.NC)
Brain aging synthesis is a critical task with broad applications in clinical and computational neuroscience. The ability to predict the future structural evolution of a subject's brain from an earlier MRI scan provides valuable insights into aging trajectories. Yet, the high-dimensionality of data, subtle changes of structure across ages, and subject-specific patterns constitute challenges in the synthesis of the aging brain. To overcome these challenges, we propose NeuroAR, a novel brain aging simulation model based on generative autoregressive transformers. NeuroAR synthesizes the aging brain by autoregressively estimating the discrete token maps of a future scan from a convenient space of concatenated token embeddings of a previous and future scan. To guide the generation, it concatenates into each scale the subject's previous scan, and uses its acquisition age and the target age at each block via cross-attention. We evaluate our approach on both the elderly population and adolescent subjects, demonstrating superior performance over state-of-the-art generative models, including latent diffusion models (LDM) and generative adversarial networks, in terms of image fidelity. Furthermore, we employ a pre-trained age predictor to further validate the consistency and realism of the synthesized images with respect to expected aging patterns. NeuroAR significantly outperforms key models, including LDM, demonstrating its ability to model subject-specific brain aging trajectories with high fidelity.
- [2] arXiv:2507.22956 [pdf, html, other]
-
Title: LLM-Assisted Cheating Detection in Korean Language via KeystrokesComments: This paper has 11 pages, 6 figures, 2 tables, and has been accepted for publication at IEEE-IJCB 2025Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)
This paper presents a keystroke-based framework for detecting LLM-assisted cheating in Korean, addressing key gaps in prior research regarding language coverage, cognitive context, and the granularity of LLM involvement. Our proposed dataset includes 69 participants who completed writing tasks under three conditions: Bona fide writing, paraphrasing ChatGPT responses, and transcribing ChatGPT responses. Each task spans six cognitive processes defined in Bloom's Taxonomy (remember, understand, apply, analyze, evaluate, and create). We extract interpretable temporal and rhythmic features and evaluate multiple classifiers under both Cognition-Aware and Cognition-Unaware settings. Temporal features perform well under Cognition-Aware evaluation scenarios, while rhythmic features generalize better under cross-cognition scenarios. Moreover, detecting bona fide and transcribed responses was easier than paraphrased ones for both the proposed models and human evaluators, with the models significantly outperforming the humans. Our findings affirm that keystroke dynamics facilitate reliable detection of LLM-assisted writing across varying cognitive demands and writing strategies, including paraphrasing and transcribing LLM-generated responses.
- [3] arXiv:2507.22959 [pdf, html, other]
-
Title: Scientific Machine Learning with Kolmogorov-Arnold NetworksSubjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Mathematical Physics (math-ph)
The field of scientific machine learning, which originally utilized multilayer perceptrons (MLPs), is increasingly adopting Kolmogorov-Arnold Networks (KANs) for data encoding. This shift is driven by the limitations of MLPs, including poor interpretability, fixed activation functions, and difficulty capturing localized or high-frequency features. KANs address these issues with enhanced interpretability and flexibility, enabling more efficient modeling of complex nonlinear interactions and effectively overcoming the constraints associated with conventional MLP architectures. This review categorizes recent progress in KAN-based models across three distinct perspectives: (i) data-driven learning, (ii) physics-informed modeling, and (iii) deep operator learning. Each perspective is examined through the lens of architectural design, training strategies, application efficacy, and comparative evaluation against MLP-based counterparts. By benchmarking KANs against MLPs, we highlight consistent improvements in accuracy, convergence, and spectral representation, clarifying KANs' advantages in capturing complex dynamics while learning more effectively. Finally, this review identifies critical challenges and open research questions in KAN development, particularly regarding computational efficiency, theoretical guarantees, hyperparameter tuning, and algorithm complexity. We also outline future research directions aimed at improving the robustness, scalability, and physical consistency of KAN-based frameworks.
- [4] arXiv:2507.22962 [pdf, html, other]
-
Title: Multi-Hazard Early Warning Systems for Agriculture with Featural-Temporal ExplanationsComments: Pre-print v0.8 2025-07-30Subjects: Machine Learning (cs.LG)
Climate extremes present escalating risks to agriculture intensifying the need for reliable multi-hazard early warning systems (EWS). The situation is evolving due to climate change and hence such systems should have the intelligent to continue to learn from recent climate behaviours. However, traditional single-hazard forecasting methods fall short in capturing complex interactions among concurrent climatic events. To address this deficiency, in this paper, we combine sequential deep learning models and advanced Explainable Artificial Intelligence (XAI) techniques to introduce a multi-hazard forecasting framework for agriculture. In our experiments, we utilize meteorological data from four prominent agricultural regions in the United States (between 2010 and 2023) to validate the predictive accuracy of our framework on multiple severe event types, which are extreme cold, floods, frost, hail, heatwaves, and heavy rainfall, with tailored models for each area. The framework uniquely integrates attention mechanisms with TimeSHAP (a recurrent XAI explainer for time series) to provide comprehensive temporal explanations revealing not only which climatic features are influential but precisely when their impacts occur. Our results demonstrate strong predictive accuracy, particularly with the BiLSTM architecture, and highlight the system's capacity to inform nuanced, proactive risk management strategies. This research significantly advances the explainability and applicability of multi-hazard EWS, fostering interdisciplinary trust and effective decision-making process for climate risk management in the agricultural industry.
- [5] arXiv:2507.22963 [pdf, html, other]
-
Title: FedCVD++: Communication-Efficient Federated Learning for Cardiovascular Risk Prediction with Parametric and Non-Parametric Model OptimizationAbdelrhman Gaber, Hassan Abd-Eltawab, John Elgallab, Youssif Abuzied, Dineo Mpanya, Turgay Celik, Swarun Kumar, Tamer ElBattSubjects: Machine Learning (cs.LG); Other Quantitative Biology (q-bio.OT)
Cardiovascular diseases (CVD) cause over 17 million deaths annually worldwide, highlighting the urgent need for privacy-preserving predictive systems. We introduce FedCVD++, an enhanced federated learning (FL) framework that integrates both parametric models (logistic regression, SVM, neural networks) and non-parametric models (Random Forest, XGBoost) for coronary heart disease risk prediction. To address key FL challenges, we propose: (1) tree-subset sampling that reduces Random Forest communication overhead by 70%, (2) XGBoost-based feature extraction enabling lightweight federated ensembles, and (3) federated SMOTE synchronization for resolving cross-institutional class imbalance.
Evaluated on the Framingham dataset (4,238 records), FedCVD++ achieves state-of-the-art results: federated XGBoost (F1 = 0.80) surpasses its centralized counterpart (F1 = 0.78), and federated Random Forest (F1 = 0.81) matches non-federated performance. Additionally, our communication-efficient strategies reduce bandwidth consumption by 3.2X while preserving 95% accuracy.
Compared to existing FL frameworks, FedCVD++ delivers up to 15% higher F1-scores and superior scalability for multi-institutional deployment. This work represents the first practical integration of non-parametric models into federated healthcare systems, providing a privacy-preserving solution validated under real-world clinical constraints. - [6] arXiv:2507.23000 [pdf, html, other]
-
Title: Planning for Cooler Cities: A Multimodal AI Framework for Predicting and Mitigating Urban Heat Stress through Urban Landscape TransformationSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
As extreme heat events intensify due to climate change and urbanization, cities face increasing challenges in mitigating outdoor heat stress. While traditional physical models such as SOLWEIG and ENVI-met provide detailed assessments of human-perceived heat exposure, their computational demands limit scalability for city-wide planning. In this study, we propose GSM-UTCI, a multimodal deep learning framework designed to predict daytime average Universal Thermal Climate Index (UTCI) at 1-meter hyperlocal resolution. The model fuses surface morphology (nDSM), high-resolution land cover data, and hourly meteorological conditions using a feature-wise linear modulation (FiLM) architecture that dynamically conditions spatial features on atmospheric context. Trained on SOLWEIG-derived UTCI maps, GSM-UTCI achieves near-physical accuracy, with an R2 of 0.9151 and a mean absolute error (MAE) of 0.41°C, while reducing inference time from hours to under five minutes for an entire city. To demonstrate its planning relevance, we apply GSM-UTCI to simulate systematic landscape transformation scenarios in Philadelphia, replacing bare earth, grass, and impervious surfaces with tree canopy. Results show spatially heterogeneous but consistently strong cooling effects, with impervious-to-tree conversion producing the highest aggregated benefit (-4.18°C average change in UTCI across 270.7 km2). Tract-level bivariate analysis further reveals strong alignment between thermal reduction potential and land cover proportions. These findings underscore the utility of GSM-UTCI as a scalable, fine-grained decision support tool for urban climate adaptation, enabling scenario-based evaluation of greening strategies across diverse urban environments.
- [7] arXiv:2507.23009 [pdf, html, other]
-
Title: Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests insteadSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Large Language Models (LLMs) have achieved remarkable results on a range of standardized tests originally designed to assess human cognitive and psychological traits, such as intelligence and personality. While these results are often interpreted as strong evidence of human-like characteristics in LLMs, this paper argues that such interpretations constitute an ontological error. Human psychological and educational tests are theory-driven measurement instruments, calibrated to a specific human population. Applying these tests to non-human subjects without empirical validation, risks mischaracterizing what is being measured. Furthermore, a growing trend frames AI performance on benchmarks as measurements of traits such as ``intelligence'', despite known issues with validity, data contamination, cultural bias and sensitivity to superficial prompt changes. We argue that interpreting benchmark performance as measurements of human-like traits, lacks sufficient theoretical and empirical justification. This leads to our position: Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead. We call for the development of principled, AI-specific evaluation frameworks tailored to AI systems. Such frameworks might build on existing frameworks for constructing and validating psychometrics tests, or could be created entirely from scratch to fit the unique context of AI.
- [8] arXiv:2507.23010 [pdf, html, other]
-
Title: Investigating the Invertibility of Multimodal Latent Spaces: Limitations of Optimization-Based MethodsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
This paper investigates the inverse capabilities and broader utility of multimodal latent spaces within task-specific AI (Artificial Intelligence) models. While these models excel at their designed forward tasks (e.g., text-to-image generation, audio-to-text transcription), their potential for inverse mappings remains largely unexplored. We propose an optimization-based framework to infer input characteristics from desired outputs, applying it bidirectionally across Text-Image (BLIP, Flux.1-dev) and Text-Audio (Whisper-Large-V3, Chatterbox-TTS) modalities.
Our central hypothesis posits that while optimization can guide models towards inverse tasks, their multimodal latent spaces will not consistently support semantically meaningful and perceptually coherent inverse mappings. Experimental results consistently validate this hypothesis. We demonstrate that while optimization can force models to produce outputs that align textually with targets (e.g., a text-to-image model generating an image that an image captioning model describes correctly, or an ASR model transcribing optimized audio accurately), the perceptual quality of these inversions is chaotic and incoherent. Furthermore, when attempting to infer the original semantic input from generative models, the reconstructed latent space embeddings frequently lack semantic interpretability, aligning with nonsensical vocabulary tokens.
These findings highlight a critical limitation. multimodal latent spaces, primarily optimized for specific forward tasks, do not inherently possess the structure required for robust and interpretable inverse mappings. Our work underscores the need for further research into developing truly semantically rich and invertible multimodal latent spaces. - [9] arXiv:2507.23035 [pdf, html, other]
-
Title: KLLM: Fast LLM Inference with K-Means QuantizationSubjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR)
Large language model (LLM) inference poses significant challenges due to its intensive memory and computation demands. Weight and activation quantization (WAQ) offers a promising solution by reducing both memory footprint and arithmetic complexity. However, two key challenges remain in the existing WAQ designs. (1) Traditional WAQ designs rely on uniform integer-based quantization for hardware efficiency, but this often results in significant accuracy degradation at low precision. K-Means-based quantization, a non-uniform quantization technique, achieves higher accuracy by matching the Gaussian-like distributions of weights and activations in LLMs. However, its non-uniform nature prevents direct execution on low-precision compute units, requiring dequantization and floating-point matrix multiplications (MatMuls) during inference. (2) Activation outliers further hinder effective low-precision WAQ. Offline thresholding methods for outlier detection can lead to significant model performance degradation, while existing online detection techniques introduce substantial runtime overhead.
To address the aforementioned challenges and fully unleash the potential of WAQ with K-Means quantization for LLM inference, in this paper, we propose KLLM, a hardware-software co-design framework. KLLM features an index-based computation scheme for efficient execution of MatMuls and nonlinear operations on K-Means-quantized data, which avoids most of the dequantization and full-precision computations. Moreover, KLLM incorporates a novel outlier detection engine, Orizuru, that efficiently identifies the top-$k$ largest and smallest elements in the activation data stream during online inference.
Extensive experiments show that, on average, KLLM achieves speedups of 9.67x, 7.03x and energy efficiency improvements of 229.50x, 150.21x compared to the A100 GPU and Atom, respectively. - [10] arXiv:2507.23037 [pdf, html, other]
-
Title: Linking Actor Behavior to Process Performance Over TimeAurélie Leribaux, Rafael Oyamada, Johannes De Smedt, Zahra Dasht Bozorgi, Artem Polyvyanyy, Jochen De WeerdtComments: Accepted for presentation at the 5th Workshop on Change, Drift, and Dynamics of Organizational Processes (ProDy), BPM 2025Subjects: Machine Learning (cs.LG)
Understanding how actor behavior influences process outcomes is a critical aspect of process mining. Traditional approaches often use aggregate and static process data, overlooking the temporal and causal dynamics that arise from individual actor behavior. This limits the ability to accurately capture the complexity of real-world processes, where individual actor behavior and interactions between actors significantly shape performance. In this work, we address this gap by integrating actor behavior analysis with Granger causality to identify correlating links in time series data. We apply this approach to realworld event logs, constructing time series for actor interactions, i.e. continuation, interruption, and handovers, and process outcomes. Using Group Lasso for lag selection, we identify a small but consistently influential set of lags that capture the majority of causal influence, revealing that actor behavior has direct and measurable impacts on process performance, particularly throughput time. These findings demonstrate the potential of actor-centric, time series-based methods for uncovering the temporal dependencies that drive process outcomes, offering a more nuanced understanding of how individual behaviors impact overall process efficiency.
- [11] arXiv:2507.23043 [pdf, html, other]
-
Title: Prediction of Significant Creatinine Elevation in First ICU Stays with Vancomycin Use: A retrospective study through CatboostJunyi Fan, Li Sun, Shuheng Chen, Yong Si, Minoo Ahmadi, Greg Placencia, Elham Pishgar, Kamiar Alaei, Maryam PishgarSubjects: Machine Learning (cs.LG)
Background: Vancomycin, a key antibiotic for severe Gram-positive infections in ICUs, poses a high nephrotoxicity risk. Early prediction of kidney injury in critically ill patients is challenging. This study aimed to develop a machine learning model to predict vancomycin-related creatinine elevation using routine ICU data.
Methods: We analyzed 10,288 ICU patients (aged 18-80) from the MIMIC-IV database who received vancomycin. Kidney injury was defined by KDIGO criteria (creatinine rise >=0.3 mg/dL within 48h or >=50% within 7d). Features were selected via SelectKBest (top 30) and Random Forest ranking (final 15). Six algorithms were tested with 5-fold cross-validation. Interpretability was evaluated using SHAP, Accumulated Local Effects (ALE), and Bayesian posterior sampling.
Results: Of 10,288 patients, 2,903 (28.2%) developed creatinine elevation. CatBoost performed best (AUROC 0.818 [95% CI: 0.801-0.834], sensitivity 0.800, specificity 0.681, negative predictive value 0.900). Key predictors were phosphate, total bilirubin, magnesium, Charlson index, and APSIII. SHAP confirmed phosphate as a major risk factor. ALE showed dose-response patterns. Bayesian analysis estimated mean risk 60.5% (95% credible interval: 16.8-89.4%) in high-risk cases.
Conclusions: This machine learning model predicts vancomycin-associated creatinine elevation from routine ICU data with strong accuracy and interpretability, enabling early risk detection and supporting timely interventions in critical care. - [12] arXiv:2507.23073 [pdf, html, other]
-
Title: Locally Differentially Private Thresholding BanditsComments: 18th European Workshop on Reinforcement Learning (EWRL 2025)Subjects: Machine Learning (cs.LG)
This work investigates the impact of ensuring local differential privacy in the thresholding bandit problem. We consider both the fixed budget and fixed confidence settings. We propose methods that utilize private responses, obtained through a Bernoulli-based differentially private mechanism, to identify arms with expected rewards exceeding a predefined threshold. We show that this procedure provides strong privacy guarantees and derive theoretical performance bounds on the proposed algorithms. Additionally, we present general lower bounds that characterize the additional loss incurred by any differentially private mechanism, and show that the presented algorithms match these lower bounds up to poly-logarithmic factors. Our results provide valuable insights into privacy-preserving decision-making frameworks in bandit problems.
- [13] arXiv:2507.23077 [pdf, html, other]
-
Title: A Foundation Model for Material Fracture PredictionAgnese Marcato, Aleksandra Pachalieva, Ryley G. Hill, Kai Gao, Xiaoyu Wang, Esteban Rougier, Zhou Lei, Vinamra Agrawal, Janel Chua, Qinjun Kang, Jeffrey D. Hyman, Abigail Hunter, Nathan DeBardeleben, Earl Lawrence, Hari Viswanathan, Daniel O'Malley, Javier E. SantosSubjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci); Geophysics (physics.geo-ph)
Accurately predicting when and how materials fail is critical to designing safe, reliable structures, mechanical systems, and engineered components that operate under stress. Yet, fracture behavior remains difficult to model across the diversity of materials, geometries, and loading conditions in real-world applications. While machine learning (ML) methods show promise, most models are trained on narrow datasets, lack robustness, and struggle to generalize. Meanwhile, physics-based simulators offer high-fidelity predictions but are fragmented across specialized methods and require substantial high-performance computing resources to explore the input space. To address these limitations, we present a data-driven foundation model for fracture prediction, a transformer-based architecture that operates across simulators, a wide range of materials (including plastic-bonded explosives, steel, aluminum, shale, and tungsten), and diverse loading conditions. The model supports both structured and unstructured meshes, combining them with large language model embeddings of textual input decks specifying material properties, boundary conditions, and solver settings. This multimodal input design enables flexible adaptation across simulation scenarios without changes to the model architecture. The trained model can be fine-tuned with minimal data on diverse downstream tasks, including time-to-failure estimation, modeling fracture evolution, and adapting to combined finite-discrete element method simulations. It also generalizes to unseen materials such as titanium and concrete, requiring as few as a single sample, dramatically reducing data needs compared to standard ML. Our results show that fracture prediction can be unified under a single model architecture, offering a scalable, extensible alternative to simulator-specific workflows.
- [14] arXiv:2507.23093 [pdf, other]
-
Title: On the Sustainability of AI Inferences in the EdgeComments: 14 pages, 8 figures, 6 tables, in preparation for journal submissionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Performance (cs.PF)
The proliferation of the Internet of Things (IoT) and its cutting-edge AI-enabled applications (e.g., autonomous vehicles and smart industries) combine two paradigms: data-driven systems and their deployment on the edge. Usually, edge devices perform inferences to support latency-critical applications. In addition to the performance of these resource-constrained edge devices, their energy usage is a critical factor in adopting and deploying edge applications. Examples of such devices include Raspberry Pi (RPi), Intel Neural Compute Stick (INCS), NVIDIA Jetson nano (NJn), and Google Coral USB (GCU). Despite their adoption in edge deployment for AI inferences, there is no study on their performance and energy usage for informed decision-making on the device and model selection to meet the demands of applications. This study fills the gap by rigorously characterizing the performance of traditional, neural networks, and large language models on the above-edge devices. Specifically, we analyze trade-offs among model F1 score, inference time, inference power, and memory usage. Hardware and framework optimization, along with external parameter tuning of AI models, can balance between model performance and resource usage to realize practical edge AI deployments.
- [15] arXiv:2507.23111 [pdf, html, other]
-
Title: Scalable Generative Modeling of Weighted GraphsComments: 25 pages, 5 figures, included appendix. code at this https URLSubjects: Machine Learning (cs.LG)
Weighted graphs are ubiquitous throughout biology, chemistry, and the social sciences, motivating the development of generative models for abstract weighted graph data using deep neural networks. However, most current deep generative models are either designed for unweighted graphs and are not easily extended to weighted topologies or incorporate edge weights without consideration of a joint distribution with topology. Furthermore, learning a distribution over weighted graphs must account for complex nonlocal dependencies between both the edges of the graph and corresponding weights of each edge. We develop an autoregressive model BiGG-E, a nontrivial extension of the BiGG model, that learns a joint distribution over weighted graphs while still exploiting sparsity to generate a weighted graph with $n$ nodes and $m$ edges in $O((n + m)\log n)$ time. Simulation studies and experiments on a variety of benchmark datasets demonstrate that BiGG-E best captures distributions over weighted graphs while remaining scalable and computationally efficient.
- [16] arXiv:2507.23115 [pdf, html, other]
-
Title: FLOSS: Federated Learning with Opt-Out and Straggler SupportComments: 5 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Previous work on data privacy in federated learning systems focuses on privacy-preserving operations for data from users who have agreed to share their data for training. However, modern data privacy agreements also empower users to use the system while opting out of sharing their data as desired. When combined with stragglers that arise from heterogeneous device capabilities, the result is missing data from a variety of sources that introduces bias and degrades model performance. In this paper, we present FLOSS, a system that mitigates the impacts of such missing data on federated learning in the presence of stragglers and user opt-out, and empirically demonstrate its performance in simulations.
- [17] arXiv:2507.23128 [pdf, html, other]
-
Title: Evaluating and Improving the Robustness of Speech Command Recognition Models to Noise and Distribution ShiftsComments: Submitted to ICASSP 2026Subjects: Machine Learning (cs.LG)
Although prior work in computer vision has shown strong correlations between in-distribution (ID) and out-of-distribution (OOD) accuracies, such relationships remain underexplored in audio-based models. In this study, we investigate how training conditions and input features affect the robustness and generalization abilities of spoken keyword classifiers under OOD conditions. We benchmark several neural architectures across a variety of evaluation sets. To quantify the impact of noise on generalization, we make use of two metrics: Fairness (F), which measures overall accuracy gains compared to a baseline model, and Robustness (R), which assesses the convergence between ID and OOD performance. Our results suggest that noise-aware training improves robustness in some configurations. These findings shed new light on the benefits and limitations of noise-based augmentation for generalization in speech models.
- [18] arXiv:2507.23136 [pdf, other]
-
Title: Observational MultiplicitySubjects: Machine Learning (cs.LG)
Many prediction tasks can admit multiple models that can perform almost equally well. This phenomenon can can undermine interpretability and safety when competing models assign conflicting predictions to individuals. In this work, we study how arbitrariness can arise in probabilistic classification tasks as a result of an effect that we call \emph{observational multiplicity}. We discuss how this effect arises in a broad class of practical applications where we learn a classifier to predict probabilities $p_i \in [0,1]$ but are given a dataset of observations $y_i \in \{0,1\}$. We propose to evaluate the arbitrariness of individual probability predictions through the lens of \emph{regret}. We introduce a measure of regret for probabilistic classification tasks, which measures how the predictions of a model could change as a result of different training labels change. We present a general-purpose method to estimate the regret in a probabilistic classification task. We use our measure to show that regret is higher for certain groups in the dataset and discuss potential applications of regret. We demonstrate how estimating regret promote safety in real-world applications by abstention and data collection.
- [19] arXiv:2507.23141 [pdf, other]
-
Title: AI paradigm for solving differential equations: first-principles data generation and scale-dilation operator AI solverSubjects: Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
Many problems are governed by differential equations (DEs). Artificial intelligence (AI) is a new path for solving DEs. However, data is very scarce and existing AI solvers struggle with approximation of high frequency components (AHFC). We propose an AI paradigm for solving diverse DEs, including DE-ruled first-principles data generation methodology and scale-dilation operator (SDO) AI solver. Using either prior knowledge or random fields, we generate solutions and then substitute them into the DEs to derive the sources and initial/boundary conditions through balancing DEs, thus producing arbitrarily vast amount of, first-principles-consistent training datasets at extremely low computational cost. We introduce a reversible SDO that leverages the Fourier transform of the multiscale solutions to fix AHFC, and design a spatiotemporally coupled, attention-based Transformer AI solver of DEs with SDO. An upper bound on the Hessian condition number of the loss function is proven to be proportional to the squared 2-norm of the solution gradient, revealing that SDO yields a smoother loss landscape, consequently fixing AHFC with efficient training. Extensive tests on diverse DEs demonstrate that our AI paradigm achieves consistently superior accuracy over state-of-the-art methods. This work makes AI solver of DEs to be truly usable in broad nature and engineering fields.
- [20] arXiv:2507.23154 [pdf, html, other]
-
Title: FuseTen: A Generative Model for Daily 10 m Land Surface Temperature Estimation from Spatio-Temporal Satellite ObservationsComments: Accepted in the 2025 International Conference on Machine Intelligence for GeoAnalytics and Remote Sensing (MIGARS)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Urban heatwaves, droughts, and land degradation are pressing and growing challenges in the context of climate change. A valuable approach to studying them requires accurate spatio-temporal information on land surface conditions. One of the most important variables for assessing and understanding these phenomena is Land Surface Temperature (LST), which is derived from satellites and provides essential information about the thermal state of the Earth's surface. However, satellite platforms inherently face a trade-off between spatial and temporal resolutions. To bridge this gap, we propose FuseTen, a novel generative framework that produces daily LST observations at a fine 10 m spatial resolution by fusing spatio-temporal observations derived from Sentinel-2, Landsat 8, and Terra MODIS. FuseTen employs a generative architecture trained using an averaging-based supervision strategy grounded in physical principles. It incorporates attention and normalization modules within the fusion process and uses a PatchGAN discriminator to enforce realism. Experiments across multiple dates show that FuseTen outperforms linear baselines, with an average 32.06% improvement in quantitative metrics and 31.42% in visual fidelity. To the best of our knowledge, this is the first non-linear method to generate daily LST estimates at such fine spatial resolution.
- [21] arXiv:2507.23170 [pdf, html, other]
-
Title: BAR Conjecture: the Feasibility of Inference Budget-Constrained LLM Services with Authenticity and ReasoningSubjects: Machine Learning (cs.LG)
When designing LLM services, practitioners care about three key properties: inference-time budget, factual authenticity, and reasoning capacity. However, our analysis shows that no model can simultaneously optimize for all three. We formally prove this trade-off and propose a principled framework named The BAR Theorem for LLM-application design.
- [22] arXiv:2507.23186 [pdf, html, other]
-
Title: NaN-Propagation: A Novel Method for Sparsity Detection in Black-Box Computational FunctionsSubjects: Machine Learning (cs.LG); Programming Languages (cs.PL)
Sparsity detection in black-box functions enables significant computational speedups in gradient-based optimization through Jacobian compression, but existing finite-difference methods suffer from false negatives due to coincidental zero gradients. These false negatives can silently corrupt gradient calculations, leading to difficult-to-diagnose errors. We introduce NaN-propagation, which exploits the universal contamination property of IEEE 754 Not-a-Number floating-point values to trace input-output dependencies through floating-point numerical computations. By systematically contaminating inputs with NaN and observing which outputs become NaN, the method reconstructs conservative sparsity patterns that eliminate false negatives. We demonstrate the approach on an aerospace wing weight model, achieving a 1.52x speedup while detecting dozens of dependencies missed by conventional methods -- a significant improvement since gradient computation is the bottleneck in many optimization workflows. The technique leverages IEEE 754 compliance to work across programming languages and math libraries without modifying existing black-box codes. Advanced strategies including NaN payload encoding enable faster-than-linear time complexity, improving upon existing black-box sparsity detection methods. Practical algorithms are also proposed to mitigate challenges from branching code execution common in engineering applications.
- [23] arXiv:2507.23217 [pdf, html, other]
-
Title: Zero-Shot Document Understanding using Pseudo Table of Contents-Guided Retrieval-Augmented GenerationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Understanding complex multimodal documents remains challenging due to their structural inconsistencies and limited training data availability. We introduce \textit{DocsRay}, a training-free document understanding system that integrates pseudo Table of Contents (TOC) generation with hierarchical Retrieval-Augmented Generation (RAG). Our approach leverages multimodal Large Language Models' (LLMs) native capabilities to seamlessly process documents containing diverse elements such as text, images, charts, and tables without requiring specialized models or additional training. DocsRay's framework synergistically combines three key techniques: (1) a semantic structuring module using prompt-based LLM interactions to generate a hierarchical pseudo-TOC, (2) zero-shot multimodal analysis that converts diverse document elements into unified, text-centric representations using the inherent capabilities of multimodal LLMs, and (3) an efficient two-stage hierarchical retrieval system that reduces retrieval complexity from $O(N)$ to $O(S + k_1 \cdot N_s)$. Evaluated on documents averaging 49.4 pages and 20,971 textual tokens, DocsRay reduced query latency from 3.89 to 2.12 seconds, achieving a 45% efficiency improvement. On the MMLongBench-Doc benchmark, DocsRay-Pro attains an accuracy of 64.7%, substantially surpassing previous state-of-the-art results.
- [24] arXiv:2507.23221 [pdf, html, other]
-
Title: A Single Direction of Truth: An Observer Model's Linear Residual Probe Exposes and Steers Contextual HallucinationsSubjects: Machine Learning (cs.LG)
Contextual hallucinations -- statements unsupported by given context -- remain a significant challenge in AI. We demonstrate a practical interpretability insight: a generator-agnostic observer model detects hallucinations via a single forward pass and a linear probe on its residual stream. This probe isolates a single, transferable linear direction separating hallucinated from faithful text, outperforming baselines by 5-27 points and showing robust mid-layer performance across Gemma-2 models (2B to 27B). Gradient-times-activation localises this signal to sparse, late-layer MLP activity. Critically, manipulating this direction causally steers generator hallucination rates, proving its actionability. Our results offer novel evidence of internal, low-dimensional hallucination tracking linked to specific MLP sub-circuits, exploitable for detection and mitigation. We release the 2000-example ContraTales benchmark for realistic assessment of such solutions.
- [25] arXiv:2507.23257 [pdf, html, other]
-
Title: Efficient Machine Unlearning via Influence ApproximationComments: 12 pages, 4 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Due to growing privacy concerns, machine unlearning, which aims at enabling machine learning models to ``forget" specific training data, has received increasing attention. Among existing methods, influence-based unlearning has emerged as a prominent approach due to its ability to estimate the impact of individual training samples on model parameters without retraining. However, this approach suffers from prohibitive computational overhead arising from the necessity to compute the Hessian matrix and its inverse across all training samples and parameters, rendering it impractical for large-scale models and scenarios involving frequent data deletion requests. This highlights the difficulty of forgetting. Inspired by cognitive science, which suggests that memorizing is easier than forgetting, this paper establishes a theoretical link between memorizing (incremental learning) and forgetting (unlearning). This connection allows machine unlearning to be addressed from the perspective of incremental learning. Unlike the time-consuming Hessian computations in unlearning (forgetting), incremental learning (memorizing) typically relies on more efficient gradient optimization, which supports the aforementioned cognitive theory. Based on this connection, we introduce the Influence Approximation Unlearning (IAU) algorithm for efficient machine unlearning from the incremental perspective. Extensive empirical evaluations demonstrate that IAU achieves a superior balance among removal guarantee, unlearning efficiency, and comparable model utility, while outperforming state-of-the-art methods across diverse datasets and model architectures. Our code is available at this https URL.
- [26] arXiv:2507.23261 [pdf, html, other]
-
Title: DynaSwarm: Dynamically Graph Structure Selection for LLM-based Multi-agent SystemSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Current multi-agent systems (MAS) frameworks often rely on manually designed and static collaboration graph structures, limiting adaptability and performance. To address these limitations, we propose DynaSwarm, a dynamic framework that enhances LLM-based MAS through two key innovations: (1) an actor-critic reinforcement learning (A2C) mechanism to optimize graph structures with improved stability over prior RL methods, and (2) a dynamic graph selector that adaptively chooses the optimal graph structure for each input sample via parameter-efficient LLM fine-tuning. DynaSwarm eliminates the need for rigid, one-fits-all graph architectures, instead leveraging sample-specific idiosyncrasies to dynamically route queries through specialized agent networks. (c) We propose to fine-tune the demonstration retriever to fully exploit the power of in-context learning (ICL). Extensive experiments on question answering, mathematical reasoning, and coding tasks demonstrate that DynaSwarm consistently outperforms state-of-the-art single-agent and MAS baselines across multiple LLM backbones. Our findings highlight the importance of sample-aware structural flexibility in LLM MAS designs.
- [27] arXiv:2507.23291 [pdf, html, other]
-
Title: Evaluating the Dynamics of Membership Privacy in Deep LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Membership inference attacks (MIAs) pose a critical threat to the privacy of training data in deep learning. Despite significant progress in attack methodologies, our understanding of when and how models encode membership information during training remains limited. This paper presents a dynamic analytical framework for dissecting and quantifying privacy leakage dynamics at the individual sample level. By tracking per-sample vulnerabilities on an FPR-TPR plane throughout training, our framework systematically measures how factors such as dataset complexity, model architecture, and optimizer choice influence the rate and severity at which samples become vulnerable. Crucially, we discover a robust correlation between a sample's intrinsic learning difficulty, and find that the privacy risk of samples highly vulnerable in the final trained model is largely determined early during training. Our results thus provide a deeper understanding of how privacy risks dynamically emerge during training, laying the groundwork for proactive, privacy-aware model training strategies.
- [28] arXiv:2507.23292 [pdf, html, other]
-
Title: SequenceLayers: Sequence Processing and Streaming Neural Networks Made EasyRJ Skerry-Ryan, Julian Salazar, Soroosh Mariooryad, David Kao, Daisy Stanton, Eric Battenberg, Matt Shannon, Ron J. Weiss, Robin Scheibler, Jonas Rothfuss, Tom BagbySubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Programming Languages (cs.PL); Software Engineering (cs.SE); Audio and Speech Processing (eess.AS)
We introduce a neural network layer API and library for sequence modeling, designed for easy creation of sequence models that can be executed both layer-by-layer (e.g., teacher-forced training) and step-by-step (e.g., autoregressive sampling). To achieve this, layers define an explicit representation of their state over time (e.g., a Transformer KV cache, a convolution buffer, an RNN hidden state), and a step method that evolves that state, tested to give identical results to a stateless layer-wise invocation. This and other aspects of the SequenceLayers contract enables complex models to be immediately streamable, mitigates a wide range of common bugs arising in both streaming and parallel sequence processing, and can be implemented in any deep learning library. A composable and declarative API, along with a comprehensive suite of layers and combinators, streamlines the construction of production-scale models from simple streamable components while preserving strong correctness guarantees. Our current implementations of SequenceLayers (JAX, TensorFlow 2) are available at this https URL.
- [29] arXiv:2507.23303 [pdf, html, other]
-
Title: An Interpretable Data-Driven Unsupervised Approach for the Prevention of Forgotten ItemsLuca Corbucci, Javier Alejandro Borges Legrottaglie, Francesco Spinnato, Anna Monreale, Riccardo GuidottiSubjects: Machine Learning (cs.LG)
Accurately identifying items forgotten during a supermarket visit and providing clear, interpretable explanations for recommending them remains an underexplored problem within the Next Basket Prediction (NBP) domain. Existing NBP approaches typically only focus on forecasting future purchases, without explicitly addressing the detection of unintentionally omitted items. This gap is partly due to the scarcity of real-world datasets that allow for the reliable estimation of forgotten items. Furthermore, most current NBP methods rely on black-box models, which lack transparency and limit the ability to justify recommendations to end users. In this paper, we formally introduce the forgotten item prediction task and propose two novel interpretable-by-design algorithms. These methods are tailored to identify forgotten items while offering intuitive, human-understandable explanations. Experiments on a real-world retail dataset show our algorithms outperform state-of-the-art NBP baselines by 10-15% across multiple evaluation metrics.
- [30] arXiv:2507.23317 [pdf, other]
-
Title: Good Learners Think Their Thinking: Generative PRM Makes Large Reasoning Model More Efficient Math LearnerComments: 33 pages, 3 figures, 19 tablesSubjects: Machine Learning (cs.LG)
Large reasoning models (LRMs) have recently shown promise in solving complex math problems when optimized with Reinforcement Learning (RL). But conventional approaches rely on outcome-only rewards that provide sparse feedback, resulting in inefficient optimization process. In this work, we investigate the function of process reward models (PRMs) to accelerate the RL training for LRMs. We propose a novel intrinsic signal-driven generative process evaluation mechanism operating at the thought level to address major bottlenecks in RL-based training. Specifically, instead of requiring PRMs to know how to solve problems, our method uses intrinsic signals in solutions to judge stepwise correctness and aggregate contiguous correct/incorrect steps into coherent 'thought' units. This structured, thought-level rewards enable more reliable credit assignment by reducing ambiguity in step segmentation and alleviating reward hacking. We further introduce a capability-adaptive reward mechanism that dynamically balances exploration and exploitation based on the LRM's current proficiency, guiding learning without stifling creative trial-and-error. These innovations are integrated into a new off-policy RL algorithm, TP-GRPO, which extends grouped proximal optimization with process-based rewards and improves training efficiency. Experiments on 1.5B and 7B parameter LRMs demonstrate that our method achieves higher problem-solving accuracy with significantly fewer training samples than outcome-only reward baselines. The results validate that well-structured process rewards can substantially accelerate LRM optimization in math reasoning tasks. Code is available at this https URL.
- [31] arXiv:2507.23335 [pdf, html, other]
-
Title: Scalable and Precise Patch Robustness Certification for Deep Learning Models with Top-k PredictionsComments: accepted by QRS 2025Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)
Patch robustness certification is an emerging verification approach for defending against adversarial patch attacks with provable guarantees for deep learning systems. Certified recovery techniques guarantee the prediction of the sole true label of a certified sample. However, existing techniques, if applicable to top-k predictions, commonly conduct pairwise comparisons on those votes between labels, failing to certify the sole true label within the top k prediction labels precisely due to the inflation on the number of votes controlled by the attacker (i.e., attack budget); yet enumerating all combinations of vote allocation suffers from the combinatorial explosion problem. We propose CostCert, a novel, scalable, and precise voting-based certified recovery defender. CostCert verifies the true label of a sample within the top k predictions without pairwise comparisons and combinatorial explosion through a novel design: whether the attack budget on the sample is infeasible to cover the smallest total additional votes on top of the votes uncontrollable by the attacker to exclude the true labels from the top k prediction labels. Experiments show that CostCert significantly outperforms the current state-of-the-art defender PatchGuard, such as retaining up to 57.3% in certified accuracy when the patch size is 96, whereas PatchGuard has already dropped to zero.
- [32] arXiv:2507.23344 [pdf, html, other]
-
Title: Designing Dynamic Pricing for Bike-sharing Systems via Differentiable Agent-based SimulationSubjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Bike-sharing systems are emerging in various cities as a new ecofriendly transportation system. In these systems, spatiotemporally varying user demands lead to imbalanced inventory at bicycle stations, resulting in additional relocation costs. Therefore, it is essential to manage user demand through optimal dynamic pricing for the system. However, optimal pricing design for such a system is challenging because the system involves users with diverse backgrounds and their probabilistic choices. To address this problem, we develop a differentiable agent-based simulation to rapidly design dynamic pricing in bike-sharing systems, achieving balanced bicycle inventory despite spatiotemporally heterogeneous trips and probabilistic user decisions. We first validate our approach against conventional methods through numerical experiments involving 25 bicycle stations and five time slots, yielding 100 parameters. Compared to the conventional methods, our approach obtains a more accurate solution with a 73% to 78% reduction in loss while achieving more than a 100-fold increase in convergence speed. We further validate our approach on a large-scale urban bike-sharing system scenario involving 289 bicycle stations, resulting in a total of 1156 parameters. Through simulations using the obtained pricing policies, we confirm that these policies can naturally induce balanced inventory without any manual relocation. Additionally, we find that the cost of discounts to induce the balanced inventory can be minimized by setting appropriate initial conditions.
- [33] arXiv:2507.23389 [pdf, html, other]
-
Title: Causal Explanation of Concept Drift -- A Truly Actionable ApproachComments: This manuscript is accepted to be presented at the TempXAI workshop at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD 2025)Subjects: Machine Learning (cs.LG)
In a world that constantly changes, it is crucial to understand how those changes impact different systems, such as industrial manufacturing or critical infrastructure. Explaining critical changes, referred to as concept drift in the field of machine learning, is the first step towards enabling targeted interventions to avoid or correct model failures, as well as malfunctions and errors in the physical world. Therefore, in this work, we extend model-based drift explanations towards causal explanations, which increases the actionability of the provided explanations. We evaluate our explanation strategy on a number of use cases, demonstrating the practical usefulness of our framework, which isolates the causally relevant features impacted by concept drift and, thus, allows for targeted intervention.
- [34] arXiv:2507.23391 [pdf, html, other]
-
Title: Policy Learning from Large Vision-Language Model Feedback without Reward ModelingComments: Accepted to IROS 2025Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
Offline reinforcement learning (RL) provides a powerful framework for training robotic agents using pre-collected, suboptimal datasets, eliminating the need for costly, time-consuming, and potentially hazardous online interactions. This is particularly useful in safety-critical real-world applications, where online data collection is expensive and impractical. However, existing offline RL algorithms typically require reward labeled data, which introduces an additional bottleneck: reward function design is itself costly, labor-intensive, and requires significant domain expertise. In this paper, we introduce PLARE, a novel approach that leverages large vision-language models (VLMs) to provide guidance signals for agent training. Instead of relying on manually designed reward functions, PLARE queries a VLM for preference labels on pairs of visual trajectory segments based on a language task description. The policy is then trained directly from these preference labels using a supervised contrastive preference learning objective, bypassing the need to learn explicit reward models. Through extensive experiments on robotic manipulation tasks from the MetaWorld, PLARE achieves performance on par with or surpassing existing state-of-the-art VLM-based reward generation methods. Furthermore, we demonstrate the effectiveness of PLARE in real-world manipulation tasks with a physical robot, further validating its practical applicability.
- [35] arXiv:2507.23412 [pdf, other]
-
Title: A Machine Learning Approach for Honey Adulteration Detection using Mineral Element ProfilesSubjects: Machine Learning (cs.LG)
This paper aims to develop a Machine Learning (ML)-based system for detecting honey adulteration utilizing honey mineral element profiles. The proposed system comprises two phases: preprocessing and classification. The preprocessing phase involves the treatment of missing-value attributes and normalization. In the classifica-tion phase, we use three supervised ML models: logistic regression, decision tree, and random forest, to dis-criminate between authentic and adulterated honey. To evaluate the performance of the ML models, we use a public dataset comprising measurements of mineral element content of authentic honey, sugar syrups, and adul-terated honey. Experimental findings show that mineral element content in honey provides robust discriminative information for detecting honey adulteration. Results also demonstrate that the random forest-based classifier outperforms other classifiers on this dataset, achieving the highest cross-validation accuracy of 98.37%.
- [36] arXiv:2507.23418 [pdf, other]
-
Title: Detection of Adulteration in Coconut Milk using Infrared Spectroscopy and Machine LearningSubjects: Machine Learning (cs.LG)
In this paper, we propose a system for detecting adulteration in coconut milk, utilizing infrared spectroscopy. The machine learning-based proposed system comprises three phases: preprocessing, feature extraction, and classification. The first phase involves removing irrelevant data from coconut milk spectral signals. In the second phase, we employ the Linear Discriminant Analysis (LDA) algorithm for extracting the most discriminating features. In the third phase, we use the K-Nearest Neighbor (KNN) model to classify coconut milk samples into authentic or adulterated. We evaluate the performance of the proposed system using a public dataset comprising Fourier Transform Infrared (FTIR) spectral information of pure and contaminated coconut milk samples. Findings show that the proposed method successfully detects adulteration with a cross-validation accuracy of 93.33%.
- [37] arXiv:2507.23428 [pdf, html, other]
-
Title: Merging Memory and Space: A Spatiotemporal State Space Neural OperatorSubjects: Machine Learning (cs.LG)
We propose the Spatiotemporal State Space Neural Operator (ST-SSM), a compact architecture for learning solution operators of time-dependent partial differential equations (PDEs). ST-SSM introduces a novel factorization of the spatial and temporal dimensions, using structured state-space models to independently model temporal evolution and spatial interactions. This design enables parameter efficiency and flexible modeling of long-range spatiotemporal dynamics. A theoretical connection is established between SSMs and neural operators, and a unified universality theorem is proved for the resulting class of architectures. Empirically, we demonstrate that our factorized formulation outperforms alternative schemes such as zigzag scanning and parallel independent processing on several PDE benchmarks, including 1D Burgers' equation, 1D Kuramoto-Sivashinsky equation, and 2D Navier-Stokes equations under varying physical conditions. Our model performs competitively with existing baselines while using significantly fewer parameters. In addition, our results reinforce previous findings on the benefits of temporal memory by showing improved performance under partial observability. Our results highlight the advantages of dimensionally factorized operator learning for efficient and generalizable PDE modeling, and put this approach on a firm theoretical footing.
- [38] arXiv:2507.23437 [pdf, html, other]
-
Title: Coflex: Enhancing HW-NAS with Sparse Gaussian Processes for Efficient and Scalable DNN Accelerator DesignComments: Accepted to ICCAD 2025 (camera-ready); 9 pages, 5 figuresSubjects: Machine Learning (cs.LG)
Hardware-Aware Neural Architecture Search (HW-NAS) is an efficient approach to automatically co-optimizing neural network performance and hardware energy efficiency, making it particularly useful for the development of Deep Neural Network accelerators on the edge. However, the extensive search space and high computational cost pose significant challenges to its practical adoption. To address these limitations, we propose Coflex, a novel HW-NAS framework that integrates the Sparse Gaussian Process (SGP) with multi-objective Bayesian optimization. By leveraging sparse inducing points, Coflex reduces the GP kernel complexity from cubic to near-linear with respect to the number of training samples, without compromising optimization performance. This enables scalable approximation of large-scale search space, substantially decreasing computational overhead while preserving high predictive accuracy. We evaluate the efficacy of Coflex across various benchmarks, focusing on accelerator-specific architecture. Our experi- mental results show that Coflex outperforms state-of-the-art methods in terms of network accuracy and Energy-Delay-Product, while achieving a computational speed-up ranging from 1.9x to 9.5x.
- [39] arXiv:2507.23449 [pdf, html, other]
-
Title: Manifold-regularised Signature Kernel Large-Margin $\ell_p$-SVDD for Multidimensional Time Series Anomaly DetectionSubjects: Machine Learning (cs.LG)
We generalise the recently introduced large-margin $\ell_p$-SVDD approach to exploit the geometry of data distribution via manifold regularising and a signature kernel representation for time series anomaly detection. Specifically, we formulate a manifold-regularised variant of the $\ell_p$-SVDD method to encourage label smoothness on the underlying manifold to capture structural information for improved detection performance. Drawing on an existing Representer theorem, we then provide an effective optimisation technique for the proposed method and show that it can benefit from the signature kernel to capture time series complexities for anomaly detection.
We theoretically study the proposed approach using Rademacher complexities to analyse its generalisation performance and also provide an experimental assessment of the proposed method across various data sets to compare its performance against other methods. - [40] arXiv:2507.23491 [pdf, other]
-
Title: Explainable artificial intelligence model predicting the risk of all-cause mortality in patients with type 2 diabetes mellitusOlga Vershinina, Jacopo Sabbatinelli, Anna Rita Bonfigli, Dalila Colombaretti, Angelica Giuliani, Mikhail Krivonosov, Arseniy Trukhanov, Claudio Franceschi, Mikhail Ivanchenko, Fabiola OlivieriSubjects: Machine Learning (cs.LG)
Objective. Type 2 diabetes mellitus (T2DM) is a highly prevalent non-communicable chronic disease that substantially reduces life expectancy. Accurate estimation of all-cause mortality risk in T2DM patients is crucial for personalizing and optimizing treatment strategies. Research Design and Methods. This study analyzed a cohort of 554 patients (aged 40-87 years) with diagnosed T2DM over a maximum follow-up period of 16.8 years, during which 202 patients (36%) died. Key survival-associated features were identified, and multiple machine learning (ML) models were trained and validated to predict all-cause mortality risk. To improve model interpretability, Shapley additive explanations (SHAP) was applied to the best-performing model. Results. The extra survival trees (EST) model, incorporating ten key features, demonstrated the best predictive performance. The model achieved a C-statistic of 0.776, with the area under the receiver operating characteristic curve (AUC) values of 0.86, 0.80, 0.841, and 0.826 for 5-, 10-, 15-, and 16.8-year all-cause mortality predictions, respectively. The SHAP approach was employed to interpret the model's individual decision-making processes. Conclusions. The developed model exhibited strong predictive performance for mortality risk assessment. Its clinically interpretable outputs enable potential bedside application, improving the identification of high-risk patients and supporting timely treatment optimization.
- [41] arXiv:2507.23495 [pdf, html, other]
-
Title: Incorporating structural uncertainty in causal decision makingComments: This work is under review at the Journal of Causal InferenceSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST)
Practitioners making decisions based on causal effects typically ignore structural uncertainty. We analyze when this uncertainty is consequential enough to warrant methodological solutions (Bayesian model averaging over competing causal structures). Focusing on bivariate relationships ($X \rightarrow Y$ vs. $X \leftarrow Y$), we establish that model averaging is beneficial when: (1) structural uncertainty is moderate to high, (2) causal effects differ substantially between structures, and (3) loss functions are sufficiently sensitive to the size of the causal effect. We prove optimality results of our suggested methodological solution under regularity conditions and demonstrate through simulations that modern causal discovery methods can provide, within limits, the necessary quantification. Our framework complements existing robust causal inference approaches by addressing a distinct source of uncertainty typically overlooked in practice.
- [42] arXiv:2507.23501 [pdf, html, other]
-
Title: Directional Ensemble Aggregation for Actor-CriticsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Off-policy reinforcement learning in continuous control tasks depends critically on accurate $Q$-value estimates. Conservative aggregation over ensembles, such as taking the minimum, is commonly used to mitigate overestimation bias. However, these static rules are coarse, discard valuable information from the ensemble, and cannot adapt to task-specific needs or different learning regimes. We propose Directional Ensemble Aggregation (DEA), an aggregation method that adaptively combines $Q$-value estimates in actor-critic frameworks. DEA introduces two fully learnable directional parameters: one that modulates critic-side conservatism and another that guides actor-side policy exploration. Both parameters are learned using ensemble disagreement-weighted Bellman errors, which weight each sample solely by the direction of its Bellman error. This directional learning mechanism allows DEA to adjust conservatism and exploration in a data-driven way, adapting aggregation to both uncertainty levels and the phase of training. We evaluate DEA across continuous control benchmarks and learning regimes - from interactive to sample-efficient - and demonstrate its effectiveness over static ensemble strategies.
- [43] arXiv:2507.23504 [pdf, html, other]
-
Title: A Verifier HierarchyComments: This paper is primarily relevant to cs.CC, but submitted under this http URL due to lack of endorsement. The paper is under review at "Information and Communication"Subjects: Machine Learning (cs.LG)
We investigate the trade-off between certificate length and verifier runtime. We prove a Verifier Trade-off Theorem showing that reducing the inherent verification time of a language from \(f(n)\) to \(g(n)\), where \(f(n) \ge g(n)\), requires certificates of length at least \(\Omega(\log(f(n) / g(n)))\). This theorem induces a natural hierarchy based on certificate complexity. We demonstrate its applicability to analyzing conjectured separations between complexity classes (e.g., \(\np\) and \(\exptime\)) and to studying natural problems such as string periodicity and rotation detection. Additionally, we provide perspectives on the \(\p\) vs. \(\np\) problem by relating it to the existence of sub-linear certificates.
- [44] arXiv:2507.23512 [pdf, html, other]
-
Title: Differentially Private Clipped-SGD: High-Probability Convergence with Arbitrary Clipping LevelComments: 60 pagesSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Gradient clipping is a fundamental tool in Deep Learning, improving the high-probability convergence of stochastic first-order methods like SGD, AdaGrad, and Adam under heavy-tailed noise, which is common in training large language models. It is also a crucial component of Differential Privacy (DP) mechanisms. However, existing high-probability convergence analyses typically require the clipping threshold to increase with the number of optimization steps, which is incompatible with standard DP mechanisms like the Gaussian mechanism. In this work, we close this gap by providing the first high-probability convergence analysis for DP-Clipped-SGD with a fixed clipping level, applicable to both convex and non-convex smooth optimization under heavy-tailed noise, characterized by a bounded central $\alpha$-th moment assumption, $\alpha \in (1,2]$. Our results show that, with a fixed clipping level, the method converges to a neighborhood of the optimal solution with a faster rate than the existing ones. The neighborhood can be balanced against the noise introduced by DP, providing a refined trade-off between convergence speed and privacy guarantees.
- [45] arXiv:2507.23534 [pdf, html, other]
-
Title: Continual Learning with Synthetic Boundary Experience BlendingSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Continual learning (CL) aims to address catastrophic forgetting in models trained sequentially on multiple tasks. While experience replay has shown promise, its effectiveness is often limited by the sparse distribution of stored key samples, leading to overly simplified decision boundaries. We hypothesize that introducing synthetic data near the decision boundary (Synthetic Boundary Data, or SBD) during training serves as an implicit regularizer, improving boundary stability and mitigating forgetting. To validate this hypothesis, we propose a novel training framework, {\bf Experience Blending}, which integrates knowledge from both stored key samples and synthetic, boundary-adjacent data. Experience blending consists of two core components: (1) a multivariate Differential Privacy (DP) noise mechanism that injects batch-wise noise into low-dimensional feature representations, generating SBD; and (2) an end-to-end training strategy that jointly leverages both stored key samples and SBD. Extensive experiments on CIFAR-10, CIFAR-100, and Tiny ImageNet demonstrate that our method outperforms nine CL baselines, achieving accuracy improvements of 10%, 6%, and 13%, respectively.
- [46] arXiv:2507.23535 [pdf, html, other]
-
Title: Transparent AI: The Case for Interpretability and ExplainabilitySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
As artificial intelligence systems increasingly inform high-stakes decisions across sectors, transparency has become foundational to responsible and trustworthy AI implementation. Leveraging our role as a leading institute in advancing AI research and enabling industry adoption, we present key insights and lessons learned from practical interpretability applications across diverse domains. This paper offers actionable strategies and implementation guidance tailored to organizations at varying stages of AI maturity, emphasizing the integration of interpretability as a core design principle rather than a retrospective add-on.
- [47] arXiv:2507.23536 [pdf, html, other]
-
Title: From LLMs to Edge: Parameter-Efficient Fine-Tuning on Edge DevicesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Parameter-efficient fine-tuning (PEFT) methods reduce the computational costs of updating deep learning models by minimizing the number of additional parameters used to adapt a model to a down- stream task. While extensively researched in large language models (LLMs), their application to smaller models used on edge devices, such as convolutional neural networks, remains underexplored. This paper benchmarks and analyzes popular PEFT methods on convolutional architectures typically deployed in resource-constrained edge environments. We evaluate LoRA, DoRA, and GaLore for updating standard and depthwise convolutional architectures to handle distribution shifts and accommodate unseen classes. We utilize recently proposed PyTorch profilers to compare the updated model performance and computational costs of these PEFT methods with traditional fine-tuning approaches. With resource efficiency in mind, we investigate their update behavior across different rank dimensions. We find that the evaluated PEFT methods are only half as memory-efficient when applied to depthwise-separable convolution architectures, compared to their efficiency with LLMs. Conversely, when targeting convolu- tional architectures optimized for edge deployment, adapter-based PEFT methods can reduce floating point operations (FLOPs) during model updates by up to 95%. These insights offer valuable guidance for selecting PEFT methods based on hardware constraints, performance requirements, and application needs. Our code is online.
- [48] arXiv:2507.23539 [pdf, html, other]
-
Title: Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity AssumptionsComments: Published in ICLR 2025Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
Motivated by the problem of fast processing of attention matrices, we study fast algorithms for computing matrix-vector products for asymmetric Gaussian Kernel matrices $K\in \mathbb{R}^{n\times n}$. $K$'s columns are indexed by a set of $n$ keys $k_1,k_2\ldots, k_n\in \mathbb{R}^d$, rows by a set of $n$ queries $q_1,q_2,\ldots,q_n\in \mathbb{R}^d $, and its $i,j$ entry is $K_{ij} = e^{-\|q_i-k_j\|_2^2/2\sigma^2}$ for some bandwidth parameter $\sigma>0$. Given a vector $x\in \mathbb{R}^n$ and error parameter $\epsilon>0$, our task is to output a $y\in \mathbb{R}^n$ such that $\|Kx-y\|_2\leq \epsilon \|x\|_2$ in time subquadratic in $n$ and linear in $d$. Our algorithms rely on the following modelling assumption about the matrices $K$: the sum of the entries of $K$ scales linearly in $n$, as opposed to worst case quadratic growth. We validate this assumption experimentally, for Gaussian kernel matrices encountered in various settings such as fast attention computation in LLMs. We obtain the first subquadratic-time algorithm that works under this assumption, for unrestricted vectors.
- [49] arXiv:2507.23562 [pdf, html, other]
-
Title: Hardware-Aware Fine-Tuning of Spiking Q-Networks on the SpiNNaker2 Neuromorphic PlatformComments: 8 pages, 5 figures, 3 tablesJournal-ref: ACM ICONS 2025 - International Conference on Neuromorphic SystemsSubjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR)
Spiking Neural Networks (SNNs) promise orders-of-magnitude lower power consumption and low-latency inference on neuromorphic hardware for a wide range of robotic tasks. In this work, we present an energy-efficient implementation of a reinforcement learning (RL) algorithm using quantized SNNs to solve two classical control tasks. The network is trained using the Q-learning algorithm, then fine-tuned and quantized to low-bit (8-bit) precision for embedded deployment on the SpiNNaker2 neuromorphic chip. To evaluate the comparative advantage of SpiNNaker2 over conventional computing platforms, we analyze inference latency, dynamic power consumption, and energy cost per inference for our SNN models, comparing performance against a GTX 1650 GPU baseline. Our results demonstrate SpiNNaker2's strong potential for scalable, low-energy neuromorphic computing, achieving up to 32x reduction in energy consumption. Inference latency remains on par with GPU-based execution, with improvements observed in certain task settings, reinforcing SpiNNaker2's viability for real-time neuromorphic control and making the neuromorphic approach a compelling direction for efficient deep Q-learning.
- [50] arXiv:2507.23568 [pdf, html, other]
-
Title: Optimised Feature Subset Selection via Simulated AnnealingFernando Martínez-García, Álvaro Rubio-García, Samuel Fernández-Lorenzo, Juan José García-Ripoll, Diego PorrasComments: 12 pages, 2 figuresSubjects: Machine Learning (cs.LG); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (stat.ML)
We introduce SA-FDR, a novel algorithm for $\ell_0$-norm feature selection that considers this task as a combinatorial optimisation problem and solves it by using simulated annealing to perform a global search over the space of feature subsets. The optimisation is guided by the Fisher discriminant ratio, which we use as a computationally efficient proxy for model quality in classification tasks. Our experiments, conducted on datasets with up to hundreds of thousands of samples and hundreds of features, demonstrate that SA-FDR consistently selects more compact feature subsets while achieving a high predictive accuracy. This ability to recover informative yet minimal sets of features stems from its capacity to capture inter-feature dependencies often missed by greedy optimisation approaches. As a result, SA-FDR provides a flexible and effective solution for designing interpretable models in high-dimensional settings, particularly when model sparsity, interpretability, and performance are crucial.
- [51] arXiv:2507.23581 [pdf, html, other]
-
Title: GraphRAG-R1: Graph Retrieval-Augmented Generation with Process-Constrained Reinforcement LearningChuanyue Yu, Kuo Zhao, Yuhan Li, Heng Chang, Mingjian Feng, Xiangzhe Jiang, Yufei Sun, Jia Li, Yuzhi Zhang, Jianxin Li, Ziwei ZhangSubjects: Machine Learning (cs.LG)
Graph Retrieval-Augmented Generation (GraphRAG) has shown great effectiveness in enhancing the reasoning abilities of LLMs by leveraging graph structures for knowledge representation and modeling complex real-world relationships. However, existing GraphRAG methods still face significant bottlenecks when handling complex problems that require multi-hop reasoning, as their query and retrieval phases are largely based on pre-defined heuristics and do not fully utilize the reasoning potentials of LLMs. To address this problem, we propose GraphRAG-R1, an adaptive GraphRAG framework by training LLMs with process-constrained outcome-based reinforcement learning (RL) to enhance the multi-hop reasoning ability. Our method can decompose complex problems, autonomously invoke retrieval tools to acquire necessary information, and perform effective reasoning. Specifically, we utilize a modified version of Group Relative Policy Optimization (GRPO) that supports rollout-with-thinking capability. Next, we design two process-constrained reward functions. To handle the shallow retrieval problem, we design a Progressive Retrieval Attenuation (PRA) reward to encourage essential retrievals. Then, to handle the over-thinking problem, we design Cost-Aware F1 (CAF) reward to balance the model performance with computational costs. We further design a phase-dependent training strategy, containing three training stages corresponding to cold start and these two rewards. Lastly, our method adopts a hybrid graph-textual retrieval to improve the reasoning capacity. Extensive experimental results demonstrate that GraphRAG-R1 boosts LLM capabilities in solving complex reasoning problems compared to state-of-the-art GraphRAG methods on both in-domain and out-of-domain datasets. Furthermore, our framework can be flexibly integrated with various existing retrieval methods, consistently delivering performance improvements.
- [52] arXiv:2507.23600 [pdf, html, other]
-
Title: EB-gMCR: Energy-Based Generative Modeling for Signal Unmixing and Multivariate Curve ResolutionSubjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
Signal unmixing analysis decomposes data into basic patterns and is widely applied in chemical and biological research. Multivariate curve resolution (MCR), a branch of signal unmixing, separates mixed chemical signals into base patterns (components) and their concentrations, playing a key role in understanding composition. Classical MCR is typically framed as matrix factorization (MF) and requires a user-specified component count, usually unknown in real data. As dataset size or component count increases, the scalability and reliability of MF-based MCR face significant challenges. This study reformulates MCR as a generative process (gMCR), and introduces an energy-based deep learning solver, EB-gMCR, that automatically discovers the smallest component set able to reconstruct the data faithfully. EB-gMCR starts from a large candidate pool (e.g., 1024 spectra) and employs a differentiable gating network to retain only active components while estimating their concentrations. On noisy synthetic datasets containing up to 256 latent sources, EB-gMCR maintained R^2 >= 0.98 and recovered the component count within 5% of the ground truth; at lower noise it achieved R^2 >= 0.99 with near exact component estimation. Additional chemical priors, such as non-negativity or nonlinear mixing, enter as simple plug-in functions, enabling adaptation to other instruments or domains without altering the core learning process. By uniting high-capacity generative modeling and hard component selection, EB-gMCR offers a practical route to large-scale signal unmixing analysis, including chemical library-driven scenarios. The source code is available at this https URL.
- [53] arXiv:2507.23604 [pdf, html, other]
-
Title: Hierarchical Message-Passing Policies for Multi-Agent Reinforcement LearningSubjects: Machine Learning (cs.LG)
Decentralized Multi-Agent Reinforcement Learning (MARL) methods allow for learning scalable multi-agent policies, but suffer from partial observability and induced non-stationarity. These challenges can be addressed by introducing mechanisms that facilitate coordination and high-level planning. Specifically, coordination and temporal abstraction can be achieved through communication (e.g., message passing) and Hierarchical Reinforcement Learning (HRL) approaches to decision-making. However, optimization issues limit the applicability of hierarchical policies to multi-agent systems. As such, the combination of these approaches has not been fully explored. To fill this void, we propose a novel and effective methodology for learning multi-agent hierarchies of message-passing policies. We adopt the feudal HRL framework and rely on a hierarchical graph structure for planning and coordination among agents. Agents at lower levels in the hierarchy receive goals from the upper levels and exchange messages with neighboring agents at the same level. To learn hierarchical multi-agent policies, we design a novel reward-assignment method based on training the lower-level policies to maximize the advantage function associated with the upper levels. Results on relevant benchmarks show that our method performs favorably compared to the state of the art.
- [54] arXiv:2507.23607 [pdf, html, other]
-
Title: Deep Learning-based Prediction of Clinical Trial Enrollment with Uncertainty EstimatesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Clinical trials are a systematic endeavor to assess the safety and efficacy of new drugs or treatments. Conducting such trials typically demands significant financial investment and meticulous planning, highlighting the need for accurate predictions of trial outcomes. Accurately predicting patient enrollment, a key factor in trial success, is one of the primary challenges during the planning phase. In this work, we propose a novel deep learning-based method to address this critical challenge. Our method, implemented as a neural network model, leverages pre-trained language models (PLMs) to capture the complexities and nuances of clinical documents, transforming them into expressive representations. These representations are then combined with encoded tabular features via an attention mechanism. To account for uncertainties in enrollment prediction, we enhance the model with a probabilistic layer based on the Gamma distribution, which enables range estimation. We apply the proposed model to predict clinical trial duration, assuming site-level enrollment follows a Poisson-Gamma process. We carry out extensive experiments on real-world clinical trial data, and show that the proposed method can effectively predict the number of patients enrolled at a number of sites for a given clinical trial, outperforming established baseline models.
- [55] arXiv:2507.23615 [pdf, html, other]
-
Title: L-GTA: Latent Generative Modeling for Time Series AugmentationJournal-ref: Machine Learning in Time Series (MILETS) Workshop at the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 3-7, 2025, Toronto, ON, CanadaSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Data augmentation is gaining importance across various aspects of time series analysis, from forecasting to classification and anomaly detection tasks. We introduce the Latent Generative Transformer Augmentation (L-GTA) model, a generative approach using a transformer-based variational recurrent autoencoder. This model uses controlled transformations within the latent space of the model to generate new time series that preserve the intrinsic properties of the original dataset. L-GTA enables the application of diverse transformations, ranging from simple jittering to magnitude warping, and combining these basic transformations to generate more complex synthetic time series datasets. Our evaluation of several real-world datasets demonstrates the ability of L-GTA to produce more reliable, consistent, and controllable augmented data. This translates into significant improvements in predictive accuracy and similarity measures compared to direct transformation methods.
- [56] arXiv:2507.23632 [pdf, html, other]
-
Title: On the Expressiveness of Softmax Attention: A Recurrent Neural Network PerspectiveSubjects: Machine Learning (cs.LG)
Since its introduction, softmax attention has become the backbone of modern transformer architectures due to its expressiveness and scalability across a wide range of tasks. However, the main drawback of softmax attention is the quadratic memory requirement and computational complexity with respect to the sequence length. By replacing the softmax nonlinearity, linear attention and similar methods have been introduced to avoid the quadratic bottleneck of softmax attention. Despite these linear forms of attention being derived from the original softmax formulation, they typically lag in terms of downstream accuracy. While strong intuition of the softmax nonlinearity on the query and key inner product suggests that it has desirable properties compared to other nonlinearities, the question of why this discrepancy exists still remains unanswered. This work demonstrates that linear attention is an approximation of softmax attention by deriving the recurrent form of softmax attention. Using this form, each part of softmax attention can be described in the language of recurrent neural networks (RNNs). Describing softmax attention as an RNN allows for the ablation of the components of softmax attention to understand the importance of each part and how they interact. In this way, our work helps explain why softmax attention is more expressive than its counterparts.
- [57] arXiv:2507.23638 [pdf, html, other]
-
Title: OptiGradTrust: Byzantine-Robust Federated Learning with Multi-Feature Gradient Analysis and Reinforcement Learning-Based Trust WeightingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Federated Learning (FL) enables collaborative model training across distributed medical institutions while preserving patient privacy, but remains vulnerable to Byzantine attacks and statistical heterogeneity. We present OptiGradTrust, a comprehensive defense framework that evaluates gradient updates through a novel six-dimensional fingerprint including VAE reconstruction error, cosine similarity metrics, $L_2$ norm, sign-consistency ratio, and Monte Carlo Shapley value, which drive a hybrid RL-attention module for adaptive trust scoring. To address convergence challenges under data heterogeneity, we develop FedBN-Prox (FedBN-P), combining Federated Batch Normalization with proximal regularization for optimal accuracy-convergence trade-offs. Extensive evaluation across MNIST, CIFAR-10, and Alzheimer's MRI datasets under various Byzantine attack scenarios demonstrates significant improvements over state-of-the-art defenses, achieving up to +1.6 percentage points over FLGuard under non-IID conditions while maintaining robust performance against diverse attack patterns through our adaptive learning approach.
- [58] arXiv:2507.23665 [pdf, html, other]
-
Title: SHAP-Guided Regularization in Machine Learning ModelsSubjects: Machine Learning (cs.LG)
Feature attribution methods such as SHapley Additive exPlanations (SHAP) have become instrumental in understanding machine learning models, but their role in guiding model optimization remains underexplored. In this paper, we propose a SHAP-guided regularization framework that incorporates feature importance constraints into model training to enhance both predictive performance and interpretability. Our approach applies entropy-based penalties to encourage sparse, concentrated feature attributions while promoting stability across samples. The framework is applicable to both regression and classification tasks. Our first exploration started with investigating a tree-based model regularization using TreeSHAP. Through extensive experiments on benchmark regression and classification datasets, we demonstrate that our method improves generalization performance while ensuring robust and interpretable feature attributions. The proposed technique offers a novel, explainability-driven regularization approach, making machine learning models both more accurate and more reliable.
- [59] arXiv:2507.23674 [pdf, html, other]
-
Title: TweakLLM: A Routing Architecture for Dynamic Tailoring of Cached ResponsesMuhammad Taha Cheema, Abeer Aamir, Khawaja Gul Muhammad, Naveed Anwar Bhatti, Ihsan Ayyub Qazi, Zafar Ayyub QaziComments: 13 pages, 9 figuresSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Large Language Models (LLMs) process millions of queries daily, making efficient response caching a compelling optimization for reducing cost and latency. However, preserving relevance to user queries using this approach proves difficult due to the personalized nature of chatbot interactions and the limited accuracy of semantic similarity search. To address this, we present TweakLLM, a novel routing architecture that employs a lightweight LLM to dynamically adapt cached responses to incoming prompts. Through comprehensive evaluation, including user studies with side-by-side comparisons, satisfaction voting, as well as multi-agent LLM debates, we demonstrate that TweakLLM maintains response quality comparable to frontier models while significantly improving cache effectiveness. Our results across real-world datasets highlight TweakLLM as a scalable, resource-efficient caching solution for high-volume LLM deployments without compromising user experience.
- [60] arXiv:2507.23675 [pdf, html, other]
-
Title: One-Step Flow Policy Mirror DescentSubjects: Machine Learning (cs.LG)
Diffusion policies have achieved great success in online reinforcement learning (RL) due to their strong expressive capacity. However, the inference of diffusion policy models relies on a slow iterative sampling process, which limits their responsiveness. To overcome this limitation, we propose Flow Policy Mirror Descent (FPMD), an online RL algorithm that enables 1-step sampling during policy inference. Our approach exploits a theoretical connection between the distribution variance and the discretization error of single-step sampling in straight interpolation flow matching models, and requires no extra distillation or consistency training. We present two algorithm variants based on flow policy and MeanFlow policy parametrizations, respectively. Extensive empirical evaluations on MuJoCo benchmarks demonstrate that our algorithms show strong performance comparable to diffusion policy baselines while requiring hundreds of times fewer function evaluations during inference.
- [61] arXiv:2507.23676 [pdf, html, other]
-
Title: DepMicroDiff: Diffusion-Based Dependency-Aware Multimodal Imputation for Microbiome DataSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Microbiome data analysis is essential for understanding host health and disease, yet its inherent sparsity and noise pose major challenges for accurate imputation, hindering downstream tasks such as biomarker discovery. Existing imputation methods, including recent diffusion-based models, often fail to capture the complex interdependencies between microbial taxa and overlook contextual metadata that can inform imputation. We introduce DepMicroDiff, a novel framework that combines diffusion-based generative modeling with a Dependency-Aware Transformer (DAT) to explicitly capture both mutual pairwise dependencies and autoregressive relationships. DepMicroDiff is further enhanced by VAE-based pretraining across diverse cancer datasets and conditioning on patient metadata encoded via a large language model (LLM). Experiments on TCGA microbiome datasets show that DepMicroDiff substantially outperforms state-of-the-art baselines, achieving higher Pearson correlation (up to 0.712), cosine similarity (up to 0.812), and lower RMSE and MAE across multiple cancer types, demonstrating its robustness and generalizability for microbiome imputation.
- [62] arXiv:2507.23712 [pdf, html, other]
-
Title: Anomalous Samples for Few-Shot Anomaly DetectionSubjects: Machine Learning (cs.LG)
Several anomaly detection and classification methods rely on large amounts of non-anomalous or "normal" samples under the assump- tion that anomalous data is typically harder to acquire. This hypothesis becomes questionable in Few-Shot settings, where as little as one anno- tated sample can make a significant difference. In this paper, we tackle the question of utilizing anomalous samples in training a model for bi- nary anomaly classification. We propose a methodology that incorporates anomalous samples in a multi-score anomaly detection score leveraging recent Zero-Shot and memory-based techniques. We compare the utility of anomalous samples to that of regular samples and study the benefits and limitations of each. In addition, we propose an augmentation-based validation technique to optimize the aggregation of the different anomaly scores and demonstrate its effectiveness on popular industrial anomaly detection datasets.
- [63] arXiv:2507.23756 [pdf, other]
-
Title: Improving annotator selection in Active Learning using a mood and fatigue-aware Recommender SystemSubjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)
This study centers on overcoming the challenge of selecting the best annotators for each query in Active Learning (AL), with the objective of minimizing misclassifications. AL recognizes the challenges related to cost and time when acquiring labeled data, and decreases the number of labeled data needed. Nevertheless, there is still the necessity to reduce annotation errors, aiming to be as efficient as possible, to achieve the expected accuracy faster. Most strategies for query-annotator pairs do not consider internal factors that affect productivity, such as mood, attention, motivation, and fatigue levels. This work addresses this gap in the existing literature, by not only considering how the internal factors influence annotators (mood and fatigue levels) but also presenting a new query-annotator pair strategy, using a Knowledge-Based Recommendation System (RS). The RS ranks the available annotators, allowing to choose one or more to label the queried instance using their past accuracy values, and their mood and fatigue levels, as well as information about the instance queried. This work bases itself on existing literature on mood and fatigue influence on human performance, simulating annotators in a realistic manner, and predicting their performance with the RS. The results show that considering past accuracy values, as well as mood and fatigue levels reduces the number of annotation errors made by the annotators, and the uncertainty of the model through its training, when compared to not using internal factors. Accuracy and F1-score values were also better in the proposed approach, despite not being as substantial as the aforementioned. The methodologies and findings presented in this study begin to explore the open challenge of human cognitive factors affecting AL.
- [64] arXiv:2507.23771 [pdf, html, other]
-
Title: Consensus-Driven Active Model SelectionComments: ICCV 2025 Highlight. 16 pages, 8 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
The widespread availability of off-the-shelf machine learning models poses a challenge: which model, of the many available candidates, should be chosen for a given data analysis task? This question of model selection is traditionally answered by collecting and annotating a validation dataset -- a costly and time-intensive process. We propose a method for active model selection, using predictions from candidate models to prioritize the labeling of test data points that efficiently differentiate the best candidate. Our method, CODA, performs consensus-driven active model selection by modeling relationships between classifiers, categories, and data points within a probabilistic framework. The framework uses the consensus and disagreement between models in the candidate pool to guide the label acquisition process, and Bayesian inference to update beliefs about which model is best as more information is collected. We validate our approach by curating a collection of 26 benchmark tasks capturing a range of model selection scenarios. CODA outperforms existing methods for active model selection significantly, reducing the annotation effort required to discover the best model by upwards of 70% compared to the previous state-of-the-art. Code and data are available at this https URL.
New submissions (showing 64 of 64 entries)
- [65] arXiv:1908.01212 (cross-list from math.CT) [pdf, html, other]
-
Title: Typing Tensor Calculus in 2-Categories (I)Comments: 28 pages; extended introduction, more explanationSubjects: Category Theory (math.CT); Machine Learning (cs.LG); Software Engineering (cs.SE)
To formalize calculations in linear algebra for the development of efficient algorithms and a framework suitable for functional programming languages and faster parallelized computations, we adopt an approach that treats elements of linear algebra, such as matrices, as morphisms in the category of matrices, $\mathbf{Mat_{k}}$. This framework is further extended by generalizing the results to arbitrary monoidal semiadditive categories. To enrich this perspective and accommodate higher-rank matrices (tensors), we define semiadditive 2-categories, where matrices $T_{ij}$ are represented as 1-morphisms, and tensors with four indices $T_{ijkl}$ as 2-morphisms. This formalization provides an index-free, typed linear algebra framework that includes matrices and tensors with up to four indices. Furthermore, we extend the framework to monoidal semiadditive 2-categories and demonstrate detailed operations and vectorization within the 2-category of 2Vec introduced by Kapranov and Voevodsky.
- [66] arXiv:2507.22906 (cross-list from eess.SP) [pdf, html, other]
-
Title: DNN-based Methods of Jointly Sensing Number and Directions of Targets via a Green Massive H2AD MIMO ReceiverSubjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)
As a green MIMO structure, the heterogeneous hybrid analog-digital H2AD MIMO architecture has been shown to own a great potential to replace the massive or extremely large-scale fully-digital MIMO in the future wireless networks to address the three challenging problems faced by the latter: high energy consumption, high circuit cost, and high complexity. However, how to intelligently sense the number and direction of multi-emitters via such a structure is still an open hard problem. To address this, we propose a two-stage sensing framework that jointly estimates the number and direction values of multiple targets. Specifically, three target number sensing methods are designed: an improved eigen-domain clustering (EDC) framework, an enhanced deep neural network (DNN) based on five key statistical features, and an improved one-dimensional convolutional neural network (1D-CNN) utilizing full eigenvalues. Subsequently, a low-complexity and high-accuracy DOA estimation is achieved via the introduced online micro-clustering (OMC-DOA) method. Furthermore, we derive the Cramér-Rao lower bound (CRLB) for the H2AD under multiple-source conditions as a theoretical performance benchmark. Simulation results show that the developed three methods achieve 100\% number of targets sensing at moderate-to-high SNRs, while the improved 1D-CNN exhibits superior under extremely-low SNR conditions. The introduced OMC-DOA outperforms existing clustering and fusion-based DOA methods in multi-source environments.
- [67] arXiv:2507.22908 (cross-list from q-fin.CP) [pdf, html, other]
-
Title: A Privacy-Preserving Federated Framework with Hybrid Quantum-Enhanced Learning for Financial Fraud DetectionAbhishek Sawaika, Swetang Krishna, Tushar Tomar, Durga Pritam Suggisetti, Aditi Lal, Tanmaya Shrivastav, Nouhaila Innan, Muhammad ShafiqueComments: To be published in proceedings of IEEE International Conference on Quantum Computing and Engineering (QCE) 2025Subjects: Computational Finance (q-fin.CP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Rapid growth of digital transactions has led to a surge in fraudulent activities, challenging traditional detection methods in the financial sector. To tackle this problem, we introduce a specialised federated learning framework that uniquely combines a quantum-enhanced Long Short-Term Memory (LSTM) model with advanced privacy preserving techniques. By integrating quantum layers into the LSTM architecture, our approach adeptly captures complex cross-transactional patters, resulting in an approximate 5% performance improvement across key evaluation metrics compared to conventional models. Central to our framework is "FedRansel", a novel method designed to defend against poisoning and inference attacks, thereby reducing model degradation and inference accuracy by 4-8%, compared to standard differential privacy mechanisms. This pseudo-centralised setup with a Quantum LSTM model, enhances fraud detection accuracy and reinforces the security and confidentiality of sensitive financial data.
- [68] arXiv:2507.22912 (cross-list from cs.CL) [pdf, html, other]
-
Title: A Language Model-Driven Semi-Supervised Ensemble Framework for Illicit Market Detection Across Deep/Dark Web and Social PlatformsNavid Yazdanjue, Morteza Rakhshaninejad, Hossein Yazdanjouei, Mohammad Sadegh Khorshidi, Mikko S. Niemela, Fang Chen, Amir H. GandomiComments: 16 pages, 5 figures, 9 tablesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Illegal marketplaces have increasingly shifted to concealed parts of the internet, including the deep and dark web, as well as platforms such as Telegram, Reddit, and Pastebin. These channels enable the anonymous trade of illicit goods including drugs, weapons, and stolen credentials. Detecting and categorizing such content remains challenging due to limited labeled data, the evolving nature of illicit language, and the structural heterogeneity of online sources. This paper presents a hierarchical classification framework that combines fine-tuned language models with a semi-supervised ensemble learning strategy to detect and classify illicit marketplace content across diverse platforms. We extract semantic representations using ModernBERT, a transformer model for long documents, finetuned on domain-specific data from deep and dark web pages, Telegram channels, Subreddits, and Pastebin pastes to capture specialized jargon and ambiguous linguistic patterns. In addition, we incorporate manually engineered features such as document structure, embedded patterns including Bitcoin addresses, emails, and IPs, and metadata, which complement language model embeddings. The classification pipeline operates in two stages. The first stage uses a semi-supervised ensemble of XGBoost, Random Forest, and SVM with entropy-based weighted voting to detect sales-related documents. The second stage further classifies these into drug, weapon, or credential sales. Experiments on three datasets, including our multi-source corpus, DUTA, and CoDA, show that our model outperforms several baselines, including BERT, ModernBERT, DarkBERT, ALBERT, Longformer, and BigBird. The model achieves an accuracy of 0.96489, an F1-score of 0.93467, and a TMCC of 0.95388, demonstrating strong generalization, robustness under limited supervision, and effectiveness in real-world illicit content detection.
- [69] arXiv:2507.22918 (cross-list from cs.CL) [pdf, html, other]
-
Title: Semantic Convergence: Investigating Shared Representations Across Scaled LLMsDaniel Son, Sanjana Rathore, Andrew Rufail, Adrian Simon, Daniel Zhang, Soham Dave, Cole Blondin, Kevin Zhu, Sean O'BrienComments: Submitted to ACL 2025 Student Research Workshop (poster)Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
We investigate feature universality in Gemma-2 language models (Gemma-2-2B and Gemma-2-9B), asking whether models with a four-fold difference in scale still converge on comparable internal concepts. Using the Sparse Autoencoder (SAE) dictionary-learning pipeline, we utilize SAEs on each model's residual-stream activations, align the resulting monosemantic features via activation correlation, and compare the matched feature spaces with SVCCA and RSA. Middle layers yield the strongest overlap, while early and late layers show far less similarity. Preliminary experiments extend the analysis from single tokens to multi-token subspaces, showing that semantically similar subspaces interact similarly with language models. These results strengthen the case that large language models carve the world into broadly similar, interpretable features despite size differences, reinforcing universality as a foundation for cross-model interpretability.
- [70] arXiv:2507.22941 (cross-list from cs.CL) [pdf, html, other]
-
Title: SigBERT: Combining Narrative Medical Reports and Rough Path Signature Theory for Survival Risk Estimation in OncologyComments: 12 pages, 2 figures, accepted for ECML PKDD 2025Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG); Applications (stat.AP)
Electronic medical reports (EHR) contain a vast amount of information that can be leveraged for machine learning applications in healthcare. However, existing survival analysis methods often struggle to effectively handle the complexity of textual data, particularly in its sequential form. Here, we propose SigBERT, an innovative temporal survival analysis framework designed to efficiently process a large number of clinical reports per patient. SigBERT processes timestamped medical reports by extracting and averaging word embeddings into sentence embeddings. To capture temporal dynamics from the time series of sentence embedding coordinates, we apply signature extraction from rough path theory to derive geometric features for each patient, which significantly enhance survival model performance by capturing complex temporal dynamics. These features are then integrated into a LASSO-penalized Cox model to estimate patient-specific risk scores. The model was trained and evaluated on a real-world oncology dataset from the Léon Bérard Center corpus, with a C-index score of 0.75 (sd 0.014) on the independent test cohort. SigBERT integrates sequential medical data to enhance risk estimation, advancing narrative-based survival analysis.
- [71] arXiv:2507.22947 (cross-list from cs.CY) [pdf, other]
-
Title: ELMES: An Automated Framework for Evaluating Large Language Models in Educational ScenariosShou'ang Wei, Xinyun Wang, Shuzhen Bi, Jian Chen, Ruijia Li, Bo Jiang, Xin Lin, Min Zhang, Yu Song, BingDong Li, Aimin Zhou, Hao HaoSubjects: Computers and Society (cs.CY); Computation and Language (cs.CL); Machine Learning (cs.LG)
The emergence of Large Language Models (LLMs) presents transformative opportunities for education, generating numerous novel application scenarios. However, significant challenges remain: evaluation metrics vary substantially across different educational scenarios, while many emerging scenarios lack appropriate assessment metrics. Current benchmarks predominantly measure general intelligence rather than pedagogical capabilities. To address this gap, we introduce ELMES, an open-source automated evaluation framework specifically designed for assessing LLMs in educational settings. ELMES features a modular architecture that enables researchers to create dynamic, multi-agent dialogues through simple configuration files, facilitating flexible scenario design without requiring extensive programming expertise. The framework incorporates a hybrid evaluation engine that objectively quantifies traditionally subjective pedagogical metrics using an LLM-as-a-Judge methodology. We conduct systematic benchmarking of state-of-the-art LLMs across four critical educational scenarios: Knowledge Point Explanation, Guided Problem-Solving Teaching, Interdisciplinary Lesson Plan Generation, and Contextualized Question Generation, employing fine-grained metrics developed in collaboration with education specialists. Our results demonstrate distinct capability distributions among models, revealing context-specific strengths and limitations. ELMES provides educators and researchers with an accessible evaluation framework that significantly reduces adaptation barriers for diverse educational applications while advancing the practical implementation of LLMs in pedagogy. The framework is publicly available at \emph{this https URL}.
- [72] arXiv:2507.22951 (cross-list from cs.AI) [pdf, html, other]
-
Title: Unifying Post-hoc Explanations of Knowledge Graph CompletionsSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Post-hoc explainability for Knowledge Graph Completion (KGC) lacks formalization and consistent evaluations, hindering reproducibility and cross-study comparisons. This paper argues for a unified approach to post-hoc explainability in KGC. First, we propose a general framework to characterize post-hoc explanations via multi-objective optimization, balancing their effectiveness and conciseness. This unifies existing post-hoc explainability algorithms in KGC and the explanations they produce. Next, we suggest and empirically support improved evaluation protocols using popular metrics like Mean Reciprocal Rank and Hits@$k$. Finally, we stress the importance of interpretability as the ability of explanations to address queries meaningful to end-users. By unifying methods and refining evaluation standards, this work aims to make research in KGC explainability more reproducible and impactful.
- [73] arXiv:2507.22952 (cross-list from cs.HC) [pdf, html, other]
-
Title: Automated Label Placement on Maps via Large Language ModelsComments: Workshop on AI for Data Editing (AI4DE) at KDD 2025Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Label placement is a critical aspect of map design, serving as a form of spatial annotation that directly impacts clarity and interpretability. Despite its importance, label placement remains largely manual and difficult to scale, as existing automated systems struggle to integrate cartographic conventions, adapt to context, or interpret labeling instructions. In this work, we introduce a new paradigm for automatic label placement (ALP) that formulates the task as a data editing problem and leverages large language models (LLMs) for context-aware spatial annotation. To support this direction, we curate MAPLE, the first known benchmarking dataset for evaluating ALP on real-world maps, encompassing diverse landmark types and label placement annotations from open-source data. Our method retrieves labeling guidelines relevant to each landmark type leveraging retrieval-augmented generation (RAG), integrates them into prompts, and employs instruction-tuned LLMs to generate ideal label coordinates. We evaluate four open-source LLMs on MAPLE, analyzing both overall performance and generalization across different types of landmarks. This includes both zero-shot and instruction-tuned performance. Our results demonstrate that LLMs, when guided by structured prompts and domain-specific retrieval, can learn to perform accurate spatial edits, aligning the generated outputs with expert cartographic standards. Overall, our work presents a scalable framework for AI-assisted map finishing and demonstrates the potential of foundation models in structured data editing tasks. The code and data can be found at this https URL.
- [74] arXiv:2507.22955 (cross-list from cs.SI) [pdf, html, other]
-
Title: LLMs Between the Nodes: Community Discovery Beyond VectorsSubjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)
Community detection in social network graphs plays a vital role in uncovering group dynamics, influence pathways, and the spread of information. Traditional methods focus primarily on graph structural properties, but recent advancements in Large Language Models (LLMs) open up new avenues for integrating semantic and contextual information into this task. In this paper, we present a detailed investigation into how various LLM-based approaches perform in identifying communities within social graphs. We introduce a two-step framework called CommLLM, which leverages the GPT-4o model along with prompt-based reasoning to fuse language model outputs with graph structure. Evaluations are conducted on six real-world social network datasets, measuring performance using key metrics such as Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), Variation of Information (VOI), and cluster purity. Our findings reveal that LLMs, particularly when guided by graph-aware strategies, can be successfully applied to community detection tasks in small to medium-sized graphs. We observe that the integration of instruction-tuned models and carefully engineered prompts significantly improves the accuracy and coherence of detected communities. These insights not only highlight the potential of LLMs in graph-based research but also underscore the importance of tailoring model interactions to the specific structure of graph data.
- [75] arXiv:2507.22958 (cross-list from cs.CV) [pdf, html, other]
-
Title: CHECK-MAT: Checking Hand-Written Mathematical Answers for the Russian Unified State ExamComments: 15 pages, 3 figures, 10 tables. Code is available at: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
This paper introduces a novel benchmark, EGE-Math Solutions Assessment Benchmark, for evaluating Vision-Language Models (VLMs) on their ability to assess hand-written mathematical solutions. Unlike existing benchmarks that focus on problem solving, our approach centres on understanding student solutions, identifying mistakes, and assigning grades according to fixed criteria. We compile 122 scanned solutions from the Russian Unified State Exam (EGE) together with official expert grades, and evaluate seven modern VLMs from Google, OpenAI, Arcee AI, and Alibaba Cloud in three inference modes. The results reveal current limitations in mathematical reasoning and human-rubric alignment, opening new research avenues in AI-assisted assessment. You can find code in this https URL
- [76] arXiv:2507.23015 (cross-list from cs.RO) [pdf, html, other]
-
Title: Learning to Prune Branches in Modern Tree-Fruit OrchardsSubjects: Robotics (cs.RO); Machine Learning (cs.LG)
Dormant tree pruning is labor-intensive but essential to maintaining modern highly-productive fruit orchards. In this work we present a closed-loop visuomotor controller for robotic pruning. The controller guides the cutter through a cluttered tree environment to reach a specified cut point and ensures the cutters are perpendicular to the branch. We train the controller using a novel orchard simulation that captures the geometric distribution of branches in a target apple orchard configuration. Unlike traditional methods requiring full 3D reconstruction, our controller uses just optical flow images from a wrist-mounted camera. We deploy our learned policy in simulation and the real-world for an example V-Trellis envy tree with zero-shot transfer, achieving a 30% success rate -- approximately half the performance of an oracle planner.
- [77] arXiv:2507.23017 (cross-list from stat.ML) [pdf, html, other]
-
Title: A Smoothing Newton Method for Rank-one Matrix RecoveryComments: 12 pages, 4 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
We consider the phase retrieval problem, which involves recovering a rank-one positive semidefinite matrix from rank-one measurements. A recently proposed algorithm based on Bures-Wasserstein gradient descent (BWGD) exhibits superlinear convergence, but it is unstable, and existing theory can only prove local linear convergence for higher rank matrix recovery. We resolve this gap by revealing that BWGD implements Newton's method with a nonsmooth and nonconvex objective. We develop a smoothing framework that regularizes the objective, enabling a stable method with rigorous superlinear convergence guarantees. Experiments on synthetic data demonstrate this superior stability while maintaining fast convergence.
- [78] arXiv:2507.23018 (cross-list from cs.AI) [pdf, html, other]
-
Title: Data Readiness for Scientific AI at ScaleWesley Brewer, Patrick Widener, Valentine Anantharaj, Feiyi Wang, Tom Beck, Arjun Shankar, Sarp OralComments: 10 pages, 1 figure, 2 tablesSubjects: Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
This paper examines how Data Readiness for AI (DRAI) principles apply to leadership-scale scientific datasets used to train foundation models. We analyze archetypal workflows across four representative domains - climate, nuclear fusion, bio/health, and materials - to identify common preprocessing patterns and domain-specific constraints. We introduce a two-dimensional readiness framework composed of Data Readiness Levels (raw to AI-ready) and Data Processing Stages (ingest to shard), both tailored to high performance computing (HPC) environments. This framework outlines key challenges in transforming scientific data for scalable AI training, emphasizing transformer-based generative models. Together, these dimensions form a conceptual maturity matrix that characterizes scientific data readiness and guides infrastructure development toward standardized, cross-domain support for scalable and reproducible AI for science.
- [79] arXiv:2507.23042 (cross-list from cs.CV) [pdf, html, other]
-
Title: Early Goal-Guided Multi-Scale Fusion for Real-Time Vision-Language DrivingComments: 6 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Robotics (cs.RO)
Autonomous vehicles must react in milliseconds while reasoning about road geometry and traffic intent to navigate complex situations. We introduce NovaDrive, a single-branch vision-language architecture that processes front-camera images, HD-map tiles, LiDAR depth, and textual waypoints in a single branch. A lightweight, two-stage cross-attention block first aligns waypoint tokens with the HD map, then refines attention over fine-grained image and depth patches. Coupled with a novel smoothness loss that discourages abrupt steering and speed changes, this design eliminates the need for recurrent memory. We fine-tune the top 15 layers of an 11B LLaMA-3.2 vision-language backbone, enabling real-time inference. On the nuScenes / Waymo subset of the MD-NEX Outdoor benchmark, NovaDrive raises success rate to 84% (+4%), boosts path-efficiency (SPL) to 0.66 (+0.11), and reduces collision frequency from 2.6% to 1.2% (-1.4%) relative to the previous state-of-the-art. Our ablations confirm that waypoint tokens, partial VLM fine-tuning, and the cross-attention fusion each contribute the most to these gains. Beyond safety, NovaDrive's shorter routes (resulting from the novel smoothness loss) translate to lower fuel or battery usage, pointing toward leaner, more easily updated driving stacks. NovaDrive can be extended to other embodied-AI domains as well.
- [80] arXiv:2507.23064 (cross-list from cs.CV) [pdf, html, other]
-
Title: Vision-Language Fusion for Real-Time Autonomous Driving: Goal-Centered Cross-Attention of Camera, HD-Map, & WaypointsComments: 5 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Autonomous cars need geometric accuracy and semantic understanding to navigate complex environments, yet most stacks handle them separately. We present XYZ-Drive, a single vision-language model that reads a front-camera frame, a 25m $\times$ 25m overhead map, and the next waypoint, then outputs steering and speed. A lightweight goal-centered cross-attention layer lets waypoint tokens highlight relevant image and map patches, supporting both action and textual explanations, before the fused tokens enter a partially fine-tuned LLaMA-3.2 11B model.
On the MD-NEX Outdoor-Driving benchmark XYZ-Drive attains 95% success and 0.80 Success weighted by Path Length (SPL), surpassing PhysNav-DG by 15%. and halving collisions, all while significantly improving efficiency by using only a single branch. Sixteen ablations explain the gains. Removing any modality (vision, waypoint, map) drops success by up to 11%, confirming their complementary roles and rich connections. Replacing goal-centered attention with simple concatenation cuts 3% in performance, showing query-based fusion injects map knowledge more effectively. Keeping the transformer frozen loses 5%, showing the importance of fine-tuning when applying VLMs for specific tasks such as autonomous driving. Coarsening map resolution from 10 cm to 40 cm blurs lane edges and raises crash rate.
Overall, these results demonstrate that early, token-level fusion of intent and map layout enables accurate, transparent, real-time driving. - [81] arXiv:2507.23104 (cross-list from cs.CL) [pdf, other]
-
Title: RASL: Retrieval Augmented Schema Linking for Massive Database Text-to-SQLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Despite advances in large language model (LLM)-based natural language interfaces for databases, scaling to enterprise-level data catalogs remains an under-explored challenge. Prior works addressing this challenge rely on domain-specific fine-tuning - complicating deployment - and fail to leverage important semantic context contained within database metadata. To address these limitations, we introduce a component-based retrieval architecture that decomposes database schemas and metadata into discrete semantic units, each separately indexed for targeted retrieval. Our approach prioritizes effective table identification while leveraging column-level information, ensuring the total number of retrieved tables remains within a manageable context budget. Experiments demonstrate that our method maintains high recall and accuracy, with our system outperforming baselines over massive databases with varying structure and available metadata. Our solution enables practical text-to-SQL systems deployable across diverse enterprise settings without specialized fine-tuning, addressing a critical scalability gap in natural language database interfaces.
- [82] arXiv:2507.23155 (cross-list from math.OC) [pdf, html, other]
-
Title: On the Complexity of Finding Stationary Points in Nonconvex Simple Bilevel OptimizationSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
In this paper, we study the problem of solving a simple bilevel optimization problem, where the upper-level objective is minimized over the solution set of the lower-level problem. We focus on the general setting in which both the upper- and lower-level objectives are smooth but potentially nonconvex. Due to the absence of additional structural assumptions for the lower-level objective-such as convexity or the Polyak-Łojasiewicz (PL) condition-guaranteeing global optimality is generally intractable. Instead, we introduce a suitable notion of stationarity for this class of problems and aim to design a first-order algorithm that finds such stationary points in polynomial time. Intuitively, stationarity in this setting means the upper-level objective cannot be substantially improved locally without causing a larger deterioration in the lower-level objective. To this end, we show that a simple and implementable variant of the dynamic barrier gradient descent (DBGD) framework can effectively solve the considered nonconvex simple bilevel problems up to stationarity. Specifically, to reach an $(\epsilon_f, \epsilon_g)$-stationary point-where $\epsilon_f$ and $\epsilon_g$ denote the target stationarity accuracies for the upper- and lower-level objectives, respectively-the considered method achieves a complexity of $\mathcal{O}\left(\max\left(\epsilon_f^{-\frac{3+p}{1+p}}, \epsilon_g^{-\frac{3+p}{2}}\right)\right)$, where $p \geq 0$ is an arbitrary constant balancing the terms. To the best of our knowledge, this is the first complexity result for a discrete-time algorithm that guarantees joint stationarity for both levels in general nonconvex simple bilevel problems.
- [83] arXiv:2507.23160 (cross-list from cond-mat.mtrl-sci) [pdf, html, other]
-
Title: Extended Factorization Machine Annealing for Rapid Discovery of Transparent Conducting MaterialsComments: 12pages, 6figuresSubjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
The development of novel transparent conducting materials (TCMs) is essential for enhancing the performance and reducing the cost of next-generation devices such as solar cells and displays. In this research, we focus on the (Al$_x$Ga$_y$In$_z$)$_2$O$_3$ system and extend the FMA framework, which combines a Factorization Machine (FM) and annealing, to search for optimal compositions and crystal structures with high accuracy and low cost. The proposed method introduces (i) the binarization of continuous variables, (ii) the utilization of good solutions using a Hopfield network, (iii) the activation of global search through adaptive random flips, and (iv) fine-tuning via a bit-string local search. Validation using the (Al$_x$Ga$_y$In$_z$)$_2$O$_3$ data from the Kaggle "Nomad2018 Predicting Transparent Conductors" competition demonstrated that our method achieves faster and more accurate searches than Bayesian optimization and genetic algorithms. Furthermore, its application to multi-objective optimization showed its capability in designing materials by simultaneously considering both the band gap and formation energy. These results suggest that applying our method to larger, more complex search problems and diverse material designs that reflect realistic experimental conditions is expected to contribute to the further advancement of materials informatics.
- [84] arXiv:2507.23167 (cross-list from cs.CL) [pdf, html, other]
-
Title: LENS: Learning Ensemble Confidence from Neural States for Multi-LLM Answer IntegrationSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Large Language Models (LLMs) have demonstrated impressive performance across various tasks, with different models excelling in distinct domains and specific abilities. Effectively combining the predictions of multiple LLMs is crucial for enhancing system robustness and performance. However, existing ensemble methods often rely on simple techniques like voting or logits ensembling, which overlook the varying confidence and reliability of models in different contexts. In this work, we propose LENS (Learning ENsemble confidence from Neural States), a novel approach that learns to estimate model confidence by analyzing internal representations. For each LLM, we train a lightweight linear confidence predictor that leverages layer-wise hidden states and normalized probabilities as inputs. This allows for more nuanced weighting of model predictions based on their context-dependent reliability. Our method does not require modifying the model parameters and requires negligible additional computation. Experimental results on multiple-choice and boolean question-answering tasks demonstrate that LENS outperforms traditional ensemble methods by a substantial margin. Our findings suggest that internal representations provide valuable signals for determining model confidence and can be effectively leveraged for ensemble learning.
- [85] arXiv:2507.23174 (cross-list from cs.CV) [pdf, html, other]
-
Title: CNN-based solution for mango classification in agricultural environmentsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
This article exemplifies the design of a fruit detection and classification system using Convolutional
Neural Networks (CNN). The goal is to develop a system that automatically assesses fruit quality for
farm inventory management. Specifically, a method for mango fruit classification was developed using
image processing, ensuring both accuracy and efficiency. Resnet-18 was selected as the preliminary
architecture for classification, while a cascade detector was used for detection, balancing execution speed
and computational resource consumption. Detection and classification results were displayed through a
graphical interface developed in MatLab App Designer, streamlining system interaction. The integration
of convolutional neural networks and cascade detectors proffers a reliable solution for fruit classification
and detection, with potential applications in agricultural quality control. - [86] arXiv:2507.23194 (cross-list from cs.CL) [pdf, html, other]
-
Title: Geak: Introducing Triton Kernel AI Agent & Evaluation BenchmarksJianghui Wang, Vinay Joshi, Saptarshi Majumder, Xu Chao, Bin Ding, Ziqiong Liu, Pratik Prabhanjan Brahma, Dong Li, Zicheng Liu, Emad BarsoumSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The demand for AI-generated GPU kernels is rapidly growing, influenced by the need for scalable, hardware-optimized solutions in both industry and academia. As deep learning workloads grow in complexity and diversity, it is imperative to automate low-level kernel development to meet performance and productivity demands. Major cloud providers, semiconductor companies, and research institutions are now investing heavily in AI-driven code generation for GPUs, aiming to reduce manual optimization efforts while achieving near-expert performance on hardware like AMD MI300X. The Triton language, a Python-based DSL for GPU programming, has emerged as a popular target for such AI-generated kernels due to its balance of performance and ease-of-coding. In this work, we present an evaluation suite for Triton-based GPU kernels and GEAK (Generating Efficient AI-centric GPU Kernels)-a framework that leverages cutting-edge LLMs to generate performant Triton code specifically for AMD GPUs, including the AMD MI300X and MI250. GEAK leverages inference-time compute scaling to produce Triton-based GPU kernels using a reasoning loop adapted from Reflexion-style feedback mechanisms. On two evaluation benchmarks, GEAK significantly outperformed the baselines of directly prompting frontier LLMs as well as Reflexion-based generation pipelines by achieving correctness up to $63$% and execution speed up of up to $2.59$X. These results highlight the promise of GEAK-like agentic code generation for accelerating the adoption of diverse hardware platforms and democratizing access to expert-level kernel performance.
- [87] arXiv:2507.23208 (cross-list from cs.IR) [pdf, html, other]
-
Title: Are Recommenders Self-Aware? Label-Free Recommendation Performance Estimation via Model UncertaintySubjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Can a recommendation model be self-aware? This paper investigates the recommender's self-awareness by quantifying its uncertainty, which provides a label-free estimation of its performance. Such self-assessment can enable more informed understanding and decision-making before the recommender engages with any users. To this end, we propose an intuitive and effective method, probability-based List Distribution uncertainty (LiDu). LiDu measures uncertainty by determining the probability that a recommender will generate a certain ranking list based on the prediction distributions of individual items. We validate LiDu's ability to represent model self-awareness in two settings: (1) with a matrix factorization model on a synthetic dataset, and (2) with popular recommendation algorithms on real-world datasets. Experimental results show that LiDu is more correlated with recommendation performance than a series of label-free performance estimators. Additionally, LiDu provides valuable insights into the dynamic inner states of models throughout training and inference. This work establishes an empirical connection between recommendation uncertainty and performance, framing it as a step towards more transparent and self-evaluating recommender systems.
- [88] arXiv:2507.23209 (cross-list from cs.IR) [pdf, html, other]
-
Title: Not Just What, But When: Integrating Irregular Intervals to LLM for Sequential RecommendationComments: Accepted by RecSys 2025 short paper trackSubjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Time intervals between purchasing items are a crucial factor in sequential recommendation tasks, whereas existing approaches focus on item sequences and often overlook by assuming the intervals between items are static. However, dynamic intervals serve as a dimension that describes user profiling on not only the history within a user but also different users with the same item history. In this work, we propose IntervalLLM, a novel framework that integrates interval information into LLM and incorporates the novel interval-infused attention to jointly consider information of items and intervals. Furthermore, unlike prior studies that address the cold-start scenario only from the perspectives of users and items, we introduce a new viewpoint: the interval perspective to serve as an additional metric for evaluating recommendation methods on the warm and cold scenarios. Extensive experiments on 3 benchmarks with both traditional- and LLM-based baselines demonstrate that our IntervalLLM achieves not only 4.4% improvements in average but also the best-performing warm and cold scenarios across all users, items, and the proposed interval perspectives. In addition, we observe that the cold scenario from the interval perspective experiences the most significant performance drop among all recommendation methods. This finding underscores the necessity of further research on interval-based cold challenges and our integration of interval information in the realm of sequential recommendation tasks. Our code is available here: this https URL.
- [89] arXiv:2507.23220 (cross-list from cs.CL) [pdf, html, other]
-
Title: Model Directions, Not Words: Mechanistic Topic Models Using Sparse AutoencodersCarolina Zheng, Nicolas Beltran-Velez, Sweta Karlekar, Claudia Shi, Achille Nazaret, Asif Mallik, Amir Feder, David M. BleiSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Traditional topic models are effective at uncovering latent themes in large text collections. However, due to their reliance on bag-of-words representations, they struggle to capture semantically abstract features. While some neural variants use richer representations, they are similarly constrained by expressing topics as word lists, which limits their ability to articulate complex topics. We introduce Mechanistic Topic Models (MTMs), a class of topic models that operate on interpretable features learned by sparse autoencoders (SAEs). By defining topics over this semantically rich space, MTMs can reveal deeper conceptual themes with expressive feature descriptions. Moreover, uniquely among topic models, MTMs enable controllable text generation using topic-based steering vectors. To properly evaluate MTM topics against word-list-based approaches, we propose \textit{topic judge}, an LLM-based pairwise comparison evaluation framework. Across five datasets, MTMs match or exceed traditional and neural baselines on coherence metrics, are consistently preferred by topic judge, and enable effective steering of LLM outputs.
- [90] arXiv:2507.23227 (cross-list from cs.CL) [pdf, html, other]
-
Title: Enabling Few-Shot Alzheimer's Disease Diagnosis on Tabular Biomarker Data with LLMsSophie Kearney, Shu Yang, Zixuan Wen, Bojian Hou, Duy Duong-Tran, Tianlong Chen, Jason Moore, Marylyn Ritchie, Li ShenSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
Early and accurate diagnosis of Alzheimer's disease (AD), a complex neurodegenerative disorder, requires analysis of heterogeneous biomarkers (e.g., neuroimaging, genetic risk factors, cognitive tests, and cerebrospinal fluid proteins) typically represented in a tabular format. With flexible few-shot reasoning, multimodal integration, and natural-language-based interpretability, large language models (LLMs) offer unprecedented opportunities for prediction with structured biomedical data. We propose a novel framework called TAP-GPT, Tabular Alzheimer's Prediction GPT, that adapts TableGPT2, a multimodal tabular-specialized LLM originally developed for business intelligence tasks, for AD diagnosis using structured biomarker data with small sample sizes. Our approach constructs few-shot tabular prompts using in-context learning examples from structured biomedical data and finetunes TableGPT2 using the parameter-efficient qLoRA adaption for a clinical binary classification task of AD or cognitively normal (CN). The TAP-GPT framework harnesses the powerful tabular understanding ability of TableGPT2 and the encoded prior knowledge of LLMs to outperform more advanced general-purpose LLMs and a tabular foundation model (TFM) developed for prediction tasks. To our knowledge, this is the first application of LLMs to the prediction task using tabular biomarker data, paving the way for future LLM-driven multi-agent frameworks in biomedical informatics.
- [91] arXiv:2507.23242 (cross-list from cs.CV) [pdf, html, other]
-
Title: Generalized Reinforcement Learning for Retriever-Specific Query Rewriter with Unstructured Real-World DocumentsSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Retrieval-Augmented Generation (RAG) systems rely heavily on effective query formulation to unlock external knowledge, yet optimizing queries for diverse, unstructured real-world documents remains a challenge. We introduce \textbf{RL-QR}, a reinforcement learning framework for retriever-specific query rewriting that eliminates the need for human-annotated datasets and extends applicability to both text-only and multi-modal databases. By synthesizing scenario-question pairs and leveraging Generalized Reward Policy Optimization (GRPO), RL-QR trains query rewriters tailored to specific retrievers, enhancing retrieval performance across varied domains. Experiments on industrial in-house data demonstrate significant improvements, with $\text{RL-QR}_{\text{multi-modal}}$ achieving an 11\% relative gain in NDCG@3 for multi-modal RAG and $\text{RL-QR}_{\text{lexical}}$ yielding a 9\% gain for lexical retrievers. However, challenges persist with semantic and hybrid retrievers, where rewriters failed to improve performance, likely due to training misalignments. Our findings highlight RL-QR's potential to revolutionize query optimization for RAG systems, offering a scalable, annotation-free solution for real-world retrieval tasks, while identifying avenues for further refinement in semantic retrieval contexts.
- [92] arXiv:2507.23248 (cross-list from cs.CL) [pdf, html, other]
-
Title: Evaluating LLMs' Multilingual Capabilities for Bengali: Benchmark Creation and Performance AnalysisSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Bengali is an underrepresented language in NLP research. However, it remains a challenge due to its unique linguistic structure and computational constraints. In this work, we systematically investigate the challenges that hinder Bengali NLP performance by focusing on the absence of standardized evaluation benchmarks. We then evaluated 10 recent open source Large Language Models (LLMs) in 8 of the translated datasets and performed a comprehensive error analysis to pinpoint their primary failure modes. Our findings reveal consistent performance gaps for Bengali compared to English, particularly for smaller models and specific model families like Mistral. We also identified promising robustness in certain architectures, such as DeepSeek, that maintain more stable performance across languages. Our analysis reveals an inverse relationship between tokenization efficiency and LLM accuracy where models tend to perform worse when inputs are excessively tokenized, whereas more efficient \& concise tokenization results in improved performance. These findings highlight critical areas where current models fall short and underscore the need for improved dataset quality and evaluation methodologies tailored to multilingual contexts. This work will catalyze further research on NLP for underrepresented languages, helping to democratize access to advanced language technologies worldwide. The code and dataset used in this research is publicly available at this https URL.
- [93] arXiv:2507.23297 (cross-list from physics.data-an) [pdf, html, other]
-
Title: Simulation-based inference for Precision Neutrino Physics through Neural Monte Carlo tuningA. Gavrikov, A. Serafini, D. Dolzhikov, A. Garfagnini, M. Gonchar, M. Grassi, R. Brugnera, V. Cerrone, L. V. D'Auria, R. M. Guizzetti, L. Lastrucci, G. Andronico, V. Antonelli, A. Barresi, D. Basilico, M. Beretta, A. Bergnoli, M. Borghesi, A. Brigatti, R. Bruno, A. Budano, B. Caccianiga, A. Cammi, R. Caruso, D. Chiesa, C. Clementi, C. Coletta, S. Dusini, A. Fabbri, G. Felici, G. Ferrante, M.G. Giammarchi, N. Giudice, N. Guardone, F. Houria, C. Landini, I. Lippi, L. Loi, P. Lombardi, F. Mantovani, S.M. Mari, A. Martini, L. Miramonti, M. Montuschi, M. Nastasi, D. Orestano, F. Ortica, A. Paoloni, L. Pelicci, E. Percalli, F. Petrucci, E. Previtali, G. Ranucci, A.C. Re, B. Ricci, A. Romani, C. Sirignano, M. Sisti, L. Stanco, E. Stanescu Farilla, V. Strati, M.D.C Torri, C. Tuvè, C. Venettacci, G. Verde, L. VotanoSubjects: Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex); High Energy Physics - Phenomenology (hep-ph); Instrumentation and Detectors (physics.ins-det)
Precise modeling of detector energy response is crucial for next-generation neutrino experiments which present computational challenges due to lack of analytical likelihoods. We propose a solution using neural likelihood estimation within the simulation-based inference framework. We develop two complementary neural density estimators that model likelihoods of calibration data: conditional normalizing flows and a transformer-based regressor. We adopt JUNO - a large neutrino experiment - as a case study. The energy response of JUNO depends on several parameters, all of which should be tuned, given their non-linear behavior and strong correlations in the calibration data. To this end, we integrate the modeled likelihoods with Bayesian nested sampling for parameter inference, achieving uncertainties limited only by statistics with near-zero systematic biases. The normalizing flows model enables unbinned likelihood analysis, while the transformer provides an efficient binned alternative. By providing both options, our framework offers flexibility to choose the most appropriate method for specific needs. Finally, our approach establishes a template for similar applications across experimental neutrino and broader particle physics.
- [94] arXiv:2507.23315 (cross-list from cs.CV) [pdf, html, other]
-
Title: Impact of Hyperparameter Optimization on the Accuracy of Lightweight Deep Learning Models for Real-Time Image ClassificationComments: 13 pages, 4 figures, 4 tables. Includes ablation study and evaluation on 7 lightweight deep learning models. Code and logs available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Lightweight convolutional and transformer-based models have become vital for real-time image classification in resource-constrained applications, such as embedded systems and edge devices. This work analyzes the influence of hyperparameter adjustment on the accuracy and convergence behavior of seven efficient deep learning architectures: EfficientNetV2-S, ConvNeXt-T, MobileViT v2 (XXS/XS/S), MobileNetV3-L, TinyViT-21M, and RepVGG-A2. All models are trained on the ImageNet-1K dataset under consistent training settings, with an emphasis on real-time practicality. An comprehensive ablation study is undertaken to separate the effect of critical hyperparameters, including learning rate schedules, batch sizes, input resolution, data augmentation, regularization approaches, and optimizer choice. To assess appropriateness for real-time applications, each model is assessed not only in terms of Top-1 and Top-5 classification accuracy, but also in terms of inference time, parameter count, model size, and frames-per-second (FPS) on a GPU-accelerated edge deployment simulation. Results demonstrate that cosine learning rate decay and adjustable batch size may greatly boost both accuracy and convergence speed, while keeping low latency and memory cost. Notably, RepVGG-A2 achieves over 80% Top-1 accuracy with efficient inference performance, offering a compelling balance between accuracy and deployment cost for VGG-style models. The results give practical guidance for constructing resource-efficient deep learning models appropriate for real-time image processing pipelines. All code and training logs are publicly accessible at this https URL.
- [95] arXiv:2507.23334 (cross-list from cs.CL) [pdf, html, other]
-
Title: MUST-RAG: MUSical Text Question Answering with Retrieval Augmented GenerationComments: 8 pages, 2 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Recent advancements in Large language models (LLMs) have demonstrated remarkable capabilities across diverse domains. While they exhibit strong zero-shot performance on various tasks, LLMs' effectiveness in music-related applications remains limited due to the relatively small proportion of music-specific knowledge in their training data. To address this limitation, we propose MusT-RAG, a comprehensive framework based on Retrieval Augmented Generation (RAG) to adapt general-purpose LLMs for text-only music question answering (MQA) tasks. RAG is a technique that provides external knowledge to LLMs by retrieving relevant context information when generating answers to questions. To optimize RAG for the music domain, we (1) propose MusWikiDB, a music-specialized vector database for the retrieval stage, and (2) utilizes context information during both inference and fine-tuning processes to effectively transform general-purpose LLMs into music-specific models. Our experiment demonstrates that MusT-RAG significantly outperforms traditional fine-tuning approaches in enhancing LLMs' music domain adaptation capabilities, showing consistent improvements across both in-domain and out-of-domain MQA benchmarks. Additionally, our MusWikiDB proves substantially more effective than general Wikipedia corpora, delivering superior performance and computational efficiency.
- [96] arXiv:2507.23348 (cross-list from cs.SE) [pdf, other]
-
Title: SWE-Debate: Competitive Multi-Agent Debate for Software Issue ResolutionHan Li, Yuling Shi, Shaoxin Lin, Xiaodong Gu, Heng Lian, Xin Wang, Yantao Jia, Tao Huang, Qianxiang WangComments: Our code and data are available at this https URLSubjects: Software Engineering (cs.SE); Computation and Language (cs.CL); Machine Learning (cs.LG)
Issue resolution has made remarkable progress thanks to the advanced reasoning capabilities of large language models (LLMs). Recently, agent-based frameworks such as SWE-agent have further advanced this progress by enabling autonomous, tool-using agents to tackle complex software engineering tasks. While existing agent-based issue resolution approaches are primarily based on agents' independent explorations, they often get stuck in local solutions and fail to identify issue patterns that span across different parts of the codebase. To address this limitation, we propose SWE-Debate, a competitive multi-agent debate framework that encourages diverse reasoning paths and achieves more consolidated issue localization. SWE-Debate first creates multiple fault propagation traces as localization proposals by traversing a code dependency graph. Then, it organizes a three-round debate among specialized agents, each embodying distinct reasoning perspectives along the fault propagation trace. This structured competition enables agents to collaboratively converge on a consolidated fix plan. Finally, this consolidated fix plan is integrated into an MCTS-based code modification agent for patch generation. Experiments on the SWE-bench benchmark show that SWE-Debate achieves new state-of-the-art results in open-source agent frameworks and outperforms baselines by a large margin.
- [97] arXiv:2507.23349 (cross-list from stat.ML) [pdf, html, other]
-
Title: Optimal Transport Learning: Balancing Value Optimization and Fairness in Individualized Treatment RulesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Individualized treatment rules (ITRs) have gained significant attention due to their wide-ranging applications in fields such as precision medicine, ridesharing, and advertising recommendations. However, when ITRs are influenced by sensitive attributes such as race, gender, or age, they can lead to outcomes where certain groups are unfairly advantaged or disadvantaged. To address this gap, we propose a flexible approach based on the optimal transport theory, which is capable of transforming any optimal ITR into a fair ITR that ensures demographic parity. Recognizing the potential loss of value under fairness constraints, we introduce an ``improved trade-off ITR," designed to balance value optimization and fairness while accommodating varying levels of fairness through parameter adjustment. To maximize the value of the improved trade-off ITR under specific fairness levels, we propose a smoothed fairness constraint for estimating the adjustable parameter. Additionally, we establish a theoretical upper bound on the value loss for the improved trade-off ITR. We demonstrate performance of the proposed method through extensive simulation studies and application to the Next 36 entrepreneurial program dataset.
- [98] arXiv:2507.23361 (cross-list from cs.SE) [pdf, other]
-
Title: SWE-Exp: Experience-Driven Software Issue ResolutionSilin Chen, Shaoxin Lin, Xiaodong Gu, Yuling Shi, Heng Lian, Longfei Yun, Dong Chen, Weiguo Sun, Lin Cao, Qianxiang WangComments: Our code and data are available at this https URLSubjects: Software Engineering (cs.SE); Computation and Language (cs.CL); Machine Learning (cs.LG)
Recent advances in large language model (LLM) agents have shown remarkable progress in software issue resolution, leveraging advanced techniques such as multi-agent collaboration and Monte Carlo Tree Search (MCTS). However, current agents act as memoryless explorers - treating each problem separately without retaining or reusing knowledge from previous repair experiences. This leads to redundant exploration of failed trajectories and missed chances to adapt successful issue resolution methods to similar problems. To address this problem, we introduce SWE-Exp, an experience - enhanced approach that distills concise and actionable experience from prior agent trajectories, enabling continuous learning across issues. Our method introduces a multi-faceted experience bank that captures both successful and failed repair attempts. Specifically, it extracts reusable issue resolution knowledge at different levels - from high-level problem comprehension to specific code changes. Experiments show that SWE-Exp achieves state-of-the-art resolution rate (41.6% Pass@1) on SWE-bench-Verified under open-source agent frameworks. Our approach establishes a new paradigm in which automated software engineering agents systematically accumulate and leverage repair expertise, fundamentally shifting from trial-and-error exploration to strategic, experience-driven issue resolution.
- [99] arXiv:2507.23402 (cross-list from cs.CV) [pdf, html, other]
-
Title: AGA: An adaptive group alignment framework for structured medical cross-modal representation learningSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Learning medical visual representations from paired images and reports is a promising direction in representation learning. However, current vision-language pretraining methods in the medical domain often simplify clinical reports into single entities or fragmented tokens, ignoring their inherent structure. In addition, contrastive learning frameworks typically depend on large quantities of hard negative samples, which is impractical for small-scale medical datasets. To tackle these challenges, we propose Adaptive Grouped Alignment (AGA), a new framework that captures structured semantics from paired medical images and reports. AGA introduces a bidirectional grouping mechanism based on a sparse similarity matrix. For each image-report pair, we compute fine-grained similarities between text tokens and image patches. Each token selects its top-matching patches to form a visual group, and each patch selects its most related tokens to form a language group. To enable adaptive grouping, we design two threshold gating modules, called Language Grouped Threshold Gate and Vision Grouped Threshold Gate, which learn grouping thresholds dynamically. Group representations are computed as weighted averages based on similarity scores. To align each token with its group representation, we introduce an Instance Aware Group Alignment loss that operates within each image-text pair, removing the need for external negatives. Finally, a Bidirectional Cross-modal Grouped Alignment module is applied to enhance fine-grained alignment between visual and linguistic group representations. Extensive experiments on public and private datasets show that our method achieves strong performance on image-text retrieval and classification tasks under both fine-tuning and zero-shot settings.
- [100] arXiv:2507.23416 (cross-list from cs.CV) [pdf, other]
-
Title: Honey Adulteration Detection using Hyperspectral Imaging and Machine LearningSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
This paper aims to develop a machine learning-based system for automatically detecting honey adulteration with sugar syrup, based on honey hyperspectral imaging data. First, the floral source of a honey sample is classified by a botanical origin identification subsystem. Then, the sugar syrup adulteration is identified, and its concentration is quantified by an adulteration detection subsystem. Both subsystems consist of two steps. The first step involves extracting relevant features from the honey sample using Linear Discriminant Analysis (LDA). In the second step, we utilize the K-Nearest Neighbors (KNN) model to classify the honey botanical origin in the first subsystem and identify the adulteration level in the second subsystem. We assess the proposed system performance on a public honey hyperspectral image dataset. The result indicates that the proposed system can detect adulteration in honey with an overall cross-validation accuracy of 96.39%, making it an appropriate alternative to the current chemical-based detection methods.
- [101] arXiv:2507.23443 (cross-list from cs.CE) [pdf, html, other]
-
Title: Adjoint-Based Aerodynamic Shape Optimization with a Manifold Constraint Learned by Diffusion ModelsSubjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Optimization and Control (math.OC)
We introduce an adjoint-based aerodynamic shape optimization framework that integrates a diffusion model trained on existing designs to learn a smooth manifold of aerodynamically viable shapes. This manifold is enforced as an equality constraint to the shape optimization problem. Central to our method is the computation of adjoint gradients of the design objectives (e.g., drag and lift) with respect to the manifold space. These gradients are derived by first computing shape derivatives with respect to conventional shape design parameters (e.g., Hicks-Henne parameters) and then backpropagating them through the diffusion model to its latent space via automatic differentiation. Our framework preserves mathematical rigor and can be integrated into existing adjoint-based design workflows with minimal modification. Demonstrated on extensive transonic RANS airfoil design cases using off-the-shelf and general-purpose nonlinear optimizers, our approach eliminates ad hoc parameter tuning and variable scaling, maintains robustness across initialization and optimizer choices, and achieves superior aerodynamic performance compared to conventional approaches. This work establishes how AI generated priors integrates effectively with adjoint methods to enable robust, high-fidelity aerodynamic shape optimization through automatic differentiation.
- [102] arXiv:2507.23455 (cross-list from cs.CV) [pdf, html, other]
-
Title: Machine learning and machine learned prediction in chest X-ray imagesComments: 8 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Machine learning and artificial intelligence are fast-growing fields of research in which data is used to train algorithms, learn patterns, and make predictions. This approach helps to solve seemingly intricate problems with significant accuracy without explicit programming by recognizing complex relationships in data. Taking an example of 5824 chest X-ray images, we implement two machine learning algorithms, namely, a baseline convolutional neural network (CNN) and a DenseNet-121, and present our analysis in making machine-learned predictions in predicting patients with ailments. Both baseline CNN and DenseNet-121 perform very well in the binary classification problem presented in this work. Gradient-weighted class activation mapping shows that DenseNet-121 correctly focuses on essential parts of the input chest X-ray images in its decision-making more than the baseline CNN.
- [103] arXiv:2507.23523 (cross-list from cs.RO) [pdf, html, other]
-
Title: H-RDT: Human Manipulation Enhanced Bimanual Robotic ManipulationSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Imitation learning for robotic manipulation faces a fundamental challenge: the scarcity of large-scale, high-quality robot demonstration data. Recent robotic foundation models often pre-train on cross-embodiment robot datasets to increase data scale, while they face significant limitations as the diverse morphologies and action spaces across different robot embodiments make unified training challenging. In this paper, we present H-RDT (Human to Robotics Diffusion Transformer), a novel approach that leverages human manipulation data to enhance robot manipulation capabilities. Our key insight is that large-scale egocentric human manipulation videos with paired 3D hand pose annotations provide rich behavioral priors that capture natural manipulation strategies and can benefit robotic policy learning. We introduce a two-stage training paradigm: (1) pre-training on large-scale egocentric human manipulation data, and (2) cross-embodiment fine-tuning on robot-specific data with modular action encoders and decoders. Built on a diffusion transformer architecture with 2B parameters, H-RDT uses flow matching to model complex action distributions. Extensive evaluations encompassing both simulation and real-world experiments, single-task and multitask scenarios, as well as few-shot learning and robustness assessments, demonstrate that H-RDT outperforms training from scratch and existing state-of-the-art methods, including Pi0 and RDT, achieving significant improvements of 13.9% and 40.5% over training from scratch in simulation and real-world experiments, respectively. The results validate our core hypothesis that human manipulation data can serve as a powerful foundation for learning bimanual robotic manipulation policies.
- [104] arXiv:2507.23609 (cross-list from cs.CV) [pdf, html, other]
-
Title: Consistent Point MatchingSubjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
This study demonstrates that incorporating a consistency heuristic into the point-matching algorithm \cite{yerebakan2023hierarchical} improves robustness in matching anatomical locations across pairs of medical images. We validated our approach on diverse longitudinal internal and public datasets spanning CT and MRI modalities. Notably, it surpasses state-of-the-art results on the Deep Lesion Tracking dataset. Additionally, we show that the method effectively addresses landmark localization. The algorithm operates efficiently on standard CPU hardware and allows configurable trade-offs between speed and robustness. The method enables high-precision navigation between medical images without requiring a machine learning model or training data.
- [105] arXiv:2507.23620 (cross-list from cs.CV) [pdf, html, other]
-
Title: DivControl: Knowledge Diversion for Controllable Image GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Diffusion models have advanced from text-to-image (T2I) to image-to-image (I2I) generation by incorporating structured inputs such as depth maps, enabling fine-grained spatial control. However, existing methods either train separate models for each condition or rely on unified architectures with entangled representations, resulting in poor generalization and high adaptation costs for novel conditions. To this end, we propose DivControl, a decomposable pretraining framework for unified controllable generation and efficient adaptation. DivControl factorizes ControlNet via SVD into basic components-pairs of singular vectors-which are disentangled into condition-agnostic learngenes and condition-specific tailors through knowledge diversion during multi-condition training. Knowledge diversion is implemented via a dynamic gate that performs soft routing over tailors based on the semantics of condition instructions, enabling zero-shot generalization and parameter-efficient adaptation to novel conditions. To further improve condition fidelity and training efficiency, we introduce a representation alignment loss that aligns condition embeddings with early diffusion features. Extensive experiments demonstrate that DivControl achieves state-of-the-art controllability with 36.4$\times$ less training cost, while simultaneously improving average performance on basic conditions. It also delivers strong zero-shot and few-shot performance on unseen conditions, demonstrating superior scalability, modularity, and transferability.
- [106] arXiv:2507.23673 (cross-list from cs.CV) [pdf, html, other]
-
Title: SAMSA: Segment Anything Model Enhanced with Spectral Angles for Hyperspectral Interactive Medical Image SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Hyperspectral imaging (HSI) provides rich spectral information for medical imaging, yet encounters significant challenges due to data limitations and hardware variations. We introduce SAMSA, a novel interactive segmentation framework that combines an RGB foundation model with spectral analysis. SAMSA efficiently utilizes user clicks to guide both RGB segmentation and spectral similarity computations. The method addresses key limitations in HSI segmentation through a unique spectral feature fusion strategy that operates independently of spectral band count and resolution. Performance evaluation on publicly available datasets has shown 81.0% 1-click and 93.4% 5-click DICE on a neurosurgical and 81.1% 1-click and 89.2% 5-click DICE on an intraoperative porcine hyperspectral dataset. Experimental results demonstrate SAMSA's effectiveness in few-shot and zero-shot learning scenarios and using minimal training examples. Our approach enables seamless integration of datasets with different spectral characteristics, providing a flexible framework for hyperspectral medical image analysis.
- [107] arXiv:2507.23682 (cross-list from cs.RO) [pdf, html, other]
-
Title: villa-X: Enhancing Latent Action Modeling in Vision-Language-Action ModelsXiaoyu Chen, Hangxing Wei, Pushi Zhang, Chuheng Zhang, Kaixin Wang, Yanjiang Guo, Rushuai Yang, Yucen Wang, Xinquan Xiao, Li Zhao, Jianyu Chen, Jiang BianComments: Project page: this https URLSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Visual-Language-Action (VLA) models have emerged as a popular paradigm for learning robot manipulation policies that can follow language instructions and generalize to novel scenarios. Recent work has begun to explore the incorporation of latent actions, an abstract representation of visual change between two frames, into VLA pre-training. In this paper, we introduce villa-X, a novel Visual-Language-Latent-Action (ViLLA) framework that advances latent action modeling for learning generalizable robot manipulation policies. Our approach improves both how latent actions are learned and how they are incorporated into VLA pre-training. Together, these contributions enable villa-X to achieve superior performance across simulated environments including SIMPLER and LIBERO, as well as on two real-world robot setups including gripper and dexterous hand manipulation. We believe the ViLLA paradigm holds significant promise, and that our villa-X provides a strong foundation for future research.
- [108] arXiv:2507.23736 (cross-list from stat.ML) [pdf, html, other]
-
Title: DICOM De-Identification via Hybrid AI and Rule-Based Framework for Scalable, Uncertainty-Aware RedactionKyle Naddeo, Nikolas Koutsoubis, Rahul Krish, Ghulam Rasool, Nidhal Bouaynaya, Tony OSullivan, Raj KrishComments: 15 pages, 6 figures,Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Access to medical imaging and associated text data has the potential to drive major advances in healthcare research and patient outcomes. However, the presence of Protected Health Information (PHI) and Personally Identifiable Information (PII) in Digital Imaging and Communications in Medicine (DICOM) files presents a significant barrier to the ethical and secure sharing of imaging datasets. This paper presents a hybrid de-identification framework developed by Impact Business Information Solutions (IBIS) that combines rule-based and AI-driven techniques, and rigorous uncertainty quantification for comprehensive PHI/PII removal from both metadata and pixel data.
Our approach begins with a two-tiered rule-based system targeting explicit and inferred metadata elements, further augmented by a large language model (LLM) fine-tuned for Named Entity Recognition (NER), and trained on a suite of synthetic datasets simulating realistic clinical PHI/PII. For pixel data, we employ an uncertainty-aware Faster R-CNN model to localize embedded text, extract candidate PHI via Optical Character Recognition (OCR), and apply the NER pipeline for final redaction. Crucially, uncertainty quantification provides confidence measures for AI-based detections to enhance automation reliability and enable informed human-in-the-loop verification to manage residual risks.
This uncertainty-aware deidentification framework achieves robust performance across benchmark datasets and regulatory standards, including DICOM, HIPAA, and TCIA compliance metrics. By combining scalable automation, uncertainty quantification, and rigorous quality assurance, our solution addresses critical challenges in medical data de-identification and supports the secure, ethical, and trustworthy release of imaging data for research. - [109] arXiv:2507.23740 (cross-list from cs.CL) [pdf, html, other]
-
Title: Rule2Text: Natural Language Explanation of Logical Rules in Knowledge GraphsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Knowledge graphs (KGs) often contain sufficient information to support the inference of new facts. Identifying logical rules not only improves the completeness of a knowledge graph but also enables the detection of potential errors, reveals subtle data patterns, and enhances the overall capacity for reasoning and interpretation. However, the complexity of such rules, combined with the unique labeling conventions of each KG, can make them difficult for humans to understand. In this paper, we explore the potential of large language models to generate natural language explanations for logical rules. Specifically, we extract logical rules using the AMIE 3.5.1 rule discovery algorithm from the benchmark dataset FB15k-237 and two large-scale datasets, FB-CVT-REV and FB+CVT-REV. We examine various prompting strategies, including zero- and few-shot prompting, including variable entity types, and chain-of-thought reasoning. We conduct a comprehensive human evaluation of the generated explanations based on correctness, clarity, and hallucination, and also assess the use of large language models as automatic judges. Our results demonstrate promising performance in terms of explanation correctness and clarity, although several challenges remain for future research. All scripts and data used in this study are publicly available at this https URL}{this https URL.
- [110] arXiv:2507.23767 (cross-list from stat.ML) [pdf, html, other]
-
Title: Scaled Beta Models and Feature Dilution for Dynamic Ticket PricingComments: 27 pages, 11 figures, 3 tablesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
A novel approach is presented for identifying distinct signatures of performing acts in the secondary ticket resale market by analyzing dynamic pricing distributions. Using a newly curated, time series dataset from the SeatGeek API, we model ticket pricing distributions as scaled Beta distributions. This enables accurate parameter estimation from incomplete statistical data using a hybrid of quantile matching and the method of moments. Incorporating the estimated $\alpha$ and $\beta$ parameters into Random Forest classifiers significantly improves pairwise artist classification accuracy, demonstrating the unique economic signatures in event pricing data. Additionally, we provide theoretical and empirical evidence that incorporating zero-variance (constant-value) features into Random Forest models acts as an implicit regularizer, enhancing feature variety and robustness. This regularization promotes deeper, more varied trees in the ensemble, improving the bias-variance tradeoff and mitigating overfitting to dominant features. These findings are validated on both the new ticket pricing dataset and the standard UCI ML handwritten digits dataset.
- [111] arXiv:2507.23768 (cross-list from stat.ML) [pdf, html, other]
-
Title: Formal Bayesian Transfer Learning via the Total Risk PriorSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
In analyses with severe data-limitations, augmenting the target dataset with information from ancillary datasets in the application domain, called source datasets, can lead to significantly improved statistical procedures. However, existing methods for this transfer learning struggle to deal with situations where the source datasets are also limited and not guaranteed to be well-aligned with the target dataset. A typical strategy is to use the empirical loss minimizer on the source data as a prior mean for the target parameters, which places the estimation of source parameters outside of the Bayesian formalism. Our key conceptual contribution is to use a risk minimizer conditional on source parameters instead. This allows us to construct a single joint prior distribution for all parameters from the source datasets as well as the target dataset. As a consequence, we benefit from full Bayesian uncertainty quantification and can perform model averaging via Gibbs sampling over indicator variables governing the inclusion of each source dataset. We show how a particular instantiation of our prior leads to a Bayesian Lasso in a transformed coordinate system and discuss computational techniques to scale our approach to moderately sized datasets. We also demonstrate that recently proposed minimax-frequentist transfer learning techniques may be viewed as an approximate Maximum a Posteriori approach to our model. Finally, we demonstrate superior predictive performance relative to the frequentist baseline on a genetics application, especially when the source data are limited.
- [112] arXiv:2507.23773 (cross-list from cs.AI) [pdf, html, other]
-
Title: SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World ModelSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Robotics (cs.RO)
AI agents built on large language models (LLMs) hold enormous promise, but current practice focuses on a one-task-one-agent approach, which not only falls short of scalability and generality, but also suffers from the fundamental limitations of autoregressive LLMs. On the other hand, humans are general agents who reason by mentally simulating the outcomes of their actions and plans. Moving towards a more general and powerful AI agent, we introduce SimuRA, a goal-oriented architecture for generalized agentic reasoning. Based on a principled formulation of optimal agent in any environment, \modelname overcomes the limitations of autoregressive reasoning by introducing a world model for planning via simulation. The generalized world model is implemented using LLM, which can flexibly plan in a wide range of environments using the concept-rich latent space of natural language. Experiments on difficult web browsing tasks show that \modelname improves the success of flight search from 0\% to 32.2\%. World-model-based planning, in particular, shows consistent advantage of up to 124\% over autoregressive planning, demonstrating the advantage of world model simulation as a reasoning paradigm. We are excited about the possibility for training a single, general agent model based on LLMs that can act superintelligently in all environments. To start, we make SimuRA, a web-browsing agent built on \modelname with pretrained LLMs, available as a research demo for public testing.
- [113] arXiv:2507.23777 (cross-list from cs.GR) [pdf, html, other]
-
Title: XSpecMesh: Quality-Preserving Auto-Regressive Mesh Generation Acceleration via Multi-Head Speculative DecodingSubjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Current auto-regressive models can generate high-quality, topologically precise meshes; however, they necessitate thousands-or even tens of thousands-of next-token predictions during inference, resulting in substantial latency. We introduce XSpecMesh, a quality-preserving acceleration method for auto-regressive mesh generation models. XSpecMesh employs a lightweight, multi-head speculative decoding scheme to predict multiple tokens in parallel within a single forward pass, thereby accelerating inference. We further propose a verification and resampling strategy: the backbone model verifies each predicted token and resamples any tokens that do not meet the quality criteria. In addition, we propose a distillation strategy that trains the lightweight decoding heads by distilling from the backbone model, encouraging their prediction distributions to align and improving the success rate of speculative predictions. Extensive experiments demonstrate that our method achieves a 1.7x speedup without sacrificing generation quality. Our code will be released.
- [114] arXiv:2507.23784 (cross-list from cs.CV) [pdf, html, other]
-
Title: SUB: Benchmarking CBM Generalization via Synthetic Attribute SubstitutionsComments: Accepted at ICCV 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Concept Bottleneck Models (CBMs) and other concept-based interpretable models show great promise for making AI applications more transparent, which is essential in fields like medicine. Despite their success, we demonstrate that CBMs struggle to reliably identify the correct concepts under distribution shifts. To assess the robustness of CBMs to concept variations, we introduce SUB: a fine-grained image and concept benchmark containing 38,400 synthetic images based on the CUB dataset. To create SUB, we select a CUB subset of 33 bird classes and 45 concepts to generate images which substitute a specific concept, such as wing color or belly pattern. We introduce a novel Tied Diffusion Guidance (TDG) method to precisely control generated images, where noise sharing for two parallel denoising processes ensures that both the correct bird class and the correct attribute are generated. This novel benchmark enables rigorous evaluation of CBMs and similar interpretable models, contributing to the development of more robust methods. Our code is available at this https URL and the dataset at this http URL.
Cross submissions (showing 50 of 50 entries)
- [115] arXiv:2206.03234 (replaced) [pdf, html, other]
-
Title: Disparate Conditional Prediction in Multiclass ClassifiersComments: Published at ICML 2025Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)
We propose methods for auditing multiclass classifiers for fairness under multiclass equalized odds,by estimating the deviation from equalized odds when the classifier is not completely fair. We generalize to multiclass classifiers the measure of Disparate Conditional Prediction (DCP), originally suggested by Sabato & Yom-Tov (2020) for binary classifiers. DCP is defined as the fraction of the population for which the classifier predicts with conditional prediction probabilities that differ from the closest common baseline. We provide new local-optimization methods for estimating the multiclass DCPunder two different regimes,one in which the conditional confusion matrices for each protected sub-population are known, and one in which these cannot be estimated, for instance, because the classifier is inaccessible or because good-quality individual-level data is not available. These methods can be used to detect classifiers that likely treat a significant fraction of the population unfairly. Experiments demonstrate the accuracy of the methods. Code is provided at this https URL DCPmulticlass.
- [116] arXiv:2306.01654 (replaced) [pdf, html, other]
-
Title: Insights into Closed-form IPM-GAN Discriminator Guidance for Diffusion ModelingSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Diffusion models are a state-of-the-art generative modeling framework that transform noise to images via Langevin sampling, guided by the score, which is the gradient of the logarithm of the data distribution. Recent works have shown empirically that the generation quality can be improved when guided by classifier network, which is typically the discriminator trained in a generative adversarial network (GAN) setting. In this paper, we propose a theoretical framework to analyze the effect of the GAN discriminator on Langevin-based sampling, and show that the IPM-GAN optimization can be seen as one of smoothed score-matching, wherein the scores of the data and the generator distributions are convolved with the kernel function associated with the IPM. The proposed approach serves to unify score-based training and optimization of IPM-GANs. Based on these insights, we demonstrate that closed-form kernel-based discriminator guidance, results in improvements (in terms of CLIP-FID and KID metrics) when applied atop baseline diffusion models. We demonstrate these results on the denoising diffusion implicit model (DDIM) and latent diffusion model (LDM) settings on various standard datasets. We also show that the proposed approach can be combined with existing accelerated-diffusion techniques to improve latent-space image generation.
- [117] arXiv:2402.03158 (replaced) [pdf, html, other]
-
Title: Optimal and Near-Optimal Adaptive Vector QuantizationSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
Quantization is a fundamental optimization for many machine-learning use cases, including compressing gradients, model weights and activations, and datasets. The most accurate form of quantization is \emph{adaptive}, where the error is minimized with respect to a given input, rather than optimizing for the worst case. However, optimal adaptive quantization methods are considered infeasible in terms of both their runtime and memory requirements.
We revisit the Adaptive Vector Quantization (AVQ) problem and present algorithms that find optimal solutions with asymptotically improved time and space complexity. We also present an even faster near-optimal algorithm for large inputs. Our experiments show our algorithms may open the door to using AVQ more extensively in a variety of machine learning applications. - [118] arXiv:2406.13265 (replaced) [pdf, html, other]
-
Title: Molecule Graph Networks with Many-body Equivariant InteractionsSubjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci)
Message passing neural networks have demonstrated significant efficacy in predicting molecular interactions. Introducing equivariant vectorial representations augments expressivity by capturing geometric data symmetries, thereby improving model accuracy. However, two-body bond vectors in opposition may cancel each other out during message passing, leading to the loss of directional information on their shared node. In this study, we develop Equivariant N-body Interaction Networks (ENINet) that explicitly integrates l = 1 equivariant many-body interactions to enhance directional symmetric information in the message passing scheme. We provided a mathematical analysis demonstrating the necessity of incorporating many-body equivariant interactions and generalized the formulation to $N$-body interactions. Experiments indicate that integrating many-body equivariant representations enhances prediction accuracy across diverse scalar and tensorial quantum chemical properties.
- [119] arXiv:2407.01621 (replaced) [pdf, html, other]
-
Title: Deciphering interventional dynamical causality from non-intervention complex systemsSubjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Methodology (stat.ME); Machine Learning (stat.ML)
Detecting and quantifying causality is a focal topic in the fields of science, engineering, and interdisciplinary studies. However, causal studies on non-intervention systems attract much attention but remain extremely challenging. Delay-embedding technique provides a promising approach. In this study, we propose a framework named Interventional Dynamical Causality (IntDC) in contrast to the traditional Constructive Dynamical Causality (ConDC). ConDC, including Granger causality, transfer entropy and convergence of cross-mapping, measures the causality by constructing a dynamical model without considering interventions. A computational criterion, Interventional Embedding Entropy (IEE), is proposed to measure causal strengths in an interventional manner. IEE is an intervened causal information flow but in the delay-embedding space. Further, the IEE theoretically and numerically enables the deciphering of IntDC solely from observational (non-interventional) time-series data, without requiring any knowledge of dynamical models or real interventions in the considered system. In particular, IEE can be applied to rank causal effects according to their importance and construct causal networks from data. We conducted numerical experiments to demonstrate that IEE can find causal edges accurately, eliminate effects of confounding, and quantify causal strength robustly over traditional indices. We also applied IEE to real-world tasks. IEE performed as an accurate and robust tool for causal analyses solely from the observational data. The IntDC framework and IEE algorithm provide an efficient approach to the study of causality from time series in diverse non-intervention complex systems.
- [120] arXiv:2407.03080 (replaced) [pdf, html, other]
-
Title: Artificial Inductive Bias for Synthetic Tabular Data Generation in Data-Scarce ScenariosComments: 19 pages, 6 FiguresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
While synthetic tabular data generation using Deep Generative Models (DGMs) offers a compelling solution to data scarcity and privacy concerns, their effectiveness relies on the availability of substantial training data, often lacking in real-world scenarios. To overcome this limitation, we propose a novel methodology that explicitly integrates artificial inductive biases into the generative process to improve data quality in low-data regimes. Our framework leverages transfer learning and meta-learning techniques to construct and inject informative inductive biases into DGMs. We evaluate four approaches (pre-training, model averaging, Model-Agnostic Meta-Learning (MAML), and Domain Randomized Search (DRS)) and analyze their impact on the quality of the generated text. Experimental results show that incorporating inductive bias substantially improves performance, with transfer learning methods outperforming meta-learning, achieving up to 60\% gains in Jensen-Shannon divergence. The methodology is model-agnostic and especially relevant in domains such as healthcare and finance, where high-quality synthetic data are essential, and data availability is often limited.
- [121] arXiv:2407.15738 (replaced) [pdf, html, other]
-
Title: Parallel Split Learning with Global SamplingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Distributed deep learning in resource-constrained environments faces scalability and generalization challenges due to large effective batch sizes and non-identically distributed client data. We introduce a server-driven sampling strategy that maintains a fixed global batch size by dynamically adjusting client-side batch sizes. This decouples the effective batch size from the number of participating devices and ensures that global batches better reflect the overall data distribution. Using standard concentration bounds, we establish tighter deviation guarantees compared to existing approaches. Empirical results on a benchmark dataset confirm that the proposed method improves model accuracy, training efficiency, and convergence stability, offering a scalable solution for learning at the network edge.
- [122] arXiv:2408.05177 (replaced) [pdf, html, other]
-
Title: Coarse Graining with Neural Operators for Simulating Chaotic SystemsSubjects: Machine Learning (cs.LG)
Accurately predicting the long-term behavior of chaotic systems is crucial for various applications such as climate modeling. However, achieving such predictions typically requires iterative computations over a dense spatiotemporal grid to account for the unstable nature of chaotic systems, which is expensive and impractical in many real-world situations. An alternative approach to such a full-resolved simulation is using a coarse grid and then correcting its errors through a \textit{closure model}, which approximates the overall information from fine scales not captured in the coarse-grid simulation. Recently, ML approaches have been used for closure modeling, but they typically require a large number of training samples from expensive fully-resolved simulations (FRS). In this work, we prove an even more fundamental limitation, i.e., the standard approach to learning closure models suffers from a large approximation error for generic problems, no matter how large the model is, and it stems from the non-uniqueness of the mapping. We propose an alternative end-to-end learning approach using a physics-informed neural operator (PINO) that overcomes this limitation by not using a closure model or a coarse-grid solver. We first train the PINO model on data from a coarse-grid solver and then fine-tune it with (a small amount of) FRS and physics-based losses on a fine grid. The discretization-free nature of neural operators means that they do not suffer from the restriction of a coarse grid that closure models face, and they can provably approximate the long-term statistics of chaotic systems. In our experiments, our PINO model achieves a 330x speedup compared to FRS with a relative error $\sim 10\%$. In contrast, the closure model coupled with a coarse-grid solver is $60$x slower than PINO while having a much higher error $\sim186\%$ when the closure model is trained on the same FRS dataset.
- [123] arXiv:2408.10610 (replaced) [pdf, html, other]
-
Title: On the Approximation of Stationary Processes using the ARMA ModelComments: 11 pages, 1 figureSubjects: Machine Learning (cs.LG); Probability (math.PR); Methodology (stat.ME)
We look at a problem related to Autoregressive Moving Average (ARMA) models, on quantifying the approximation error between a true stationary process $X_t$ and an ARMA model $Y_t$. We take the transfer function representation $x(L)$ of a stationary process $X_t$ and show that the $L^{\infty}$ norm of $x$ acts as a valid norm on $X_t$ that controls the $\ell^2$ norm of its Wold coefficients. We then show that a certain subspace of stationary processes, which includes ARMA models, forms a Banach algebra under the $L^{\infty}$ norm that respects the multiplicative structure of $H^{\infty}$ transfer functions and thus improves on the structural properties of the cepstral norm for ARMA models. The natural definition of invertibility in this algebra is consistent with the original definition of ARMA invertibility, and generalizes better to non-ARMA processes than Wiener's $\ell^1$ condition. Finally, we calculate some explicit approximation bounds in the simpler context of continuous transfer functions, and critique some heuristic ideas on Padé approximations and parsimonious models.
- [124] arXiv:2409.17092 (replaced) [pdf, html, other]
-
Title: Accumulator-Aware Post-Training Quantization for Large Language ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM)
When quantizing weights and activations to increasingly narrower representations, the cost of additions begins to dominate that of multiplications in multiply-accumulate (MAC) units. Recent studies show that reducing addition costs via low-precision accumulation improves throughput, power, and area across inference platforms, albeit with an increased risk of overflow. Accumulator-aware quantization research has so far only considered the quantization-aware training (QAT) paradigm, in which models are fine-tuned or trained from scratch with quantization in the loop. As models and datasets continue to grow in size, QAT techniques become increasingly more expensive, which has motivated the recent surge in post-training quantization (PTQ) research. To bridge this gap, we introduce AXE, the first accumulator-aware quantization framework explicitly designed to endow overflow avoidance guarantees to PTQ algorithms. We present theoretical motivation for AXE and demonstrate its flexibility by implementing it on top of two existing algorithms: GPFQ and OPTQ. We design AXE to support multi-stage accumulation, opening the door to full datapath optimization for the first time. We evaluate AXE using recent language generation models; when quantizing Llama3 8B for a 16-bit multi-stage accumulation datapath, AXE maintains up to 98% of the FP16 perplexity, surpassing naive bit width manipulation by up to 15%.
- [125] arXiv:2410.09486 (replaced) [pdf, other]
-
Title: ActSafe: Active Exploration with Safety Constraints for Reinforcement LearningSubjects: Machine Learning (cs.LG); Robotics (cs.RO)
Reinforcement learning (RL) is ubiquitous in the development of modern AI systems. However, state-of-the-art RL agents require extensive, and potentially unsafe, interactions with their environments to learn effectively. These limitations confine RL agents to simulated environments, hindering their ability to learn directly in real-world settings. In this work, we present ActSafe, a novel model-based RL algorithm for safe and efficient exploration. ActSafe learns a well-calibrated probabilistic model of the system and plans optimistically w.r.t. the epistemic uncertainty about the unknown dynamics, while enforcing pessimism w.r.t. the safety constraints. Under regularity assumptions on the constraints and dynamics, we show that ActSafe guarantees safety during learning while also obtaining a near-optimal policy in finite time. In addition, we propose a practical variant of ActSafe that builds on latest model-based RL advancements and enables safe exploration even in high-dimensional settings such as visual control. We empirically show that ActSafe obtains state-of-the-art performance in difficult exploration tasks on standard safe deep RL benchmarks while ensuring safety during learning.
- [126] arXiv:2412.00123 (replaced) [pdf, html, other]
-
Title: Electricity Price Prediction Using Multi-Kernel Gaussian Process Regression Combined with Kernel-Based Support Vector RegressionSubjects: Machine Learning (cs.LG); Probability (math.PR)
This paper presents a new hybrid model for predicting German electricity prices. The algorithm is based on a combination of Gaussian Process Regression (GPR) and Support Vector Regression (SVR). Although GPR is a competent model for learning stochastic patterns within data and for interpolation, its performance for out-of-sample data is not very promising. By choosing a suitable data-dependent covariance function, we can enhance the performance of GPR for the German hourly power prices being tested. However, since the out-of-sample prediction is dependent on the training data, the prediction is vulnerable to noise and outliers. To overcome this issue, a separate prediction is calculated using SVR, which applies margin-based optimization. This method is advantageous when dealing with non-linear processes and outliers, since only certain necessary points (support vectors) in the training data are responsible for regression. The individual predictions are then linearly combined using uniform weights. When tested on historic German power prices, this approach outperforms the publicly available benchmarks, namely the LASSO estimated autoregressive regression model, deep neural network provided in the recent research by [1].
- [127] arXiv:2412.12098 (replaced) [pdf, other]
-
Title: MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximizationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Reinforcement learning (RL) algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. Most common RL algorithms use undirected exploration, i.e., select random sequences of actions. Exploration can also be directed using intrinsic rewards, such as curiosity or model epistemic uncertainty. However, effectively balancing task and intrinsic rewards is challenging and often task-dependent. In this work, we introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. MaxInfoRL steers exploration towards informative transitions, by maximizing intrinsic rewards such as the information gain about the underlying task. When combined with Boltzmann exploration, this approach naturally trades off maximization of the value function with that of the entropy over states, rewards, and actions. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits. We then apply this general formulation to a variety of off-policy model-free RL methods for continuous state-action spaces, yielding novel algorithms that achieve superior performance across hard exploration problems and complex scenarios such as visual control tasks.
- [128] arXiv:2412.19792 (replaced) [pdf, html, other]
-
Title: InfAlign: Inference-aware language model alignmentAnanth Balashankar, Ziteng Sun, Jonathan Berant, Jacob Eisenstein, Michael Collins, Adrian Hutter, Jong Lee, Chirag Nagpal, Flavien Prost, Aradhana Sinha, Ananda Theertha Suresh, Ahmad BeiramiSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Information Theory (cs.IT)
Language model alignment is a critical step in training modern generative language models. Alignment targets to improve win rate of a sample from the aligned model against the base model. Today, we are increasingly using inference-time algorithms (e.g., Best-of-N, controlled decoding, tree search) to decode from language models rather than standard sampling. We show that this train/test mismatch makes standard RLHF framework sub-optimal in view of such inference-time methods. To this end, we propose a framework for inference-aware alignment (InfAlign), which aims to optimize inference-time win rate of the aligned policy against the base model. We prove that for any inference-time decoding procedure, the optimal aligned policy is the solution to the standard RLHF problem with a transformation of the reward. This motivates us to provide the calibrate-and-transform RL (InfAlign-CTRL) algorithm to solve this problem, which involves a reward calibration step and a KL-regularized reward maximization step with a transformation of the calibrated reward. For best-of-N sampling and best-of-N jailbreaking, we propose specific transformations offering up to 3-8% improvement on inference-time win rates. Finally, we also show that our proposed reward calibration method is a strong baseline for optimizing standard win rate.
- [129] arXiv:2501.08727 (replaced) [pdf, html, other]
-
Title: Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image ModelsComments: ICCV 2025Subjects: Machine Learning (cs.LG)
Parameter-Efficient Fine-Tuning (PEFT) of text-to-image models has become an increasingly popular technique with many applications. Among the various PEFT methods, Low-Rank Adaptation (LoRA) and its variants have gained significant attention due to their effectiveness, enabling users to fine-tune models with limited computational resources. However, the approximation gap between the low-rank assumption and desired fine-tuning weights prevents the simultaneous acquisition of ultra-parameter-efficiency and better performance. To reduce this gap and further improve the power of LoRA, we propose a new PEFT method that combines two classes of adaptations, namely, transform and residual adaptations. In specific, we first apply a full-rank and dense transform to the pre-trained weight. This learnable transform is expected to align the pre-trained weight as closely as possible to the desired weight, thereby reducing the rank of the residual weight. Then, the residual part can be effectively approximated by more compact and parameter-efficient structures, with a smaller approximation error. To achieve ultra-parameter-efficiency in practice, we design highly flexible and effective tensor decompositions for both the transform and residual adaptations. Additionally, popular PEFT methods such as DoRA can be summarized under this transform plus residual adaptation scheme. Experiments are conducted on fine-tuning Stable Diffusion models in subject-driven and controllable generation. The results manifest that our method can achieve better performances and parameter efficiency compared to LoRA and several baselines.
- [130] arXiv:2501.15544 (replaced) [pdf, html, other]
-
Title: Advancing Generative Artificial Intelligence and Large Language Models for Demand Side Management with Internet of Electric VehiclesComments: 11 PagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Generative artificial intelligence, particularly through large language models (LLMs), is poised to transform energy optimization and demand side management (DSM) within microgrids. This paper explores the integration of LLMs into energy management, emphasizing their roles in automating the optimization of DSM strategies with Internet of electric vehicles. We investigate challenges and solutions associated with DSM and explore the new opportunities presented by leveraging LLMs. Then, we propose an innovative solution that enhances LLMs with retrieval-augmented generation for automatic problem formulation, code generation, and customizing optimization. We present a case study to demonstrate the effectiveness of our proposed solution in charging scheduling and optimization for electric vehicles, highlighting our solution's significant advancements in energy efficiency and user adaptability. This work underscores the potential of LLMs for energy optimization and fosters a new era of intelligent DSM solutions.
- [131] arXiv:2501.16325 (replaced) [pdf, html, other]
-
Title: Tailored Forecasting from Short Time Series via Meta-learningComments: 23 pages, 12 figuresSubjects: Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD); Computational Physics (physics.comp-ph)
Machine learning models can effectively forecast dynamical systems from time-series data, but they typically require large amounts of past data, making forecasting particularly challenging for systems with limited history. To overcome this, we introduce Meta-learning for Tailored Forecasting using Related Time Series (METAFORS), which generalizes knowledge across systems to enable forecasting in data-limited scenarios. By learning from a library of models trained on longer time series from potentially related systems, METAFORS builds and initializes a model tailored to short time-series data from the system of interest. Using a reservoir computing implementation and testing on simulated chaotic systems, we demonstrate that METAFORS can reliably predict both short-term dynamics and long-term statistics without requiring contextual labels. We see this even when test and related systems exhibit substantially different behaviors, highlighting METAFORS' strengths in data-limited scenarios.
- [132] arXiv:2502.06210 (replaced) [pdf, html, other]
-
Title: Achieving Deep Continual Learning via EvolutionSubjects: Machine Learning (cs.LG)
Deep neural networks, despite their remarkable success, remain fundamentally limited in their ability to perform Continual Learning (CL). While most current methods aim to enhance the capabilities of a single model, Inspired by the collective learning mechanisms of human populations, we introduce Evolving Continual Learning (ECL), a framework that maintains and evolves a diverse population of neural network models. ECL continually searches for an optimal architecture for each introduced incremental task. This tailored model is trained on the corresponding task and archived as a specialized expert, contributing to a growing collection of skills. This approach inherently resolves the core CL challenges: stability is achieved through the isolation of expert models, while plasticity is greatly enhanced by evolving unique, task-specific architectures. Experimental results demonstrate that ECL significantly outperforms state-of-the-art individual-level CL methods. By shifting the focus from individual adaptation to collective evolution, ECL presents a novel path toward AI systems capable of CL.
- [133] arXiv:2502.17264 (replaced) [pdf, html, other]
-
Title: Kandinsky Conformal Prediction: Beyond Class- and Covariate-Conditional CoverageSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Conformal prediction is a powerful distribution-free framework for constructing prediction sets with coverage guarantees. Classical methods, such as split conformal prediction, provide marginal coverage, ensuring that the prediction set contains the label of a random test point with a target probability. However, these guarantees may not hold uniformly across different subpopulations, leading to disparities in coverage. Prior work has explored coverage guarantees conditioned on events related to the covariates and label of the test point. We present Kandinsky conformal prediction, a framework that significantly expands the scope of conditional coverage guarantees. In contrast to Mondrian conformal prediction, which restricts its coverage guarantees to disjoint groups -- reminiscent of the rigid, structured grids of Piet Mondrian's art -- our framework flexibly handles overlapping and fractional group memberships defined jointly on covariates and labels, reflecting the layered, intersecting forms in Wassily Kandinsky's compositions. Our algorithm unifies and extends existing methods, encompassing covariate-based group conditional, class conditional, and Mondrian conformal prediction as special cases, while achieving a minimax-optimal high-probability conditional coverage bound. Finally, we demonstrate the practicality of our approach through empirical evaluation on real-world datasets.
- [134] arXiv:2503.13544 (replaced) [pdf, html, other]
-
Title: Decision by Supervised Learning with Deep Ensembles: A Practical Framework for Robust Portfolio OptimizationComments: 8 pages, 3 figuresSubjects: Machine Learning (cs.LG); Computational Finance (q-fin.CP); Portfolio Management (q-fin.PM)
We propose Decision by Supervised Learning (DSL), a practical framework for robust portfolio optimization. DSL reframes portfolio construction as a supervised learning problem: models are trained to predict optimal portfolio weights, using cross-entropy loss and portfolios constructed by maximizing the Sharpe or Sortino ratio. To further enhance stability and reliability, DSL employs Deep Ensemble methods, substantially reducing variance in portfolio allocations. Through comprehensive backtesting across diverse market universes and neural architectures, shows superior performance compared to both traditional strategies and leading machine learning-based methods, including Prediction-Focused Learning and End-to-End Learning. We show that increasing the ensemble size leads to higher median returns and more stable risk-adjusted performance. The code is available at this https URL.
- [135] arXiv:2504.10403 (replaced) [pdf, html, other]
-
Title: Satellite Federated Fine-Tuning for Foundation Models in Space Computing Power NetworksSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
Advancements in artificial intelligence (AI) and low-earth orbit (LEO) satellites have promoted the application of large remote sensing foundation models for various downstream tasks. However, direct downloading of these models for fine-tuning on the ground is impeded by privacy concerns and limited bandwidth. Satellite federated learning (FL) offers a solution by enabling model fine-tuning directly on-board satellites and aggregating model updates without data downloading. Nevertheless, for large foundation models, the computational capacity of satellites is insufficient to support effective on-board fine-tuning in traditional satellite FL frameworks. To address these challenges, we propose a satellite-ground collaborative federated fine-tuning framework. The key of the framework lies in how to reasonably decompose and allocate model components to alleviate insufficient on-board computation capabilities. During fine-tuning, satellites exchange intermediate results with ground stations or other satellites for forward propagation and back propagation, which brings communication challenges due to the special communication topology of space transmission networks, such as intermittent satellite-ground communication, short duration of satellite-ground communication windows, and unstable inter-orbit inter-satellite links (ISLs). To reduce transmission delays, we further introduce tailored communication strategies that integrate both communication and computing resources. Specifically, we propose a parallel intra-orbit communication strategy, a topology-aware satellite-ground communication strategy, and a latency-minimalization inter-orbit communication strategy to reduce space communication costs. Simulation results demonstrate significant reductions in training time with improvements of approximately 33%.
- [136] arXiv:2504.16283 (replaced) [pdf, html, other]
-
Title: Affect Models Have Weak Generalizability to Atypical SpeechComments: PreprintSubjects: Machine Learning (cs.LG)
Speech and voice conditions can alter the acoustic properties of speech, which could impact the performance of paralinguistic models for affect for people with atypical speech. We evaluate publicly available models for recognizing categorical and dimensional affect from speech on a dataset of atypical speech, comparing results to datasets of typical speech. We investigate three dimensions of speech atypicality: intelligibility, which is related to pronounciation; monopitch, which is related to prosody, and harshness, which is related to voice quality. We look at (1) distributional trends of categorical affect predictions within the dataset, (2) distributional comparisons of categorical affect predictions to similar datasets of typical speech, and (3) correlation strengths between text and speech predictions for spontaneous speech for valence and arousal. We find that the output of affect models is significantly impacted by the presence and degree of speech atypicalities. For instance, the percentage of speech predicted as sad is significantly higher for all types and grades of atypical speech when compared to similar typical speech datasets. In a preliminary investigation on improving robustness for atypical speech, we find that fine-tuning models on pseudo-labeled atypical speech data improves performance on atypical speech without impacting performance on typical speech. Our results emphasize the need for broader training and evaluation datasets for speech emotion models, and for modeling approaches that are robust to voice and speech differences.
- [137] arXiv:2505.00830 (replaced) [pdf, html, other]
-
Title: Intersectional Divergence: Measuring Fairness in RegressionSubjects: Machine Learning (cs.LG)
Fairness in machine learning research is commonly framed in the context of classification tasks, leaving critical gaps in regression. In this paper, we propose a novel approach to measure intersectional fairness in regression tasks, going beyond the focus on single protected attributes from existing work to consider combinations of all protected attributes. Furthermore, we contend that it is insufficient to measure the average error of groups without regard for imbalanced domain preferences. Accordingly, we propose Intersectional Divergence (ID) as the first fairness measure for regression tasks that 1) describes fair model behavior across multiple protected attributes and 2) differentiates the impact of predictions in target ranges most relevant to users. We extend our proposal demonstrating how ID can be adapted into a loss function, IDLoss, that satisfies convergence guarantees and has piecewise smooth properties that enable practical optimization. Through an extensive experimental evaluation, we demonstrate how ID allows unique insights into model behavior and fairness, and how incorporating IDLoss into optimization can considerably improve single-attribute and intersectional model fairness while maintaining a competitive balance in predictive performance.
- [138] arXiv:2505.05702 (replaced) [pdf, html, other]
-
Title: Hypergraph Neural Sheaf Diffusion: A Symmetric Simplicial Set Framework for Higher-Order LearningComments: Published in IEEE AccessJournal-ref: IEEE Access, vol. 13, pp. 131823-131838, 2025Subjects: Machine Learning (cs.LG); Algebraic Topology (math.AT)
The absence of intrinsic adjacency relations and orientation systems in hypergraphs creates fundamental challenges for constructing sheaf Laplacians of arbitrary degrees. We resolve these limitations through symmetric simplicial sets derived directly from hypergraphs, called symmetric simplicial lifting, which encode all possible oriented subrelations within each hyperedge as ordered tuples. This construction canonically defines adjacency via facet maps while inherently preserving hyperedge provenance. We establish that the normalized degree zero sheaf Laplacian on our symmetric simplicial lifting reduces exactly to the traditional graph normalized sheaf Laplacian when restricted to graphs, validating its mathematical consistency with prior graph-based sheaf theory. Furthermore, the induced structure preserves all structural information from the original hypergraph, ensuring that every multi-way relational detail is faithfully retained. Leveraging this framework, we introduce Hypergraph Neural Sheaf Diffusion (HNSD), the first principled extension of neural sheaf diffusion to hypergraphs. HNSD operates via normalized degree zero sheaf Laplacian over symmetric simplicial lifting, resolving orientation ambiguity and adjacency sparsity inherent to hypergraph learning. Experimental evaluations demonstrate HNSDs competitive performance across established benchmarks.
- [139] arXiv:2505.06275 (replaced) [pdf, html, other]
-
Title: SinBasis Networks: Matrix-Equivalent Feature Extraction for Wave-Like Optical SpectrogramsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics)
Wave-like images-from attosecond streaking spectrograms to optical spectra, audio mel-spectrograms and periodic video frames-encode critical harmonic structures that elude conventional feature extractors. We propose a unified, matrix-equivalent framework that reinterprets convolution and attention as linear transforms on flattened inputs, revealing filter weights as basis vectors spanning latent feature subspaces. To infuse spectral priors we apply elementwise $\sin(\cdot)$ mappings to each weight matrix. Embedding these transforms into CNN, ViT and Capsule architectures yields Sin-Basis Networks with heightened sensitivity to periodic motifs and built-in invariance to spatial shifts. Experiments on a diverse collection of wave-like image datasets-including 80,000 synthetic attosecond streaking spectrograms, thousands of Raman, photoluminescence and FTIR spectra, mel-spectrograms from AudioSet and cycle-pattern frames from Kinetics-demonstrate substantial gains in reconstruction accuracy, translational robustness and zero-shot cross-domain transfer. Theoretical analysis via matrix isomorphism and Mercer-kernel truncation quantifies how sinusoidal reparametrization enriches expressivity while preserving stability in data-scarce regimes. Sin-Basis Networks thus offer a lightweight, physics-informed approach to deep learning across all wave-form imaging modalities.
- [140] arXiv:2505.07797 (replaced) [pdf, html, other]
-
Title: A Theoretical Framework for Explaining Reinforcement Learning with Shapley ValuesSubjects: Machine Learning (cs.LG)
Reinforcement learning agents can achieve super-human performance in complex decision-making tasks, but their behaviour is often difficult to understand and explain. This lack of explanation limits deployment, especially in safety-critical settings where understanding and trust are essential. We identify three core explanatory targets that together provide a comprehensive view of reinforcement learning agents: behaviour, outcomes, and predictions. We develop a unified theoretical framework for explaining these three elements of reinforcement learning agents through the influence of individual features that the agent observes in its environment. We derive feature influences by using Shapley values, which collectively and uniquely satisfy a set of well-motivated axioms for fair and consistent credit assignment. The proposed approach, Shapley Values for Explaining Reinforcement Learning (SVERL), provides a single theoretical framework to comprehensively and meaningfully explain reinforcement learning agents. It yields explanations with precise semantics that are not only interpretable but also mathematically justified, enabling us to identify and correct conceptual issues in prior explanations. Through illustrative examples, we show how SVERL produces useful, intuitive explanations of agent behaviour, outcomes, and predictions, which are not apparent from observing agent behaviour alone.
- [141] arXiv:2505.18102 (replaced) [pdf, html, other]
-
Title: How Can I Publish My LLM Benchmark Without Giving the True Answers Away?Comments: Extended version of the paper presented as an Oral at the ICML 2025 Workshop on the Impact of Memorization on Trustworthy Foundation ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Methodology (stat.ME)
Publishing a large language model (LLM) benchmark on the Internet risks contaminating future LLMs: the benchmark may be unintentionally (or intentionally) used to train or select a model. A common mitigation is to keep the benchmark private and let participants submit their models or predictions to the organizers. However, this strategy will require trust in a single organization and still permits test-set overfitting through repeated queries. To overcome this issue, we propose a way to publish benchmarks without completely disclosing the ground-truth answers to the questions, while still maintaining the ability to openly evaluate LLMs. Our main idea is to inject randomness to the answers by preparing several logically correct answers, and only include one of them as the solution in the benchmark. This reduces the best possible accuracy, i.e., Bayes accuracy, of the benchmark. Not only is this helpful to keep us from disclosing the ground truth, but this approach also offers a test for detecting data contamination. In principle, even fully capable models should not surpass the Bayes accuracy. If a model surpasses this ceiling despite this expectation, this is a strong signal of data contamination. We present experimental evidence that our method can detect data contamination accurately on a wide range of benchmarks, models, and training methodologies.
- [142] arXiv:2505.20553 (replaced) [pdf, html, other]
-
Title: A ZeNN architecture to avoid the Gaussian trapComments: New experiments involving PiNNs for solving Schrödinger and Bessel-type equationsSubjects: Machine Learning (cs.LG); Probability (math.PR)
We propose a new simple architecture, Zeta Neural Networks (ZeNNs), in order to overcome several shortcomings of standard multi-layer perceptrons (MLPs). Namely, in the large width limit, MLPs are non-parametric, they do not have a well-defined pointwise limit, they lose non-Gaussian attributes and become unable to perform feature learning; moreover, finite width MLPs perform poorly in learning high frequencies. The new ZeNN architecture is inspired by three simple principles from harmonic analysis:
i) Enumerate the perceptons and introduce a non-learnable weight to enforce convergence;
ii) Introduce a scaling (or frequency) factor;
iii) Choose activation functions that lead to near orthogonal systems.
We will show that these ideas allow us to fix the referred shortcomings of MLPs. In fact, in the infinite width limit, ZeNNs converge pointwise, they exhibit a rich asymptotic structure beyond Gaussianity, and perform feature learning. Moreover, when appropriate activation functions are chosen, (finite width) ZeNNs excel at learning high-frequency features of functions with low dimensional domains. - [143] arXiv:2506.03956 (replaced) [pdf, html, other]
-
Title: Adapt before Continual LearningSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Continual Learning (CL) seeks to enable neural networks to incrementally acquire new knowledge (plasticity) while retaining existing knowledge (stability). Although pre-trained models (PTMs) have provided a strong foundation for CL, existing approaches face a fundamental challenge in balancing these two competing objectives. Current methods typically address stability by freezing the PTM backbone, which severely limits the model's plasticity, particularly when incoming data distribution diverges largely from the pre-training data. Alternatively, sequentially fine-tuning the entire PTM can adapt to new knowledge but often leads to catastrophic forgetting, highlighting the critical stability-plasticity trade-off in PTM-based CL. To address this limitation, we propose Adapting PTMs before the core CL} process (ACL), a novel framework that introduces a plug-and-play adaptation phase prior to learning each new task. During this phase, ACL refines the PTM backbone by aligning embeddings with their original class prototypes while distancing them from irrelevant classes. This mechanism theoretically and empirically demonstrates desirable balance between stability and plasticity, significantly improving CL performance across benchmarks and integrated methods. Code is available at this https URL.
- [144] arXiv:2506.12284 (replaced) [pdf, html, other]
-
Title: GrokAlign: Geometric Characterisation and Acceleration of GrokkingComments: 23 pages, 11 figures, 3 tablesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
A key challenge for the machine learning community is to understand and accelerate the training dynamics of deep networks that lead to delayed generalisation and emergent robustness to input perturbations, also known as grokking. Prior work has associated phenomena like delayed generalisation with the transition of a deep network from a linear to a feature learning regime, and emergent robustness with changes to the network's functional geometry, in particular the arrangement of the so-called linear regions in deep networks employing continuous piecewise affine nonlinearities. Here, we explain how grokking is realised in the Jacobian of a deep network and demonstrate that aligning a network's Jacobians with the training data (in the sense of cosine similarity) ensures grokking under a low-rank Jacobian assumption. Our results provide a strong theoretical motivation for the use of Jacobian regularisation in optimizing deep networks -- a method we introduce as GrokAlign -- which we show empirically to induce grokking much sooner than more conventional regularizers like weight decay. Moreover, we introduce centroid alignment as a tractable and interpretable simplification of Jacobian alignment that effectively identifies and tracks the stages of deep network training dynamics. Accompanying webpage (this https URL) and code (this https URL).
- [145] arXiv:2506.14781 (replaced) [pdf, html, other]
-
Title: Two-dimensional Parallel Tempering for Constrained OptimizationComments: Added references in IntroductionJournal-ref: Physical Review E (2025)Subjects: Machine Learning (cs.LG); Statistical Mechanics (cond-mat.stat-mech); Optimization and Control (math.OC); Machine Learning (stat.ML)
Sampling Boltzmann probability distributions plays a key role in machine learning and optimization, motivating the design of hardware accelerators such as Ising machines. While the Ising model can in principle encode arbitrary optimization problems, practical implementations are often hindered by soft constraints that either slow down mixing when too strong, or fail to enforce feasibility when too weak. We introduce a two-dimensional extension of the powerful parallel tempering algorithm (PT) that addresses this challenge by adding a second dimension of replicas interpolating the penalty strengths. This scheme ensures constraint satisfaction in the final replicas, analogous to low-energy states at low temperature. The resulting two-dimensional parallel tempering algorithm (2D-PT) improves mixing in heavily constrained replicas and eliminates the need to explicitly tune the penalty strength. In a representative example of graph sparsification with copy constraints, 2D-PT achieves near-ideal mixing, with Kullback-Leibler divergence decaying as O(1/t). When applied to sparsified Wishart instances, 2D-PT yields orders of magnitude speedup over conventional PT with the same number of replicas. The method applies broadly to constrained Ising problems and can be deployed on existing Ising machines.
- [146] arXiv:2506.15692 (replaced) [pdf, html, other]
-
Title: MLE-STAR: Machine Learning Engineering Agent via Search and Targeted RefinementSubjects: Machine Learning (cs.LG)
Agents based on large language models (LLMs) for machine learning engineering (MLE) can automatically implement ML models via code generation. However, existing approaches to build such agents often rely heavily on inherent LLM knowledge and employ coarse exploration strategies that modify the entire code structure at once. This limits their ability to select effective task-specific models and perform deep exploration within specific components, such as experimenting extensively with feature engineering options. To overcome these, we propose MLE-STAR, a novel approach to build MLE agents. MLE-STAR first leverages external knowledge by using a search engine to retrieve effective models from the web, forming an initial solution, then iteratively refines it by exploring various strategies targeting specific ML components. This exploration is guided by ablation studies analyzing the impact of individual code blocks. Furthermore, we introduce a novel ensembling method using an effective strategy suggested by MLE-STAR. Our experimental results show that MLE-STAR achieves medals in 64% of the Kaggle competitions on the MLE-bench Lite, significantly outperforming the best alternative.
- [147] arXiv:2506.17247 (replaced) [pdf, html, other]
-
Title: Recursive Learning-Based Virtual Buffering for Analytical Global PlacementSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Due to the skewed scaling of interconnect versus cell delay in modern technology nodes, placement with buffer porosity (i.e., cell density) awareness is essential for timing closure in physical synthesis flows. However, existing approaches face two key challenges: (i) traditional van Ginneken-Lillis-style buffering approaches are computationally expensive during global placement; and (ii) machine learning-based approaches, such as BufFormer, lack a thorough consideration of Electrical Rule Check (ERC) violations and fail to "close the loop" back into the physical design flow. In this work, we propose MLBuf-RePlAce, the first open-source learning-driven virtual buffering-aware analytical global placement framework, built on top of the OpenROAD infrastructure. MLBuf-RePlAce adopts an efficient recursive learning-based generative buffering approach to predict buffer types and locations, addressing ERC violations during global placement. We compare MLBuf-RePlAce against the default virtual buffering-based timing-driven global placer in OpenROAD, using open-source testcases from the TILOS MacroPlacement and OpenROAD-flow-scripts repositories. Without degradation of post-route power, MLBuf-RePlAce achieves (maximum, average) improvements of (56%, 31%) in total negative slack (TNS) within the open-source OpenROAD flow. When evaluated by completion in a commercial flow, MLBuf-RePlAce achieves (maximum, average) improvements of (53%, 28%) in TNS with an average of 0.2% improvement in post-route power.
- [148] arXiv:2507.07675 (replaced) [pdf, other]
-
Title: Some Theoretical Results on Layerwise Effective Dimension Oscillations in Finite Width ReLU NetworksComments: Incomplete citationsSubjects: Machine Learning (cs.LG)
We analyze the layerwise effective dimension (rank of the feature matrix) in fully-connected ReLU networks of finite width. Specifically, for a fixed batch of $m$ inputs and random Gaussian weights, we derive closed-form expressions for the expected rank of the \$m\times n\$ hidden activation matrices. Our main result shows that $\mathbb{E}[EDim(\ell)]=m[1-(1-2/\pi)^\ell]+O(e^{-c m})$ so that the rank deficit decays geometrically with ratio $1-2 / \pi \approx 0.3634$. We also prove a sub-Gaussian concentration bound, and identify the "revival" depths at which the expected rank attains local maxima. In particular, these peaks occur at depths $\ell_k^*\approx(k+1/2)\pi/\log(1/\rho)$ with height $\approx (1-e^{-\pi/2}) m \approx 0.79m$. We further show that this oscillatory rank behavior is a finite-width phenomenon: under orthogonal weight initialization or strong negative-slope leaky-ReLU, the rank remains (nearly) full. These results provide a precise characterization of how random ReLU layers alternately collapse and partially revive the subspace of input variations, adding nuance to prior work on expressivity of deep networks.
- [149] arXiv:2507.09252 (replaced) [pdf, html, other]
-
Title: TPP-SD: Accelerating Transformer Point Process Sampling with Speculative DecodingSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We propose TPP-SD, a novel approach that accelerates Transformer temporal point process (TPP) sampling by adapting speculative decoding (SD) techniques from language models. By identifying the structural similarities between thinning algorithms for TPPs and speculative decoding for language models, we develop an efficient sampling framework that leverages a smaller draft model to generate multiple candidate events, which are then verified by the larger target model in parallel. TPP-SD maintains the same output distribution as autoregressive sampling while achieving significant acceleration. Experiments on both synthetic and real datasets demonstrate that our approach produces samples from identical distributions as standard methods, but with 2-6$\times$ speedup. Our ablation studies analyze the impact of hyperparameters such as draft length and draft model size on sampling efficiency. TPP-SD bridges the gap between powerful Transformer TPP models and the practical need for rapid sequence sampling.
- [150] arXiv:2507.13762 (replaced) [pdf, html, other]
-
Title: MolPIF: A Parameter Interpolation Flow Model for Molecule GenerationYaowei Jin, Junjie Wang, Wenkai Xiang, Duanhua Cao, Dan Teng, Zhehuan Fan, Jiacheng Xiong, Xia Sheng, Chuanlong Zeng, Duo An, Mingyue Zheng, Shuangjia Zheng, Qian ShiSubjects: Machine Learning (cs.LG); Biomolecules (q-bio.BM)
Advances in deep learning for molecular generation show promise in accelerating drug discovery. Bayesian Flow Networks (BFNs) have recently shown impressive performance across diverse chemical tasks, with their success often ascribed to the paradigm of modeling in a low-variance parameter space. However, the Bayesian inference-based strategy imposes limitations on designing more flexible distribution transformation pathways, making it challenging to adapt to diverse data distributions and varied task requirements. Furthermore, the potential for simpler, more efficient parameter-space-based models is unexplored. To address this, we propose a novel Parameter Interpolation Flow model (named PIF) with detailed theoretical foundation, training, and inference procedures. We then develop MolPIF for structure-based drug design, demonstrating its superior performance across diverse metrics compared to baselines. This work validates the effectiveness of parameter-space-based generative modeling paradigm for molecules and offers new perspectives for model design.
- [151] arXiv:2507.15857 (replaced) [pdf, html, other]
-
Title: Diffusion Beats Autoregressive in Data-Constrained SettingsComments: Project Webpage: this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Autoregressive (AR) models have long dominated the landscape of large language models, driving progress across a wide range of tasks. Recently, diffusion-based language models have emerged as a promising alternative, though their advantages over AR models remain underexplored. In this paper, we systematically study masked diffusion models in data-constrained settings-where training involves repeated passes over limited data-and find that they significantly outperform AR models when compute is abundant but data is scarce. Diffusion models make better use of repeated data, achieving lower validation loss and superior downstream performance. We interpret this advantage as implicit data augmentation: masked diffusion exposes the model to a diverse distribution of token orderings and prediction tasks, unlike AR's fixed left-to-right factorization. We find new scaling laws for diffusion models and derive a closed-form expression for the critical compute threshold at which diffusion begins to outperform AR. These results suggest that when data, not compute, is the bottleneck, diffusion models offer a compelling alternative to the standard AR paradigm. Our code is available at: this https URL.
- [152] arXiv:2507.18376 (replaced) [pdf, html, other]
-
Title: A Comprehensive Review of Diffusion Models in Smart Agriculture: Progress, Applications, and ChallengesXing Hu, Haodong Chen, Qianqian Duan, Danfeng Hong, Ruijiao Li, Huiliang Shang, Linghua Jiang, Haima Yang, Dawei ZhangSubjects: Machine Learning (cs.LG)
With the global population growing and arable land resources becoming increasingly scarce,smart agriculture and precision agriculture have emerged as key directions for the future ofagricultural this http URL intelligence (AI) technologies, particularly deep learning models, have found widespread applications in areas such as crop monitoring and pest detection. As an emerging generative model, diffusion models have shown significant promise in tasks like agricultural image processing, data augmentation, and remote sensing. Compared to traditional generative adversarial networks (GANs), diffusion models offer superior training stability and generation quality, effectively addressing challenges such as limited agricultural data and imbalanced image samples. This paper reviews the latest advancements in the application of diffusion models in agriculture, focusing on their potential in crop pest and disease detection, remote sensing image enhancement, crop growth prediction, and agricultural resource management. Experimental results demonstrate that diffusion models significantly improve model accuracy and robustness in data augmentation, image generation, and denoising, especially in complex environments. Despite challenges related to computational efficiency and generalization capabilities, diffusion models are expected to play an increasingly important role in smart and precision agriculture as technology advances, providing substantial support for the sustainable development of global agriculture.
- [153] arXiv:2507.19095 (replaced) [pdf, other]
-
Title: GCL-GCN: Graphormer and Contrastive Learning Enhanced Attributed Graph Clustering NetworkComments: The source code for this study is available at this https URLSubjects: Machine Learning (cs.LG)
Attributed graph clustering holds significant importance in modern data analysis. However, due to the complexity of graph data and the heterogeneity of node attributes, leveraging graph information for clustering remains challenging. To address this, we propose a novel deep graph clustering model, GCL-GCN, specifically designed to address the limitations of existing models in capturing local dependencies and complex structures when dealing with sparse and heterogeneous graph data. GCL-GCN introduces an innovative Graphormer module that combines centrality encoding and spatial relationships, effectively capturing both global and local information between nodes, thereby enhancing the quality of node representations. Additionally, we propose a novel contrastive learning module that significantly enhances the discriminative power of feature representations. In the pre-training phase, this module increases feature distinction through contrastive learning on the original feature matrix, ensuring more identifiable initial representations for subsequent graph convolution and clustering tasks. Extensive experimental results on six datasets demonstrate that GCL-GCN outperforms 14 advanced methods in terms of clustering quality and robustness. Specifically, on the Cora dataset, it improves ACC, NMI, and ARI by 4.94%, 13.01%, and 10.97%, respectively, compared to the primary comparison method MBN.
- [154] arXiv:2507.21004 (replaced) [pdf, html, other]
-
Title: Compositional Function Networks: A High-Performance Alternative to Deep Neural Networks with Built-in InterpretabilityComments: The project has been open sourced at Github (this https URL)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Deep Neural Networks (DNNs) deliver impressive performance but their black-box nature limits deployment in high-stakes domains requiring transparency. We introduce Compositional Function Networks (CFNs), a novel framework that builds inherently interpretable models by composing elementary mathematical functions with clear semantics. Unlike existing interpretable approaches that are limited to simple additive structures, CFNs support diverse compositional patterns -- sequential, parallel, and conditional -- enabling complex feature interactions while maintaining transparency. A key innovation is that CFNs are fully differentiable, allowing efficient training through standard gradient descent. We demonstrate CFNs' versatility across multiple domains, from symbolic regression to image classification with deep hierarchical networks. Our empirical evaluation shows CFNs achieve competitive performance against black-box models (96.24% accuracy on CIFAR-10) while outperforming state-of-the-art interpretable models like Explainable Boosting Machines. By combining the hierarchical expressiveness and efficient training of deep learning with the intrinsic interpretability of well-defined mathematical functions, CFNs offer a powerful framework for applications where both performance and accountability are paramount.
- [155] arXiv:2507.21197 (replaced) [pdf, other]
-
Title: AdaptHetero: Machine Learning Interpretation-Driven Subgroup Adaptation for EHR-Based Clinical PredictionComments: 12 pages, 4 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Machine learning interpretation (MLI) has primarily been leveraged to build clinician trust and uncover actionable insights in EHRs. However, the intrinsic complexity and heterogeneity of EHR data limit its effectiveness in guiding subgroup-specific modeling. We propose AdaptHetero, a novel MLI-driven framework that transforms interpretability insights into actionable guidance for tailoring model training and evaluation across subpopulations within individual hospital systems. Evaluated on three large-scale EHR datasets: GOSSIS-1-eICU, WiDS, and MIMIC-IV, AdaptHetero consistently identifies heterogeneous model behaviors in predicting ICU mortality, in-hospital death, and hidden hypoxemia. By integrating SHAP-based interpretation and unsupervised clustering, the framework enhances the identification of clinically meaningful subgroup-specific characteristics, leading to improved predictive performance and optimized clinical deployment.
- [156] arXiv:2507.21433 (replaced) [pdf, html, other]
-
Title: MemShare: Memory Efficient Inference for Large Reasoning Models through KV Cache ReuseComments: 11 pages, 7 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Large Reasoning Models (LRMs) have achieved significant advances in mathematical reasoning and formal logic tasks. However, their tendency to generate lengthy chain-of-thought sequences leads to substantial memory overhead during inference. We observe that LRMs frequently produce highly similar intermediate reasoning steps, which correspond to similar KV cache states across layers. Motivated by this observation, we propose MemShare, a novel KV cache management approach that effectively reduces memory overhead. MemShare employs a collaborative filtering algorithm to efficiently identify reusable KV cache blocks and enables zero copy cache reuse to significantly reduce memory overhead, improve throughput while maintaining accuracy. Experimental results demonstrate that MemShare delivers up to 84.79\% improvement in throughput while maintaining better accuracy compared to existing KV cache management methods.
- [157] arXiv:2507.22174 (replaced) [pdf, html, other]
-
Title: Spatial-Temporal Reinforcement Learning for Network Routing with Non-Markovian TrafficSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Reinforcement Learning (RL) has been widely used for packet routing in communication networks, but traditional RL methods rely on the Markov assumption that the current state contains all necessary information for decision-making. In reality, internet traffic is non-Markovian, and past states do influence routing performance. Moreover, common deep RL approaches use function approximators, such as neural networks, that do not model the spatial structure in network topologies. To address these shortcomings, we design a network environment with non-Markovian traffic and introduce a spatial-temporal RL (STRL) framework for packet routing. Our approach outperforms traditional baselines by more than 19% during training and 7% for inference despite a change in network topology.
- [158] arXiv:2507.22303 (replaced) [pdf, html, other]
-
Title: CS-SHRED: Enhancing SHRED for Robust Recovery of Spatiotemporal DynamicsComments: 30 pages, 7 figures, 13 tables. Code: this https URLSubjects: Machine Learning (cs.LG)
We present CS-SHRED, a novel deep learning architecture that integrates Compressed Sensing (CS) into a Shallow Recurrent Decoder (SHRED) to reconstruct spatiotemporal dynamics from incomplete, compressed, or corrupted data. Our approach introduces two key innovations. First, by incorporating CS techniques into the SHRED architecture, our method leverages a batch-based forward framework with $\ell_1$ regularization to robustly recover signals even in scenarios with sparse sensor placements, noisy measurements, and incomplete sensor acquisitions. Second, an adaptive loss function dynamically combines Mean Squared Error (MSE) and Mean Absolute Error (MAE) terms with a piecewise Signal-to-Noise Ratio (SNR) regularization, which suppresses noise and outliers in low-SNR regions while preserving fine-scale features in high-SNR regions.
We validate CS-SHRED on challenging problems including viscoelastic fluid flows, maximum specific humidity fields, sea surface temperature distributions, and rotating turbulent flows. Compared to the traditional SHRED approach, CS-SHRED achieves significantly higher reconstruction fidelity -- as demonstrated by improved SSIM and PSNR values, lower normalized errors, and enhanced LPIPS scores-thereby providing superior preservation of small-scale structures and increased robustness against noise and outliers.
Our results underscore the advantages of the jointly trained CS and SHRED design architecture which includes an LSTM sequence model for characterizing the temporal evolution with a shallow decoder network (SDN) for modeling the high-dimensional state space. The SNR-guided adaptive loss function for the spatiotemporal data recovery establishes CS-SHRED as a promising tool for a wide range of applications in environmental, climatic, and scientific data analyses. - [159] arXiv:2507.22633 (replaced) [pdf, html, other]
-
Title: H2Tune: Federated Foundation Model Fine-Tuning with Hybrid HeterogeneitySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Different from existing federated fine-tuning (FFT) methods for foundation models, hybrid heterogeneous federated fine-tuning (HHFFT) is an under-explored scenario where clients exhibit double heterogeneity in model architectures and downstream tasks. This hybrid heterogeneity introduces two significant challenges: 1) heterogeneous matrix aggregation, where clients adopt different large-scale foundation models based on their task requirements and resource limitations, leading to dimensional mismatches during LoRA parameter aggregation; and 2) multi-task knowledge interference, where local shared parameters, trained with both task-shared and task-specific knowledge, cannot ensure only task-shared knowledge is transferred between clients. To address these challenges, we propose H2Tune, a federated foundation model fine-tuning with hybrid heterogeneity. Our framework H2Tune consists of three key components: (i) sparsified triple matrix decomposition to align hidden dimensions across clients through constructing rank-consistent middle matrices, with adaptive sparsification based on client resources; (ii) relation-guided matrix layer alignment to handle heterogeneous layer structures and representation capabilities; and (iii) alternating task-knowledge disentanglement mechanism to decouple shared and specific knowledge of local model parameters through alternating optimization. Theoretical analysis proves a convergence rate of O(1/\sqrt{T}). Extensive experiments show our method achieves up to 15.4% accuracy improvement compared to state-of-the-art baselines. Our code is available at this https URL.
- [160] arXiv:2507.22789 (replaced) [pdf, other]
-
Title: G-Core: A Simple, Scalable and Balanced RLHF TrainerJunyu Wu, Weiming Chang, Xiaotao Liu, Guanyou He, Haoqiang Hong, Boqi Liu, Hongtao Tian, Tao Yang, Yunsheng Shi, Feng Lin, Ting YaoComments: I haven't received company approval yet, and I uploaded it by mistakeSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Reinforcement Learning from Human Feedback (RLHF) has become an increasingly popular paradigm for training large language models (LLMs) and diffusion models. While existing RLHF training systems have enabled significant progress, they often face challenges in scaling to multi-modal and diffusion workflows and adapting to dynamic workloads. In particular, current approaches may encounter limitations in controller scalability, flexible resource placement, and efficient orchestration when handling complex RLHF pipelines, especially in scenarios involving dynamic sampling or generative reward modeling. In this paper, we present \textbf{G-Core}, a simple, scalable, and balanced RLHF training framework designed to address these challenges. G-Core introduces a parallel controller programming model, enabling flexible and efficient orchestration of complex RLHF workflows without the bottlenecks of a single centralized controller. Furthermore, we propose a dynamic placement schema that adaptively partitions resources and schedules workloads, significantly reducing hardware idle time and improving utilization, even under highly variable training conditions. G-Core has successfully trained models that support WeChat product features serving a large-scale user base, demonstrating its effectiveness and robustness in real-world scenarios. Our results show that G-Core advances the state of the art in RLHF training, providing a solid foundation for future research and deployment of large-scale, human-aligned models.
- [161] arXiv:2304.01430 (replaced) [pdf, html, other]
-
Title: Divided Attention: Unsupervised Multi-Object Discovery with Contextually Separated SlotsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
We investigate the emergence of objects in visual perception in the absence of any semantic annotation. The resulting model has received no supervision, does not use any pre-trained features, and yet it can segment the domain of an image into multiple independently moving regions. The resulting motion segmentation method can handle an unknown and varying number of objects in real-time. The core multi-modal conditional encoder-decoder architecture has one modality (optical flow) feed the encoder to produce a collection of latent codes (slots), and the other modality (color image) conditions the decoder to generate the first modality (flow) from the slots. The training criterion is designed to foster 'information separation' among the slots, while the architecture explicitly allocates activations to individual slots, leading to a method we call Divided Attention (DivA). At test time, DivA handles a different number of objects and different image resolution than seen at training, and is invariant to permutations of the slots. DivA achieves state-of-the-art performance while tripling the runtime speed of comparable methods, up to 104 FPS, and reduces the performance gap from supervised methods to 12% or less. Objects bootstrapped by DivA can then be used to prime static classifiers via contrastive learning. On fewer than 5,000 video clips, training DINO on DivA's object proposals narrows the performance gap to ImageNet-based training by up to 30.2% compared to training directly on the video frames.
- [162] arXiv:2312.14057 (replaced) [pdf, html, other]
-
Title: Weighted least-squares approximation with determinantal point processes and generalized volume samplingComments: Compared with the first version, conjectures (13) on DPP and (16) on volume sampling have been modified, including a convexity requirement. Proofs of propositions 5.4 and 5.13 have been modified accordingly. Remarks 5.5 and 5.6 have been added to discuss alternatives to conjecture (13) on DPPSubjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Statistics Theory (math.ST)
We consider the problem of approximating a function from $L^2$ by an element of a given $m$-dimensional space $V_m$, associated with some feature map $\boldsymbol{\varphi}$, using evaluations of the function at random points $x_1, \dots,x_n$. After recalling some results on optimal weighted least-squares using independent and identically distributed points, we consider weighted least-squares using projection determinantal point processes (DPP) or volume sampling. These distributions introduce dependence between the points that promotes diversity in the selected features $\boldsymbol{\varphi}(x_i)$. We first provide a generalized version of volume-rescaled sampling yielding quasi-optimality results in expectation with a number of samples $n = O(m\log(m))$, that means that the expected $L^2$ error is bounded by a constant times the best approximation error in $L^2$. Also, further assuming that the function is in some normed vector space $H$ continuously embedded in $L^2$, we further prove that the approximation error in $L^2$ is almost surely bounded by the best approximation error measured in the $H$-norm. This includes the cases of functions from $L^\infty$ or reproducing kernel Hilbert spaces. Finally, we present an alternative strategy consisting in using independent repetitions of projection DPP (or volume sampling), yielding similar error bounds as with i.i.d. or volume sampling, but in practice with a much lower number of samples. Numerical experiments illustrate the performance of the different strategies.
- [163] arXiv:2404.09363 (replaced) [pdf, html, other]
-
Title: Momentum-based gradient descent methods for Lie groupsComments: 22 pages, 2 algorithms, 6 figuresSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Differential Geometry (math.DG); Numerical Analysis (math.NA)
Polyak's Heavy Ball (PHB; Polyak, 1964), a.k.a. Classical Momentum, and Nesterov's Accelerated Gradient (NAG; Nesterov, 1983) are well-established momentum-descent methods for optimization. Although the latter generally outperforms the former, primarily, generalizations of PHB-like methods to nonlinear spaces have not been sufficiently explored in the literature. In this paper, we propose a generalization of NAG-like methods for Lie group optimization. This generalization is based on the variational one-to-one correspondence between classical and accelerated momentum methods (Campos et al., 2023). We provide numerical experiments for chosen retractions on the group of rotations based on the Frobenius norm and the Rosenbrock function to demonstrate the effectiveness of our proposed methods, and that align with results of the Euclidean case, that is, a faster convergence rate for NAG.
- [164] arXiv:2407.08722 (replaced) [pdf, html, other]
-
Title: Controlling diverse robots by inferring Jacobian fields with deep networksComments: Project Page: this https URLSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics. Modern fabrication techniques have greatly expanded the feasible hardware, but using these systems requires control software to translate the desired motions into actuator commands. Conventional robots can easily be modeled as rigid links connected by joints, but it remains an open challenge to model and control biologically inspired robots that are often soft or made of several materials, lack sensing capabilities, and may change their material properties with use. Here, we introduce a method that uses deep neural networks to map a video stream of a robot to its visuomotor Jacobian field (the sensitivity of all 3D points to the robot's actuators). Our method enables the control of robots from only a single camera, makes no assumptions about the robots' materials, actuation, or sensing, and is trained without expert intervention by observing the execution of random commands. We demonstrate our method on a diverse set of robot manipulators that vary in actuation, materials, fabrication, and cost. Our approach achieves accurate closed-loop control and recovers the causal dynamic structure of each robot. Because it enables robot control using a generic camera as the only sensor, we anticipate that our work will broaden the design space of robotic systems and serve as a starting point for lowering the barrier to robotic automation.
- [165] arXiv:2408.02123 (replaced) [pdf, html, other]
-
Title: FovEx: Human-Inspired Explanations for Vision Transformers and Convolutional Neural NetworksComments: Accepted in the International Journal of Computer Vision (Springer Nature)Journal-ref: Int J Comput Vis (2025)Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Explainability in artificial intelligence (XAI) remains a crucial aspect for fostering trust and understanding in machine learning models. Current visual explanation techniques, such as gradient-based or class-activation-based methods, often exhibit a strong dependence on specific model architectures. Conversely, perturbation-based methods, despite being model-agnostic, are computationally expensive as they require evaluating models on a large number of forward passes. In this work, we introduce Foveation-based Explanations (FovEx), a novel XAI method inspired by human vision. FovEx seamlessly integrates biologically inspired perturbations by iteratively creating foveated renderings of the image and combines them with gradient-based visual explorations to determine locations of interest efficiently. These locations are selected to maximize the performance of the model to be explained with respect to the downstream task and then combined to generate an attribution map. We provide a thorough evaluation with qualitative and quantitative assessments on established benchmarks. Our method achieves state-of-the-art performance on both transformers (on 4 out of 5 metrics) and convolutional models (on 3 out of 5 metrics), demonstrating its versatility among various architectures. Furthermore, we show the alignment between the explanation map produced by FovEx and human gaze patterns (+14\% in NSS compared to RISE, +203\% in NSS compared to GradCAM). This comparison enhances our confidence in FovEx's ability to close the interpretation gap between humans and machines.
- [166] arXiv:2408.03351 (replaced) [pdf, html, other]
-
Title: Quantum Transfer Learning for MNIST Classification Using a Hybrid Quantum-Classical ApproachSubjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)
We implement a hybrid quantum-classical model for image classification that compresses MNIST digit images into a low-dimensional feature space and then maps these features onto a 5-qubit quantum state. First, an autoencoder compresses each $28\times28$ image (784 pixels) into a 64-dimensional latent vector, preserving salient features of the digit with minimal reconstruction error. We further reduce the latent representation to 5 principal components using Principal Component Analysis (PCA), to match the 5 available qubits. These 5 features are encoded as rotation angles in a quantum circuit with 5 qubits. The quantum feature map applies single-qubit rotations ($R_y$ gates) proportional to the feature values, followed by a Hadamard gate and a cascade of entangling CNOT gates to produce a non-product entangled state. Measuring the 5-qubit state yields a 32-dimensional probability distribution over basis outcomes, which serves as a quantum-enhanced feature vector for classification. A classical neural network with a softmax output is then trained on these 32-dimensional quantum feature vectors to predict the digit class. We evaluate the hybrid model on the MNIST dataset and compare it to a purely classical baseline that uses the 64-dimensional autoencoder latent features for classification. The results show that the hybrid model can successfully classify digits, demonstrating the feasibility of integrating quantum computing in the classification pipeline, although its accuracy (about 75\% on test data) currently falls below the classical baseline (about 98\% on the same compressed data).
- [167] arXiv:2408.12319 (replaced) [pdf, html, other]
-
Title: Neural-ANOVA: Analytical Model Decomposition using Automatic IntegrationComments: 6 pages, 3 figures, 3 tables, accepted for publication at MLSP 2025Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
The analysis of variance (ANOVA) decomposition offers a systematic method to understand the interaction effects that contribute to a specific decision output. In this paper we introduce Neural-ANOVA, an approach to decompose neural networks into the sum of lower-order models using the functional ANOVA decomposition. Our approach formulates a learning problem, which enables fast analytical evaluation of integrals over subspaces that appear in the calculation of the ANOVA decomposition. Finally, we conduct numerical experiments to provide insights into the approximation properties compared to other regression approaches from the literature.
- [168] arXiv:2410.02744 (replaced) [pdf, html, other]
-
Title: Neutral Residues: Revisiting Adapters for Model ExtensionComments: Accepted at ICML 2025Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
We address the problem of extending a pretrained large language model to a new domain that was not seen during training. Standard techniques, such as finetuning or low-rank adaptation (LoRA) are successful at domain adaptation, but do not formally add capacity to the model. This often leads to a trade-off, between performing well on the new domain vs. degrading performance on the original domain. Here, we revisit and improve adapters to extend LLMs from three angles: data, architecture and training procedure, which are advantageously considered jointly. The resulting method, called neutral residues, modifies adapters in a way that leads each new residual block to output near-zeros on the original domain. This solution leads to strong results when adapting a state-of-the-art model originally trained on English to a new language. Neutral residues significantly outperform competing approaches such as finetuning, LoRA or vanilla adapters in terms of the trade-off between learning the new language and not forgetting English.
- [169] arXiv:2410.03094 (replaced) [pdf, html, other]
-
Title: Entanglement-induced provable and robust quantum learning advantagesComments: 7 pages, 2 figures + 13-page supplementary materialsJournal-ref: npj Quantum Information 11, 127 (2025)Subjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC); Machine Learning (cs.LG)
Quantum computing holds unparalleled potentials to enhance machine learning. However, a demonstration of quantum learning advantage has not been achieved so far. We make a step forward by rigorously establishing a noise-robust, unconditional quantum learning advantage in expressivity, inference speed, and training efficiency, compared to commonly-used classical models. Our proof is information-theoretic and pinpoints the origin of this advantage: entanglement can be used to reduce the communication required by non-local tasks. In particular, we design a task that can be solved with certainty by quantum models with a constant number of parameters using entanglement, whereas commonly-used classical models must scale linearly to achieve a larger-than-exponentially-small accuracy. We show that the quantum model is trainable with constant resources and robust against constant noise. Through numerical and trapped-ion experiments on IonQ Aria, we demonstrate the desired advantage. Our results provide valuable guidance for demonstrating quantum learning advantages with current noisy intermediate-scale devices.
- [170] arXiv:2410.16593 (replaced) [pdf, html, other]
-
Title: Graph Sampling for Scalable and Expressive Graph Neural Networks on Homophilic GraphsSubjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Graph Neural Networks (GNNs) excel in many graph machine learning tasks but face challenges when scaling to large networks. GNN transferability allows training on smaller graphs and applying the model to larger ones, but existing methods often rely on random subsampling, leading to disconnected subgraphs and reduced model expressivity. We propose a novel graph sampling algorithm that leverages feature homophily to preserve graph structure. By minimizing the trace of the data correlation matrix, our method better preserves the graph Laplacian trace -- a proxy for the graph connectivity -- than random sampling, while achieving lower complexity than spectral methods. Experiments on citation networks show improved performance in preserving Laplacian trace and GNN transferability compared to random sampling.
- [171] arXiv:2412.04502 (replaced) [pdf, other]
-
Title: Physics-informed Gaussian Processes as Linear Model Predictive ControllerComments: Accepted at L4DC 2025Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY)
We introduce a novel algorithm for controlling linear time invariant systems in a tracking problem. The controller is based on a Gaussian Process (GP) whose realizations satisfy a system of linear ordinary differential equations with constant coefficients. Control inputs for tracking are determined by conditioning the prior GP on the setpoints, i.e. control as inference. The resulting Model Predictive Control scheme incorporates pointwise soft constraints by introducing virtual setpoints to the posterior Gaussian process. We show theoretically that our controller satisfies open-loop stability for the optimal control problem by leveraging general results from Bayesian inference and demonstrate this result in a numerical example.
- [172] arXiv:2412.06946 (replaced) [pdf, html, other]
-
Title: A Deep Learning Powered Numerical Relativity Surrogate for Binary Black Hole WaveformsOsvaldo Gramaxo Freitas, Anastasios Theodoropoulos, Nino Villanueva, Tiago Fernandes, Solange Nunes, José A. Font, Antonio Onofre, Alejandro Torres-Forné, José D. Martin-GuerreroSubjects: General Relativity and Quantum Cosmology (gr-qc); High Energy Astrophysical Phenomena (astro-ph.HE); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG)
Gravitational-wave approximants are essential for gravitational-wave astronomy, allowing the coverage binary black hole parameter space for inference or match filtering without costly numerical relativity (NR) simulations, but generally trading some accuracy for computational efficiency. To reduce this trade-off, NR surrogate models can be constructed using interpolation within NR waveform space. We present a 2-stage training approach for neural network-based NR surrogate models. Initially trained on approximant-generated waveforms and then fine-tuned with NR data, these dual-stage artificial neural surrogate (\texttt{DANSur}) models offer rapid and competitively accurate waveform generation, generating millions in under 20ms on a GPU while keeping mean mismatches with NR around $10^{-4}$. Implemented in the \textsc{bilby} framework, we show they can be used for parameter estimation tasks.
- [173] arXiv:2412.15441 (replaced) [pdf, html, other]
-
Title: Insights into resource utilization of code small language models serving with runtime engines and execution providersComments: Accepted in Journal of Systems and Software (JSS). For its published version refer to the Journal of JSSSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The rapid growth of language models, particularly in code generation, requires substantial computational resources, raising concerns about energy consumption and environmental impact. Optimizing language models inference resource utilization is crucial, and Small Language Models (SLMs) offer a promising solution to reduce resource demands. Our goal is to analyze the impact of deep learning serving configurations, defined as combinations of runtime engines and execution providers, on resource utilization, in terms of energy consumption, execution time, and computing-resource utilization from the point of view of software engineers conducting inference in the context of code generation SLMs. We conducted a technology-oriented, multi-stage experimental pipeline using twelve code generation SLMs to investigate energy consumption, execution time, and computing-resource utilization across the configurations. Significant differences emerged across configurations. CUDA execution provider configurations outperformed CPU execution provider configurations in both energy consumption and execution time. Among the configurations, TORCH paired with CUDA demonstrated the greatest energy efficiency, achieving energy savings from 37.99% up to 89.16% compared to other serving configurations. Similarly, optimized runtime engines like ONNX with the CPU execution provider achieved from 8.98% up to 72.04% energy savings within CPU-based configurations. Also, TORCH paired with CUDA exhibited efficient computing-resource utilization. Serving configuration choice significantly impacts resource utilization. While further research is needed, we recommend the above configurations best suited to software engineers' requirements for enhancing serving resource utilization efficiency.
- [174] arXiv:2502.15215 (replaced) [pdf, html, other]
-
Title: Tensor Product Neural Networks for Functional ANOVA ModelComments: 45 pagesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
Interpretability for machine learning models is becoming more and more important as machine learning models become more complex. The functional ANOVA model, which decomposes a high-dimensional function into a sum of lower dimensional functions (commonly referred to as components), is one of the most popular tools for interpretable AI, and recently, various neural networks have been developed for estimating each component in the functional ANOVA model. However, such neural networks are highly unstable when estimating each component since the components themselves are not uniquely defined. That is, there are multiple functional ANOVA decompositions for a given function. In this paper, we propose a novel neural network which guarantees a unique functional ANOVA decomposition and thus is able to estimate each component stably and accurately. We call our proposed neural network ANOVA Tensor Product Neural Network (ANOVA-TPNN) since it is motivated by the tensor product basis expansion. Theoretically, we prove that ANOVA-TPNN can approximate any smooth function well. Empirically, we show that ANOVA-TPNN provide much more stable estimation of each component and thus much more stable interpretation when training data and initial values of the model parameters vary than existing neural networks do.
- [175] arXiv:2502.17482 (replaced) [pdf, html, other]
-
Title: MVCNet: Multi-View Contrastive Network for Motor Imagery ClassificationComments: 12 pages, 9 figuresSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
Electroencephalography (EEG)-based brain-computer interfaces (BCIs) enable neural interaction by decoding brain activity for external communication. Motor imagery (MI) decoding has received significant attention due to its intuitive mechanism. However, most existing models rely on single-stream architectures and overlook the multi-view nature of EEG signals, leading to limited performance and generalization. We propose a multi-view contrastive network (MVCNet), a dual-branch architecture that parallelly integrates CNN and Transformer blocks to capture both local spatial-temporal features and global temporal dependencies. To enhance the informativeness of training data, MVCNet incorporates a unified augmentation pipeline across time, frequency, and spatial domains. Two contrastive modules are further introduced: a cross-view contrastive module that enforces consistency of original and augmented views, and a cross-model contrastive module that aligns features extracted from both branches. Final representations are fused and jointly optimized by contrastive and classification losses. Experiments on five public MI datasets across three scenarios demonstrate that MVCNet consistently outperforms nine state-of-the-art MI decoding networks, highlighting its effectiveness and generalization ability. MVCNet provides a robust solution for MI decoding by integrating multi-view information and dual-branch modeling, contributing to the development of more reliable BCI systems.
- [176] arXiv:2502.20632 (replaced) [pdf, html, other]
-
Title: Lattice Protein Folding with Variational AnnealingSubjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Biomolecules (q-bio.BM)
Understanding the principles of protein folding is a cornerstone of computational biology, with implications for drug design, bioengineering, and the understanding of fundamental biological processes. Lattice protein folding models offer a simplified yet powerful framework for studying the complexities of protein folding, enabling the exploration of energetically optimal folds under constrained conditions. However, finding these optimal folds is a computationally challenging combinatorial optimization problem. In this work, we introduce a novel upper-bound training scheme that employs masking to identify the lowest-energy folds in two-dimensional Hydrophobic-Polar (HP) lattice protein folding. By leveraging Dilated Recurrent Neural Networks (RNNs) integrated with an annealing process driven by temperature-like fluctuations, our method accurately predicts optimal folds for benchmark systems of up to 60 beads. Our approach also effectively masks invalid folds from being sampled without compromising the autoregressive sampling properties of RNNs. This scheme is generalizable to three spatial dimensions and can be extended to lattice protein models with larger alphabets. Our findings emphasize the potential of advanced machine learning techniques in tackling complex protein folding problems and a broader class of constrained combinatorial optimization challenges.
- [177] arXiv:2503.15897 (replaced) [pdf, html, other]
-
Title: Learning 3D Scene Analogies with Neural Contextual Scene MapsComments: Accepted to ICCV 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Understanding scene contexts is crucial for machines to perform tasks and adapt prior knowledge in unseen or noisy 3D environments. As data-driven learning is intractable to comprehensively encapsulate diverse ranges of layouts and open spaces, we propose teaching machines to identify relational commonalities in 3D spaces. Instead of focusing on point-wise or object-wise representations, we introduce 3D scene analogies, which are smooth maps between 3D scene regions that align spatial relationships. Unlike well-studied single instance-level maps, these scene-level maps smoothly link large scene regions, potentially enabling unique applications in trajectory transfer in AR/VR, long demonstration transfer for imitation learning, and context-aware object rearrangement. To find 3D scene analogies, we propose neural contextual scene maps, which extract descriptor fields summarizing semantic and geometric contexts, and holistically align them in a coarse-to-fine manner for map estimation. This approach reduces reliance on individual feature points, making it robust to input noise or shape variations. Experiments demonstrate the effectiveness of our approach in identifying scene analogies and transferring trajectories or object placements in diverse indoor scenes, indicating its potential for robotics and AR/VR applications. Project page including the code is available through this link: this https URL.
- [178] arXiv:2503.17564 (replaced) [pdf, html, other]
-
Title: ModalTune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital PathologySubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Prediction tasks in digital pathology are challenging due to the massive size of whole-slide images (WSIs) and the weak nature of training signals. Advances in computing, data availability, and self-supervised learning (SSL) have paved the way for slide-level foundation models (SLFMs) that can improve prediction tasks in low-data regimes. However, current methods under-utilize shared information between tasks and modalities. To overcome this challenge, we propose ModalTune, a novel fine-tuning framework which introduces the Modal Adapter to integrate new modalities without modifying SLFM weights. Additionally, we use large-language models (LLMs) to encode labels as text, capturing semantic relationships across multiple tasks and cancer types in a single training recipe. ModalTune achieves state-of-the-art (SOTA) results against both uni-modal and multi-modal models across four cancer types, jointly improving survival and cancer subtype prediction while remaining competitive in pan-cancer settings. Additionally, we show ModalTune is generalizable to two out-of-distribution (OOD) datasets. To our knowledge, this is the first unified fine-tuning framework for multi-modal, multi-task, and pan-cancer modeling in digital pathology.
- [179] arXiv:2503.18711 (replaced) [pdf, other]
-
Title: Accenture-NVS1: A Novel View Synthesis DatasetThomas Sugg, Kyle O'Brien, Lekh Poudel, Alex Dumouchelle, Michelle Jou, Marc Bosch, Deva Ramanan, Srinivasa Narasimhan, Shubham TulsianiComments: 6 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
This paper introduces ACC-NVS1, a specialized dataset designed for research on Novel View Synthesis specifically for airborne and ground imagery. Data for ACC-NVS1 was collected in Austin, TX and Pittsburgh, PA in 2023 and 2024. The collection encompasses six diverse real-world scenes captured from both airborne and ground cameras, resulting in a total of 148,000 images. ACC-NVS1 addresses challenges such as varying altitudes and transient objects. This dataset is intended to supplement existing datasets, providing additional resources for comprehensive research, rather than serving as a benchmark.
- [180] arXiv:2504.05422 (replaced) [pdf, html, other]
-
Title: EP-Diffuser: An Efficient Diffusion Model for Traffic Scene Generation and Prediction via Polynomial RepresentationsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
As the prediction horizon increases, predicting the future evolution of traffic scenes becomes increasingly difficult due to the multi-modal nature of agent motion. Most state-of-the-art (SotA) prediction models primarily focus on forecasting the most likely future. However, for the safe operation of autonomous vehicles, it is equally important to cover the distribution for plausible motion alternatives. To address this, we introduce EP-Diffuser, a novel parameter-efficient diffusion-based generative model designed to capture the distribution of possible traffic scene evolutions. Conditioned on road layout and agent history, our model acts as a predictor and generates diverse, plausible scene continuations. We benchmark EP-Diffuser against two SotA models in terms of accuracy and plausibility of predictions on the Argoverse 2 dataset. Despite its significantly smaller model size, our approach achieves both highly accurate and plausible traffic scene predictions. We further evaluate model generalization ability in an out-of-distribution (OoD) test setting using Waymo Open dataset and show superior robustness of our approach. The code and model checkpoints are available at: this https URL.
- [181] arXiv:2504.11952 (replaced) [pdf, html, other]
-
Title: Robust and Fine-Grained Detection of AI Generated TextsRam Mohan Rao Kadiyala, Siddartha Pullakhandam, Kanwal Mehreen, Drishti Sharma, Siddhant Gupta, Jebish Purbey, Ashay Srivastava, Subhasya TippaReddy, Arvind Reddy Bobbili, Suraj Telugara Chandrashekhar, Modabbir Adeeb, Srinadh Vura, Suman Debnath, Hamza FarooqComments: 18 pages, 6 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
An ideal detection system for machine generated content is supposed to work well on any generator as many more advanced LLMs come into existence day by day. Existing systems often struggle with accurately identifying AI-generated content over shorter texts. Further, not all texts might be entirely authored by a human or LLM, hence we focused more over partial cases i.e human-LLM co-authored texts. Our paper introduces a set of models built for the task of token classification which are trained on an extensive collection of human-machine co-authored texts, which performed well over texts of unseen domains, unseen generators, texts by non-native speakers and those with adversarial inputs. We also introduce a new dataset of over 2.4M such texts mostly co-authored by several popular proprietary LLMs over 23 languages. We also present findings of our models' performance over each texts of each domain and generator. Additional findings include comparison of performance against each adversarial method, length of input texts and characteristics of generated texts compared to the original human authored texts.
- [182] arXiv:2504.13201 (replaced) [pdf, html, other]
-
Title: CEE: An Inference-Time Jailbreak Defense for Embodied Intelligence via Subspace Concept RotationSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Large Language Models (LLMs) are increasingly becoming the cognitive core of Embodied Intelligence (EI) systems, such as robots and autonomous vehicles. However, this integration also exposes them to serious jailbreak risks, where malicious instructions can be transformed into dangerous physical actions. Existing defense mechanisms suffer from notable drawbacks--including high training costs, significant inference delays, and complex hyperparameter tuning--which limit their practical applicability. To address these challenges, we propose a novel and efficient inference-time defense framework: Concept Enhancement Engineering (CEE). CEE enhances the model's inherent safety mechanisms by directly manipulating its internal representations, requiring neither additional training nor external modules, thereby improving defense efficiency. Furthermore, CEE introduces a rotation-based control mechanism that enables stable and linearly tunable behavioral control of the model. This design eliminates the need for tedious manual tuning and avoids the output degradation issues commonly observed in other representation engineering methods. Extensive experiments across multiple EI safety benchmarks and diverse attack scenarios demonstrate that CEE significantly improves the defense success rates of various multimodal LLMs. It effectively mitigates safety risks while preserving high-quality generation and inference efficiency, offering a promising solution for deploying safer embodied intelligence systems.
- [183] arXiv:2505.05085 (replaced) [pdf, html, other]
-
Title: Learning dynamically inspired invariant subspaces for Koopman and transfer operator approximationComments: 23 pages, 13 figuresSubjects: Dynamical Systems (math.DS); Machine Learning (cs.LG); Numerical Analysis (math.NA)
Transfer and Koopman operator methods offer a framework for representing complex, nonlinear dynamical systems via linear transformations, enabling a deeper understanding of the underlying dynamics. The spectra of these operators provide important insights into system predictability and emergent behaviour, although efficiently estimating them from data can be challenging. We approach this issue through the lens of general operator and representational learning, in which we approximate these linear operators using efficient finite-dimensional representations. Specifically, we machine-learn orthonormal basis functions that are dynamically tailored to the system. This learned basis provides a particularly accurate approximation of the operator's action as well as a nearly invariant finite-dimensional subspace. We illustrate our approach with examples that showcase the retrieval of spectral properties from the estimated operator, and emphasise the dynamically adaptive quality of the machine-learned basis.
- [184] arXiv:2505.19219 (replaced) [pdf, other]
-
Title: Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent PathfindingComments: 112 pages, 21 figures, 20 tables. The project website is: this https URLSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Combinatorics (math.CO)
Multi-Agent Path Finding (MAPF) is a fundamental problem in artificial intelligence and robotics, requiring the computation of collision-free paths for multiple agents navigating from their start locations to designated goals. As autonomous systems become increasingly prevalent in warehouses, urban transportation, and other complex environments, MAPF has evolved from a theoretical challenge to a critical enabler of real-world multi-robot coordination. This comprehensive survey bridges the long-standing divide between classical algorithmic approaches and emerging learning-based methods in MAPF research. We present a unified framework that encompasses search-based methods (including Conflict-Based Search, Priority-Based Search, and Large Neighborhood Search), compilation-based approaches (SAT, SMT, CSP, ASP, and MIP formulations), and data-driven techniques (reinforcement learning, supervised learning, and hybrid strategies). Through systematic analysis of experimental practices across 200+ papers, we uncover significant disparities in evaluation methodologies, with classical methods typically tested on larger-scale instances (up to 200 by 200 grids with 1000+ agents) compared to learning-based approaches (predominantly 10-100 agents). We provide a comprehensive taxonomy of evaluation metrics, environment types, and baseline selections, highlighting the need for standardized benchmarking protocols. Finally, we outline promising future directions including mixed-motive MAPF with game-theoretic considerations, language-grounded planning with large language models, and neural solver architectures that combine the rigor of classical methods with the flexibility of deep learning. This survey serves as both a comprehensive reference for researchers and a practical guide for deploying MAPF solutions in increasingly complex real-world applications.
- [185] arXiv:2505.20980 (replaced) [pdf, html, other]
-
Title: Identifying Super Spreaders in Multilayer NetworksSubjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)
Identifying super-spreaders can be framed as a subtask of the influence maximisation problem. It seeks to pinpoint agents within a network that, if selected as single diffusion seeds, disseminate information most effectively. Multilayer networks, a specific class of heterogeneous graphs, can capture diverse types of interactions (e.g., physical-virtual or professional-social), and thus offer a more accurate representation of complex relational structures. In this work, we introduce a novel approach to identifying super-spreaders in such networks by leveraging graph neural networks. To this end, we construct a dataset by simulating information diffusion across hundreds of networks - to the best of our knowledge, the first of its kind tailored specifically to multilayer networks. We further formulate the task as a variation of the ranking prediction problem based on a four-dimensional vector that quantifies each agent's spreading potential: (i) the number of activations; (ii) the duration of the diffusion process; (iii) the peak number of activations; and (iv) the simulation step at which this peak occurs. Our model, TopSpreadersNetwork, comprises a relationship-agnostic encoder and a custom aggregation layer. This design enables generalisation to previously unseen data and adapts to varying graph sizes. In an extensive evaluation, we compare our model against classic centrality-based heuristics and competitive deep learning methods. The results, obtained across a broad spectrum of real-world and synthetic multilayer networks, demonstrate that TopSpreadersNetwork achieves superior performance in identifying high-impact nodes, while also offering improved interpretability through its structured output.
- [186] arXiv:2505.21567 (replaced) [pdf, other]
-
Title: EaqVLA: Encoding-aligned Quantization for Vision-Language-Action ModelsComments: There is an error in this paper, and as the author, I request retractionSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
With the development of Embodied Artificial intelligence, the end-to-end control policy such as Vision-Language-Action (VLA) model has become the mainstream. Existing VLA models faces expensive computing/storage cost, which need to be optimized. Quantization is considered as the most effective method which can not only reduce the memory cost but also achieve computation acceleration. However, we find the token alignment of VLA models hinders the application of existing quantization methods. To address this, we proposed an optimized framework called EaqVLA, which apply encoding-aligned quantization to VLA models. Specifically, we propose an complete analysis method to find the misalignment in various granularity. Based on the analysis results, we propose a mixed precision quantization with the awareness of encoding alignment. Experiments shows that the porposed EaqVLA achieves better quantization performance (with the minimal quantization loss for end-to-end action control and xxx times acceleration) than existing quantization methods.
- [187] arXiv:2506.10006 (replaced) [pdf, html, other]
-
Title: HER2 Expression Prediction with Flexible Multi-Modal Inputs via Dynamic Bidirectional ReconstructionComments: 8 pages,6 figures,3 tables,accepted by the 33rd ACM International Conference on Multimedia(ACM MM 2025)Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
In breast cancer HER2 assessment, clinical evaluation relies on combined H&E and IHC images, yet acquiring both modalities is often hindered by clinical constraints and cost. We propose an adaptive bimodal prediction framework that flexibly supports single- or dual-modality inputs through two core innovations: a dynamic branch selector activating modality completion or joint inference based on input availability, and a cross-modal GAN (CM-GAN) enabling feature-space reconstruction of missing modalities. This design dramatically improves H&E-only accuracy from 71.44% to 94.25%, achieves 95.09% with full dual-modality inputs, and maintains 90.28% reliability under single-modality conditions. The "dual-modality preferred, single-modality compatible" architecture delivers near-dual-modality accuracy without mandatory synchronized acquisition, offering a cost-effective solution for resource-limited regions and significantly improving HER2 assessment accessibility.
- [188] arXiv:2507.01631 (replaced) [pdf, html, other]
-
Title: Tile and Slide : A New Framework for Scaling NeRF from Local to Global 3D Earth ObservationComments: Accepted at ICCV 2025 Workshop 3D-VAST (From street to space: 3D Vision Across Altitudes). Our code will be made public after the conference at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
Neural Radiance Fields (NeRF) have recently emerged as a paradigm for 3D reconstruction from multiview satellite imagery. However, state-of-the-art NeRF methods are typically constrained to small scenes due to the memory footprint during training, which we study in this paper. Previous work on large-scale NeRFs palliate this by dividing the scene into NeRFs. This paper introduces Snake-NeRF, a framework that scales to large scenes. Our out-of-core method eliminates the need to load all images and networks simultaneously, and operates on a single device. We achieve this by dividing the region of interest into NeRFs that 3D tile without overlap. Importantly, we crop the images with overlap to ensure each NeRFs is trained with all the necessary pixels. We introduce a novel $2\times 2$ 3D tile progression strategy and segmented sampler, which together prevent 3D reconstruction errors along the tile edges. Our experiments conclude that large satellite images can effectively be processed with linear time complexity, on a single GPU, and without compromise in quality.
- [189] arXiv:2507.07820 (replaced) [pdf, html, other]
-
Title: AI Should Sense Better, Not Just Scale Bigger: Adaptive Sensing as a Paradigm ShiftSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Current AI advances largely rely on scaling neural models and expanding training datasets to achieve generalization and robustness. Despite notable successes, this paradigm incurs significant environmental, economic, and ethical costs, limiting sustainability and equitable access. Inspired by biological sensory systems, where adaptation occurs dynamically at the input (e.g., adjusting pupil size, refocusing vision)--we advocate for adaptive sensing as a necessary and foundational shift. Adaptive sensing proactively modulates sensor parameters (e.g., exposure, sensitivity, multimodal configurations) at the input level, significantly mitigating covariate shifts and improving efficiency. Empirical evidence from recent studies demonstrates that adaptive sensing enables small models (e.g., EfficientNet-B0) to surpass substantially larger models (e.g., OpenCLIP-H) trained with significantly more data and compute. We (i) outline a roadmap for broadly integrating adaptive sensing into real-world applications spanning humanoid, healthcare, autonomous systems, agriculture, and environmental monitoring, (ii) critically assess technical and ethical integration challenges, and (iii) propose targeted research directions, such as standardized benchmarks, real-time adaptive algorithms, multimodal integration, and privacy-preserving methods. Collectively, these efforts aim to transition the AI community toward sustainable, robust, and equitable artificial intelligence systems.
- [190] arXiv:2507.12911 (replaced) [pdf, other]
-
Title: LaViPlan : Language-Guided Visual Path Planning with RLVRComments: This paper has been withdrawn due to an internal institutional policy that prohibits preprint submissions to arXivSubjects: Robotics (cs.RO); Machine Learning (cs.LG)
Out-of-distribution (OOD) scenarios in autonomous driving refer to situations that deviate from the training domain, often leading to unexpected and potentially hazardous behavior from planners that lack prior exposure to such cases. Recently, Vision-Language Models (VLMs) have been introduced into autonomous driving research for their promising generalization capabilities in OOD settings. Early studies demonstrated that VLMs could recognize OOD scenarios and generate user-level decisions such as "go straight" or "turn right." However, a new challenge has emerged due to the misalignment between the VLM's high-level decisions or visual reasoning expressed in language, and the low-level predicted trajectories interpreted as actions. In this paper, we propose LaViPlan, a framework that leverages Reinforcement Learning with Verifiable Rewards (RLVR) to optimize VLMs using planning-oriented metrics. This approach addresses the vision-language-action misalignment observed in existing VLMs fine-tuned via supervised learning, which can recognize driving scenarios but often produce context-unaware decisions. Experimental results demonstrate that our method improves situational awareness and decision-making under OOD conditions, highlighting its potential to mitigate the misalignment issue. This work introduces a promising post-training paradigm for VLM agents in the context of autonomous driving.
- [191] arXiv:2507.18675 (replaced) [pdf, other]
-
Title: Advancing Vision-based Human Action Recognition: Exploring Vision-Language CLIP Model for Generalisation in Domain-Independent TasksSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Human action recognition plays a critical role in healthcare and medicine, supporting applications such as patient behavior monitoring, fall detection, surgical robot supervision, and procedural skill assessment. While traditional models like CNNs and RNNs have achieved moderate success, they often struggle to generalize across diverse and complex actions. Recent advancements in vision-language models, especially the transformer-based CLIP model, offer promising capabilities for generalizing action recognition from video data. In this work, we evaluate CLIP on the UCF-101 dataset and systematically analyze its performance under three masking strategies: (1) percentage-based and shape-based black masking at 10%, 30%, and 50%, (2) feature-specific masking to suppress bias-inducing elements, and (3) isolation masking that retains only class-specific regions. Our results reveal that CLIP exhibits inconsistent behavior and frequent misclassifications, particularly when essential visual cues are obscured. To overcome these limitations, we propose incorporating class-specific noise, learned via a custom loss function, to reinforce attention to class-defining features. This enhancement improves classification accuracy and model confidence while reducing bias. We conclude with a discussion on the challenges of applying such models in clinical domains and outline directions for future work to improve generalizability across domain-independent healthcare scenarios.
- [192] arXiv:2507.19060 (replaced) [pdf, html, other]
-
Title: PurpCode: Reasoning for Safer Code GenerationJiawei Liu, Nirav Diwan, Zhe Wang, Haoyu Zhai, Xiaona Zhou, Kiet A. Nguyen, Tianjiao Yu, Muntasir Wahed, Yinlin Deng, Hadjer Benkraouda, Yuxiang Wei, Lingming Zhang, Ismini Lourentzou, Gang WangSubjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Machine Learning (cs.LG); Software Engineering (cs.SE)
We introduce PurpCode, the first post-training recipe for training safe code reasoning models towards generating secure code and defending against malicious cyberactivities. PurpCode trains a reasoning model in two stages: (i) Rule Learning, which explicitly teaches the model to reference cybersafety rules to generate vulnerability-free code and to avoid facilitating malicious cyberactivities; and (ii) Reinforcement Learning, which optimizes model safety and preserves model utility through diverse, multi-objective reward mechanisms. To empower the training pipelines with comprehensive cybersafety data, we conduct internal red-teaming to synthesize comprehensive and high-coverage prompts based on real-world tasks for inducing unsafe cyberactivities in the model. Based on PurpCode, we develop a reasoning-based coding model, namely PurpCode-32B, which demonstrates state-of-the-art cybersafety, outperforming various frontier models. Meanwhile, our alignment method decreases the model overrefusal rates in both general and cybersafety-specific scenarios, while preserving model utility in both code generation and common security knowledge.
- [193] arXiv:2507.19079 (replaced) [pdf, other]
-
Title: SmartPNT-MSF: A Multi-Sensor Fusion Dataset for Positioning and Navigation ResearchSubjects: Robotics (cs.RO); Machine Learning (cs.LG)
High-precision navigation and positioning systems are critical for applications in autonomous vehicles and mobile mapping, where robust and continuous localization is essential. To test and enhance the performance of algorithms, some research institutions and companies have successively constructed and publicly released datasets. However, existing datasets still suffer from limitations in sensor diversity and environmental coverage. To address these shortcomings and advance development in related fields, the SmartPNT Multisource Integrated Navigation, Positioning, and Attitude Dataset has been developed. This dataset integrates data from multiple sensors, including Global Navigation Satellite Systems (GNSS), Inertial Measurement Units (IMU), optical cameras, and LiDAR, to provide a rich and versatile resource for research in multi-sensor fusion and high-precision navigation. The dataset construction process is thoroughly documented, encompassing sensor configurations, coordinate system definitions, and calibration procedures for both cameras and LiDAR. A standardized framework for data collection and processing ensures consistency and scalability, enabling large-scale analysis. Validation using state-of-the-art Simultaneous Localization and Mapping (SLAM) algorithms, such as VINS-Mono and LIO-SAM, demonstrates the dataset's applicability for advanced navigation research. Covering a wide range of real-world scenarios, including urban areas, campuses, tunnels, and suburban environments, the dataset offers a valuable tool for advancing navigation technologies and addressing challenges in complex environments. By providing a publicly accessible, high-quality dataset, this work aims to bridge gaps in sensor diversity, data accessibility, and environmental representation, fostering further innovation in the field.
- [194] arXiv:2507.19747 (replaced) [pdf, html, other]
-
Title: TokenBlowUp: Resolving Representational Singularities in LLM Token Spaces via Monoidal TransformationsSubjects: Algebraic Geometry (math.AG); Machine Learning (cs.LG)
Recent work has provided compelling evidence challenging the foundational manifold hypothesis for the token embedding spaces of Large Language Models (LLMs). These findings reveal the presence of geometric singularities around polysemous tokens, which can lead to representational instability. Existing methodologies, which presuppose a smooth data manifold, are ill-equipped to address such intrinsic structural flaws. In this paper, we formalize this problem in the language of scheme theory and propose a rigorous resolution by applying the scheme-theoretic blow-up at each singular point. This procedure replaces a singular point in the ambient affine scheme with its exceptional divisor, which we identify as a canonical geometric space -- a projective space of directions -- that houses the disambiguated semantic meanings of the token. This process of ``representational desingularization'' constructs a new geometric landscape for embeddings. We prove a formal theorem guaranteeing the geometric regularization of this new space, showing that the original pathologies are resolved. Finally, we outline the architectural implications of our framework, arguing for a paradigm shift from static look-ups to dynamic, geometrically-grounded computation.
- [195] arXiv:2507.21035 (replaced) [pdf, html, other]
-
Title: GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression AnalysisComments: 51 pages (13 pages for the main text, 9 pages for references, and 29 pages for the appendix)Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Genomics (q-bio.GN)
Gene expression analysis holds the key to many biomedical discoveries, yet extracting insights from raw transcriptomic data remains formidable due to the complexity of multiple large, semi-structured files and the need for extensive domain expertise. Current automation approaches are often limited by either inflexible workflows that break down in edge cases or by fully autonomous agents that lack the necessary precision for rigorous scientific inquiry. GenoMAS charts a different course by presenting a team of LLM-based scientists that integrates the reliability of structured workflows with the adaptability of autonomous agents. GenoMAS orchestrates six specialized LLM agents through typed message-passing protocols, each contributing complementary strengths to a shared analytic canvas. At the heart of GenoMAS lies a guided-planning framework: programming agents unfold high-level task guidelines into Action Units and, at each juncture, elect to advance, revise, bypass, or backtrack, thereby maintaining logical coherence while bending gracefully to the idiosyncrasies of genomic data.
On the GenoTEX benchmark, GenoMAS reaches a Composite Similarity Correlation of 89.13% for data preprocessing and an F$_1$ of 60.48% for gene identification, surpassing the best prior art by 10.61% and 16.85% respectively. Beyond metrics, GenoMAS surfaces biologically plausible gene-phenotype associations corroborated by the literature, all while adjusting for latent confounders. Code is available at this https URL. - [196] arXiv:2507.21886 (replaced) [pdf, html, other]
-
Title: Efficient Pain Recognition via Respiration Signals: A Single Cross-Attention Transformer Multi-Window Fusion PipelineSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Signal Processing (eess.SP)
Pain is a complex condition affecting a large portion of the population. Accurate and consistent evaluation is essential for individuals experiencing pain, and it supports the development of effective and advanced management strategies. Automatic pain assessment systems provide continuous monitoring and support clinical decision-making, aiming to reduce distress and prevent functional decline. This study has been submitted to the \textit{Second Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN)}. The proposed method introduces a pipeline that leverages respiration as the input signal and incorporates a highly efficient cross-attention transformer alongside a multi-windowing strategy. Extensive experiments demonstrate that respiration is a valuable physiological modality for pain assessment. Moreover, experiments revealed that compact and efficient models, when properly optimized, can achieve strong performance, often surpassing larger counterparts. The proposed multi-window approach effectively captures both short-term and long-term features, as well as global characteristics, thereby enhancing the model's representational capacity.
- [197] arXiv:2507.22581 (replaced) [pdf, html, other]
-
Title: Unveiling the Influence of Amplifying Language-Specific NeuronsInaya Rahmanisa, Lyzander Marciano Andrylie, Mahardika Krisna Ihsani, Alfan Farizki Wicaksono, Haryo Akbarianto Wibowo, Alham Fikri AjiComments: Our code and dataset are made available at this https URLSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Language-specific neurons in LLMs that strongly correlate with individual languages have been shown to influence model behavior by deactivating them. However, their role in amplification remains underexplored. This work investigates the effect of amplifying language-specific neurons through interventions across 18 languages, including low-resource ones, using three models primarily trained in different languages. We compare amplification factors by their effectiveness in steering to the target language using a proposed Language Steering Shift (LSS) evaluation score, then evaluate it on downstream tasks: commonsense reasoning (XCOPA, XWinograd), knowledge (Include), and translation (FLORES). The optimal amplification factors effectively steer output toward nearly all tested languages. Intervention using this factor on downstream tasks improves self-language performance in some cases but generally degrades cross-language results. These findings highlight the effect of language-specific neurons in multilingual behavior, where amplification can be beneficial especially for low-resource languages, but provides limited advantage for cross-lingual transfer.
- [198] arXiv:2507.22782 (replaced) [pdf, html, other]
-
Title: Enhancing Multi-Agent Collaboration with Attention-Based Actor-Critic PoliciesComments: 8 pagesSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
This paper introduces Team-Attention-Actor-Critic (TAAC), a reinforcement learning algorithm designed to enhance multi-agent collaboration in cooperative environments. TAAC employs a Centralized Training/Centralized Execution scheme incorporating multi-headed attention mechanisms in both the actor and critic. This design facilitates dynamic, inter-agent communication, allowing agents to explicitly query teammates, thereby efficiently managing the exponential growth of joint-action spaces while ensuring a high degree of collaboration. We further introduce a penalized loss function which promotes diverse yet complementary roles among agents. We evaluate TAAC in a simulated soccer environment against benchmark algorithms representing other multi-agent paradigms, including Proximal Policy Optimization and Multi-Agent Actor-Attention-Critic. We find that TAAC exhibits superior performance and enhanced collaborative behaviors across a variety of metrics (win rates, goal differentials, Elo ratings, inter-agent connectivity, balanced spatial distributions, and frequent tactical interactions such as ball possession swaps).