Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > q-bio
arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Quantitative Biology

  • New submissions
  • Cross-lists
  • Replacements

See recent articles

Showing new listings for Friday, 1 August 2025

Total of 28 entries
Showing up to 2000 entries per page: fewer | more | all

New submissions (showing 13 of 13 entries)

[1] arXiv:2507.22942 [pdf, other]
Title: Reproducibility and scientific interpretation in the age of AI: consilience in biological systematics, ecology, and molecular biology
Charles Morphy D. Santos, Luciana Campos Paulino, Michaella P. Andrade, Gabriel Tognella-Poccia, João Paulo Gois
Comments: 27 pages, 5 figures
Subjects: Other Quantitative Biology (q-bio.OT)

Achieving complete reproducibility in science, particularly in research fields such as biodiversity, is challenging due to analytical choices, bias and interpretation. Here, we examine examples of reproducibility in biological systematics, ecology, and molecular biology. To mitigate the impact of interpretation and analytical choices, Artificial Intelligence (AI) has provided potential tools. In the present work, while emphasizing the need for methodological rigor and transparency, we acknowledge the role of interpretation in activities such as coding biological characters and detecting morphological patterns in nature. We explore the opportunities and limitations associated with the synergy between big data and AI in molecular biology, emphasizing the need for a more comprehensive and integrated approach based on dataset quality and usefulness. We conclude by advocating for AI-based tools to assist biologists, reinforcing consilience as a criterion for scientific validity without hindering scientific progress.

[2] arXiv:2507.23030 [pdf, html, other]
Title: Place-cell heterogeneity underlies power-laws in hippocampal activity
John J. Briguglio, Jaesung Lee, Albert K. Lee, Vincent Hakim, Sandro Romani
Comments: 17 pages, 4 figures
Subjects: Neurons and Cognition (q-bio.NC)

Power-law scaling in coarse-grained data suggests critical dynamics, but the true source of this scaling often remains unclear. Here, we analyze neural activity recorded during spatial navigation, reproducing power-law scaling under a phenomenological renormalization group (PRG) procedure that clusters units by activity similarity. Such scaling was previously linked to criticality. Here, we show that the iterative nature of the procedure itself leads to the emergence of power laws when applied to heterogeneous, non-interacting units obeying spatially structured activity without requiring critical interactions. Furthermore, the scaling exponents produced by heteregeneous non-interacting units match the observed exponents in recorded neural data. A simplified version of the PRG further reveals how heterogeneity smooths transitions across scales, mimicking critical behavior. The resulting exponents depend systematically on system and population size, predictions confirmed by subsampling the data.

[3] arXiv:2507.23038 [pdf, html, other]
Title: Investigation of a two-patch within-host model of hepatitis B viral infection
Keoni Castellano, Omar Saucedo, Stanca M. Ciupe
Subjects: Other Quantitative Biology (q-bio.OT)

Chronic infection with hepatitis B virus (HBV) can lead to formation of abnormal nodular structures within the liver. To address how changes in liver anatomy affect overall virus-host dynamics, we developed within-host ordinary differential equation models of two-patch hepatitis B infection, one that assumes irreversible and one that assumes reversible movement between nodular structures. We investigated the models analytically and numerically, and determined the contribution of patch susceptibility, immune responses, and virus movement on within-patch and whole-liver virus dynamics. We explored the structural and practical identifiability of the models by implementing a differential algebra approach and the Monte Carlo approach for a specific HBV data set. We determined conditions for viral clearance, viral localization, and systemic viral infection. Our study suggests that cell susceptibility to infection within modular structures, the movement rate between patches, and the immune-mediated infected cell killing have the most influence on HBV dynamics. Our results can help inform intervention strategies.

[4] arXiv:2507.23048 [pdf, other]
Title: CARTEpigenoQC: A Quality Control Toolkit for CAR-T Single-Cell Epigenomic Data
Kaitao Lai
Comments: 5 pages, 2 figures, 2 tables
Subjects: Genomics (q-bio.GN)

CARTEpigenoQC is an R-based toolkit designed to streamline quality control (QC) for single-cell epigenomic datasets involving Chimeric Antigen Receptor (CAR)-engineered T cells. With the growing application of scATAC-seq, scCUT&Tag, and scBS-seq to characterize CAR-T cell states, it has become critical to perform customized QC that not only addresses standard metrics like FRiP (Fraction of Reads in Peaks) and TSS enrichment, but also directly detects signal from CAR vector insertion sites. CARTEpigenoQC supports both 10x Genomics and non-10x data formats and produces HTML and PNG summary outputs suited for exploratory analysis and regulatory-grade preclinical reporting. It is intended to assist researchers, core facilities, and translational immunologists in ensuring the validity of single-cell epigenomic profiling of engineered T cells.

[5] arXiv:2507.23056 [pdf, html, other]
Title: Phylogenetic network models as graphical models
Seth Sullivant
Comments: 21 pages, 7 figures
Subjects: Populations and Evolution (q-bio.PE); Combinatorics (math.CO); Statistics Theory (math.ST)

The displayed tree phylogenetic network model is shown to sit as a natural submodel of the graphical model associated to a directed acyclic graph (DAG). This representation allows to derive a number of results about the displayed tree model. In particular, the concept of a local modification to a DAG model is developed and applied to the displayed tree model. As an application, some nonidentifiability issues related to the displayed tree models are highlighted as they relate to reticulation edges and stacked reticulations in the networks. We also derive rank conditions on flattenings of probability tensors for the displayed tree model, generalizing classic results for phylogenetic tree models.

[6] arXiv:2507.23101 [pdf, html, other]
Title: The Lipid Interactome: An interactive and open access platform for exploring cellular lipid-protein interactomes
Gaelen Guzman, André Nadler, Frank Stein, Jeremy M. Baskin, Carsten Schultz, Fikadu Tafesse
Comments: 10 pages
Subjects: Quantitative Methods (q-bio.QM)

Lipid-protein interactions play essential roles in cellular signaling and membrane dynamics, yet their systematic characterization has long been hindered by the inherent biochemical properties of lipids. Recent advances in functionalized lipid probes -- equipped with photoactivatable crosslinkers, affinity handles, and photocleavable protecting groups -- have enabled proteomics-based identification of lipid interacting proteins with unprecedented specificity and resolution. Despite the growing number of published lipid interactomes, there remains no centralized effort to harmonize, compare, or integrate these datasets.
The Lipid Interactome addresses this gap by providing a structured, interactive web portal that adheres to FAIR data principles -- ensuring that lipid interactome studies are Findable, Accessible, Interoperable, and Reusable. Through standardized data formatting, interactive visualizations, and direct cross-study comparisons, this resource enables researchers to systematically explore the protein-binding partners of diverse bioactive lipids. By consolidating and curating lipid interactome proteomics data from multiple studies, the Lipid Interactome database serves as a critical tool for deciphering the biological functions of lipids in cellular systems.

[7] arXiv:2507.23116 [pdf, html, other]
Title: Alpha-Z divergence unveils further distinct phenotypic traits of human brain connectivity fingerprint
Md Kaosar Uddin, Nghi Nguyen, Huajun Huang, Duy Duong-Tran, Jingyi Zheng
Subjects: Neurons and Cognition (q-bio.NC)

The accurate identification of individuals from functional connectomes (FCs) is critical for advancing individualized assessments in neuropsychiatric research. Traditional methods, such as Pearson's correlation, have limitations in capturing the complex, non-Euclidean geometry of FC data, leading to suboptimal performance in identification performance. Recent developments have introduced geodesic distance as a more robust metric; however, its performance is highly sensitive to regularization choices, which vary by spatial scale and task condition. To address these challenges, we propose a novel divergence-based distance metric, the Alpha-Z Bures-Wasserstein divergence, which provides a more flexible and geometry-aware framework for FC comparison. Unlike prior methods, our approach does not require meticulous parameter tuning and maintains strong identification performance across multiple task conditions and spatial resolutions. We evaluate our approach against both traditional (e.g., Euclidean, Pearson) and state-of-the-art manifold-based distances (e.g., affine-invariant, log-Euclidean, Bures-Wasserstein), and systematically investigate how varying regularization strengths affect geodesic distance performance on the Human Connectome Project dataset. Our results show that the proposed method significantly improves identification rates over traditional and geodesic distances, particularly when optimized regularization is applied, and especially in high-dimensional settings where matrix rank deficiencies degrade existing metrics. We further validate its generalizability across resting-state and task-based fMRI, using multiple parcellation schemes. These findings suggest that the new divergence provides a more reliable and generalizable framework for functional connectivity analysis, offering enhanced sensitivity in linking FC patterns to cognitive and behavioral outcomes.

[8] arXiv:2507.23146 [pdf, html, other]
Title: Lightweight Language Models are Prone to Reasoning Errors for Complex Computational Phenotyping Tasks
Sarah Pungitore, Shashank Yadav, David Maughan, Vignesh Subbian
Subjects: Quantitative Methods (q-bio.QM)

Objective: Although computational phenotyping is a central informatics activity with resulting cohorts supporting a wide variety of applications, it is time-intensive because of manual data review. We previously assessed the ability of LLMs to perform computational phenotyping tasks using computable phenotypes for ARF respiratory support therapies. They successfully performed concept classification and classification of single-therapy phenotypes, but underperformed on multiple-therapy phenotypes. To understand issues with these complex tasks, we expanded PHEONA, a generalizable framework for evaluation of LLMs, to include methods specifically for evaluating faulty reasoning. Materials and Methods: We assessed the responses of three lightweight LLMs (DeepSeek-r1 32 billion, Mistral Small 24 billion, and Phi-4 14 billion) both with and without prompt modifications to identify explanation correctness and unfaithfulness errors for phenotyping. Results: For experiments without prompt modifications, both errors were present across all models although more responses had explanation correctness errors than unfaithfulness errors. For experiments assessing accuracy impact after prompt modifications, DeepSeek, a reasoning model, had the smallest overall accuracy impact when compared to Mistral and Phi. Discussion: Since reasoning errors were ubiquitous across models, our enhancement of PHEONA to include a component for assessing faulty reasoning provides critical support for LLM evaluation and evidence for reasoning errors for complex tasks. While insights from reasoning errors can help prompt refinement, a deeper understanding of why LLM reasoning errors occur will likely require further development and refinement of interpretability methods. Conclusion: Reasoning errors were pervasive across LLM responses for computational phenotyping, a complex reasoning task

[9] arXiv:2507.23481 [pdf, other]
Title: Factors controlling protein evolvability, at the molecular scale
Jorge A. Vila
Subjects: Populations and Evolution (q-bio.PE); Biomolecules (q-bio.BM)

This piece serves two purposes. Firstly, it aims at elucidating the role of epistasis in shaping, at a molecular level, the evolutionary paths of proteins, as well as the extent to which these epistatic effects are the outcome of an as-yet-unidentified epistatic force. Second, it seeks to ascertain the extent to which the principle of least action will enable us to identify which of all potential trajectories has the highest evolutionary efficiency, as well as how variations in factors such as protein robustness and folding rates, resulting from the unavoidability of destabilizing mutations, might influence this critical evolutionary process. The initial findings suggest that protein evolution, at a molecular level, may be more predictable than previously thought, as epistasis and the principle of least action collectively impose constraints on evolutionary paths and trajectories, and consequently, on protein evolvability. Thus, this work should advance our understanding of the main molecular mechanisms that underlie the evolution of mutation-driven proteins and also provide grounds to answer a fundamental evolutionary question: how does Darwinian selection regard all potential trajectories available?

[10] arXiv:2507.23537 [pdf, other]
Title: Global, Regional, and National Burden of Chronic Kidney Disease Attributable to High Body Mass Index (BMI) among Individuals Aged 20-54 Years from 1990 to 2021: An Analysis of the Global Burden of Disease Study
Yu Chen, Guangxi Wu
Subjects: Populations and Evolution (q-bio.PE)

Background:Chronic kidney disease is one of the most prevalent non-communicable health issues globally, and high body mass index plays a significant role in the onset and progression of chronic kidney disease. Methods: Data on the disease burden attributable to high body mass index were retrieved from the 2021 Global Burden of Disease, Injuries, and Risk Factors Study . The global cases, age-standardized mortality rate , and age-standardized disability-adjusted life years attributable to high body mass index were estimated based on age, sex, geographic location, and the Social-demographic Index (SDI). The estimated annual percentage change was calculated to quantify trends in ASMR and ASDR from 1990 to 2019. Decomposition and frontier analyses were conducted to understand the drivers behind changes in burden and to identify top-performing countries. Inequality analysis was performed to assess disparities in burden across different SDI levels. The Bayesian age-period-cohort model was used to predict the disease burden up to this http URL: In 2021, there were 4,643.41 global deaths and 2,514,227.16 DALYs attributable to high body mass index-related CKD, more than triple the figures from 1990. Additionally, from 1990 to 2021, the ASMR and ASDR accelerated, with EAPCs of 2.25 (95% CI: 2.13 to 2.37) and 1.98 (95% CI: 1.89 to 2.08), respectively, particularly among males, in High-income North America, and in Low-middle SDI regions. In terms of SDI, the Low-middle SDI region had the highest ASMR and ASDR related to CKD in 2021. Conclusion: From 1990 to 2021, there was a significant increase in global deaths and DALYs attributable to high high body mass index related CKD. As a major public health issue for CKD patients, high BMI urgently requires targeted measures to address it.

[11] arXiv:2507.23636 [pdf, html, other]
Title: Household scale Wolbachia release strategies for effective dengue control
Abby Barlow, Ben Adams
Subjects: Populations and Evolution (q-bio.PE)

The release of Wolbachia-infected mosquitoes into Aedes aegypti infested areas is a promising strategy for localised eradication of dengue infection. Ae aegypti mosquitoes favour urban environments as breeding habitats, so are often found in and around houses. Therefore, it is likely that they will infect members of the households that they reside around. Since population groupings within households are small, stochastic effects become important. Despite this, little work has been carried out to investigate the outcome of releasing Wolbachia-infected mosquitoes at a household scale, either from an empirical and theoretical stand point. In previous work, we developed and analysed a stochastic (continuous time Markov chain) model for the invasion of Wolbachia-infected mosquitoes into a single household containing a population of wildtype mosquitoes. In the present study, we extend our framework to a connected community of households coupled by the movement of mosquitoes. We use numerical results obtained via Gillespie's stochastic simulation algorithm to investigate optimal strategies for the release of Wolbachia-infected mosquitoes carried out at either the community or the household scale. We find that household scale releases can facilitate rapid and successful invasion of the Wolbachia-infected mosquitoes into the household population and then into the wider community. We further explore the impact of regular household scale releases of Wolbachia-infected mosquitoes for a range of compositions for the release population, time intervals between releases and proportion of households participating in the releases. We find that a single release household can provide sufficient protection to the entire community of households if releases are carried out frequently for a number of years and a sufficient number of females are released on each occasion.

[12] arXiv:2507.23678 [pdf, other]
Title: A complex network perspective on brain disease
David Papo, Javier M. Buldú
Comments: 26 pages, 3 figures
Subjects: Neurons and Cognition (q-bio.NC); Applied Physics (physics.app-ph)

If brain anatomy and dynamics have a genuine complex network structure as it has become standard to posit, it is also reasonable to assume that such a structure should play a key role not only in brain function but also in brain dysfunction. However, exactly how network structure is implicated in brain damage and whether at least some pathologies can be thought of as "network diseases" is not entirely clear. Here we discuss ways in which a complex network representation can help characterising brain pathology, but also subjects' vulnerability to and likelihood of recovery from disease. We show how the way disease is defined is related to the way function is defined and this, in turn, determines which network property may be functionally relevant to brain disease. Thus, addressing brain disease "networkness" may shed light not only on brain pathology, with potential clinical implications, but also on functional brain activity, and what is functional in it.

[13] arXiv:2507.23769 [pdf, html, other]
Title: Environment heterogeneity creates fast amplifiers of natural selection in graph-structured populations
Cecilia Fruet, Arthur Alexandre, Alia Abbara, Claude Loverdo, Anne-Florence Bitbol
Comments: 50 pages, 18 figures
Subjects: Populations and Evolution (q-bio.PE)

Complex spatial structure, with partially isolated subpopulations, and environment heterogeneity, such as gradients in nutrients, oxygen, and drugs, both shape the evolution of natural populations. We investigate the impact of environment heterogeneity on mutant fixation in spatially structured populations with demes on the nodes of a graph. When migrations between demes are frequent, we demonstrate that environment heterogeneity can amplify natural selection and simultaneously accelerate mutant fixation and extinction, thereby fostering the quick fixation of beneficial mutants. We evidence this effect in the star graph, more strongly in the line graph, and also in a more general class of graphs. We show that for amplification to occur, mutants must have a stronger fitness advantage in demes with stronger migration outflow. In circulation graphs, where migration inflow and outflow are equal in each deme, we find that environment heterogeneity has no impact in a first approximation, but increases the fixation probability of beneficial mutants to second order. When migrations between demes are rare, we show that environment heterogeneity can also foster amplification of selection, by allowing demes with sufficient mutant advantage to become refugia for mutants.

Cross submissions (showing 8 of 8 entries)

[14] arXiv:2507.22954 (cross-list from cs.LG) [pdf, html, other]
Title: Neural Autoregressive Modeling of Brain Aging
Ridvan Yesiloglu, Wei Peng, Md Tauhidul Islam, Ehsan Adeli
Comments: Accepted at Deep Generative Models Workshop @ MICCAI 2025
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Neurons and Cognition (q-bio.NC)

Brain aging synthesis is a critical task with broad applications in clinical and computational neuroscience. The ability to predict the future structural evolution of a subject's brain from an earlier MRI scan provides valuable insights into aging trajectories. Yet, the high-dimensionality of data, subtle changes of structure across ages, and subject-specific patterns constitute challenges in the synthesis of the aging brain. To overcome these challenges, we propose NeuroAR, a novel brain aging simulation model based on generative autoregressive transformers. NeuroAR synthesizes the aging brain by autoregressively estimating the discrete token maps of a future scan from a convenient space of concatenated token embeddings of a previous and future scan. To guide the generation, it concatenates into each scale the subject's previous scan, and uses its acquisition age and the target age at each block via cross-attention. We evaluate our approach on both the elderly population and adolescent subjects, demonstrating superior performance over state-of-the-art generative models, including latent diffusion models (LDM) and generative adversarial networks, in terms of image fidelity. Furthermore, we employ a pre-trained age predictor to further validate the consistency and realism of the synthesized images with respect to expected aging patterns. NeuroAR significantly outperforms key models, including LDM, demonstrating its ability to model subject-specific brain aging trajectories with high fidelity.

[15] arXiv:2507.22963 (cross-list from cs.LG) [pdf, html, other]
Title: FedCVD++: Communication-Efficient Federated Learning for Cardiovascular Risk Prediction with Parametric and Non-Parametric Model Optimization
Abdelrhman Gaber, Hassan Abd-Eltawab, John Elgallab, Youssif Abuzied, Dineo Mpanya, Turgay Celik, Swarun Kumar, Tamer ElBatt
Subjects: Machine Learning (cs.LG); Other Quantitative Biology (q-bio.OT)

Cardiovascular diseases (CVD) cause over 17 million deaths annually worldwide, highlighting the urgent need for privacy-preserving predictive systems. We introduce FedCVD++, an enhanced federated learning (FL) framework that integrates both parametric models (logistic regression, SVM, neural networks) and non-parametric models (Random Forest, XGBoost) for coronary heart disease risk prediction. To address key FL challenges, we propose: (1) tree-subset sampling that reduces Random Forest communication overhead by 70%, (2) XGBoost-based feature extraction enabling lightweight federated ensembles, and (3) federated SMOTE synchronization for resolving cross-institutional class imbalance.
Evaluated on the Framingham dataset (4,238 records), FedCVD++ achieves state-of-the-art results: federated XGBoost (F1 = 0.80) surpasses its centralized counterpart (F1 = 0.78), and federated Random Forest (F1 = 0.81) matches non-federated performance. Additionally, our communication-efficient strategies reduce bandwidth consumption by 3.2X while preserving 95% accuracy.
Compared to existing FL frameworks, FedCVD++ delivers up to 15% higher F1-scores and superior scalability for multi-institutional deployment. This work represents the first practical integration of non-parametric models into federated healthcare systems, providing a privacy-preserving solution validated under real-world clinical constraints.

[16] arXiv:2507.23057 (cross-list from eess.SP) [pdf, html, other]
Title: Neural Energy Landscapes Predict Working Memory Decline After Brain Tumor Resection
Triet M. Tran, Sina Khanmohammadi
Subjects: Signal Processing (eess.SP); Neurons and Cognition (q-bio.NC)

Surgical resection is the primary treatment option for brain tumor patients, but it carries the risk of postoperative cognitive dysfunction. This study investigates how tumor-induced alterations in presurgical neural dynamics relate to postoperative working memory decline. We analyzed functional magnetic resonance imaging (fMRI) of brain tumor patients before surgery and extracted energy landscapes of high-order brain interactions. We then examined the relation between these energy features and postoperative working memory performance using statistical and machine learning (random forest) models. Patients with lower postoperative working memory scores exhibited fewer but more extreme transitions between local energy minima and maxima, whereas patients with higher scores showed more frequent but less extreme shifts. Furthermore, the presurgical high-order energy features were able to accurately predict postoperative working memory decline with a mean accuracy of 90\%, F1 score of 87.5\%, and an AUC of 0.95. Our study suggests that the brain tumor-induced disruptions in high-order neural dynamics before surgery are predictive of postoperative working memory decline. Our findings pave the path for personalized surgical planning and targeted interventions to mitigate cognitive risks associated with brain tumor resection.

[17] arXiv:2507.23227 (cross-list from cs.CL) [pdf, html, other]
Title: Enabling Few-Shot Alzheimer's Disease Diagnosis on Tabular Biomarker Data with LLMs
Sophie Kearney, Shu Yang, Zixuan Wen, Bojian Hou, Duy Duong-Tran, Tianlong Chen, Jason Moore, Marylyn Ritchie, Li Shen
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)

Early and accurate diagnosis of Alzheimer's disease (AD), a complex neurodegenerative disorder, requires analysis of heterogeneous biomarkers (e.g., neuroimaging, genetic risk factors, cognitive tests, and cerebrospinal fluid proteins) typically represented in a tabular format. With flexible few-shot reasoning, multimodal integration, and natural-language-based interpretability, large language models (LLMs) offer unprecedented opportunities for prediction with structured biomedical data. We propose a novel framework called TAP-GPT, Tabular Alzheimer's Prediction GPT, that adapts TableGPT2, a multimodal tabular-specialized LLM originally developed for business intelligence tasks, for AD diagnosis using structured biomarker data with small sample sizes. Our approach constructs few-shot tabular prompts using in-context learning examples from structured biomedical data and finetunes TableGPT2 using the parameter-efficient qLoRA adaption for a clinical binary classification task of AD or cognitively normal (CN). The TAP-GPT framework harnesses the powerful tabular understanding ability of TableGPT2 and the encoded prior knowledge of LLMs to outperform more advanced general-purpose LLMs and a tabular foundation model (TFM) developed for prediction tasks. To our knowledge, this is the first application of LLMs to the prediction task using tabular biomarker data, paving the way for future LLM-driven multi-agent frameworks in biomedical informatics.

[18] arXiv:2507.23359 (cross-list from eess.IV) [pdf, html, other]
Title: Pixel Embedding Method for Tubular Neurite Segmentation
Huayu Fu, Jiamin Li, Haozhi Qu, Xiaolin Hu, Zengcai Guo
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)

Automatic segmentation of neuronal topology is critical for handling large scale neuroimaging data, as it can greatly accelerate neuron annotation and analysis. However, the intricate morphology of neuronal branches and the occlusions among fibers pose significant challenges for deep learning based segmentation. To address these issues, we propose an improved framework: First, we introduce a deep network that outputs pixel level embedding vectors and design a corresponding loss function, enabling the learned features to effectively distinguish different neuronal connections within occluded regions. Second, building on this model, we develop an end to end pipeline that directly maps raw neuronal images to SWC formatted neuron structure trees. Finally, recognizing that existing evaluation metrics fail to fully capture segmentation accuracy, we propose a novel topological assessment metric to more appropriately quantify the quality of neuron segmentation and reconstruction. Experiments on our fMOST imaging dataset demonstrate that, compared to several classical methods, our approach significantly reduces the error rate in neuronal topology reconstruction.

[19] arXiv:2507.23384 (cross-list from physics.bio-ph) [pdf, html, other]
Title: Could Living Cells Use Phase Transitions to Process Information?
Arvind Murugan, David Zwicker, Charlotta Lorenz, Eric R. Dufresne
Comments: 8 pages, 5 figures
Subjects: Biological Physics (physics.bio-ph); Disordered Systems and Neural Networks (cond-mat.dis-nn); Soft Condensed Matter (cond-mat.soft); Adaptation and Self-Organizing Systems (nlin.AO); Cell Behavior (q-bio.CB)

To maintain homeostasis, living cells process information with networks of interacting molecules. Traditional models for cellular information processing have focused on networks of chemical reactions between molecules. Here, we describe how networks of physical interactions could contribute to the processing of information inside cells. In particular, we focus on the impact of biomolecular condensation, a structural phase transition found in cells. Biomolecular condensation has recently been implicated in diverse cellular processes. Some of these are essentially computational, including classification and control tasks. We place these findings in the broader context of physical computing, an emerging framework for describing how the native dynamics of nonlinear physical systems can be leveraged to perform complex computations. The synthesis of these ideas raises questions about expressivity (the range of problems that cellular phase transitions might be able to solve) and learning (how these systems could adapt and evolve to solve different problems). This emerging area of research presents diverse opportunities across molecular biophysics, soft matter, and physical computing.

[20] arXiv:2507.23454 (cross-list from cs.HC) [pdf, other]
Title: Breaking the mould of Social Mixed Reality -- State-of-the-Art and Glossary
Marta Bieńkiewicz, Julia Ayache, Panayiotis Charalambous, Cristina Becchio, Marco Corragio, Bertram Taetz, Francesco De Lellis, Antonio Grotta, Anna Server, Daniel Rammer, Richard Kulpa, Franck Multon, Azucena Garcia-Palacios, Jessica Sutherland, Kathleen Bryson, Stéphane Donikian, Didier Stricker, Benoît Bardy
Comments: pre-print
Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Emerging Technologies (cs.ET); Graphics (cs.GR); Neurons and Cognition (q-bio.NC)

This article explores a critical gap in Mixed Reality (MR) technology: while advances have been made, MR still struggles to authentically replicate human embodiment and socio-motor interaction. For MR to enable truly meaningful social experiences, it needs to incorporate multi-modal data streams and multi-agent interaction capabilities. To address this challenge, we present a comprehensive glossary covering key topics such as Virtual Characters and Autonomisation, Responsible AI, Ethics by Design, and the Scientific Challenges of Social MR within Neuroscience, Embodiment, and Technology. Our aim is to drive the transformative evolution of MR technologies that prioritize human-centric innovation, fostering richer digital connections. We advocate for MR systems that enhance social interaction and collaboration between humans and virtual autonomous agents, ensuring inclusivity, ethical design and psychological safety in the process.

[21] arXiv:2507.23462 (cross-list from physics.bio-ph) [pdf, other]
Title: Sub-surface Skin Deformation in Response to Gentle Brushing
Saito Sakaguchi, Basil Duvernoy, Anders Fridberger, Håkan Olausson, Sarah McIntyre
Comments: 2 pages, Work-in-Progress at IEEE World Haptics Conference 2025 (WHC2025)
Subjects: Biological Physics (physics.bio-ph); Tissues and Organs (q-bio.TO)

Even simple tactile stimuli can lead to remarkably different perceptions among individuals, both in intensity and pleasantness. To understand the physical factors behind this variation, it is important to investigate how mechanical events are transmitted through the skin. In this study, we visualize the internal skin strains in response to soft brushing stimuli using functional Optical Coherence Tomography (fOCT), which provides depth-resolved time-series data of the displacement of the skin. Driven with custom-made software, the system enabled sub-surface imaging at a refresh rate of 10 kHz. Brushing was applied to the back of the hand, and skin displacement was observed at different depths. The results show that each skin layer responds differently to the stimulus, suggesting that internal skin dynamics play a role in tactile perception. This method offers a way to investigate how mechanical events within the skin relate to sensory function.

Replacement submissions (showing 7 of 7 entries)

[22] arXiv:2407.01621 (replaced) [pdf, html, other]
Title: Deciphering interventional dynamical causality from non-intervention complex systems
Jifan Shi, Yang Li, Juan Zhao, Siyang Leng, Rui Bao, Kazuyuki Aihara, Luonan Chen, Wei Lin
Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Methodology (stat.ME); Machine Learning (stat.ML)

Detecting and quantifying causality is a focal topic in the fields of science, engineering, and interdisciplinary studies. However, causal studies on non-intervention systems attract much attention but remain extremely challenging. Delay-embedding technique provides a promising approach. In this study, we propose a framework named Interventional Dynamical Causality (IntDC) in contrast to the traditional Constructive Dynamical Causality (ConDC). ConDC, including Granger causality, transfer entropy and convergence of cross-mapping, measures the causality by constructing a dynamical model without considering interventions. A computational criterion, Interventional Embedding Entropy (IEE), is proposed to measure causal strengths in an interventional manner. IEE is an intervened causal information flow but in the delay-embedding space. Further, the IEE theoretically and numerically enables the deciphering of IntDC solely from observational (non-interventional) time-series data, without requiring any knowledge of dynamical models or real interventions in the considered system. In particular, IEE can be applied to rank causal effects according to their importance and construct causal networks from data. We conducted numerical experiments to demonstrate that IEE can find causal edges accurately, eliminate effects of confounding, and quantify causal strength robustly over traditional indices. We also applied IEE to real-world tasks. IEE performed as an accurate and robust tool for causal analyses solely from the observational data. The IntDC framework and IEE algorithm provide an efficient approach to the study of causality from time series in diverse non-intervention complex systems.

[23] arXiv:2407.19330 (replaced) [pdf, html, other]
Title: Unveiling Cancer Stem Cell Marker Networks: A Hypergraph Approach
David H. Margarit, Gustavo Paccosi, Marcela V. Reale, Lilia M. Romanelli
Subjects: Biological Physics (physics.bio-ph); Quantitative Methods (q-bio.QM)

We propose a novel computational framework leveraging hypergraph theory to analyse cancer stem cell markers (CSCMs) across multiple organs. Hypergraphs provide a robust representation of CSCM co-expression patterns, capturing their complex multi-organ relationships more comprehensively than traditional graph-based methods. By integrating mutual information analysis and Markov models, we identify key markers driving tumour heterogeneity and metastasis, offering detailed insights into their interdependencies. This approach establishes hypergraphs as a computationally powerful tool to model cancer progression and metastatic dynamics, contributing to the understanding of complex biological systems and supporting the development of targeted therapeutic strategies.

[24] arXiv:2502.12831 (replaced) [pdf, html, other]
Title: The gene's eye-view of quantitative genetics
Philibert Courau, Amaury Lambert, Emmanuel Schertzer
Comments: (40 pages, 2 figures)
Subjects: Probability (math.PR); Populations and Evolution (q-bio.PE)

Modelling the evolution of a continuous trait in a biological population is one of the oldest problems in evolutionary biology, which led to the birth of quantitative genetics. With the recent development of GWAS methods, it has become essential to link the evolution of the trait distribution to the underlying evolution of allelic frequencies at many loci, co-contributing to the trait value. The way most articles go about this is to make assumptions on the trait distribution, and use Wright's formula to model how the evolution of the trait translates on each individual locus. Here, we take a gene's eye-view of the system, starting from an explicit finite-loci model with selection, drift, recombination and mutation, in which the trait value is a direct product of the genome. We let the number of loci go to infinity under the assumption of strong recombination, and characterize the limit behavior of a given locus with a McKean-Vlasov SDE and the corresponding Fokker-Planck IPDE. In words, the selection on a typical locus depends on the mean behaviour of the other loci which can be approximated with the law of the focal locus. Results include the independence of two loci and explicit stationary distribution for allelic frequencies at a given locus (under some assumptions on the fitness function).

[25] arXiv:2502.20632 (replaced) [pdf, html, other]
Title: Lattice Protein Folding with Variational Annealing
Shoummo Ahsan Khandoker, Estelle M. Inack, Mohamed Hibat-Allah
Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Biomolecules (q-bio.BM)

Understanding the principles of protein folding is a cornerstone of computational biology, with implications for drug design, bioengineering, and the understanding of fundamental biological processes. Lattice protein folding models offer a simplified yet powerful framework for studying the complexities of protein folding, enabling the exploration of energetically optimal folds under constrained conditions. However, finding these optimal folds is a computationally challenging combinatorial optimization problem. In this work, we introduce a novel upper-bound training scheme that employs masking to identify the lowest-energy folds in two-dimensional Hydrophobic-Polar (HP) lattice protein folding. By leveraging Dilated Recurrent Neural Networks (RNNs) integrated with an annealing process driven by temperature-like fluctuations, our method accurately predicts optimal folds for benchmark systems of up to 60 beads. Our approach also effectively masks invalid folds from being sampled without compromising the autoregressive sampling properties of RNNs. This scheme is generalizable to three spatial dimensions and can be extended to lattice protein models with larger alphabets. Our findings emphasize the potential of advanced machine learning techniques in tackling complex protein folding problems and a broader class of constrained combinatorial optimization challenges.

[26] arXiv:2506.08184 (replaced) [pdf, html, other]
Title: Unable to Forget: Proactive Interference Reveals Working Memory Limits in LLMs Beyond Context Length
Chupei Wang (University of Virginia), Jiaqiu Vince Sun (New York University)
Comments: Accepted at ICML 2025 Workshop on Long Context Foundation Models (ICFM). Code: this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)

Information retrieval in Large Language Models (LLMs) is increasingly recognized as intertwined with generation capabilities rather than mere lookup. While longer contexts are often assumed to improve retrieval, the effects of intra-context interference remain understudied. To address this, we adapt the proactive interference (PI) paradigm from cognitive science, where earlier information disrupts recall of newer updates. In humans, susceptibility to such interference is inversely linked to working memory capacity. We introduce PI-LLM, an evaluation that sequentially streams semantically related key-value updates and queries only the final values. Although these final values are clearly positioned just before the query, LLM retrieval accuracy declines log-linearly toward zero as interference accumulates; errors arise from retrieving previously overwritten values. Attempts to mitigate interference via prompt engineering (e.g., instructing models to ignore earlier input) yield limited success. These findings reveal a fundamental constraint on LLMs' ability to disentangle interference and flexibly manipulate information, suggesting a working memory bottleneck beyond mere context access. This calls for approaches that strengthen models' ability to suppress irrelevant content during retrieval.

[27] arXiv:2507.13762 (replaced) [pdf, html, other]
Title: MolPIF: A Parameter Interpolation Flow Model for Molecule Generation
Yaowei Jin, Junjie Wang, Wenkai Xiang, Duanhua Cao, Dan Teng, Zhehuan Fan, Jiacheng Xiong, Xia Sheng, Chuanlong Zeng, Duo An, Mingyue Zheng, Shuangjia Zheng, Qian Shi
Subjects: Machine Learning (cs.LG); Biomolecules (q-bio.BM)

Advances in deep learning for molecular generation show promise in accelerating drug discovery. Bayesian Flow Networks (BFNs) have recently shown impressive performance across diverse chemical tasks, with their success often ascribed to the paradigm of modeling in a low-variance parameter space. However, the Bayesian inference-based strategy imposes limitations on designing more flexible distribution transformation pathways, making it challenging to adapt to diverse data distributions and varied task requirements. Furthermore, the potential for simpler, more efficient parameter-space-based models is unexplored. To address this, we propose a novel Parameter Interpolation Flow model (named PIF) with detailed theoretical foundation, training, and inference procedures. We then develop MolPIF for structure-based drug design, demonstrating its superior performance across diverse metrics compared to baselines. This work validates the effectiveness of parameter-space-based generative modeling paradigm for molecules and offers new perspectives for model design.

[28] arXiv:2507.21035 (replaced) [pdf, html, other]
Title: GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis
Haoyang Liu, Yijiang Li, Haohan Wang
Comments: 51 pages (13 pages for the main text, 9 pages for references, and 29 pages for the appendix)
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Genomics (q-bio.GN)

Gene expression analysis holds the key to many biomedical discoveries, yet extracting insights from raw transcriptomic data remains formidable due to the complexity of multiple large, semi-structured files and the need for extensive domain expertise. Current automation approaches are often limited by either inflexible workflows that break down in edge cases or by fully autonomous agents that lack the necessary precision for rigorous scientific inquiry. GenoMAS charts a different course by presenting a team of LLM-based scientists that integrates the reliability of structured workflows with the adaptability of autonomous agents. GenoMAS orchestrates six specialized LLM agents through typed message-passing protocols, each contributing complementary strengths to a shared analytic canvas. At the heart of GenoMAS lies a guided-planning framework: programming agents unfold high-level task guidelines into Action Units and, at each juncture, elect to advance, revise, bypass, or backtrack, thereby maintaining logical coherence while bending gracefully to the idiosyncrasies of genomic data.
On the GenoTEX benchmark, GenoMAS reaches a Composite Similarity Correlation of 89.13% for data preprocessing and an F$_1$ of 60.48% for gene identification, surpassing the best prior art by 10.61% and 16.85% respectively. Beyond metrics, GenoMAS surfaces biologically plausible gene-phenotype associations corroborated by the literature, all while adjusting for latent confounders. Code is available at this https URL.

Total of 28 entries
Showing up to 2000 entries per page: fewer | more | all
  • About
  • Help
  • Click here to contact arXiv Contact
  • Click here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack