Projecting knowledge graph queries into an embedding space using geometric models (points, boxes and spheres) can help to answer queries for large incomplete knowledge graphs. In this work, we propose a symbolic learning-free approach using fuzzy logic to address the shape-closure problem that restricted geometric-based embedding models to only a few shapes (e.g. ConE) for answering complex logical queries. The use of symbolic approach facilitates non-closure geometric models (e.g. point, box) to handle logical operators (including negation). This enabled our newly proposed spherical embeddings (SpherE) in this work to use a polar coordinate system to effectively represent hierarchical relation. Results show that the SpherE model can answer existential positive first-order logic and negation queries. We show that SpherE significantly outperforms the point and box embeddings approaches while generating semantically meaningful hierarchy-aware embeddings.
Owing to the unprecedented capability in semantic understanding and logical reasoning, large language models (LLMs) have shown fantastic potential in developing next-generation sequential recommender systems (RSs). However, existing LLM-based sequential RSs mostly separate index generation from sequential recommendation, leading to insufficient integration between semantic information and collaborative information. On the other hand, the neglect of user-related information hinders LLM-based sequential RSs from exploiting high-order user-item interaction patterns. In this paper, we propose the End-to-End Dual Dynamic (ED2) recommender, the first LLM-based sequential RS which adopts dual dynamic index mechanism, targeting resolving the above limitations simultaneously. The dual dynamic index mechanism can not only assembly index generation and sequential recommendation into a unified LLM-backbone pipeline, but also make it practical for LLM-based sequential recommender to take advantage of user-related information. Specifically, to facilitate the LLM comprehension ability to dual dynamic index, we propose a multigrained token regulator which constructs alignment supervision based on LLMs semantic knowledge across multiple representation granularities. Moreover, the associated user collection data and a series of novel instruction tuning tasks are specially customized to capture the high-order user-item interaction patterns. Extensive experiments on three public datasets demonstrate the superiority of ED2, achieving an average improvement of 19.62% in Hit-Rate and 21.11% in NDCG.
Reranking is significant for recommender systems due to its pivotal role in refining recommendation results. Numerous reranking models have emerged to meet diverse reranking requirements in practical applications, which not only prioritize accuracy but also consider additional aspects such as diversity and fairness. However, most of the existing models struggle to strike a harmonious balance between these diverse aspects at the model level. Additionally, the scalability and personalization of these models are often limited by their complexity and a lack of attention to the varying importance of different aspects in diverse reranking scenarios. To address these issues, we propose LLM4Rerank, a comprehensive LLM-based reranking framework designed to bridge the gap between various reranking aspects while ensuring scalability and personalized performance. Specifically, we abstract different aspects into distinct nodes and construct a fully connected graph for LLM to automatically consider aspects like accuracy, diversity, fairness, and more, all in a coherent Chain-of-Thought (CoT) process. To further enhance personalization during reranking, we facilitate a customizable input mechanism that allows fine-tuning of LLM's focus on different aspects according to specific reranking needs. Experimental results on three widely used public datasets demonstrate that LLM4Rerank outperforms existing state-of-the-art reranking models across multiple aspects.
Incorporating a critiquing component into recommender applications facilitates the enhancement of user perception. Typically, critique-able recommender systems adapt the model parameters and update the recommendation list in real-time through the analysis of user critiquing keyphrases in the inference phase. The current critiquing methods necessitate the designation of a dedicated recommendation model to estimate user relevance to the critiquing keyphrase during the training phase preceding the recommendations update. This paradigm restricts the applicable scenarios and reduces the potential for keyphrase exploitation. Furthermore, these approaches ignore the issue of catastrophic forgetting caused by continuous modification of model parameters in multi-step critiquing. Thus, we present a general Representative Items Sampling Framework for Critiquing on Knowledge Graph Recommendation (RISC) implemented as a plug-in, which offers a new paradigm for critiquing in mainstream recommendation scenarios. RISC leverages the knowledge graph to sample important representative items as a hinge to expand and convey information from user critiquing, indirectly estimating the relevance of the user to the critiquing keyphrase. Consequently, the necessity for specialized user-keyphrase correlation modules is eliminated with respect to a variety of knowledge graph recommendation models. Moreover, we propose a Weight Experience Replay (WER) approach based on KG to mitigate catastrophic forgetting by reinforcing the user's prior preferences during the inference phase. Our extensive experimental findings on three real-world datasets and three knowledge graph recommendation methods illustrate that RISC with WER can be effectively integrated into knowledge graph recommendation models to efficiently utilize user critiquing for refining recommendations and mitigate catastrophic forgetting.
Deep recommender systems rely heavily on large embedding tables to handle high-cardinality categorical features such as user/item identifiers, and face significant memory constraints at scale. To tackle this challenge, hashing techniques are often employed to map multiple entities to the same embedding and thus reduce the size of the embedding tables. Concurrently, graph-based collaborative signals have emerged as powerful tools in recommender systems, yet their potential for optimizing embedding table reduction remains unexplored. This paper introduces GraphHash, the first graph-based approach that leverages modularity-based bipartite graph clustering on user-item interaction graphs to reduce embedding table sizes. We demonstrate that the modularity objective has a theoretical connection to message-passing, which provides a foundation for our method. By employing fast clustering algorithms, GraphHash serves as a computationally efficient proxy for message-passing during preprocessing and a plug-and-play graph-based alternative to traditional ID hashing. Extensive experiments show that GraphHash substantially outperforms diverse hashing baselines on both retrieval and click-through-rate prediction tasks. In particular, GraphHash achieves on average a 101.52% improvement in recall when reducing the embedding table size by more than 75%, highlighting the value of graph-based collaborative information for model reduction. Our code is available at https://github.com/snap-research/GraphHash.
Recent advances in recommender systems have shown that user-system interaction essentially formulates long-term optimization problems, and online reinforcement learning can be adopted to improve recommendation performance. The general solution framework incorporates a value function that estimates the user's expected cumulative rewards in the future and guides the training of the recommendation policy. To avoid local maxima, the policy may explore potential high-quality actions during inference to increase the chance of finding better future rewards. To accommodate the stepwise recommendation process, one widely adopted approach to learning the value function is learning from the difference between the values of two consecutive states of a user. However, we argue that this paradigm involves a challenge of Mixing Random Factors: there exist two random factors from the stochastic policy and the uncertain user environment, but they are not separately modeled in the standard temporal difference (TD) learning, which may result in a suboptimal estimation of the long-term rewards and less effective action exploration. As a solution, we show that these two factors can be separately approximated by decomposing the original temporal difference loss. The disentangled learning framework can achieve a more accurate estimation with faster learning and improved robustness against action exploration. As an empirical verification of our proposed method, we conduct offline experiments with simulated online environments built on the basis of public datasets.
Off-policy evaluation (OPE) is a crucial problem in reinforcement learning (RL), where the goal is to estimate the long-term cumulative reward of a target policy using historical data generated by a potentially different behaviour policy. In many real-world applications, such as precision medicine and recommendation systems, unobserved confounders may influence the action, reward, and state transition dynamics, which leads to biased estimates if not properly addressed. While existing methods for handling unobserved confounders in OPE focus on single-action settings, they are less effective in multi-action scenarios commonly found in practical applications, where an agent can take multiple actions simultaneously. In this paper, we propose a novel auxiliary variable-aided method for OPE in multi-action settings with unobserved confounders. Our approach overcomes the limitations of traditional auxiliary variable methods for multi-action scenarios by requiring only a single auxiliary variable, relaxing the need for as many auxiliary variables as the actions. Through theoretical analysis, we prove that our method provides an unbiased estimation of the target policy value. Empirical evaluations demonstrate that our estimator achieves better performance compared to existing baseline methods, highlighting its effectiveness and reliability in addressing unobserved confounders in multi-action OPE settings.
Collaborative Filtering (CF) methods dominate real-world recommender systems given their ability to learn high-quality, sparse ID-embedding tables that effectively capture user preferences. These tables scale linearly with the number of users and items, and are trained to ensure high similarity between embeddings of interacted user-item pairs, while maintaining low similarity for non-interacted pairs. Despite their high performance, encouraging dispersion for non-interacted pairs necessitates expensive regularization (e.g., negative sampling), hurting runtime and scalability. Existing research tends to address these challenges by simplifying the learning process, either by reducing model complexity or sampling data, trading performance for runtime. In this work, we move beyond model-level modifications and study the properties of the embedding tables under different learning strategies. Through theoretical analysis, we find that the singular values of the embedding tables are intrinsically linked to different CF loss functions. These findings are empirically validated on real-world datasets, demonstrating the practical benefits of higher stable rank -- a continuous version of matrix rank which encodes the distribution of singular values. Based on these insights, we propose an efficient warm-start strategy that regularizes the stable rank of the user and item embeddings. We show that stable rank regularization during early training phases can promote higher-quality embeddings, resulting in training speed improvements of up to 65.9%. Additionally, stable rank regularization can act as a proxy for negative sampling, allowing for performance gains of up to 21.2% over loss functions with small negative sampling ratios. Overall, our analysis unifies current CF methods under a new perspective -- their optimization of stable rank -- motivating a flexible regularization method that is easy to implement, yet effective at enhancing CF systems.
Communities in networks are groups of nodes that are more densely connected to each other than to the rest of the network, forming clusters with strong internal relationships. When nodes have sensitive attributes, such as demographic groups in social networks, a key question is whether nodes in each group are equally well-connected within each community. We model connectivity fairness using group modularity, an adaptation of modularity that accounts for group structures. We introduce two versions of group modularity, each grounded on a different null model, and propose fairness-aware community detection algorithms. Finally, we provide experimental results on real and synthetic networks, evaluating both the connectivity fairness of community structures in networks and the performance of our fairness-aware algorithms.
Online communities play a critical role in shaping societal discourse and influencing collective behavior in the real world. The tendency for people to connect with others who share similar characteristics and views, known as homophily, plays a key role in the formation of echo chambers which further amplify polarization and division. Existing works examining homophily in online communities traditionally infer it using content- or adjacency-based approaches, such as constructing explicit interaction networks or performing topic analysis. These methods fall short for platforms where interaction networks cannot be easily constructed and fail to capture the complex nature of user interactions across the platform. This work introduces a novel approach for quantifying user homophily. We first use an Inverse Reinforcement Learning (IRL) framework to infer users' policies, then use these policies as a measure of behavioral homophily. We apply our method to Reddit, conducting a case study across 5.9 million interactions over six years, demonstrating how this approach uncovers distinct behavioral patterns and user roles that vary across different communities. We further validate our behavioral homophily measure against traditional content-based homophily, offering a powerful method for analyzing social media dynamics and their broader societal implications. We find, among others, that users can behave very similarly (high behavioral homophily) when discussing entirely different topics like soccer vs e-sports (low topical homophily), and that there is an entire class of users on Reddit whose purpose seems to be to disagree with others.
The 21st century has already witnessed so many outbreaks with pandemic potential, including SARS (2002), H1N1 (2009), MERS (2012), Ebola (2014), Zika virus (2015), and the COVID-19 pandemic (2019). Using 60 million geotagged Sina Weibo tweets covering over 20 million active accounts, we investigate the collective emotional dynamics on social media in the most recent global pandemic, i.e., COVID-19. This research features two highlights: (1) It focuses on the Chinese population located in the initial epicenter of the pandemic. (2) It examines the initial year after the pandemic outbreak, a critical period where emotions were most intense due to the uncertainty and rapid developments related to the crisis. Using cross-disciplinary methods, we reveal a positive connection between online emotional resonance and geographic proximity, demonstrating a direct mapping between virtual network distances and physical spatial embedding. We propose a percolation-based index to measure the nationwide emotional resonance level with which we illustrate the significant economic impact of the global health issue. Finally, we identify a leader-follower pattern in emotional resonance fluctuations based on time-lag emotion correlations, revealing that less active regions play a crucial role in leading and responding to emotional changes. In the face of long COVID and emerging global health crises, our analysis elucidates how collective emotional resonance evolves, providing potential directions for online opinion interventions during global shocks.
Bayer-patterned color filter array (CFA) has been the go-to solution for color image sensors. In augmented reality (AR), although color interpolation (i.e., demosaicing) of pre-demosaic RAW images facilitates a user-friendly rendering, it creates no benefits in offloaded DNN analytics but increases the image channels by 3x inducing higher transmission overheads. The potential optimization in frame preprocessing of DNN offloading is yet to be investigated.
To that end, we propose ABO, an adaptive RAW frame offloading framework that parallelizes demosaicing with DNN computation. Its contributions are three-fold: First, we design a configurable tile-wise RAW image neural codec to compress frame sizes while sustaining downstream DNN accuracy under bandwidth constraints. Second, based on content-aware tiles-in-frame selection and runtime bandwidth estimation, a dynamic transmission controller adaptively calibrates codec configurations to maximize the DNN accuracy. Third, we further optimize the system pipelining to achieve lower end-to-end frame processing latency and higher throughput. Through extensive evaluations on a prototype platform, ABO consistently achieves 40% more frame processing throughput and 30% less end-to-end latency while improving the DNN accuracy by up to 15% than SOTA baselines. It also exhibits improved robustness against dim lighting and motion blur situations.
Sharding blockchain networks face significant scalability challenges due to high frequencies of cross-shard transactions and uneven workload distributions among shards. To address these scalability issues, account migration offers a promising solution. However, existing migration solutions struggle with the high computational overhead and insufficient capture of complex transaction patterns. We propose AERO, a deep reinforcement learning framework to facilitate efficient account migration in sharding blockchains. AERO employs a prefix-based grouping strategy to enable group-level migration decisions and capture complex transaction patterns and relationships between accounts. We also implement a sharding blockchain system called AEROChain, which integrates AERO and aligns with the blockchain decentralization principle. Extensive evaluation with real Ethereum transaction data demonstrates that AERO improves the system throughput by 31.77% compared to existing solutions, effectively reducing cross-shard transactions and balancing shard workloads.
Blockchain interoperability protocols enable cross-chain asset transfers or data retrievals between isolated chains, which are considered as one of the core infrastructure for Web 3.0. However, existing protocols either face severe scalability issues due to high on-chain and off-chain cost, or suffer from trust concerns because of centralized architecture.
In this paper, we propose MAP, a trustless blockchain interoperability protocol that relays cross-chain transactions across heterogeneous chains with high scalability. First, within MAP, we develop a novel cross-chain relay architecture, which integrates a unified relay chain and on-chain light clients of source chains, allowing the trustworthy retrieval and verification of heterogeneous cross-chain transactions. Furthermore, we reduce cross-chain verification cost by incorporating an optimized on-chain light client scheme that adaptively decouples signature verification overheads from inefficient smart contract execution and offloads them to off-chain provers. For experiments, we conduct the first large-scale evaluation on existing blockchain interoperability protocols. With MAP, the required number of on-chain light clients is reduced from O(N2) to O(N), with decreasing on-chain cost 35% and 25% off-chain cost when verifying cross-chain transactions.
To demonstrate the effectiveness, we deployed MAP in the real world. By 2025, we have supported over six popular public chains, 50 cross-chain applications and relayed over 200K cross-chain transactions worth over 640 million USD. Based on the deployment records, we construct the first real-world cross-chain dataset to further advance blockchain interoperability research.
Content-based Recommender Systems (CRSs) play a crucial role in shaping user experiences in e-commerce, online advertising, and personalized recommendations. However, due to the vast amount of categorical features, the embedding tables used in CRS models pose a significant storage bottleneck for real-world deployment, especially on resource-constrained devices. To address this problem, various embedding pruning methods have been proposed, but most existing ones require expensive retraining steps for each target parameter budget, leading to enormous computation costs. In reality, this computation cost is a major hurdle in real-world applications with diverse storage requirements, such as federated learning and streaming settings. In this paper, we propose Shapley Value-guided Embedding Reduction (Shaver) as our response. With Shaver, we view the problem from a cooperative game perspective, and quantify each embedding parameter's contribution with Shapley values to facilitate contribution-based parameter pruning. To address the inherently high computation costs of Shapley values, we propose an efficient and unbiased method to estimate Shapley values of a CRS's embedding parameters. Moreover, in the pruning stage, we put forward a field-aware codebook to mitigate the information loss in the traditional zero-out treatment. Through extensive experiments on three real-world datasets, Shaver has demonstrated competitive performance with lightweight recommendation models across various parameter budgets. The source code is available at https://github.com/chenxing1999/shaver.
Efficient processing of large-scale graphs with billions to trillions of edges is essential for training graph-based large language models (LLMs) in web-scale systems. The increasing complexity and size of these models create significant communication challenges due to the extensive message exchanges required across distributed nodes. Current graph engines struggle to effectively scale across hundreds of computing nodes because they often overlook variations in communication costs within the interconnection hierarchy. This paper presents GraphCom, a communication-efficient message graph engine for graph processing on supercomputers. Our key idea is to leverage the network topology information to perform communication hierarchy-aware message aggregation, where messages are (i) gathered to the responsible nodes (referred to as monitors) in the source domains, (ii) transferred between monitors, and (iii) scattered to the target nodes in the target domains. GraphCom's aggregation is more aggressive in that each source domain (instead of the source node). We have implemented GraphCom on top of MPI. We demonstrate GraphCom's effectiveness with synthetic benchmarks and real-world graphs, utilizing up to 79,024 nodes and over 1.2 million processor cores, demonstrating that GraphCom surpasses leading graph- parallel systems and state-of-the-art counterparts in both throughput and scalability. Moreover, we have deployed GraphCom on a production supercomputer, where it consistently outperforms the top solutions on the Graph500 list. These results highlight the potential GraphCom has to significantly improve the efficiency of distributed large-scale graph-based LLM training by optimizing communication between distributed systems, making it an invaluable graph engine for distributed training tasks on web-scale graphs.
Mobile and Web-of-Things (WoT) devices at the network edge generate vast amounts of data for machine learning applications, yet privacy concerns hinder centralized model training. Federated Learning (FL) allows clients (devices) to collaboratively train a shared model coordinated by a central server without transferring private data. However, inherent statistical heterogeneity among clients presents challenges, often leading to a dilemma between clients' need for personalized local models and the server's goal of building a generalized global model. Existing FL methods typically prioritize either global generalization or local personalization, resulting in a trade-off between these objectives and limiting the full potential of diverse client data. To address this challenge, we propose a novel framework that enhances both global generalization and local personalization by Rethinking Information Representation in the Federated learning process (FedRIR). Specifically, we introduce Masked Client-Specific Learning (MCSL), which isolates and extracts fine-grained client-specific features tailored to each client's unique data characteristics, thereby enhancing personalization. Meanwhile, the Information Distillation Module (IDM) refines global shared features by filtering out redundant client-specific information, resulting in a purer and more robust global representation that enhances generalization. By integrating refined global features with isolated client-specific features, we construct enriched representations that effectively capture both global patterns and local nuances, thereby improving the performance of downstream tasks on the client. Extensive experiments on diverse datasets demonstrate that FedRIR significantly outperforms state-of-the-art FL methods, achieving up to a 3.93% improvement in accuracy while ensuring robustness and stability in heterogeneous environments. The code is publicly available at https://github.com/Deep-Imaging-Group/FedRIR.
Pre-trained models (PTMs) are widely adopted across various downstream tasks in the machine learning supply chain. Adopting untrustworthy PTMs introduces significant security risks, where adversaries can poison the model supply chain by embedding hidden malicious behaviors (backdoors) into PTMs. However, existing backdoor attacks to PTMs can only achieve partially task-agnostic and the embedded backdoors are easily erased during the fine-tuning process. This makes it challenging for the backdoors to persist and propagate through the supply chain. In this paper, we propose a novel and severer backdoor attack, TransTroj, which enables the backdoors embedded in PTMs to efficiently transfer in the model supply chain. In particular, we first formalize this attack as an indistinguishability problem between poisoned and clean samples in the embedding space. We decompose embedding indistinguishability into pre- and post-indistinguishability, representing the similarity of the poisoned and reference embeddings before and after the attack. Then, we propose a two-stage optimization that separately optimizes triggers and victim PTMs to achieve embedding indistinguishability. We evaluate TransTroj on four PTMs and six downstream tasks. Experimental results show that our method significantly outperforms SOTA task-agnostic backdoor attacks -- achieving nearly 100% attack success rate on most downstream tasks -- and demonstrates robustness under various system settings. Our findings underscore the urgent need to secure the model supply chain against such transferable backdoor attacks. The code is available at https://github.com/haowang-cqu/TransTroj
The data stream generated by users on web applications is often collected using a local differential privacy (LDP) approach to ensure privacy. This approach offers rigorous theoretical guarantees and low computational overhead, albeit at the expense of data utility. Data utility encompasses both the value of individual data points and the temporal relevance that exists between them, but existing studies primarily focus on enhancing the former utility while neglecting the latter. Furthermore, the collected data often requires cleaning, and we have demonstrated through a case study that data stream lacking time relevance poses a significant risk to users' privacy during the cleaning process. In this paper, for the first time we present an online LDP publishing mechanism while preserving the inherent temporal relevance for the infinite stream, called the Sampling Period Perturbation Algorithm (SPPA). Specifically, we model the temporal relevance between data points as the Fourier interpolation function, resulting in a computational complexity reduction from O(n2) to O(n log n) when compared with the conventional Markov approach in the offline setting. To strike a better balance between privacy and utility, we add noise to the sampling period due to its minimal impact on sensitivity, which is analyzed by our novel concepts of (ε,τ)-temporal indistinguishability and (ε,w,τ)-event LDP. Through extensive experiments, SPPA exhibits superior performance in terms of both data utility and privacy preservation compared to the state-of-the-art baselines. In particular, when ε=1, compared with the state-of-the-art baseline, SPPA diminishes the MSE by up to 64.2%, and raises the event monitoring efficiency by up to 21.4%.
Dynamic graph neural networks (DGNNs) have emerged and been widely deployed in various web applications (e.g., Reddit) to serve users (e.g., personalized content delivery) due to their remarkable ability to learn from complex and dynamic user interaction data. Despite benefiting from high-quality services, users have raised privacy concerns, such as misuse of personal data (e.g., dynamic user-user/item interaction) for model training, requiring DGNNs to "forget" their data to meet AI governance laws (e.g., the "right to be forgotten" in GDPR). However, current static graph unlearning studies cannot unlearn dynamic graph elements and exhibit limitations such as the model-specific design or reliance on pre-processing, which disenable their practicability in dynamic graph unlearning. To this end, we study the dynamic graph unlearning for the first time and propose an effective, efficient, general, and post-processing method to implement DGNN unlearning. Specifically, we first formulate dynamic graph unlearning in the context of continuous-time dynamic graphs, and then propose a method called Gradient Transformation that directly maps the unlearning request to the desired parameter update. Comprehensive evaluations on six real-world datasets and state-of-the-art DGNN backbones demonstrate its effectiveness (e.g., limited drop or obvious improvement in utility) and efficiency (e.g., 7.23× speed-up) advantages. Additionally, our method has the potential to handle future unlearning requests with significant performance gains (e.g., 32.59× speed-up).
Phishing scams on Ethereum have expanded with the surge of the platform, posing substantial challenges due to the sheer similarity in user behaviours and sparse temporal instances. Current methods often fail to tackle these concerns and overlook the temporal sequence of transactions, resulting in suboptimal performance. In this paper, we aim to address these gaps by focusing on the alignment of two aspects: (1) User-specific local temporal behavior, and (2) Divergences from global activity patterns of the network. Hence, we introduce CATALOG (CApturing joint TemporAl dependencies from LOcal and Global user behaviour), a novel representation learning model that jointly captures the local and global user behviours and their correlations by leveraging a dual cross-attention mechanism paired with a bi-directional Masked Language Modelling (MLM) transformer. Our proposed model simultaneously learns from local behavioral shifts, global market trends, and contextually enriched embeddings, effectively distinguishing phishing from non-phishing users while addressing existing research gaps. Extensive experiments on real-world Ethereum transaction data show that our framework improves phishing detection by 7-8% in the F1-Score along with demonstrating the generalization to Ethereum versions 1.0 and 2.0.
We explored the ubiquitous phenomenon of serial scammers, each of whom deployed dozens to thousands of addresses to conduct a series of similar Rug Pulls on popular decentralized exchanges. We first constructed two datasets of around 384,000 scammer addresses behind all one-day Simple Rug Pulls on Uniswap (Ethereum) and Pancakeswap (BSC), and identified distinctive scam patterns including star, chain, and major (scam-funding) flow. These patterns, which collectively cover about 40% of all scammer addresses in our datasets, reveal typical ways scammers run multiple Rug Pulls and organize the money flow among different addresses. We then studied the more general concept of scam cluster, which comprises scammer addresses linked together via direct ETH/BNB transfers or behind the same scam pools. We found that scam token contracts are highly similar within each cluster (average similarities >70%) and dissimilar across different clusters (average similarities <30%), corroborating our view that each cluster belongs to the same scammer/scam organization. Lastly, we analyze the scam profit of individual scam pools and clusters, employing a novel cluster-aware profit formula that takes into account the important role of wash traders. The analysis shows that the existing formula inflates the profit by at least 35% on Uniswap and 24% on Pancakeswap.
As the complexity and frequency of cyberattacks, such as Advanced Persistent Threats (APTs) and ransomware, continue to escalate, traditional anomaly detection methods have proven inadequate in addressing these sophisticated, multi-faceted threats. Recently, Host Provenance Graphs (HPGs) have played a crucial role in analyzing system-level interactions, detecting anomalous behaviors, and tracing attack chains. However, existing provenance-based detection methods primarily rely on single-dimensional feature analysis, which fails to capture the dynamic and multi-dimensional patterns of modern APT attacks, resulting in insufficient detection performance. To overcome this limitation, we introduce STGAN, a model that integrates spatial-temporal graphs into host provenance graph modeling. STGAN applies temporal and spatial encoding to dynamic provenance graphs to extract temporal, spatial, and semantic features, constructing a comprehensive feature representation. This representation is further fused and enhanced using a multi-head self-attention mechanism, followed by anomaly detection. Through extensive evaluations on three widely-used provenance graph datasets, we demonstrate that our approach consistently outperforms current state-of-the-art techniques in terms of detection performance. Additionally, we contribute to the research community by releasing our datasets and code, facilitating further exploration and validation.
In this paper, we systematically evaluate the effectiveness of existing tools for the dynamic security analysis of client-side JavaScript, focusing in particular on information flow control. Each tool is evaluated in terms of: (i) compatibility, i.e., the ability to process and analyze existing scripts without breaking; (ii) transparency, i.e., the ability to preserve the original script semantics when security enforcement is not necessary; (iii) coverage, i.e., the effectiveness in terms of number of detected information flows; (iv) performance, i.e., the computational overhead introduced by the analysis. Our investigation shows that most of the existing analysis tools are incompatible with the modern Web and the compatibility issues affecting them are not easily fixed. Moreover, transparency issues abound and make us question analysis correctness. This is also confirmed by our coverage evaluation, showing that some tools are unable to detect any information flow on real-world websites, while the remaining tools report significantly different outputs. Finally, we observe that the computational overhead of analysis tools may be significant and can exceed 30x. In the end, out of all the evaluated tools, just one of them (Project Foxhound) is effective enough for practical adoption at scale.
Graph Contrastive Learning (GCL) is a widely adopted approach in self-supervised graph representation learning, applying contrastive objectives to produce effective representations. However, current GCL methods primarily focus on capturing implicit semantic relationships, often overlooking the structural commonsense embedded within the graph's structure and attributes, which contains underlying knowledge crucial for effective representation learning. Due to the lack of explicit information and clear guidance in general graph, identifying and integrating such structural commonsense in GCL poses a significant challenge. To address this gap, we propose a novel framework called Structural Commonsense Unveiling in Graph Contrastive Learning (Str-GCL). Str-GCL leverages first-order logic rules to represent structural commonsense and explicitly integrates them into the GCL framework. It introduces topological and attribute-based rules without altering the original graph and employs a representation alignment mechanism to guide the encoder in effectively capturing this commonsense. To the best of our knowledge, this is the first attempt to directly incorporate structural commonsense into GCL. Extensive experiments demonstrate that Str-GCL outperforms existing GCL methods, providing a new perspective on leveraging structural commonsense in graph representation learning.
Heterophilic Graph Neural Networks (HGNNs) have shown promising results for semi-supervised learning tasks on graphs. Notably, most real-world heterophilic graphs are composed of a mixture of nodes with different neighbor patterns, exhibiting local node-level homophilic and heterophilic structures. However, existing works are only devoted to designing better unified HGNN backbones for node classification tasks on heterophilic and homophilic graphs simultaneously, and their analyses of HGNN performance concerning nodes are only based on the determined data distribution without exploring the effect caused by the difference of structural pattern between training and testing nodes. How to learn invariant node representations on heterophilic graphs to handle this structure difference or distribution shifts remains unexplored. In this paper, we first discuss the limitations of previous graph-based invariant learning methods in addressing the heterophilic graph structure distribution shifts from the perspective of data augmentation. Then, we propose HEI, a framework capable of generating invariant node representations through incorporating Heterophily information, the node's estimated neighbor pattern, to infer latent Environments without augmentation, which are then used for Invariant prediction. We provide detailed theoretical guarantees to clarify the reasonability of HEI. Extensive experiments on various benchmarks and backbones can also demonstrate the effectiveness and robustness of our method compared with existing state-of-the-art baselines.
The smoothing issue in graph learning leads to indistinguishable node representations, posing significant challenges for graph-related tasks. However, our experiments reveal that this problem can uncover underlying properties of node anomaly detection (NAD) that previous research has missed. We introduce Individual Smoothing Patterns (ISP) and Neighborhood Smoothing Patterns (NSP), which indicate that the representations of anomalous nodes are harder to smooth than those of normal ones. In addition, we explore the theoretical implications of these patterns, demonstrating the potential benefits of ISP and NSP for NAD tasks. Motivated by these findings, we propose SmoothGNN, a novel unsupervised NAD framework. First, we design a learning component to explicitly capture ISP for detecting node anomalies. Second, we design a spectral graph neural network to implicitly learn ISP to enhance detection. Third, we design an effective coefficient based on our findings that NSP can serve as coefficients for node representations, aiding in the identification of anomalous nodes. Furthermore, we devise a novel anomaly measure to calculate loss functions and anomalous scores for nodes, reflecting the properties of NAD using ISP and NSP. Extensive experiments on 9 real datasets show that SmoothGNN outperforms the best rival by an average of 14.66% in AUC and 7.28% in Average Precision, with 75x running time speedup, validating the effectiveness and efficiency of our framework. Our code is available at https://github.com/xydong127/SmoothGNN.
Hypergraph neural networks (HyperGNNs) show promise in modeling online networks with high-order correlations. Despite notable progress, training these models on large-scale raw hypergraphs entails substantial computational and storage costs, thereby increasing the need of hypergraph size reduction. However, existing size reduction methods primarily capture pairwise association pattern within conventional graphs, making them challenging to adapt to hypergraphs with high-order correlations. To fill this gap, we introduce a novel hypergraph condensation framework, HG-Cond, designed to distill large-scale hypergraphs into compact, synthetic versions while maintaining comparable HyperGNN performance. Within this framework, we develop a Neural Hyperedge Linker to capture the high-order connectivity pattern through variational inference, achieving linear complexity with respect to the number of nodes. Moreover, We propose a multi-aspectual amelioration strategy including a Gradient-Parameter Synergistic Matching objective to holistically refine synthetic hypergraphs by coordinating improvements in node attributes, high-order connectivity, and label distributions. Extensive experiments demonstrate the efficacy of HG-Cond in hypergraph condensation, notably outperforming the original test accuracy on the 20News dataset while concurrently reducing the hypergraph size to a mere 5% of its initial scale. Furthermore, the condensed hypergraphs demonstrate robust cross-architectural generalizability and potential for expediting neural architecture search.
Dynamic graph neural networks (DGNNs) are designed to capture the dynamic evolution of graph node interactions. However, existing DGNNs mainly consider homogeneous graphs, neglecting the rich heterogeneity in node and edge types, which is prevalent for real-world graphs and essential for modeling complex dynamic interactions. In this work, we propose the TrajEctory and Semantic-Aware dynamic heterogeneous graph neural network (TeSa), which integrates trajectory-based evolution and semantic-aware aggregation to capture both the evolving dynamics and heterogeneous semantics entailed in continuous-time dynamic heterogeneous graphs. In particular, trajectory-based evolution treats the interactions received by each node (called node trajectory) as a sequence and employs a temporal point process to learn the dynamic evolution in these interactions. Semantic-aware aggregation separates edges of different types when aggregating messages for each node from its neighbors. Edges of the same type are processed at first (i.e., intra-semantic aggregation), and then edges of different types are handled (i.e., inter-semantic fusion), to offer a comprehensive view of the heterogeneous semantics. We compare TeSa with 7 state-of-the-art DGNN models, and the results show that TeSa improves the best-performing baseline by an average of 5.11% and 5.74% in accuracy for transductive and inductive tasks.
In the digital age, resources such as open-source software and publicly accessible databases form a crucial category of digital public goods, providing extensive benefits for Internet. However, these public goods' inherent non-exclusivity and non-competitiveness frequently result in under-provision, a dilemma exacerbated by individuals' tendency to free-ride. This scenario fosters both cooperation and competition among users, leading to the public goods games. This paper investigates networked public goods games involving heterogeneous players and convex costs, focusing on the characterization of Nash Equilibrium (NE). In these games, each player can choose her effort level, representing her contributions to public goods. Network structures are employed to model the interactions among participants. Each player's utility consists of a concave value component, influenced by the collective efforts of all players, and a convex cost component, determined solely by the individual's own effort. To the best of our knowledge, this study is the first to explore the networked public goods game with convex costs. Our research begins by examining welfare solutions aimed at maximizing social welfare and ensuring the convergence of pseudo-gradient ascent dynamics. We establish the presence of NE in this model and provide an in-depth analysis of the conditions under which NE is unique. We also delve into comparative statics, an essential tool in economics, to evaluate how slight modifications in the model--interpreted as monetary redistribution--affect player utilities. In addition, we analyze a particular scenario with a predefined game structure, illustrating the practical relevance of our theoretical insights. Overall, our research enhances the broader understanding of strategic interactions and structural dynamics in networked public goods games, with significant implications for policy design in internet economic and social networks.
The tension between persuasion and privacy preservation is common in real-world settings. Online platforms should protect the privacy of web users whose data they collect, even as they seek to disclose information about these data (e.g., to advertisers). Similarly, hospitals may share patient data to attract research investments with the obligation to preserve patients' privacy. To address these issues, we study Bayesian persuasion under differential privacy constraints, where the sender must design an optimal signaling scheme for persuasion while guaranteeing the privacy of each agent's private information in the database. To understand how privacy constraints affect information disclosure, we explore two perspectives within Bayesian persuasion: one views the mechanism as releasing a posterior about the private data, while the other views it as sending an action recommendation.
The posterior-based formulation leads to privacy-utility tradeoffs, quantifying how the tightness of privacy constraints impacts the sender's optimal utility. For any instance in a common utility function family and a wide range of privacy levels, a significant constant gap in the sender's optimal utility can be found between any two of the three conditions: ε-differential privacy constraint, relaxation (ε,δ)-differential privacy constraint, and no privacy constraint. We further geometrically characterize optimal signaling schemes under popular privacy constraints (ε-differential privacy, (ε,δ)-differential privacy and Rényi differential privacy), which turns out to be equivalent to finding concave hulls in constrained posterior regions. Finally, we develop polynomial-time algorithms for computing optimal differentially private signaling schemes.
While federated learning enables intelligent services and personalized user experiences, it raises privacy concerns due to regulatory requirements and user demands for data protection. Federated unlearning offers a potential solution to these issues. However, despite increasing demand for its practical implementation driven by right-to-be-forgotten regulations, the economic implications of federated unlearning on user behavior and platform profitability remain underexplored, potentially hindering its adoption. In this paper, we formulate a set of contract design problems for both unlearning-disabled and unlearning-enabled scenarios. Challenges arise when the unlearning-enabled platform jointly designs compensation for both learning and unlearning to incentivize users' sequential decisions to balance the expected revenue and unlearning cost. We first conduct a questionnaire survey that reveals that federated unlearning increases users' willingness to participate in federated learning. We then provide a necessary condition for maximizing the surplus of an unlearning-enabled platform, enabling the point-wise decomposition for the optimal contract design problem, based on which we minimize the incentive cost and maximize the surplus for the platform. Our further analysis reveals that i) the incentive effects of unlearning grow quadratically with users' privacy sensitivity, and ii) enabling unlearning may even profit more than disabling it when the training cost increases at a faster rate than the probability of privacy leakage as effort levels rise. Our numerical results show that the platform's profitability is primarily influenced by users' privacy sensitivity. When users have a relatively high privacy sensitivity, enabling unlearning can significantly improve profitability.
Online platforms and regulators face a continuing problem of designing effective evaluation metrics. While tools for collecting and processing data continue to progress, this has not addressed the problem of unknown unknowns, or fundamental informational limitations on part of the evaluator. To guide the choice of metrics in the face of this informational problem, we turn to the evaluated agents themselves, who may have more information about how to measure their own outcomes. We model this interaction as an agency game, where we ask: When does an agent have an incentive to reveal the observability of a metric to their evaluator? We show that an agent will prefer to reveal metrics that differentiate the most difficult tasks from the rest, and conceal metrics that differentiate the easiest. We further show that the agent can prefer to reveal a metric *garbled* with noise over both fully concealing and fully revealing. This indicates an economic value to privacy that yields Pareto improvement for both the agent and evaluator. We demonstrate these findings on data from online rideshare platforms.
The remarkable ability of diffusion models to generate high-fidelity images has led to their widespread adoption. However, concerns have also arisen regarding their potential to produce Not Safe for Work (NSFW) content and exhibit social biases, hindering their practical use in real-world applications. In response to this challenge, prior work has focused on employing security filters to identify and exclude toxic text, or alternatively, fine-tuning pre-trained diffusion models to erase sensitive concepts. Unfortunately, existing methods struggle to achieve satisfactory performance in the sense that they can have a significant impact on the normal model output while still failing to prevent the generation of harmful content in some cases. In this paper, we propose a novel self-discovery approach to identifying a semantic direction vector in the embedding space to restrict text embedding within a safe region. Our method circumvents the need for correcting individual words within the input text and steers the entire text prompt towards a safe region in the embedding space, thereby enhancing model robustness against all possibly unsafe prompts. In addition, we employ Low-Rank Adaptation (LoRA) for semantic direction vector initialization to reduce the impact on the model performance for other semantics. Furthermore, our method can also be integrated with existing methods to improve their social responsibility. Extensive experiments on benchmark datasets demonstrate that our method can effectively reduce NSFW content and mitigate social bias generated by diffusion models compared to several state-of-the-art baselines. WARNING:This paper contains model-generated images that may be potentially offensive.
Websites commonly display cookie banners to inform users about the use and purposes of cookies. However, they may still, whether intentionally or unintentionally (e.g., due to third-party libraries imported), mis-declare cookies that may be abused for tracking. In this work, we introduce COOVER (<u>coo</u>kie <u>v</u>alue examin<u>er</u>) to assess the non-compliance between the website-declared purpose and the semantic-intended purpose of cookies (denoted as potential cookie purpose violation ). We advocate that the value of the cookie is a more reliable indicator of its semantic-intended purpose compared to other features such as expiration time. COOVER decomposes the cookie value into primitive segments representing minimal semantic units, and fine-tunes a GPT-3.5 model to automatically interpret their value-inferred semantics. Based on the interpretation, it classifies cookies into four GDPR-defined purposes. COOVER achieves an F1 score of 95%, significantly outperforming other methods. We employ COOVER to analyze Alexa Top 1k websites to understand the status quo of potential cookie purpose violation on the web. Remarkably, out of 15,339 cookies across these websites, only 3.1% quality as truly necessary cookies, while 44.1% of websites suffer from issues of potential purpose violation.
With the development of social media, people are exposed to a vast amount of unverified information, making fact-checking particularly important. Existing fact-checking methods primarily encourage breaking down claims into more easily solvable sub-tasks, and deriving final answers through reasoning with external evidence. However, these models face logical issues regarding whether and how the sub-tasks can logically be combined to form the original claims, and encounter causal errors in the reasoning process due to insufficient evidence or hallucinations from LLMs. In addition, they often suffer from a lack of interpretability. In this paper, we propose Logical and Causal fact-checking (LoCal), a novel fact-checking framework based on multiple LLM-based agents. The usage of multi-agent systems is due to their increasingly demonstrated ability to perform complex tasks in a manner similar to humans. LoCal primarily consists of a decomposing agent, multiple reasoning agents, and two evaluating agents. Specifically, the decomposing agent first utilizes the in-context learning ability of LLMs to break down complex claims into simpler sub-tasks, including fact verification tasks and question answering tasks. Afterwards, two types of reasoning agents are respectively utilized to retrieve external knowledge to address the fact verification tasks that require comparative analysis skills, and the question answering tasks that necessitate the ability of information extraction from evidence. We then combine the sub-tasks and their corresponding responses to generate a solution for evaluation. In order to enhance logical and causal consistency, two evaluating agents are respectively employed to examine whether the generated solution is logically equivalent to the original claim and determine whether the solution still holds when challenged by the counterfactual label. The evaluating agents provide confidence degrees for the solutions based on the evaluation results and iteratively correct the logical and causal errors in the reasoning process. We evaluate LoCal on two challenging datasets, and the results show that LoCal significantly outperforms all the baseline models across different settings of evidence availability. In addition, LoCal offers better interpretability by providing a structured solution along with detailed evaluating processes. We believe LoCal will provide valuable insights for future misinformation detection.
Large Language Models (LLMs) are increasingly employed in zero-shot documents ranking, yielding commendable results. However, several significant challenges still persist in LLMs for ranking: (1) LLMs are constrained by limited input length, precluding them from processing a large number of documents simultaneously; (2) The output document sequence is influenced by the input order of documents, resulting in inconsistent ranking outcomes; (3) Achieving a balance between cost and ranking performance is challenging. To tackle these issues, we introduce a novel documents ranking method called TourRank1. which is inspired by the sport tournaments, such as FIFA World Cup. Specifically, we 1) overcome the limitation in input length and reduce the ranking latency by incorporating a multi-stage grouping strategy similar to the parallel group stage of sport tournaments; 2) improve the ranking performance and robustness to input orders by using a points system to ensemble multiple ranking results. We test TourRank with different LLMs on the TREC DL datasets and the BEIR benchmark. The experimental results demonstrate that TourRank delivers state-of-the-art performance at a modest cost.
Search result diversification (SRD), which aims to ensure that documents in a ranking list cover a broad range of subtopics, is a significant and widely studied problem in Information Retrieval and Web Search. Existing methods primarily utilize a paradigm of ''greedy selection'', i.e., selecting one document with the highest diversity score at a time or optimize an approximation of the objective function. These approaches tend to be inefficient and are easily trapped in a suboptimal state. To address these challenges, we introduce Multi-Agent reinforcement learning (MARL) for search result DIVersity, which called MA4DIV. In this approach, each document is an agent and the search result diversification is modeled as a cooperative task among multiple agents. By modeling the SRD ranking problem as a cooperative MARL problem, this approach allows for directly optimizing the diversity metrics, such as α-NDCG, while achieving high training efficiency. We conducted experiments on public TREC datasets and a larger scale dataset in the industrial setting. The experiemnts show that MA4DIV achieves substantial improvements in both effectiveness and efficiency than existing baselines, especially on the industrial dataset.
The rapid evolution of the Web as a key platform for information dissemination has led to the growing integration of large language models (LLMs) in Web-based applications. However, the swift changes in web content present challenges in maintaining these models' relevance and accuracy. The task of Knowledge Editing (KE) is aimed at efficiently and precisely adjusting the behavior of large language models (LLMs) to update specific knowledge while minimizing any adverse effects on other knowledge. Current research predominantly concentrates on editing white-box LLMs, neglecting a significant scenario: editing black-box LLMs, where access is limited to interfaces and only textual output is provided. In this paper, we initially officially introduce KE on black-box LLMs, followed by presenting a thorough evaluation framework. This framework operates without requiring logits and considers pre- and post-edit consistency, addressing the limitations of current evaluations that are inadequate for black-box LLMs editing and lack comprehensiveness. To address privacy leaks of editing data and style over-editing in existing approaches, we propose a new postEdit framework. postEdit incorporates a retrieval mechanism for editing knowledge and a purpose-trained editing plugin called post-editor, ensuring privacy through downstream processing and maintaining textual style consistency via fine-grained editing. Experiments and analysis conducted on two benchmarks show that postEdit surpasses all baselines and exhibits robust generalization, notably enhancing style retention by an average of +20.82%. Our code is available on github https://github.com/songxiaoshuai/postEdit.
Stress haunts people in modern society, which may cause severe health issues if left unattended. With social media becoming an integral part of daily life, leveraging social media to detect stress has gained increasing attention. While the majority of the work focuses on classifying stress states and stress categories, this study introduce a new task aimed at estimating more specific stressors (like exam, writing paper, etc.) through users' posts on social media. Unfortunately, the diversity of stressors with many different classes but a few examples per class, combined with the consistent arising of new stressors over time, hinders the machine understanding of stressors. To this end, we cast the stressor estimation problem within a practical scenario few-shot learning setting, and propose a novel meta-learning based stressor estimation framework that is enhanced by a meta-knowledge inheritance mechanism. This model can not only learn generic stressor context through meta-learning, but also has a good generalization ability to estimate new stressors with little labeled data. A fundamental breakthrough in our approach lies in the inclusion of the meta-knowledge inheritance mechanism, which equips our model with the ability to prevent catastrophic forgetting when adapting to new stressors. The experimental results show that our model achieves state-of-the-art performance compared with the baselines. Additionally, we construct a social media-based stressor estimation dataset that can help train artificial intelligence models to facilitate human well-being.
With the development of multi-modal modeling techniques, recent sequential recommender systems enhance transferability by incorporating cross-domain universal multi-modal data, e.g., text and image. Existing methods typically adopt pairwise alignment to alleviate the gap between modalities. However, this alignment paradigm has limitations on explainability, consistency, and expansibility, resulting in suboptimal performance. This paper proposes a novel Explainable multi-modality Alignment method for transferable Rec ommender systems, i.e., EARec. Specifically, we design a two-stage framework to achieve explainable modality alignment in the source domain and recommendation based on aligned modality representations in the target domain. In the first stage, we adopt a generative task to align various modalities in parallel to a shared anchor with explainable meaning. All modalities share the same anchor to ensure consistent direction. Additionally, we treat behavior as an independent modality to integrate task-specific information into the alignment framework. In the second stage, we compose multiple item modality representation models trained in the first stage to obtain a unified model capable of understanding various modalities simultaneously, thereby providing high-quality item modality representations for recommendations in the target domain. Benefiting from the approach of parallel modality alignment followed by model composition, the framework shows flexibility in expanding new modalities. Experimental results on multiple public datasets demonstrate the superiority of EARec over baselines, and further analyses indicate the explainability and expansibility of the proposed alignment method.
Semi-supervised anomaly detection (AD) has garnered growing attention due to its ability to effectively leverage limited labeled data to identify anomalies. However, current methods often impose artificial constraints on the proportion of unlabeled anomalies in the training set, thereby impeding the effective training of models for anomaly detection in real-world scenarios where several anomalies may be present in the unlabeled dataset. Additionally, existing methods often struggle to effectively exploit and model the complex relationships between data instances, which is critical for learning more discriminative features and accurate distance measures. Distance-based methods, in particular, typically rely on Euclidean distance metric, which lacks the flexibility to capture complex correlations across different data dimensions. To address the above challenges, we propose CAD, a denoising-aware <u>C</u>ontrastive distance learning framework for semi-supervised <u>AD</u>. It introduces a contrastive training objective to facilitate the learning of distinctive representations by contrasting the average distance between anomalies and unlabeled samples. To fully exploit the information from the unlabeled data meanwhile mitigate the effects of noise, we incorporate a two-stage anomaly denoising and expansion strategy to refine the dataset by identifying high-confidence samples from the unlabeled set. Furthermore, we employ a parameterized bilinear tensor distance layer to learn a customized distance metric, enabling the model to capture intricate relationships among data points. Extensive experiments on 10 real-world datasets demonstrate that CAD significantly outperforms existing semi-supervised AD models. Code available at https://github.com/CADrepo/CAD.
Recently, research on Text-Attributed Graphs (TAGs) has gained significant attention due to the prevalence of free-text node features in real-world applications and the advancements in Large Language Models (LLMs) that bolster TAG methodologies. However, current TAG approaches face two primary challenges: (i) Heavy reliance on label information and (ii) Limited cross-domain zero/few-shot transferability. These issues constrain the scaling of both data and model size, owing to high labor costs and scaling laws, complicating the development of graph foundation models with strong transferability. In this work, we propose the GraphCLIP framework to address these challenges by learning <u>graph</u> foundation models with strong <u>c</u>ross-domain zero/few-shot transferabi<u>li</u>ty through a self-supervised contrastive graph-summary <u>p</u>retraining method. Specifically, we generate and curate large-scale graph-summary pair data with the assistance of LLMs, and introduce a novel graph-summary pretraining method, combined with invariant learning, to enhance graph foundation models with strong cross-domain zero-shot transferability. For few-shot learning, we propose a novel graph prompt tuning technique aligned with our pretraining objective to mitigate catastrophic forgetting and minimize learning costs. Extensive experiments show the superiority of GraphCLIP in both zero-shot and few-shot settings, while evaluations across various downstream tasks confirm the versatility of GraphCLIP. Our code is available at: https://github.com/ZhuYun97/GraphCLIP.
The advancement of intelligent transportation systems has led to a growing demand for accurate path representations, which are essential for tasks such as travel time estimation, path ranking, and trajectory analysis. However, traditional path representation learning (PRL) methods often focus solely on single-modal road network data, overlooking important physical and regional factors that influence real-world traffic dynamics. To overcome this limitation, we introduce Path-LLM, a multi-modal path representation learning model that integrates large language models (LLMs) into PRL. Our approach leverages LLMs to interpret both topological and textual data, enabling robust multi-modal path representations. To effectively align and merge these modalities, we propose TPalign, a contrastive learning-based pretraining strategy that ensures alignment within the embedding space. We then present TPfusion, a multimodal fusion module that dynamically adjusts the weight of each modality before integration. To further optimize LLM training, we introduce a Two-stage Overlapping Curriculum Learning (TOCL) approach, which progressively increases the complexity of the training data. Finally, we evaluate Path-LLM on three real-world datasets across traditional PRL downstream tasks, achieving up to a 61.84% improvement in path ranking performance on the Xi'an dataset. Additionally, Path-LLM demonstrates superior performance in both few-shot and zero-shot learning scenarios. Our code is available at: https://github.com/decisionintelligence/Path-LLM.
Network Intrusion Detection Systems (NIDS) are critical for web security by identifying and blocking malicious traffic. In-network NIDS leverage programmable switches for high-speed traffic processing. However, they are unable to reconcile the fine-grained classification of known classes and the identification of unseen attacks. Moreover, they lack support for incremental updates. In this paper, we propose Helios, an in-network malicious traffic detection system, for continual adaptation in attack-incremental scenarios. First, we design a novel Supervised Mixture Prototypical Learning (SMPL) method combined with clustering initialization to learn prototypes that encapsulate the knowledge, based on the weighted infinity norm distance. SMPL enables known class classification and unseen attack identification through similarity comparison between prototypes and samples. Then, we design boundary calibration and overlap refinement to transform learned prototypes into priority-guided matching rules, ensuring precise and efficient in-network deployment. Additionally, Helios supports incremental prototype learning and rule updates, achieving low-cost hardware reconfiguration. We implement Helios on a Tofino switch and evaluation on three datasets shows that Helios achieves superior performance in classifying known classes (92%+ in ACC and F1) as well as identifying unseen attacks (62% - 98% in TPR). Helios has also reduced resource consumption and reconfiguration time, demonstrating its scalability and efficiency for real-world deployment.
Graph Neural Networks (GNNs) have achieved remarkable success in node classification tasks on individual graphs. However, existing GNNs trained within a specific domain ( a.k.a., source domain) frequently exhibit unsatisfied performance when transferred to another domain ( a.k.a., target domain), due to the domain gap. To tackle this issue, Few Shot Graph Domain Adaptation (FSGDA) is introduced to the node classification task, facilitating knowledge transfer from a fully labeled source graph to a target graph with minimal annotations for each class. An intuitive solution is directly training the GNN with labeled source and target samples together. Nevertheless, there are two issues in this procedure: (1) When the annotations on the target domain used for training are extremely sparse, the GNN performance may significantly be damaged by nodes with the source-domain bias not aligning with the target-domain distribution. (2) Apart from the biased nodes, the low-value nodes among the remaining nodes impede the GNN learning for the core nodes, like the limited target training nodes. To address the above issues, we propose a new method for FSGDA, named GraphInflu, whose core idea is to grasp the key takeaways from the source domain to facilitate the adaptation process. It contains two characteristic modules, including the Supportive Node Selector and the Soft Logic-Inspired Node Reweighting. The former aims to identify the most influential set of source nodes based on their contribution to improving performance on target nodes. The latter further focuses more on the core nodes in the selected influential set, which closely align with the target nodes especially those presenting challenging predictions. Extensive experiments validate the efficacy of GraphInflu by overcoming the current state-of-the-art methods. Our code is available at https://github.com/lvXiangwei/GraphInflu.git.
Livestreaming by VTubers --- animated 2D/3D avatars controlled by real individuals --- have recently garnered substantial global followings and achieved significant monetary success. Despite prior research highlighting the importance of realism in audience engagement, VTubers deliberately conceal their identities, cultivating dedicated fan communities through virtual personas. While previous studies underscore that building a core fan community is essential to a streamer's success, we lack an understanding of the characteristics of viewers of this new type of streamer. Gaining a deeper insight into these viewers is critical for VTubers to enhance audience engagement, foster a more robust fan base, and attract a larger viewership. To address this gap, we conduct a comprehensive analysis of VTuber viewers on Bilibili, a leading livestreaming platform where nearly all VTubers in China stream. By compiling a first-of-its-kind dataset covering 2.7M livestreaming sessions, we investigate the characteristics, engagement patterns, and influence of VTuber viewers. Our research yields several valuable insights, which we then leverage to develop a tool to ''recommend'' future subscribers to VTubers. By reversing the typical approach of recommending streams to viewers, this tool assists VTubers in pinpointing potential future fans to pay more attention to, and thereby effectively growing their fan community.
Processing long contexts presents a significant challenge for large language models (LLMs). While recent advancements allow LLMs to handle much longer contexts than before (e.g., 32K or 128K tokens), it is computationally expensive and can still be insufficient for many applications. Retrieval-Augmented Generation (RAG) is considered a promising strategy to address this problem. However, conventional RAG methods face inherent limitations because of two underlying requirements: 1) explicitly stated queries, and 2) well-structured knowledge. These conditions, however, do not hold in general long-context processing tasks.
In this work, we propose MemoRAG, a novel RAG framework empowered by global memory-augmented retrieval. MemoRAG features a dual-system architecture. First, it employs a light but long-range system to create a global memory of the long context. Once a task is presented, it generates draft answers, providing useful clues for the retrieval tools to locate relevant information within the long context. Second, it leverages an expensive but expressive system, which generates the final answer based on the retrieved information. Building upon this fundamental framework, we realize the memory module in the form of KV compression, and reinforce its memorization and cluing capacity from the Generation quality's Feedback (a.k.a. RLGF). In our experiments, MemoRAG achieves superior performances across a variety of long-context evaluation tasks, not only complex scenarios where traditional RAG methods struggle, but also simpler ones where RAG is typically applied.
Tables are ubiquitous across various domains for concisely representing structured information. Empowering large language models (LLMs) to reason over tabular data represents an actively explored direction. However, since typical LLMs only support one-dimensional (1D) inputs, existing methods often flatten the two-dimensional (2D) table structure into a sequence of tokens, which can severely disrupt the spatial relationships and result in an inevitable loss of vital contextual information. In this paper, we first empirically demonstrate the detrimental impact of such flattening operations on the performance of LLMs in capturing the spatial information of tables through two elaborate proxy tasks. Subsequently, we introduce a simple yet effective positional encoding method, termed "2D-TPE" (Two-Dimensional Table Positional Encoding), to address this challenge. 2D-TPE enables each attention head to dynamically select a permutation order of tokens within the context for attending to them, where each permutation represents a distinct traversal mode for the table, such as column-wise or row-wise traversal. 2D-TPE effectively mitigates the risk of losing essential spatial information while preserving computational efficiency, thus better preserving the table structure. Extensive experiments across five benchmarks demonstrate that 2D-TPE outperforms strong baselines, underscoring the importance of preserving the table structure for accurate table comprehension. Comprehensive analysis further reveals the substantially better scalability of 2D-TPE to large tables than baselines.
Multi-modal recommender systems (MMRecs) leverage diverse modalities to deliver personalized recommendations, yet they often struggle with efficiency due to the large size of modality encoders and the complexity of fusing high-dimensional features. To address the efficiency issue, a promising solution is to compress a cumbersome MMRec into a lightweight ID-based Multi- Layer Perceptron-based Recommender system (MLPRec) through Knowledge Distillation (KD). Despite effectiveness, we argue that this approach overlooks the significant gap between the complex teacher MMRec and the lightweight, ID-based student MLPRec, which differ significantly in size, architecture, and input modalities, leading to ineffective knowledge transfer and suboptimal student performance. To bridge this gap, we propose TARec, a novel teacher-assisted Wasserstein Knowledge Distillation framework for compressing MMRecs into an efficient MLPRec. TARec introduces: (i) a two-staged KD process using an intermediate Teacher Assistant (TA) model to bridge the gap between teacher and student, facilitating smoother knowledge transfer; (ii) logit-level KD using the Wasserstein Distance as metric, replacing the conventional KL divergence to ensure stable gradient flow even with significant teacher-student gaps; and (iii) embedding-level contrastive KD to further distill high-quality embedding-level knowledge from teacher. Extensive experiments on real-world datasets verify the effectiveness of TARec, demonstrating that TARec significantly outperforms the state-of-the-art MMRecs while reducing computational costs. Our code is available at: https://github.com/Suehn/TARec.git.
Network connectivity minimization is a fundamental problem in controlling the spread of viruses in the Internet and facilitating information propagation in online social networks. The problem aims to identify a budget number of key nodes whose removal would minimize the connectivity of a network. However, the existing solutions heavily rely on the number of edges, making it challenging to handle large and densely connected social networks. In this study, we present a fast algorithm that is independent of the number of edges. To achieve this, we first introduce a surrogate matrix that approximates the residual adjacency matrix with arbitrary small predefined error. We then devise an efficient approach for inferring k influential nodes by optimizing the eigenvalues of the surrogate matrix. Remarkably, the algorithm has a small time complexity of O(knr3), with r being a small tunable number. Our algorithm thereby maintains a linear scalability in terms of the number of nodes and is unaffected by the number of edges. Hence, it has the capability to efficiently handle large and dense social networks. At last, we evaluate its performance against state-of-the-art techniques using diverse real-world datasets. The experimental results demonstrate the superiority of our proposed method in terms of both solution quality and computational efficiency.
Self-supervised learning (SSL) in graphs has garnered significant attention, particularly in employing Graph Neural Networks (GNNs) with pretext tasks initially designed for other domains, such as contrastive learning and feature reconstruction. However, it remains uncertain whether these methods effectively reflect essential graph properties, precisely representation similarity with its neighbors. We observe that existing methods position opposite ends of a spectrum driven by the graph embedding smoothness, with each end corresponding to outperformance on specific downstream tasks. Decomposing the SSL objective into three terms via an information-theoretic framework with a neighbor representation variable reveals that this polarization stems from an imbalance among the terms, which existing methods may not effectively maintain. Further insights suggest that balancing between the extremes can lead to improved performance across a wider range of downstream tasks. A framework, BSG (Balancing Smoothness in Graph SSL), introduces novel loss functions designed to supplement the representation quality in graph-based SSL by balancing the derived three terms: neighbor loss, minimal loss, and divergence loss. We present a rigorous theoretical analysis of the effects of these loss functions, highlighting their significance from both the SSL and graph smoothness perspectives. Extensive experiments on multiple real-world datasets across node classification and link prediction consistently demonstrate that BSG achieves state-of-the-art performance, outperforming existing methods. Our implementation code is available at https://github.com/steve30572/BSG.
Graph neural networks have been demonstrated as a powerful paradigm for effectively learning graph-structured data on the web and mining content from it. %the wide web. for downstream task analysis. Current leading graph models require a large number of labeled samples for training, which unavoidably leads to overfitting in few-shot scenarios. Recent research has sought to alleviate this issue by simultaneously leveraging graph learning and meta-learning paradigms. However, these graph meta-learning models assume the availability of numerous meta-training tasks to learn transferable meta-knowledge. Such assumption may not be feasible in the real world due to the difficulty of constructing tasks and the substantial costs involved. Therefore, we propose a SiMple yet effectIve approach for graph few-shot Learning with fEwer tasks, named SMILE. We introduce a dual-level mixup strategy, encompassing both within-task and across-task mixup, to simultaneously enrich the available nodes and tasks in meta-learning. Moreover, we explicitly leverage the prior information provided by the node degrees in the graph to encode expressive node representations. Theoretically, we demonstrate that SMILE can enhance the model generalization ability. Empirically, SMILE consistently outperforms other competitive models by a large margin across all evaluated datasets with in-domain and cross-domain settings. Our anonymous code can be found https://github.com/KEAML-JLU/SMILE.
The Web of Things (WoT) enhances interoperability across web-based and ubiquitous computing platforms while complementing existing IoT standards. The multimodal Federated Learning (FL) paradigm has been introduced to enhance WoT by enabling the fusion of multi-source mobile sensing data while preserving privacy. However, a key challenge in mobile sensing systems using multimodal FL is modality incompleteness, where some modalities may be unavailable or only partially captured, potentially degrading the system's performance and reliability. Current multimodal FL frameworks typically train multiple unimodal FL subsystems or apply interpolation techniques on the node side to approximate missing modalities. However, these approaches overlook the shared latent feature space among incomplete modalities across different nodes and fail to discriminate against low-quality nodes. To address this gap, we present FedMobile, a new knowledge contribution-aware multimodal FL framework designed for robust learning despite missing modalities. FedMobile prioritizes local-to-global knowledge transfer, leveraging cross-node multimodal feature information to reconstruct missing features. It also enhances system performance and resilience to modality heterogeneity through rigorous node contribution assessments and knowledge contribution-aware aggregation rules. Empirical evaluations on five widely recognized multimodal benchmark datasets demonstrate that FedMobile maintains robust learning even when up to 90% of modality information is missing or when data from two modalities are randomly missing, outperforming state-of-the-art baselines. Our code and data are available at the https://doi.org/10.5281/zenodo.14802364 link.
Building location embedding from web-sourced satellite imagery has emerged as an enduring research focus in web mining. However, most existing methods are inherently constrained by their reliance on discrete, sparse sampling strategies, failing to capture the essential spatial continuity of geographic spaces. Moreover, the presence of confounding factors in satellite images can distort the perception of actual objects, leading to semantic discontinuity in the embeddings. In this work, we propose SatCLE, a novel framework for <u>C</u>ontinuous <u>L</u>ocation <u>E</u>mbeddings leveraging <u>Sat</u>ellite imagery. Specifically, to address the out-of-sample query challenge of spatial continuity, we propose a geospatial refinement strategy comprising stochastic perturbation continuity expansion and graph propagation fusion, which transforms discrete geospatial coordinates into a continuous space. To mitigate the effects of confounders on semantic continuity, we introduce causal refinement, integrating causal theory to localize and eliminate spurious correlations arising from the environmental context. Through extensive experiments, SatCLE shows state-of-the-art performance, exhibiting superior spatial coherence and semantic fidelity across diverse geospatial tasks. The source code is available at https://github.com/CityMind-Lab/SatCLE.
The rapid proliferation of fake news on social media threatens social stability, creating an urgent demand for more effective detection methods. While many promising approaches have emerged, most rely on content analysis with limited semantic depth, leading to suboptimal comprehension of news content. To address this limitation, capturing broader-range semantics is essential yet challenging, as it introduces two primary types of noise: fully connecting sentences in news graphs often adds unnecessary structural noise, while highly similar but authenticity-irrelevant sentences introduce feature noise, complicating the detection process. To tackle these issues, we propose BREAK, a <u>b</u>road-<u>r</u>ange s<u>e</u>mantics model for f<u>ak</u>e news detection that leverages a fully connected graph to capture comprehensive semantics while employing dual denoising modules to minimize both structural and feature noise. The semantic structure denoising module balances the graph's connectivity by iteratively refining it between two bounds: a sequence-based structure as a lower bound and a fully connected graph as the upper bound. This refinement uncovers label-relevant semantic interrelations structures. Meanwhile, the semantic feature denoising module reduces noise from similar semantics by diversifying representations, aligning distinct outputs from the denoised graph and sequence encoders using KL-divergence to achieve feature diversification in high-dimensional space. The two modules are jointly optimized in a bi-level framework, enhancing the integration of denoised semantics into a comprehensive representation for detection. Extensive experiments across four datasets prove that BREAK significantly outperforms existing fake news detection methods.
Empathetic Response Generation (ERG) is one of the key tasks of the affective computing area, which aims to produce emotionally nuanced and compassionate responses to user's queries. However, existing ERG research is predominantly confined to the singleton text modality, limiting its effectiveness since human emotions are inherently conveyed through multiple modalities. To combat this, we introduce an avatar-based Multimodal ERG (MERG) task, entailing rich text, speech, and facial vision information. We first present a large-scale high-quality benchmark dataset, AvaMERG, which extends traditional text ERG by incorporating authentic human speech audio and dynamic talking-face avatar videos, encompassing a diverse range of avatar profiles and broadly covering various topics of real-world scenarios. Further, we deliberately tailor a system, named Empatheia, for MERG. Built upon a Multimodal Large Language Model (MLLM) with multimodal encoder, speech and avatar generators, Empatheia performs end-to-end MERG, with Chain-of-Empathetic reasoning mechanism integrated for enhanced empathy understanding and reasoning.Finally, we devise a list of empathetic-enhanced tuning strategies, strengthening the capabilities of emotional accuracy and content, avatar-profile consistency across modalities. Experimental results on AvaMERG data demonstrate that Empatheia consistently shows superior performance than baseline methods on both textual ERG and MERG. All data and code are open at https://AvaMERG.github.io/.
Cross-Domain Recommendation (CDR) has been widely investi- gated for solving long-standing data sparsity problem via knowl- edge sharing across domains. In this paper, we focus on the Multi- Modal Cross-Domain Recommendation (MMCDR) problem where different items have multi-modal information while few users are overlapped across domains. MMCDR is particularly challenging in two aspects: fully exploiting diverse multi-modal information within each domain and leveraging useful knowledge transfer across domains. However, previous methods fail to cluster items with similar characteristics while filtering out inherit noises within different modalities, hurdling the model performance. What is worse, conventional CDR models primarily rely on overlapped users for domain adaptation, making them ill-equipped to handle scenarios where the majority of users are non-overlapped. To fill this gap, we propose Joint Similarity Item Exploration and Overlapped User Guidance (SIEOUG) for solving the MMCDR problem. SIEOUG first proposes similarity item exploration module, which not only obtains pair-wise and group-wise item-item graph knowledge, but also reduces irrelevant noise for multi-modal modeling. Then SIEOUG proposes user-item collaborative filtering module to aggregate user/item embeddings with the attention mechanism for collaborative filtering. Finally SIEOUG proposes overlapped user guidance module with optimal user matching for knowledge sharing across domains. Our empirical study on Amazon dataset with several different tasks demonstrates that SIEOUG significantly outperforms the state-of-the-art models under the MMCDR setting.
Dynamic Dependence Network (DDN) inference is crucial for understanding evolving relationships in multimodal time series web data, with broad applications in fields like medical and financial network analysis. The inherent dynamic nature, temporal continuity, and heterogeneous data sources in multimodal time series data pose three fundamental challenges: computational efficiency, prediction stability and robustness, and modality quality disparity. Previous methods, generally lacking utilization of multiple modalities, either struggle with computational efficiency due to the time-intensive manual hyperparameter tuning, or compromise prediction stability and robustness by neglecting temporal coherence. To address these challenges, we propose a <u>No</u>rmalized mutual information-driven <u>T</u>uning-fre<u>e</u> Dynamic Dependence <u>Net</u>work inference method for multimodal data, namely NoTeNet. NoTeNet provides a promising paradigm that can integrate two different data modalities to enhance prediction accuracy. It uses normalized mutual information transforms noisy auxiliary data into relationship matrices and employs a kernel function for smooth temporal estimation. Additionally, NoTeNet significantly reduces the need for manual hyperparameter adjustments, offering a tuning-free approach with theoretical guarantees. On various synthetic datasets and real-world data, NoTeNet demonstrates superior prediction accuracy and efficiency without the need for hyperparameter tuning, making it potential for a wide range of web data applications.
Standard multimodal self-supervised learning (SSL) algorithms regard cross-modal synchronization as implicit supervisory labels during pretraining, thus posing high requirements on the scale and quality of multimodal samples. These constraints significantly limit the performance of sensing intelligence in IoT applications, as the heterogeneity and the non-interpretability of time-series signals result in abundant unimodal data but scarce high-quality multimodal pairs. This paper proposes InfoMAE, a cross-modal alignment framework that tackles the challenge of multimodal pair efficiency under the SSL setting by facilitating efficient cross-modal alignment of pretrained unimodal representations. InfoMAE achieves efficient cross-modal alignment with limited data pairs through a novel information theory-inspired formulation that simultaneously addresses distribution-level and instance-level alignment. Extensive experiments on two real-world IoT applications are performed to evaluate InfoMAE's pairing efficiency to bridge pretrained unimodal models into a cohesive joint multimodal model. InfoMAE enhances downstream multimodal tasks by over 60% with significantly improved multimodal pairing efficiency. It also improves unimodal task accuracy by an average of 22%.
We consider the problem of bidding in online advertising, where an advertiser aims to maximize value while adhering to budget and Return-on-Spend (RoS) constraints. Unlike prior work that assumes knowledge of the value generated by winning each impression (e.g., conversions), we address the more realistic setting where the advertiser must simultaneously learn the optimal bidding strategy and the value of each impression opportunity. This introduces a challenging exploration-exploitation dilemma: the advertiser must balance exploring different bids to estimate impression values with exploiting current knowledge to bid effectively. To address this, we propose a novel Upper Confidence Bound (UCB)-style algorithm that carefully manages this trade-off. Via a rigorous theoretical analysis, we prove that our algorithm achieves Õ(₲T log(|B|T) ) regret and constraint violation, where T is the number of bidding rounds and B is the domain of possible bids. This establishes the first optimal regret and constraint violation bounds for bidding in the online setting with unknown impression values. Moreover, our algorithm is computationally efficient and simple to implement. We validate our theoretical findings through experiments on synthetic data, demonstrating that our algorithm exhibits strong empirical performance compared to existing approaches.
Content feeds provided by platforms such as X (formerly Twitter) and TikTok are consumed by users on a daily basis. In this paper, we revisit the native advertising problem in content feeds, initiated by Ieong et al. Given a sequence of organic items (e.g., videos or posts) relevant to a user's interests or to an information search, the goal is to place ads within the organic content so as to maximize a reward function (e.g., number of clicks), while accounting for two considerations: (1) an ad can only be inserted after a relevant content item; (2) the users' attention decays after consuming content or ads. These considerations provide a natural model for capturing both the advertisement effectiveness and the user experience. In this paper, we design fast and practical 2-approximation greedy algorithms for the associated optimization problem, improving over the best-known practical algorithm that only achieves an approximation factor of 4. Our algorithms exploit a counter-intuitive observation, namely, while top items are seemingly more important due to the decaying attention of the user, taking good care of the bottom items is key for obtaining improved approximation guarantees. We then provide the first comprehensive empirical evaluation on the problem, showing the strong empirical performance of our~methods.
Next Point-of-Interest (POI) recommendation has become a crucial task in Location-Based Social Networks (LBSNs), which provide personalized recommendations by predicting the user's next check-in locations. Commonly used models including Recurrent Neural Networks (RNNs) and Graph Convolutional Networks (GCNs) have been widely explored. However, these models face significant challenges, including the difficulty of capturing the hierarchical and tree-like structure of POIs in Euclidean space and the sparsity problem inherent in POI recommendations. To address these challenges, we propose a Hyperbolic Variational Graph Auto-Encoder (HVGAE) for next POI recommendation. Specifically, we utilize a Hyperbolic Graph Convolutional Network (Hyperbolic GCN) to model hierarchical structures and tree-like relationships by converting node embeddings from euclidean space to hyperbolic space. Then we use Variational Graph Auto-Encoder (VGAE) to convert node embeddings to probabilistic distributions, enhancing the capture of deeper latent features and providing a more robust model structure. Furthermore, we combine the Mamba4Rec recommender and Rotary Position Embedding (RoPE) and propose Rotary Position Mamba (RPMamba) to effectively utilize POI embeddings rich in sequential information, which improves the accuracy of the next POI recommendation. Extensive experiments on three public datasets demonstrate the superior performance of the HVGAE model.
Hyper-relational knowledge graphs (HKGs) extend the traditional triplet-based knowledge graph by adding qualifiers to the relationships, making HKGs particularly useful for tasks that require more profound understanding and inference from relationships between entities. However, existing hyper-relational knowledge representation learning methods (HKRL) focus on direct neighbourhood information of entities only by neglecting the relational similarity of the main triple in hyper-relational facts and the attribute details in the qualifiers. In addition, few works extract common and private information across multiple views to minimize noise and interference. This paper proposes a multi-hypergraph disentanglement method for HKRL to address the above issues. Specifically, we first construct four hypergraphs to mine and utilise the inherent structure information of HKGs, and then propose to extract common representations among hypergraphs and private representations within individual hypergraphs to mine the semantic information and the task-relevant information, respectively. Experiment results on four real datasets demonstrate the effectiveness of the proposed method compared to SOTA methods in link prediction tasks on HKGs.
With the advancement of WebAssembly, abbreviated as Wasm, various memory bugs and undefined behaviors have emerged, leading to security issues that affect usability and portability. Existing methods struggle to detect these problems in Wasm binaries due to challenges associated with binary instrumentation and the difficulty of defining legal memory bounds.While sanitizers combined with fuzzing are recognized as effective means for identifying bugs, current Wasm sanitizers necessitate compile-time instrumentation, rendering them unsuitable for practical scenarios where only binaries are accessible. In this paper, we propose WBSan, the first Wasm binary sanitizer employing static analysis and Wasm binary instrumentation to detect memory bugs and undefined behaviors. We develop distinct instrumentation patterns tailored for each type of bug and introduce Wasm shadow memory to address complex memory bugs. Our results reveal that WBSan achieves a 16.8% false detection rate, outperforming current Wasm binary checkers and native sanitizers in detecting memory bugs and undefined behaviors. Furthermore, when compared with the binary-only fuzzer, WBSan uncovers more crashes and achieves greater code coverage.
Conversational recommender systems (CRS) aim to provide personalized recommendations via interactive dialogues with users. While large language models (LLMs) enhance CRS with their superior understanding of context-aware user preferences, they typically struggle to leverage behavioral data, which have proven to be important for classical collaborative filtering (CF)-based approaches. For this reason, we propose CRAG-Collaborative Retrieval Augmented Generation for LLM-based CRS. To the best of our knowledge, CRAG is the first approach that combines state-of-the-art LLMs with CF for conversational recommendations. Our experiments on two publicly available movie conversational recommendation datasets, i.e., a refined Reddit dataset (which we name Reddit-v2) as well as the Redial dataset, demonstrate the superior item coverage and recommendation performance of CRAG, compared to several CRS baselines. Moreover, we observe that the improvements are mainly due to better recommendation accuracy on recently released movies. The code and data are available at https://github.com/yaochenzhu/CRAG.
Urban fine-grained data map inference, leveraging information from coarse-grained maps, has emerged as a significant area of research due to the growing complexity and data heterogeneity in urban environments. Existing methods have a priori assumption that a coarse-grained data map, one fixed-size granularity, transforms into a fine-grained data map, also one fixed-size granularity. However, in actual scenarios, the collected coarse-grained data maps are often incomplete and have significantly distinct granularities in various urban areas, which results in incomplete heterogeneous data, i.e., multi-granularity data maps in terms of spatial information. Meanwhile, different granularity data maps are needed for various urban downstream tasks, which is a multi-task problem. To that end, this paper proposes a novel framework, a multi-granularity super-resolution data map inference framework (MGSR), designed to harness spatio-temporal information to transform incomplete coarse-grained multi-granularity data maps into fine-grained multi-granularity data maps. Specifically, we design a granularity alignment network to align multi-granularity information and address missing data on each granularity data map by leveraging the other granularity data maps with a well-designed self-supervised task. Then, we introduce a feature extraction network to capture spatio-temporal dependencies and extract features. Finally, we devise a recurrent super-resolution network with shared parameters to infer multi-granularity data maps. We conduct extensive experiments on three real-world benchmark datasets and demonstrate that MGSR significantly outperforms the state-of-the-art methods for multi-granularity urban data map inference and reduces RMSE and MAE by up to 40.1% and 50.3%, respectively. The source code has been released at https://github.com/wn13/MGSR\_code.
Recently, multilingual Vision-Language Pre-training (mVLP) has shown remarkable progress in learning joint representations across different modalities and languages. However, most existing methods learn semantic alignment at a coarse-grained level and fail to capture fine-grained correlations between different languages and modalities. To address this, we propose a Multi-grained Multilingual Vision-Language Pre-training (M2-VLP) model, which aims to learn cross-lingual cross-modal alignment at different semantic granular levels. In cross-lingual interaction, the model learns the global alignment of parallel sentence pairs and the word-level correlations. In cross-modal interaction, the model aligns images with captions and image regions with corresponding words. To integrate the cross-lingual and cross-modal alignment above, we propose a unified multi-grained contrastive learning paradigm. Under zero-shot cross-lingual and fine-tuned multilingual settings, extensive experiments on vision-language downstream tasks across twenty languages demonstrate the effectiveness of M2-VLP over competitive contrastive models. Code and models are available at https://github.com/ahtamjan/M2-VLP.
Modern online advertising systems often involve a substantial number of advertisers in each auction, which results in scalability issues. To address this challenge, two-stage auctions have been designed and implemented in practice. These auctions enable efficient allocation of ad slots among numerous candidate advertisers in a short response time. This approach employs a fast yet coarse model in the first stage to select a small subset of advertisers, followed by a slow, more refined model to determine the final winners. However, existing two-stage auction mechanisms primarily focus on optimizing welfare, overlooking other critical objectives of the platform, such as revenue.
In this paper, we propose ad-wise selection metrics, named Max-Wel and Max-Rev, which optimize the platform's welfare and revenue, respectively. These metrics are based on each ad's contribution to the corresponding objective function. We also provide theoretical guarantees for the proposed metrics. Our method is applicable to both welfare and revenue optimizations and can be easily implemented using neural networks. Through extensive experiments conducted on both synthetic and industrial data, we demonstrate the advantages of our proposed selection metrics compared to existing baselines.
The rapid spread of rumors on social media has posed significant challenges to maintaining public trust and information integrity. Since an information cascade process is essentially a propagation tree, recent rumor detection models leverage graph neural networks to additionally capture information propagation patterns, thus outperforming text-only solutions. Given the variations in topics and social impact of the root node, different source information naturally has distinct outreach capabilities, resulting in different heights of propagation trees. This variation, however, impedes the data-driven design of existing graph-based rumor detectors. Given a shallow propagation tree with limited interactions, it is unlikely for graph-based approaches to capture sufficient cascading patterns, questioning their ability to handle less popular news or early detection needs. In contrast, a deep propagation tree is prone to noisy user responses, and this can in turn obfuscate the predictions. In this paper, we propose a novel Epidemiology-informed Network (EIN) that integrates epidemiological knowledge to enhance performance by overcoming data-driven methods' sensitivity to data quality. Meanwhile, to adapt epidemiology theory to rumor detection, it is expected that each user's stance toward the source information will be annotated. To bypass the costly and time-consuming human labeling process, we take advantage of large language models to generate stance labels, facilitating optimization objectives for learning epidemiology-informed representations. Our experimental results demonstrate that the proposed EIN not only outperforms state-of-the-art methods on real-world datasets but also exhibits enhanced robustness across varying tree depths.
It is increasingly common in digital environments to use A/B tests to compare the performance of recommendation algorithms. However, such experiments often violate the stable unit treatment value assumption (SUTVA), particularly SUTVA's ''no hidden treatments'' assumption, due to the shared data between algorithms being compared. This results in a novel form of bias, which we term ''symbiosis bias,'' where the performance of each algorithm is influenced by the training data generated by its competitor. In this paper, we investigate three experimental designs--cluster-randomized, data-diverted, and user-corpus co-diverted experiments--aimed at mitigating symbiosis bias. We present a theoretical model of symbiosis bias and simulate the impact of each design in dynamic recommendation environments. Our results show that while each design reduces symbiosis bias to some extent, they also introduce new challenges, such as reduced training data in data-diverted experiments. We further validate the existence of symbiosis bias using data from a large-scale A/B test conducted on a global recommender system, demonstrating that symbiosis bias affects treatment effect estimates in the field. Our findings provide actionable insights for researchers and practitioners seeking to design experiments that accurately capture algorithmic performance without bias in treatment effect estimates introduced by shared data.
Social media platforms employ various content moderation techniques to remove harmful, offensive, and toxic content, with moderation levels varying across platforms and evolving over time. Parler, a fringe platform popular among conservative users, initially had minimal moderation, promoting itself as a space for open discussion. However, in 2021, it was removed from the Apple and Google App Stores and suspended from Amazon Web Services due to inadequate moderation of harmful content. After a month-long suspension, Parler returned with stricter guidelines, offering a unique opportunity to study the impact of platform-wide policy changes on user behavior and content outcomes. In this paper, we analyzed Parler data to assess the causal associations of these moderation changes on content toxicity and factuality. Using a longitudinal dataset of 17M posts from 432K users, who were active both before and after replatforming, we employed quasi-experimental analysis, controlling for confounding factors. We introduced a novel approach by using data from another social media platform, Twitter, to account for a critical confounding factor: offline events. This allowed us to isolate the effects of Parler's replatforming policies from external real-world influences. Our findings demonstrate that Parler's moderation changes are causally associated with a significant reduction in all forms of toxicity (p < 0.001). Additionally, we observed an increase in the factuality of the news sites shared and a reduction in the number of conspiracy/ pseudoscience sources.
Signed graph clustering is a critical technique for discovering community structures in graphs that exhibit both positive and negative relationships. We have identified two significant challenges in this domain: i) existing signed spectral methods are highly vulnerable to noise, which is prevalent in real-world scenarios; ii) the guiding principle "an enemy of my enemy is my friend", rooted in Social Balance Theory, often narrows or disrupts cluster boundaries in mainstream signed graph neural networks. Addressing these challenges, we propose the <u>D</u>eep <u>S</u>igned <u>G</u>raph <u>C</u>lustering framework (DSGC), which leverages Weak Balance Theory to enhance preprocessing and encoding for robust representation learning. First, DSGC introduces Violation Sign-Refine to denoise the signed network by correcting noisy edges with high-order neighbor information. Subsequently, Density-based Augmentation enhances semantic structures by adding positive edges within clusters and negative edges across clusters, following Weak Balance principles. The framework then utilizes Weak Balance principles to develop clustering-oriented signed neural networks to broaden cluster boundaries by emphasizing distinctions between negatively linked nodes. Finally, DSGC optimizes clustering assignments by minimizing a regularized clustering loss. Comprehensive experiments on synthetic and real-world datasets demonstrate DSGC consistently outperforms all baselines, establishing a new benchmark in signed graph clustering.
The success of Graph Neural Networks (GNNs) in graph classification has heightened interest in explainable GNNs, particularly through graph rationalization. This method aims to enhance GNNs explainability by identifying subgraph structures (i.e., rationales) that support model predictions. However, existing methods often rely on centralized datasets, posing challenges in scenarios where data privacy is crucial, such as in molecular property prediction. Federated Learning (FL) offers a solution by enabling collaborative model training without sharing raw data. In this context, Federated Graph Rationalization emerges as a promising research direction. However, in each client, the rationalization methods often rely on client-specific shortcuts to compose rationales and make task predictions. Data heterogeneity, characterized by non-IID data across clients, exacerbates this problem, leading to poor prediction performance. To address these challenges, we propose the Environment-aware Data Augmentation (EaDA) method for Federated Graph Rationalization. EaDA comprises two main components: the Environment-aware Rationale Extraction (ERE) module and the Local-Global Alignment (LGA) module. The ERE module employs prototype learning to infer and share abstract environment information across clients, which are then aggregated to form a global environment. This information is used to generate counterfactual samples for local clients, enhancing the robustness of task predictions. The LGA module uses contrastive learning methods to align local and global rationale representations, mitigating performance degradation due to data heterogeneity. Comprehensive experiments on benchmark datasets demonstrate the effectiveness of our approaches. Code is available at https://github.com/yuelinan/Codes-of-EaDA.
Network traffic anomaly detection is pivotal in cybersecurity, especially as data volume grows and security requirement intensifies. This study addresses critical limitations in existing reconstruction-based methods, which quantify anomalies relying on intra-sample differences and struggle to detect drifted anomalies. In response, we propose a novel approach, the Uncertainty-Inspired Inter-Sample Differences (UnDiff) method, which leverages model uncertainty to enhance anomaly detection capabilities, particularly in scenarios involving anomaly drift. By employing evidential learning, the UnDiff model gathers evidence to minimize uncertainty in normal network traffic, enhancing its ability to differentiate between normal and anomalous traffic. To overcome the limitations of intra-sample difference quantification in reconstruction-based methods, we propose a novel anomaly score based on inter-sample uncertainty deviation that directly quantifies the anomaly degree. Benefiting from a concise model design and parameterized uncertainty quantification, UnDiff achieves high efficiency. Extensive experiments on three benchmarks demonstrate UnDiff's superior performance in detecting both undrifted and drifted anomalies with minimal computational overhead.
Large language models (LLMs) provide powerful foundations to perform fine-grained text re-ranking. However, they are often prohibitive in reality due to constraints on computation bandwidth. In this work, we propose a flexible architecture called Matroyshka Re-Ranker, which is designed to facilitate runtime customization of model layers and sequence lengths at each layer based on users' configurations. Consequently, the LLM-based re-rankers can be made applicable across various real-world situations. The increased flexibility may come at the cost of precision loss. To address this problem, we introduce a suite of techniques to optimize the performance. First, we propose cascaded self-distillation, where each sub-architecture learns to preserve a precise re-ranking performance from its super components, whose predictions can be exploited as smooth and informative teacher signals. Second, we design a factorized compensation mechanism, where two collaborative LoRA modules, vertical and horizontal, are jointly employed to compensate for the precision loss resulted from arbitrary combinations of layer and sequence compression. We perform comprehensive experiments using passage and document retrieval datasets from MSMARCO, along with all public datasets from BEIR. In our experiments, Matryoshka Re-Ranker substantially outperforms existing methods, while effectively preserving its superior performance across various compression forms and application scenarios. We have publicly released our method at this https://github.com/FlagOpen/FlagEmbedding repo.
In the era of data-centric AI, the focus of recommender systems has shifted from model-centric innovations to data-centric approaches. The success of modern AI models is built on large-scale datasets, but this also results in significant training costs. Dataset distillation has emerged as a key solution, condensing large datasets to accelerate model training while preserving model performance. However, condensing discrete and sequentially correlated user-item interactions, particularly with extensive item sets, presents considerable challenges. This paper introduces TD3, a novel Tucker Decomposition based Dataset Distillation method within a meta-learning framework, designed for sequential recommendation. TD3 distills a fully expressive synthetic sequence summary from original data. To efficiently reduce computational complexity and extract refined latent patterns, Tucker decomposition decouples the summary into four factors: synthetic user latent factor, temporal dynamics latent factor, shared item latent factor, and a relation core that models their interconnections. Additionally, a surrogate objective in bi-level optimization is proposed to align feature spaces extracted from models trained on both original data and synthetic sequence summary beyond the naive performance matching approach. In the inner-loop, an augmentation technique allows the learner to closely fit the synthetic summary, ensuring an accurate update of it in the outer-loop. To accelerate the optimization process and address long dependencies, RaT-BPTT is employed for bi-level optimization. Experiments and analyses on multiple public datasets have confirmed the superiority and cross-architecture generalizability of the proposed designs. Codes are released at https://github.com/USTC-StarTeam/TD3.
Prior studies on Video Anomaly Detection (VAD) mainly focus on detecting whether each video frame is abnormal or not in the video, which largely ignore the structured video semantic information (i.e., what, when, and where does the abnormal event happen). With this in mind, this paper propose a new chat-paradigm Multi-scene Video Abnormal Event Extraction and Localization (M-VAE) task, aiming to extract the abnormal event quadruples (i.e., subject, event type, object, scene) and localize such event. Further, this paper believes that this new task faces two key challenges, i.e., global-local spatial modeling and global-local spatial balancing. To this end, this paper proposes a Global-local Spatial-sensitive Large Language Model (LLM) named Sherlock, i.e., acting like Sherlock Holmes to track down the criminal events, for this M-VAE task. Specifically, this model designs a Global-local Spatial-enhanced MoE (GSM) module and a Spatial Imbalance Regulator (SIR) to address the two challenges respectively. Extensive experiments on our M-VAE instruction dataset show the significant advantages of Sherlock over several advanced Video-LLMs. This justifies the importance of global-local spatial information for the M-VAE task and the effectiveness of Sherlock in capturing such information.
Traditional Graph Self-Supervised Learning (GSSL) struggles to capture complex structural properties well. This limitation stems from two main factors: (1) the inadequacy of conventional Graph Neural Networks (GNNs) in representing sophisticated topological features, and (2) the focus of self-supervised learning solely on final graph representations. To address these issues, we introduce GenHopNet, a GNN framework that integrates a k-hop message-passing scheme, enhancing its ability to capture local structural information without explicit substructure extraction. We theoretically demonstrate that GenHopNet surpasses the expressiveness of the classical Weisfeiler-Lehman (WL) test for graph isomorphism. Furthermore, we propose a structural- and positional-aware GSSL framework that incorporates topological information throughout the learning process. This approach enables the learning of representations that are both sensitive to graph topology and invariant to specific structural and feature augmentations. Comprehensive experiments on graph classification datasets, including those designed to test structural sensitivity, show that our method consistently outperforms the existing approaches and maintains computational efficiency. Our work significantly advances GSSL's capability in distinguishing graphs with similar local structures but different global topologies.
The InterPlanetary File System (IPFS) is a pioneering effort for Web 3.0, well-known for its decentralized infrastructure. However, some recent studies have shown that IPFS exhibits a high degree of centralization and has integrated centralized components for improved performance. While this change contradicts the core decentralized ethos of IPFS and introduces risks of hurting the data replication level and thus availability, it also opens some opportunities for better data management and cost savings through deduplication.
To explore these challenges and opportunities, we start by collecting an extensive dataset of IPFS internal traffic spanning the last three years with 20+ billion messages. By analyzing this long-term trace, we obtain a more complete and accurate view of how the status of centralization evolves over an extended period. In particular, our study reveals that (1) IPFS shows a low replication level, with only 2.71% of data files replicated more than 5 times. While increasing replication enhances lookup performance and data availability, it adversely affects downloading throughput due to the overhead involved in managing peer connections, (2) there is a clear growing trend in centralization within IPFS in the last 3 years, with just 5% of peers now hosting over 80% of the content, significantly decreasing from 21.38% 3 years ago, which is largely driven by the increase of cloud nodes, (3) the default deduplication strategy of IPFS using Fixed-Size Chunking (FSC) is largely inefficient, especially with the default 256KB chunk size, showing near-zero duplication being detected. Although Content-Defined Chunking (CDC) with smaller chunks could save ~1.8 petabytes (PB) storage space, it could impact user performance negatively. We thus design and evaluate a new metadata format that optimizes deduplication without compromising performance.
With the widespread adoption of Internet Protocol (IP) communication technology and web-based platforms, cloud manufacturing has become a significant hallmark of Industry 4.0. Integrating graph algorithms into these web-enabled environments is crucial as they facilitate the representation and analysis of complex relationships in manufacturing processes, enabling efficient decision-making and adaptability in dynamic environments. As a key scheduling problem in cloud manufacturing, the flexible job-shop scheduling problem (FJSP) finds extensive applications in real-world scenarios. However, traditional FJSP-solving methods struggle to meet the efficiency and adaptability demands of cloud manufacturing due to generalization issues and excessive computational time, while reinforcement learning-based methods fail to learn relationships between FJSP nodes, such as interactions between operations of different jobs, leading to limited interpretability and performance. To address these issues, we propose a dual operation aggregation graph neural network (GNN) for solving FJSP. Specifically, we decouple the disjunctive graph into two distinct graphs, reducing graph density and clarifying relationships between machines and operations, thus enabling more effective aggregation and understanding by neural networks. We develop two distinct graph aggregation methods to minimize the influence of non-critical machine and operation nodes on decision-making while enhancing the model's ability to account for long-term benefits. Additionally, to achieve more accurate multi-objective estimation and mitigate reward sparsity, we design a reward function that simultaneously considers machine efficiency, schedule balance, and makespan minimization. Extensive experimental results on well-known datasets demonstrate that our model outperforms state-of-the-art models and exhibits excellent generalization capabilities, effectively addressing the challenges of cloud manufacturing.
Zero-shot named entity recognition (NER) aims to develop entity recognition systems from unannotated text corpora. This task presents substantial challenges due to minimal human intervention. Recent work has adapted large language models (LLMs) for zero-shot NER by crafting specialized prompt templates. And it advances models' self-learning abilities by incorporating self-annotated demonstrations. Two important challenges persist: (i) Correlations between contexts surrounding entities are overlooked, leading to wrong type predictions or entity omissions. (ii) The indiscriminate use of task demonstrations, retrieved through shallow similarity-based strategies, severely misleads LLMs during inference.
In this paper, we introduce the cooperative multi-agent system (CMAS), a novel framework for zero-shot NER that uses the collective intelligence of multiple agents to address the challenges outlined above. CMAS has four main agents: (i) a self-annotator, (ii) a type-related feature (TRF) extractor, (iii) a demonstration discriminator, and (iv) an overall predictor. To explicitly capture correlations between contexts surrounding entities, CMAS reformulates NER into two subtasks: recognizing named entities and identifying entity type-related features within the target sentence. To enable controllable utilization of demonstrations, a demonstration discriminator is established to incorporate the self-reflection mechanism, automatically evaluating helpfulness scores for the target sentence. Experimental results show that CMAS significantly improves zero-shot NER performance across six benchmarks, including both domain-specific and general-domain scenarios. Furthermore, CMAS demonstrates its effectiveness in few-shot settings and with various LLM backbones.
The ''pre-train then fine-tune'' approach has advanced GNNs by enabling general knowledge capture without task-specific labels. However, an objective gap between pre-training and downstream tasks limits its effectiveness. Recent graph prompting methods aim to close this gap through task reformulations and learnable prompts. Despite this, they struggle with complex graphs like heterophily graphs. Freezing the GNN encoder can reduce the impact of prompting, while simple prompts fail to handle diverse hop-level distributions. This paper identifies two key challenges in adapting graph prompting methods for complex graphs: (i) adapting the model to new distributions in downstream tasks to mitigate pre-training and fine-tuning discrepancies from heterophily and (ii) customizing prompts for hop-specific node requirements. To overcome these challenges, we propose Distribution-aware Graph Prompt Tuning (DAGPrompT), which integrates a GLoRA module for optimizing the GNN encoder's projection matrix and message-passing schema through low-rank adaptation. DAGPrompT also incorporates hop-specific prompts accounting for varying graph structures and distributions among hops. Evaluations on 10 datasets and 14 baselines demonstrate that DAGPrompT improves accuracy by up to 4.79% in node and graph classification tasks, setting a new state-of-the-art while preserving efficiency. Codes are available at https://github.com/Cqkkkkkk/DAGPrompT GitHub.
The increasing prevalence of large-scale graphs poses a significant challenge for graph neural network training, attributed to their substantial computational requirements. In response, graph condensation (GC) emerges as a promising data-centric solution aiming to substitute the large graph with a small yet informative condensed graph to facilitate data-efficient GNN training. However, existing GC methods suffer from intricate optimization processes, necessitating excessive computing resources and training time. In this paper, we revisit existing GC optimization strategies and identify two pervasive issues therein: (1) various GC optimization strategies converge to coarse-grained class-level node feature matching between the original and condensed graphs; (2) existing GC methods rely on a Siamese graph network architecture that requires time-consuming bi-level optimization with iterative gradient computations. To overcome these issues, we propose a training-free GC framework termed Class-partitioned Graph Condensation (CGC), which refines the node distribution matching from the class-to-class paradigm into a novel class-to-node paradigm, transforming the GC optimization into a class partition problem which can be efficiently solved by any clustering methods. Moreover, CGC incorporates a pre-defined graph structure to enable a closed-form solution for condensed node features, eliminating the need for back-and-forth gradient descent in existing GC approaches. Extensive experiments demonstrate that CGC achieves an exceedingly efficient condensation process with advanced accuracy. Compared with the state-of-the-art GC methods, CGC condenses the Ogbn-products graph within 30 seconds, achieving a speedup ranging from 102 × to 104 × and increasing accuracy by up to 4.2%.
In online advertising, uncertainty calibration aims to adjust a ranking model's probability predictions to better approximate the true likelihood of an event, e.g., a click or a conversion. However, existing calibration approaches may lack the ability to effectively model complex nonlinear relations, consider context features, and achieve balanced performance across different data subsets. To tackle these challenges, we introduce a novel model called Monotonic Calibration Networks, featuring three key designs: a monotonic calibration function (MCF), an order-preserving regularizer, and a field-balance regularizer. The nonlinear MCF is capable of naturally modeling and universally approximating the intricate relations between uncalibrated predictions and the posterior probabilities, thus being much more expressive than existing methods. MCF can also integrate context features using a flexible model architecture, thereby achieving context awareness. The order-preserving and field-balance regularizers promote the monotonic relationship between adjacent bins and the balanced calibration performance on data subsets, respectively. Experimental results on both public and industrial datasets demonstrate the superior performance of our method in generating well-calibrated probability predictions.
Data summarization tasks are often modeled as k-clustering problems, where the goal is to choose k data points, called cluster centers, that best represent the dataset by minimizing a clustering objective. A popular objective is to minimize the maximum distance between any data point and its nearest center, which is formalized as the k-center problem. While in some applications all data points can be chosen as centers, in the general setting, centers must be chosen from a predefined subset of points, referred as facilities or suppliers; this is known as the k-supplier problem. In this work, we focus on fair data summarization modeled as the fair k-supplier problem, where data consists of several groups, and a minimum number of centers must be selected from each group while minimizing the k-supplier objective. The groups can be disjoint or overlapping, leading to two distinct problem variants each with different computational complexity.
We present 3-approximation algorithms for both variants, improving the previously known factor of 5. For disjoint groups, our algorithm runs in polynomial time, while for overlapping groups, we present a fixed-parameter tractable algorithm, where the exponential runtime depends only on the number of groups and centers. We show that these approximation factors match the theoretical lower bounds, assuming standard complexity theory conjectures. Finally, using an open-source implementation, we demonstrate the scalability of our algorithms on large synthetic datasets and assess the price of fairness on real-world data, comparing solution quality with and without fairness constraints.
Algorithmic recourse (AR) has made significant progress by identifying small perturbations in input features that can alter predictions, which provide a data-centric approach to understand decisions from diverse black-box models on the Web. Towards the feasibility issue, i.e., whether the recoursed examples provides actionable and reliable recommendations to end-users, causal algorithmic recourse have incorporated structural causal model (SCM) to preserve the realistic constraints among input features. For instance, preserving structural causal knowledge between "age" and "educational level" can avoid generating samples with decreasing age and increasing educational level. However, previous causal AR methods suffer from the requirement of prior structural causal knowledge, e.g., prior causal graph or the whole SCM, which restricts the realistic application of causal AR methods.
To bridge this gap, we aim to develop a novel framework for causal algorithmic recourse that does not rely on neither prior causal graph or prior SCM. Since identifying counterfactuals without causal graph is impossible, we instead propose to approximate and constrain the variation of the perturbed components, i.e., the exogenous noise variables, by formulating the generation of AR as the structure-preserving intervention. With the aid of development in non-linear Independent Component Analysis (ICA), our method can further achieve theoretically guaranteed constraints on such variation of exogeneous variables. Experimental results on synthetic, semi-synthetic, and real-world data demonstrate the effectiveness of our proposed methods without any prior causal graph or SCM knowledge.
In recent years, a large number of on-chain attacks have emerged in the blockchain empowered Web3 ecosystem. In the year of 2023 alone, on-chain attacks have caused losses of over 585 million. Attackers use blockchain transactions to carry out on-chain attacks, for example, exploiting vulnerabilities or business logic flaws in Web3 applications. A wealth of efforts have been devoted to detecting on-chain attack transactions through expert patterns and machine learning techniques. However, in this ever-evolving ecosystem, the performance of current methods is limited in detecting new on-chain attacks, due to the obsoleting of attack recognition patterns or the reliance on on-chain attack samples. In this paper, we propose a universal approach for detecting on-chain attacks even when there are few or even no new on-chain attack samples. Specifically, an in-depth analysis of the transaction characteristics is conducted, and we propose a new insight to train a generic attack transaction detecting model, i.e., transaction reconstruction. Particularly, to overcome the over-fitting in the transaction reconstruction task, we use the web-scale function comments related to transactions as supervision information, rather than expert-confirmed labels. Experimental results demonstrate that the proposed approach surpasses the supervised state-of-the-art by 13% in AUC, with just 30 known on-chain attack samples. Moreover, without any known attack samples, our method can still detect new on-chain attacks in the wild (with a precision of 61.83%). Among attacks detected in the wild, we confirm 1,692 address poisoning attacks, a new type of on-chain attack targeting token holders. Our code is available at: https://github.com/wuzhy1ng/attack_trans_detection_www25.
Graph Neural Networks (GNNs) have become indispensable tools in many domains, such as social network analysis, financial fraud detection, and drug discovery. Prior research primarily concentrated on improving prediction accuracy while overlooking how reliable the model predictions are. Conformal prediction on graphs emerges as a promising solution, offering statistically sound uncertainty estimates with a pre-defined coverage level. Despite the promising progress, existing works only focus on achieving model coverage guarantees without considering fairness in the coverage within different demographic groups. To bridge the gap between conformal prediction and fair coverage across different groups, we pose the fundamental question: Can fair GNNs enable the uncertainty estimates to be fairly applied across demographic groups? To answer this question, we provide a comprehensive analysis of the uncertainty estimation in fair GNNs employing various strategies. We prove theoretically that fair GNNs can enforce consistent uncertainty bounds across different demographic groups, thereby minimizing bias in uncertainty estimates. Furthermore, we conduct extensive experiments on five commonly used datasets across seven state-of-the-art fair GNN models to validate our theoretical findings. Additionally, based on the theoretical and empirical insights, we identify and analyze the key strategies from various fair GNN models that contribute to ensuring equalized uncertainty estimates. Our work estimates a solid foundation for future exploration of the practical implications and potential adjustments needed to enhance fairness in GNN applications across various domains. For reproducibility, we publish our data and code at https://github.com/wulongfeng/EqualizedCoverage_CP.
Conversational Recommender System (CRS) leverages real-time feedback from users to dynamically model their preferences, thereby enhancing the system's ability to provide personalized recommendations and improving the overall user experience. CRS has demonstrated significant promise, prompting researchers to concentrate their efforts on developing user simulators that are both more realistic and trustworthy. The advent of Large Language Models (LLMs) has demonstrated capabilities that approach human-level intelligence across a diverse range of tasks. Research efforts have been made to utilize LLMs for building user simulators to evaluate the performance of CRS. Although these efforts showcase innovation, they are accompanied by certain limitations. In this work, we introduce a Controllable, Scalable, and Human-Involved (CSHI) simulator framework that manages the behavior of user simulators across various stages via a plugin manager. CSHI tailors behavioral simulations and interaction patterns to deliver authentic user-system engagement experiences. Through experiments and case studies in two conversational recommendation scenarios, we show that our framework can adapt to a variety of conversational recommendation settings and effectively simulate users' personalized preferences. Consequently, our simulator is able to generate feedback that closely mirrors that of real users. This facilitates a reliable assessment of existing CRS studies and promotes the creation of high-quality conversational recommendation datasets.
Large-scale models are typically adapted to meet the diverse requirements of model owners and users. However, maintaining multiple specialized versions of the model is inefficient. In response, we propose AIM, a novel model modulation paradigm that enables a single model to exhibit diverse behaviors to meet the specific end requirements. AIM enables two key modulation modes: utility and focus modulations. The former provides model owners with dynamic control over output quality to deliver varying utility levels, and the latter offers users precise control to shift model's focused input features. AIM introduces a logits redistribution strategy that operates in a training data-agnostic and retraining-free manner. We establish a formal foundation to ensure AIM's regulation capability, based on the statistical properties of logits ordering via joint probability distributions. Our evaluation confirms AIM's practicality and versatility for AI model modulation, with tasks spanning image classification, semantic segmentation and text generation, and prevalent architectures including ResNet, SegFormer and Llama.
Recent research on query generation has focused on using Large Language Models (LLMs), which, despite achieving state-of-the-art performance, also introduce hallucination issues in generated queries. In this work, we categorize these issues into relevance hallucination and factuality hallucination, proposing a new typology for hallucinations arising from LLM-based query generation. We present an effective approach to decouple content from form in LLM-generated queries, preserving the factual knowledge extracted and integrated from inputs while leveraging the LLM's linguistic capabilities to construct syntactic structures, including function words. Specifically, we introduce a model-agnostic and training-free method that transforms the Large Language Model into a Pointer-Generator (LargePiG), where the pointer attention distribution utilizes the LLM's inherent attention weights, and the copy probability is derived from the difference between the vocabulary distribution in the model's high layers and the last layer. To validate the effectiveness of LargePiG, we constructed two datasets for assessing hallucination issues in query generation, covering both document and video scenarios. Empirical studies on various LLMs demonstrated LargePiG's superiority across both datasets. Additional experiments further verified that LargePiG reduces hallucination in large vision-language models and enhances the accuracy of document-based question-answering and factuality evaluation tasks. The source code and dataset are available at https://github.com/Jeryi-Sun/LargePiG.
Graph Neural Networks (GNNs) have become a prominent approach for learning from graph-structured data. However, their effectiveness can be significantly compromised when the graph structure is suboptimal. To address this issue, Graph Structure Learning (GSL) has emerged as a promising technique that refines node connections adaptively. Nevertheless, we identify two key limitations in existing GSL methods: 1) Most methods primarily focus on node similarity to construct relationships, while overlooking the quality of node information. Blindly connecting low-quality nodes and aggregating their ambiguous information can degrade the performance of other nodes. 2) The constructed graph structures are often constrained to be symmetric, which may limit the model's flexibility and effectiveness.
To overcome these limitations, we propose an Uncertainty-aware Graph Structure Learning (UnGSL) strategy. UnGSL estimates the uncertainty of node information and utilizes it to adjust the strength of directional connections, where the influence of nodes with high uncertainty is adaptively reduced. Importantly, UnGSL serves as a plug-in module that can be seamlessly integrated into existing GSL methods with minimal additional computational cost. In our experiments, we implement UnGSL into six representative GSL methods, demonstrating consistent performance improvements. The code is available at https://github.com/UnHans/UnGSL.
Informal caregivers (e.g., family members or friends) of people living with Alzheimer's Disease and Related Dementias (ADRD) face substantial challenges and often seek support through online communities. Understanding the factors driving engagement within these platforms is crucial, as it can enhance communities' long-term value to meet their needs effectively. This study investigated the user interaction dynamics within two large, popular ADRD communities, TalkingPoint and ALZConnected, focusing on topic initiator engagement, initial post content, and the linguistic patterns of comments at the thread level. Using analytical methods such as propensity score matching, topic modeling, and predictive modeling, we found that active topic initiator engagement drives a higher comment volume, and reciprocal replies from topic initiators encourage further commentor engagement at the community level. Practical caregiving topics prompt more re-engagement of topic initiators, while emotional support topics attract more comments from commentors. Additionally, the linguistic complexity and emotional tone of a comment are associated with its likelihood of receiving replies from topic initiators. These findings highlight the importance of fostering active and reciprocal engagement and providing effective strategies to enhance sustainability in ADRD caregiving and broader health-related online communities.
In e-commerce platforms, profit-aware recommender systems aim to improve the platform's profits while maintaining high overall accuracy by recommending items with high profits as top-ranked items. We explore two issues faced by existing model-based profit-aware approaches (i.e., MBAs) when training recommendation models for profit enhancement. First, existing MBAs tend to inaccurately infer the item ranking without considering the user's current preference for each item through their profit-based weighting scheme. Second, through the point-wise learning-to-rank (LTR), the model is optimized solely for the preference score of each item independently rather than being directly optimized for the overall ranking of items. To tackle these issues, we propose a novel MBA that involves three key steps: (S1) defining the Current Preference incorporated with Profit (i.e., CPP) for items; (S2) classifying items through CPP; and (S3) training the model by list-wise LTR based on CPP. Extensive experimental results using real-world platform datasets demonstrate that our approach improves accuracy by approximately 4% and profits by about 24% compared to the best-competing method.
AI-generated counterspeech offers a promising and scalable strategy to curb online toxicity through direct replies that promote civil discourse. However, current counterspeech is one-size-fits-all, lacking adaptation to the moderation context and the users involved. We propose and evaluate multiple strategies for generating tailored counterspeech that is adapted to the moderation context and personalized for the moderated user. We instruct a LLaMA2-13B model to generate counterspeech, experimenting with various configurations based on different contextual information and fine-tuning strategies. We identify the configurations that generate persuasive counterspeech through a combination of quantitative indicators and human evaluations collected via a pre-registered mixed-design crowdsourcing experiment. Results show that contextualized counterspeech can significantly outperform state-of-the-art generic counterspeech in adequacy and persuasiveness, without compromising other characteristics. Our findings also reveal a poor correlation between quantitative indicators and human evaluations, suggesting that these methods assess different aspects and highlighting the need for nuanced evaluation methodologies. The effectiveness of contextualized AI-generated counterspeech and the divergence between human and algorithmic evaluations underscore the importance of increased human-AI collaboration in content moderation.
In China, the number of riders in the on-demand delivery industry has surpassed ten million. Ensuring that these riders earn a decent income can enhance their financial security, reduce poverty, and promote social equity and stability. Due to ease of use, lower-cost maintenance and environmental friendliness, electric bicycles (e-bikes) are the primary mode of transportation for delivery riders. However, these riders frequently encounter depleted batteries due to limited capacity and prolonged charging times, necessitating inconvenient swaps or recharges during deliveries. To address this issue, we propose the e-bike Battery Swap-as-a-Service (eBaaS), an innovative battery-swapping system that leverages an intelligent AIoT network for seamless battery swapping at distributed locations across urban areas. eBaaS integrates edge-cloud collaboration, battery resource allocation, battery anomaly detection, and battery range prediction to minimize downtime and reduce unnecessary mileage. While eBaaS's potential benefits are evident, there has been a lack of robust methods to quantify its impact. Thus, we further developed the eBaaS Impact Evaluation Method (EIEM), the first comprehensive model to address this gap. EIEM analyzes data from approximately 260,000 delivery riders and 5 million riding trajectories. Findings indicate that eBaaS reduces average invalid mileage by 6 km and increases the order volume by an average of over 20% daily per e-bike rider. Meanwhile, the annual electricity savings result in a reduction of 2.74 million kilograms of carbon emissions for 260,000 riders. The eBaaS system is therefore significantly beneficial for environmental conservation and sustainable urban development.
Hate speech on social media threatens the mental and physical well-being of individuals and contributes to real-world violence. Resharing is an important driver behind the spread of hate speech on social media. Yet, little is known about who reshares hate speech and what their characteristics are. In this paper, we analyze the role of user characteristics in hate speech resharing across different types of hate speech (e.g., political hate). For this, we first cluster hate speech posts using large language models into different types of hate speech. Then we model the effects of user attributes on users' probability to reshare hate speech using an explainable machine learning model. To do so, we apply debiasing to control for selection bias in our observational social media data and further control for the latent vulnerability of users to hate speech. We find that, all else equal, users with fewer followers, fewer friends, fewer posts, and older accounts share more hate speech. This shows that users with little social influence tend to share more hate speech. Further, we find substantial heterogeneity across different types of hate speech. For example, racist and misogynistic hate is spread mostly by users with little social influence. In contrast, political anti-Trump and anti-right-wing hate is reshared by users with larger social influence. Overall, understanding the factors that drive users to share hate speech is crucial for detecting individuals at risk of engaging in harmful behavior and for designing effective mitigation strategies. Disclaimer: This work contains terms that are offensive and hateful.
A question-answering (QA) simulator is a model that simulates human students QA behaviors. By leveraging QA history to estimate the probability of correctly answering a newly recommended question, the simulator enables the educational recommender systems to be trained in a simulated environment, protecting human students from the potential negative impact of low-quality recommendations. Despite its significant importance, the construction of QA simulators has not been thoroughly explored in the research domain of AI. Previous methods mainly rely on existing knowledge tracing (KT) models to construct such a simulator. However, due to the discrepancy between the KT task and the simulation task, those KT-based simulators suffer from severe bias accumulation, which limits the effectiveness of the simulation. In this paper, we propose a method called Diffusion-based Simulator (DSim), which takes advantage of diffusion to alleviate the bias accumulation. To our knowledge, DSim is the first to focus on building a QA simulator.
Misinformation is a significant societal issue with potentially severe consequences. It appears in text, image, audio, and video modalities, encompassing various categories such as unimodal deception (fact-conflicting, AI-generated & offensive content) and cross-modal inconsistencies. However, current detection approaches often focus on text and image, overlooking the growing prevalence of misinformation in audio and video content. Moreover, these methods typically tend to address only one or two types of misinformation, failing to address all categories simultaneously. These detectors are also usually designed to make judgments without providing explanations, reducing transparency and limiting their broader applicability. To address these issues, we propose MDAM3, a Misinformation Detection and Analysis Framework for Multitype Multimodal Media. MDAM3 analyzes each input in internal detection and examines relationships across modalities to identify inconsistencies. It utilizes web resources and integrates Large Vision-Language Models (LVLMs) to deliver accurate detection results along with detailed analysis. To evaluate MDAM3, we curate MDAM3-DB, a specialized multitype multimodal misinformation dataset. A user study is conducted to explore MDAM3's usability, interpretability, and effectiveness. We hope this research contributes to advancing misinformation detection methodologies and provides valuable insights for developing robust multimodal analysis tools.