Large language models (LLMs) have taken the world by storm, revolutionizing the use of AI in products. While scaling laws demonstrate that larger models yield better results, making them work in production is hard, often due to latency demands on inference. In this proposed tutorial, we will share optimizations - both algorithmic and systems-related - that help leverage LLMs (both small and large) for recommendation and generative AI use cases at planet scale for the world's largest professional network - LinkedIn. In the first part of the tutorial, we will discuss state-of-the-art (SOTA) model quantization and pruning techniques. This will be in conjunction with a discussion on GPU kernel-level optimizations including minimizing memory copying, effectively utilizing shared memory, optimizing thread scheduling, and maximizing parallel efficiency. We will discuss our own experience with these inventing and leveraging such techniques, while also discussing the latest advancements from other enterprises and the open source world. Our discussions will cover models ranging in size from 1 billion to 100 billion+ parameters. In the second part of the tutorial, we will discuss the latest advancements in the world of LLM knowledge distillation which can result in training very powerful and performant small language models (SLMs). We will also discuss effective instruction tuning and preference alignment techniques that help with improving accuracy and quality of results for generative use cases. Finally, we will discuss actual production use cases that benefit from the aforementioned techniques at planet scale for LinkedIn.
This tutorial will explore the fascinating domain of empirical network modeling through artificial intelligence (AI) techniques, with applications across social media, web systems, and urban environments. Participants will gain valuable insights into incorporating advanced AI methods-such as graph machine learning, deep reinforcement learning, and generative models-within complex network science. The goal is to provide a comprehensive understanding of how these models can effectively represent, predict, and control empirical networked systems with heterogeneous structures and dynamic processes. The tutorial will begin by introducing essential background knowledge, outlining motivations and challenges, exploring recent methodological advances, and highlighting key applications.
Graph representation learning has become central to many graph-based tasks, driving advancements in various domains such as web search, recommendation systems, and social network analysis. Traditionally, these methods rely on end-to-end supervised learning paradigms that require abundant labeled data, which can be costly and difficult to obtain. To address this limitation, few-shot learning on graphs has emerged as a promising approach, allowing models to generalize with minimal supervision and overcome data scarcity in real-world applications. This tutorial offers an in-depth exploration of recent advancements in few-shot learning for graphs, providing a comparative analysis of state-of-the-art methods and identifying future research directions. We categorize these approaches into two main taxonomies: (1) a problem taxonomy that examines various types of data scarcity problems and their applications, and (2) a technique taxonomy that outlines key strategies for tackling these challenges, including meta-learning, pre-training methods from both the pre-LLM and LLM eras. The tutorial will conclude by summarizing key insights from the literature and discussing future avenues for research, aiming to equip participants with a deep understanding of few-shot learning on graphs and inspire innovation in this rapidly growing field.
Recommendation models typically follow a discriminative paradigm, predicting whether items should be retrieved. While effective, the expressive capabilities of these recommender systems are limited. Users can only passively browse the recommended items rather than actively express their needs and engage in an interactive experience. With recent advances in generative models such as large language models, a paradigm shift is happening in the study of recommender systems. Researchers propose building generative recommendation models either by aligning pre-trained generative models with user behaviors or designing recommendation models within a generative framework. These models enable the systems to receive and deliver more human-like content, such as natural language, images, and beyond. In this tutorial, we first provide an overview of the latest progress in generative recommendation models, covering approaches based on large language models, semantic IDs, diffusion models, and more. We then make an in-depth discussion on the challenges, open questions, and potential future directions in developing generative recommendation models.
In the current digital era, Deep Recommender Systems (DRS) are essential for navigating and tailoring online content to individual preferences. However, conventional approaches that rely primarily on a single recommendation task, scenario, data modality, or user behavior are increasingly inadequate for capturing users' complex and evolving preferences. This limitation highlights the need for joint modeling approaches that integrate multiple tasks, scenarios, modalities, and behaviors within the recommendation process, enhancing recommendation precision, efficiency, and personalization. In this tutorial, we aim to give a comprehensive survey on the recent progress of the joint modeling methods in recommendations, which includes multi-task, multi-scenario, multi-modal, and multi-behavior modeling. This work will provide academic researchers and industry professionals with a thorough understanding and clear insight into these areas, sparking new ideas, fostering discussions, and driving technological advancements in the field of deep recommender systems. Our tutorial homepage is released online https://applied-machine-learning-lab.github.io/Joint-Modeling-in-Deep-Recommender-Systems-WWW2025/.
Personalization stands as the cornerstone of recommender systems (RecSys), striving to sift out redundant information and offer tailor-made services for users. However, the conventional cloud-based RecSys necessitate centralized data collection, posing significant risks of user privacy breaches. In response to this challenge, federated recommender systems (FedRecSys) have emerged, garnering considerable attention. FedRecSys enable users to retain personal data locally and solely share model parameters with low privacy sensitivity for global model training, significantly bolstering the system's privacy protection capabilities. Within the distributed learning framework, the pronounced non-iid nature of user behavior data introduces fresh hurdles to federated optimization. Meanwhile, the ability of federated learning to concurrently learn multiple models presents an opportunity for personalized user modeling. Consequently, the development of personalized FedRecSys (PFedRecSys) is crucial and holds substantial significance. This tutorial seeks to provide an introduction to PFedRecSys, encompassing (1) an overview of existing studies on PFedRecSys, (2) a comprehensive taxonomy of PFedRecSys spanning four pivotal research directions-client-side adaptation, server-side aggregation, communication efficiency, privacy and protection, and (3) exploration of open challenges and promising future directions in PFedRecSys. This tutorial aims to establish a robust foundation and spark new perspectives for subsequent exploration and practical implementations in the evolving realm of RecSys.
Although Large Language Models (LLMs) have revolutionized natural language processing, they face significant challenges in web applications, including hallucinations, outdated knowledge, and limited specialization in niche domains. To address these issues, this tutorial explores how integrating retrieval mechanisms and structured knowledge can enhance LLM performance for web use. By leveraging Retrieval-Augmented Generation (RAG), we can ground LLM outputs with relevant external data, mitigating limitations in applications like search engines, chatbots, and recommendation systems. We delve into text structuring techniques-such as taxonomy construction, multi-level text classification, and taxonomy-guided information retrieval-that improve the effectiveness of information retrieval processes. Furthermore, we examine how structure-guided augmented generation through information extraction and knowledge graph construction can reduce hallucinations and enhance factual accuracy. By bridging the gap between unstructured language models and structured knowledge, we aim to unlock new potentials for dynamic web content generation and personalized user experiences. Finally, we highlight future directions for seamlessly integrating retrieval and generation, enhancing personalization, and incorporating multimodal data to expand the capabilities of LLMs in web applications.
Time series analysis has become instrumental in tackling key challenges in web-based applications, such as server load balancing, anomaly detection in e-commerce traffic, and tourism demand forecasting. This proposal introduces a comprehensive half-day lecture-style tutorial for The Web Conference 2025, tailored to professionals, researchers, and practitioners aiming to harness machine learning for analyzing web-sourced time series data. The tutorial will cover foundational principles, data processing techniques, and advanced modeling strategies, equipping attendees with both theoretical understanding and practical skills. Participants will also explore best practices for integrating time series analysis into web-centric workflows, with diverse applications in e-commerce, digital health, and transportation. Led by leading experts in the field, this tutorial provides an invaluable opportunity to deepen knowledge, gain hands-on experience, and foster meaningful connections-bridging the gap between theory and real-world implementation.
Large language models (LLMs), due to their emergent ability and generalizability, are making new waves in developing Artificial General Intelligence (AGI). However, LLMs are black-box models, which often fall short of capturing and accessing factual knowledge. In contrast, Knowledge Graphs (KGs) are structured knowledge models that explicitly store rich factual knowledge. KGs can enhance LLMs by providing external knowledge for inference and interpretability. Meanwhile, KGs are difficult to construct and evolve by nature, which challenges the existing methods in KGs to generate new facts and represent unseen knowledge. Therefore, it is complementary to integrating LLMs and KGs together and simultaneously leveraging their advantages to achieve AGI's ultimate goal: to reason, adapt, and synthesize knowledge with human-level nuance and factual accuracy.
This tutorial aims to bridge this gap by presenting a comprehensive overview of the unification of LLMs and KGs for next-level AGI. Specifically, we will cover three key frameworks: (1) KG-enhanced LLMs, which focus on augmenting LLMs with KGs for pre-training, fine-tuning, and inference, thereby enriching the LLMs' factual and contextual accuracy; (2) LLM-augmented KGs, which leverage LLMs to assist in tasks such as KG completion, construction, and question answering, ultimately facilitating KG scalability and adaptability; and (3) Synergized LLM-KG Systems and Applications, where LLMs and KGs function symbiotically to enable real-time, bidirectional reasoning, transforming static knowledge structures into dynamic, AGI-driven frameworks. Through this tutorial, participants will gain a structured understanding of the architectures, underlying methodologies, and key advancements in LLM-KG unification, alongside insights into current real-world applications and challenges. We will also explore future research directions, encouraging the development of innovative AGI systems that are not only knowledgeable but also faithful in reasoning. This tutorial will empower researchers and practitioners to unlock the next level of AGI by integrating the strengths of LLMs and KGs into cohesive, intelligent systems.
Computational advertising is one of the most successful application scenarios of machine learning and artificial intelligence. This tutorial is designed to review the latest progress of several critical areas in computational advertising: matching, prediction, auction and bidding. Particularly, with the recent advances in generative AI such as large language models, there is a growing interest in further enhancing these areas with these techniques. In this tutorial, we first introduce the recent advances in matching, including its architecture alternatives, model developments, and how it co-evolves with the ad products which distinguishes itself from that in recommendation products. We then review the recent advances in prediction, with a focus on topics such as feature interactions, user interest models, and multi-task/domain learning. We will show how these building bricks constitute large prediction models and LLM-enhanced/LLM-based prediction models. Then, we discuss auction and bidding, a unique area in computational advertising. Both traditional and learning-based auctions will be introduced, followed by their applications in real-world ad products. Given the auction designs, we show how bidding evolves from control-based, to reinforcement learning-based, and most recently to generative AI-based. Our aim is to help the audience grasp the recent developments in computational advertising, as well as to spark inspiration for future research.
Integrating invariance into data representations is a principled design in intelligent systems and web applications. Representations play a fundamental role, where systems and applications are both built on meaningful representations of digital inputs (rather than the raw data). In fact, the proper design/learning of such representations relies on priors w.r.t. the task of interest. Here, the concept of symmetry from the Erlangen Program may be the most fruitful prior - informally, a symmetry of a system is a transformation that leaves a certain property of the system invariant. Symmetry priors are ubiquitous, e.g., translation as a symmetry of the object classification, where object category is invariant under translation.
The quest for invariance is as old as pattern recognition and data mining itself. Invariant design has been the cornerstone of various representations in the era before deep learning, such as the SIFT. As we enter the early era of deep learning, the invariance principle is largely ignored and replaced by a data-driven paradigm, such as the CNN. However, this neglect did not last long before they encountered bottlenecks regarding robustness, interpretability, efficiency, and so on. The invariance principle has returned in the era of rethinking deep learning, forming a new field known as Geometric Deep Learning (GDL).
In this tutorial, we will give a historical perspective of the invariance in data representations. More importantly, we will identify those research dilemmas, promising works, future directions, and web applications.
In the era of information overload, recommendation systems play a pivotal role in filtering data and delivering personalized content. Recent advancements in feature interaction and user behavior modeling have significantly enhanced the recall and ranking processes of these systems. With the rise of large language models (LLMs), new opportunities have emerged to further improve recommendation systems. This tutorial explores two primary approaches for integrating LLMs: LLMs-enhanced recommendations, which leverage the reasoning capabilities of general LLMs, and generative large recommendation models, which focus on scaling and sophistication. While the former has been extensively covered in existing literature, the latter remains underexplored. This tutorial aims to fill this gap by providing a comprehensive overview of generative large recommendation models, including their recent advancements, challenges, and potential research directions. Key topics include data quality, scaling laws, user behavior mining, and efficiency in training and inference. By engaging with this tutorial, participants will gain insights into the latest developments and future opportunities in the field, aiding both academic research and practical applications. The timely nature of this exploration supports the rapid evolution of recommendation systems, offering valuable guidance for researchers and practitioners alike.
Graph machine learning has been extensively studied in both academia and industry. Although booming with a vast number of emerging methods and techniques, most of the literature is built on the in-distribution (I.D.) hypothesis, i.e., testing and training graph data are sampled from the identical distribution. However, this I.D. hypothesis can hardly be satisfied in many real-world graph scenarios where the model performance substantially degrades when there exist distribution shifts between testing and training graph data. To solve this critical problem, several advanced graph machine learning techniques which go beyond the I.D. hypothesis, have made great progress and attracted ever-increasing attention from the research community. This tutorial is to disseminate and promote the recent research achievement on graph out-of-distribution adaptation, graph out-of-distribution generalization, and large language models for tackling distribution shifts, which are exciting and fast-growing research directions in the general field of machine learning and data mining. We will advocate novel, high-quality research findings, as well as innovative solutions to the challenging problems in graph machine learning under distribution shifts and the applications on graphs. This topic is at the core of the scope of The Web Conference, and is attractive to machine learning as well as data mining audience from both academia and industry.
Over the past two years, generative AI (GenAI) has evolved rapidly, influencing interdisciplinary fields including social and e-commerce Recsys. Despite several exciting research advances, landing GenAI innovations in real-world Recsys remains challenging due to the sophistication of modern industrial product and systems. Our tutorial begins with a brief overview of industrial Recsys and GenAI fundamentals (including LLMOps), followed by the ongoing efforts and opportunities to enhance existing Recsys data and model with foundation models.
We then explore how GenAI's curation and reasoning capabilities can be integrated into Recsys-for example, by repurposing raw content, incorporating external knowledge for display and creative optimization, and generating personalized insights/explanations to foster transparency and trust. Following this, the tutorial highlights how AI agents can reshape Recsys by introducing interactive reasoning and action loops, moving beyond traditional passive feedback models. Lastly, we shed insights on real-world solutions for human-AI alignment and responsible GenAI practices.
A key feature of the tutorial is the presentation of the holistic AI, Infrastructure, LLMOps, and Product roadmap, including evaluation and responsible AI practices, based on production solutions from LinkedIn, Amazon, Meta, TikTok, Microsoft, and Instacart. Real-world case studies and demos are further provided for illustration. While GenRecsys is still in its early stages, this tutorial provides valuable insights and practical strategies for the Recsys and GenAI communities, bridging scientific research and applied solutions in this novel and rapidly growing field.
The recent development of Web Intelligence has heightened privacy concerns among end-users. Federated intelligence offers a novel approach to restructuring Web Intelligence within a federated setting to better protect privacy. Additionally, the advent of large foundation models has notably enhanced the capability of individual agents to address complex problems and has reshaped the Web ecosystem. Numerous large language models and domain-specific foundation models underpin various applications, linking end-users to the connected Web. This tutorial presents federated intelligence as a strategy to develop a Web-based collective intelligence system by leveraging existing foundations and applications. In this framework, agents collaboratively enhance their intelligence by acquiring complementary knowledge and making fine-grained adaptations, enabling them to manage complex tasks across diverse web environments. The tutorial provides an in-depth review of recent advancements and potential directions in federated intelligence, detailing its core concepts, applications, and future developments. It is designed for scholars, practitioners, and interested audiences, offering a concise overview that facilitates quick understanding and encourages meaningful discussions on its future evolution.
Trustworthy AI is crucial for web applications, since it ensures user data privacy, enhances security, and fosters user confidence. As web applications increasingly rely on AI for personalization and decision-making, maintaining transparency and accountability becomes essential to prevent bias, misinformation, and unethical practices. By building trust, developers can create safer and more reliable experiences, ultimately promoting user engagement and satisfaction. However, when dataset sizes grow bigger with the rapid web data collection, it is laborious and expensive to obtain perfect data (e.g., clean, safe, and balanced data). As a result, the volume of imperfect data becomes enormous, e.g., web-scale image and speech data with noisy labels, images with specific noise, and long-tail-distributed data. However, standard learning methods assumes that the supervised information is fully correct and intact. Therefore, imperfect data harms the performance of most of the standard learning algorithms, and sometimes even makes existing algorithms break down. In this tutorial, we focus on the algorithmic design of trustworthy AI when facing three types of imperfect data: noisy data, adversarial data, and long-tailed data in the real-world web applications.
Graph data is extensively utilized across various domains, owing to its capacity to represent complex structural relationships among diverse real-world entities. However, the rapid expansion of graph data introduces significant challenges in terms of storage, transmission, and the training of graph neural networks (GNNs) for effective graph data analysis. In light of these challenges, graph condensation (GC) has emerged as a data-centric solution, synthesizing a compact yet representative graph to replace the original large graph in GNN training. These GNNs trained on condensed graphs can achieve performance comparable to models trained on full-scale data, attracting substantial attention and stimulating extensive research. In response to this trend, this tutorial provides a comprehensive and up-to-date overview of GC research. It systematically categorizes existing studies into five categories aligned with critical GC evaluation criteria: effectiveness, generalization, efficiency, fairness, and robustness. Additionally, we will provide an in-depth analysis of two fundamental components of GC: optimization strategies and condensed graph generation, elucidating their key characteristics and underlying technologies. Finally, this tutorial will explore GC applications across various fields and outline potential directions for future research in this rapidly evolving and impactful domain. To help the audience gain a systematic understanding of the topics and technologies covered in this tutorial, we present further details in our recent survey paper https://arxiv.org/abs/2401.11720 and GC toolkit https://arxiv.org/abs/2405.14246
Query understanding in Conversational Information Seeking (CIS) involves accurately interpreting user intent through context-aware interactions. This includes resolving ambiguities, refining queries, and adapting to evolving information needs. Large Language Models (LLMs) enhance this process by interpreting nuanced language and adapting dynamically, improving the relevance and precision of search results in real-time. In this tutorial, we explore advanced techniques to enhance query understanding in LLM-based CIS systems. We delve into LLM-driven methods for developing robust evaluation metrics to assess query understanding quality in multi-turn interactions, strategies for building more interactive systems, and applications like proactive query management and query reformulation. We also discuss key challenges in integrating LLMs for query understanding in conversational search systems and outline future research directions. Our goal is to deepen the audience's understanding of LLM -based conversational query understanding and inspire discussions to drive ongoing advancements in this field.
Deep learning techniques have demonstrated impressive effectiveness across a wide array of web applications. Notably, graph neural networks (GNNs) and large language models (LLMs) have become essential tools for modeling the extensive graph-structured data and text/language data that populate the web. Despite their success, the advancement of these methods is frequently hampered by resource constraints. Key challenges include the scarcity of labeled data (data-level constraints) and the demand for smaller model sizes suitable for real-world computing environments (model-level constraints). Addressing these issues is crucial for the effective and efficient deployment of models across various real-world web systems and applications, such as social networks, search engines, recommender systems, question answering, and content analysis. Therefore, there is an urgent need to develop innovative and efficient learning techniques that can overcome these resource limitations from both data and model perspectives.
In this lecture-style tutorial, we will focus on state-of-the-art approaches in resource-efficient learning, specifically exploring a range of data- and model-efficient methods for GNNs and LLMs, along with their practical applications in web contexts. Our objectives for this tutorial are threefold: (1)to categorize challenges in resource-efficient learning and discuss data and model constraints; (2) to provide a comprehensive review of existing methods and recent advances in resource-efficient learning, particularly concerning GNNs and LLMs; and (3) to highlight open questions and potential future research directions in this rapidly evolving field. Together, these objectives will provide participants with a comprehensive understanding of resource-efficient learning for GNNs and LLMs, its challenges, and its potential for future advancements.
Human mobility analytics is essential to enabling a broad range of web-related applications, such as navigation, urban planning, and point-of-interest (POI) recommendation. The proliferation of mobility data, including geo-social media check-ins and geo-location data, offers unprecedented opportunities for analyzing human mobility. This lecture-style tutorial offers an in-depth look at web-centric human mobility analytics, organized according to three levels: location-level, individual-level, and macro-level. Location-level analytics focus on spatial activities within specific geographical locations, using points of interest and other data to forecast future visits and identify urban mobility patterns. Individual-level analytics delve into the movements of individuals, e.g., considering sequences of visited locations over time, elucidating individual movement behaviors. Macro-level analytics broaden the scope of analyses to include large-scale spatial patterns and population flows across regions, offering a macro perspective on mobility. The tutorial encompasses cutting-edge learning frameworks such as federated learning as well as continual learning and innovative applications of Large Language Models (LLMs), which enhance predictive analytics and expand the capabilities of mobility analysis. The tutorial aims to afford the participants a comprehensive overview of the current state and future directions of web-centric human mobility analytics, making it an invaluable resource for using web-sourced human mobility data to facilitate a more informed and interconnected world. The video teaser is available at https://shorturl.at/HShNc.
From 1919 to 1934, the socialist city government of Vienna fostered a remarkable increase in art, public architecture, and scholarship despite desperate economic conditions. Throughout this period, the Vienna Circle, a philosophical discussion group, examined with new rigor the question of what can be known. Their work built the theoretical foundations of computing. Much of this work was carried out in Vienna's distinctive cafés. This was not a quaint idiosyncrasy; the cafés were in the business of amiability. Parallels to the early Web and its precursors are not difficult to find, and the collapse of Red Vienna may parallel the current predicament of the Web.
This article explores the history of images on the Web, starting with a definition of 'image' and an overview of human-created images, from Stone Age paintings to portraits. It then presents essential milestones in developing digital images, such as the first digitized photograph, the evolution of digital cameras, image editor software, and image file formats. Finally, it explores the history of images on the Web, recounting the story of the first photograph on the network and the technological advancements leading up to the present day, including social media platforms for photo sharing and the emergence of AI-generated imagery. The article concludes with a brief discussion on the future of images on the Web.
Machine common sense (MCS)-the challenge of enabling computers to grasp everyday human knowledge-has been a grand challenge in Artificial Intelligence (AI) since the 1950s. While recent advances in large language models have led to impressive progress, there is still no consensus on how much common sense today's AI actually possesses. In this brief review, we revisit the historical development of MCS in the context of the Web, examining how the Web's evolution-from early knowledge representation efforts to knowledge graphs, the Semantic Web, and crowdsourcing-has shaped MCS research. We argue that key breakthroughs in Web technologies were instrumental in addressing longstanding challenges of scale and coverage in commonsense reasoning. At the same time, MCS research has influenced the development of core Web applications, including intelligent agents, plausibility-based reasoning, and robust evaluation of black-box AI systems.
The history of the web has been defined by a series of revolutionary developments, from the advent of HTML and web browsers to the explosion of social media and cloud computing. Throughout this evolution, human actors have played pivotal roles, serving as developers, technologists, and innovators who have guided the trajectory of the Internet. However, the emergence of Large Language Models (LLMs), such as OpenAI's GPT series, represents a paradigm shift. These AI systems are not merely tools for humans but are increasingly becoming autonomous agents that shape the flow of information on the web. In this paper, we argue that LLMs can be framed as historical actors, meaning they are not only facilitators of human endeavors but also independent forces that influence the web's evolution. We explore how these AI systems direct the flow of information, shape public discourse, and may influence future interpretations of web-based history.
As digital content becomes increasingly ubiquitous, the need for robust watermark removal techniques has grown due to the inadequacy of existing embedding techniques, which lack robustness. This paper introduces a novel Saliency-Aware Diffusion Reconstruction (SADRE) framework for watermark elimination on the web, combining adaptive noise injection, region-specific perturbations, and advanced diffusion-based reconstruction. SADRE disrupts embedded watermarks by injecting targeted noise into latent representations guided by saliency masks although preserving essential image features. A reverse diffusion process ensures high-fidelity image restoration, leveraging adaptive noise levels determined by watermark strength. Our framework is theoretically grounded with stability guarantees and achieves robust watermark removal across diverse scenarios. Empirical evaluations on stateof-the-art (SOTA) watermarking techniques demonstrate SADRE's superiority in balancing watermark disruption and image quality. SADRE sets a new benchmark for watermark elimination, offering a flexible and reliable solution for real-world web content. Code is available on https://github.com/inzamamulDU/SADRE.
Column Type Annotation (CTA) is the process of assigning semantic types to the columns of a table. Models and approaches for CTA are typically evaluated on distinct datasets, making benchmarking and comparisons not trivial. In this paper, we identify major issues in commonly-used datasets for CTA and explore their impact on the evaluation process and reported results. Drawing from our analysis, we propose changes towards more robust and unbiased benchmarking practices.
We present a comprehensive study on the emergence of Computational Social Science (CSS) - an interdisciplinary field leveraging computational methods to address social science questions - and its impact on adjacent social sciences. We trained a robust CSS classifier using papers from CSS-focused venues and applied it to 11 million papers spanning 1990 to 2021. Our analysis yielded three key findings. First, there were two critical inflections in the rise of CSS. The first occurred around 2005 when psychology, politics, and sociology began engaging with CSS. The second emerged in approximately 2014 when economics finally joined the trend. Sociology is currently the most engaged with CSS. Second, using the density of yearly knowledge embeddings constructed by advanced transformer models, we observed that CSS initially lacked a cohesive identity. From the early 2000s to 2014, however, it began to form a distinct cluster, creating boundaries between CSS and other social sciences, particularly in politics and sociology. After 2014, these boundaries faded, and CSS increasingly blended with the social sciences. Third, shared data-driven methods homogenized CSS papers across disciplines, with politics and economics showing the most alignment due to the combined influence of CSS and causal identification. Nevertheless, non-CSS papers in sociology, psychology, and politics became more divergent. Taken together, these findings highlight the dynamics of division and unity as new disciplines emerge within existing knowledge landscapes. A live demo of CSS evolution can be found at https://evolution-css.netlify.app/
Psychological assessment tools have long helped humans understand behavioural patterns. While Large Language Models (LLMs) can generate content comparable to that of humans, we explore whether they exhibit personality traits. To this end, this work applies psychological tools to LLMs in diverse scenarios to generate personality profiles. Using established trait-based questionnaires such as the Big Five Inventory and by addressing the possibility of training data contamination, we examine the dimensional variability and dominance of LLMs across five core personality dimensions: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. Our findings reveal that LLMs exhibit unique dominant traits, varying characteristics, and distinct personality profiles even within the same family of models.
Multi-grained alignment for text-video retrieval takes advantage of both coarse- and fine-grained semantic alignments, which can enhance the representation capabilities of cross-modal information retrieval. However, the current research fails to fully explore the informative semantics embedded in fine-grained levels of both text and video spaces. In this work, we propose a Selective Multi-Grained Alignment framework, termed SMA, which achieves video-sentence and object-phrase alignments through coarse- and fine-grained alignment modules. Specifically, we introduce a token aggregation module, which implicitly selects important patches per frame and words in text and then aggregates them to generate object-level features for frames and phrase-level features for text. Moreover, we apply a similarity-aware selection module to select keyframes in video efficiently. Extensive experiments on four benchmarks demonstrate that our SMA achieves superior performance on MSR-VTT (52.8%), MSVD (48.2%), ActivityNet (48.9%), and DiDeMo (50.1%). The code of this paper is available at https://github.com/bzy-source/SMA.
The Open Wikipedia Ranking is an open dataset published yearly, containing the ranking of Wikipedia pages with respect to centrality measures computed on the whole Wikipedia graph for that year. In this paper, ten years after its start, we report some details, results and anecdotal observations on this dataset. The goal of the Open Wikipedia Ranking is to provide a completely open and reproducible ranking of Wikipedia pages based on indegree, PageRank, harmonic centrality, and page views; the Wikipedia graphs themselves are also made available by the Laboratory of Web Algorithmics. What characterizes the Open Wikipedia Ranking is that the whole graph construction and ranking process are meticulously documented and reproducible. All computations are based on open-source Java software and algorithms from the literature. Thus, the reason of the centrality score of pages can be exactly traced back to structural graph properties.
Detecting anomalies in system logs has been an active research topic because of its importance in detecting system faults and novel attacks. Recently, many log anomaly detection approaches equipped with deep learning techniques have demonstrated great success. However, the vulnerability to backdoor attacks of these approaches is under-exploited. In this paper, we study how to inject a backdoor into self-supervised log anomaly detection models, i.e., making abnormal logs evade detection. To ensure stealth, we first design a trigger pattern without including any abnormal log entries. Then, we revise a learning objective that can inject the trigger into anomaly detection models. After deployment, if abnormal logs are hidden within the trigger, the backdoored log anomaly detection models could classify them as normal. We conduct backdoor attacks against two well-established self-supervised log anomaly detection models, DeepLog and LogBERT. Experimental results demonstrate the effectiveness of our method in making these models predict abnormal log entries as normal ones.
Misinformation surrounding emerging outbreaks poses a serious societal threat, making robust countermeasures essential. One promising approach is stance detection (SD), which identifies whether social media posts support or oppose misleading claims. In this work, we finetune classifiers on COVID-19 misinformation SD datasets consisting of claims and corresponding tweets. Specifically, we test controllable misinformation generation (CMG) using large language models (LLMs) as a method for data augmentation. While CMG demonstrates the potential for expanding training datasets, our experiments reveal that performance gains over traditional augmentation methods are often minimal and inconsistent-primarily due to built-in safeguards within LLMs. We release our code and datasets to facilitate further research on misinformation detection and generation.
Large Language Models (LLMs) are known to hallucinate and generate non-factual outputs which can undermine user trust. Traditional methods to directly mitigate hallucinations, such as representation editing and contrastive decoding, often require additional training data and involve high implementation complexity. While ensemble-based approaches harness multiple LLMs to tap into the ''wisdom of crowds'', these methods overlook uncertainties in individual model responses. Recent studies reveal that uncertainty estimation can enable LLMs to self-assess the likelihood of generating hallucinations. In this work, we focus on factoid question answering (QA) and observe that LLMs accuracy and self-assessment capabilities vary widely with different models excelling in different scenarios. Leveraging this insight, we propose Uncertainty-Aware Fusion (UAF), an ensemble framework to reduces hallucinations by strategically combining multiple LLM based on their accuracy and self-assessment abilities. Empirical results on several public benchmark datasets show that UAF outperforms state-of-the-art hallucination mitigation methods by 8% in factual accuracy, while either narrowing or surpassing the performance gap with GPT-4.
Secretary problem is one of the most widely studied online stochastic models, in which an employer wants to hire the best candidate from n candidates who arrive in a random order. It is well-known that the optimal success probability is 1/e. However, in reality, things are more complex because employers often have interviewers in different cities, interviewing candidates in a distributed manner. This motivates us to study the secretary problem with multiple queues. Feldman and Tennenholtz [EC 2012] studied this assuming the candidates are distributed evenly. In particular, when there are two even queues, the optimal success probability is 1/4. In this work, we move to the general problem when the queues are arbitrary and design the optimal online algorithm for the case of two queues. Our results characterize the exact success probability curve, connecting the cases of a single queue and two equal queues. Our technique is grounded on the linear programming framework introduced by Buchbinder et al. [Math. Oper. Res. 2014] and a novel analysis.
Fairness has often been seen as an ethical concern that needs to be considered at some cost on the utility. In contrast, in this work, we formulate fairness, and especially fairness in ranking, as a way to avoid unjust biases and provide a more accurate ranking that results in improvement on the actual unbiased utility. With this in mind, we design a fairness measure that, instead of blindly forcing some approximate equality constraint, checks if the outcome is plausible in a just world. Our fairness measure asks a simple and fundamental statistical question: ''What is the chance of observing this outcome in an unbiased world?''. If the chance is high enough, the outcome is fair. We provide a dynamic programming algorithm that, given a ranking, calculates our fairness measure. Secondly, given a sequence of potentially biased scores, along with the sensitive feature, we provide a fair ranking algorithm based on our fairness measure. Finally, we run some experiments to understand the behavior of our ranking algorithm against other fundamental algorithms.
This paper introduces a novel concept and implementation of an ontology proxy designed to seamlessly enhance accessibility and reliability of the Web of Ontologies by addressing challenges such as link rot, evolution inconsistencies, and communication failures. The proxy features a time travel mode, powered by DBpedia Archivo, that provides access to archived and versioned snapshots of ontologies. This enables failure recovery and the emulation of a consistent state in time, supporting reproducible research and enhancing the FAIRness of ontologies and associated (meta)data in a plug-and-play manner. Initial evaluations show significant improvements in ontology retrieval success rates, underscoring the proxy's potential as a viable interface for breaking accessibility barriers.
The widespread dissemination of fake news on social media poses significant risks, necessitating timely and accurate detection. However, existing methods struggle with unseen news due to their reliance on training data from past events and domains, leaving the challenge of detecting novel fake news largely unresolved.
To address this, we identify biases in training data tied to specific domains and propose a debiasing solution FNDCD. Originating from causal analysis, FNDCD employs a reweighting strategy based on classification confidence and propagation structure regularization to reduce the influence of domain-specific biases, enhancing the detection of unseen fake news. Experiments on real-world datasets with non-overlapping news domains demonstrate FNDCD's effectiveness in improving generalization across domains.
Graph coarsening is a reduction technique that approximates a larger graph to a smaller tractable graph. A good quality graph representation with specific properties is needed to achieve good performance with downstream applications. However, existing coarsening methods could not coarsen graphs with desirable properties, such as sparsity, tree, bipartite structure, or multi-component structure. This work presents an optimization framework for learning coarsened graphs with desirable multi-component structure. The proposed methods are solved efficiently by leveraging block majorization-minimization, log determinant, and spectral regularization frameworks. Extensive experiments with real benchmark datasets elucidate the proposed framework's efficacy in preserving the structure in coarsened graphs. Empirically, when there is no prior knowledge available regarding the graph's structure, constructing a multicomponent coarsened graph consistently outperforms state-of-the-art methods.
Abnormal behavior detection is crucial in many fields, such as social networks, financial transactions, and cybersecurity. However, it poses significant challenges due to the intricate structural evolution of heterogeneous graphs. To address this issue, we propose a novel method called <u>R</u>elational <u>E</u>volution enhanced <u>A</u>nomaly <u>D</u>etection in dynamic heterogeneous <u>Graph</u> (ReadGraph ). ReadGraph focuses on tracing relation-based dynamic structural evolution to comprehensively capture features related to abnormal behaviors (edges) across different types of nodes. We conduct extensive experiments to evaluate ReadGraph against advanced competitors. It demonstrates that ReadGraph is 13.69% more effective than other methods on average.
Recently, there has been growing research on negative feedback in recommender systems. These studies use a fixed threshold to binarize feedback into positive or negative. However, such an approach bears limitations when the rating habits for expressing disappointment differ across users or when ratings are noisy. Motivated by the remarkable success of Large Language Models (LLMs), we investigate how LLM can address this challenge on the fly. To this end, we present ReFINe, <u>Re</u> surrecting <u>F</u> alsely <u>I</u> dentified <u>Ne</u> gative feedback with LLM. ReFINe classifies the negative feedback into two distinct types: Falsely identified negative with positive signals and Confirmed negative with only negative signals. To the best of our knowledge, our work is the first to propose and demonstrate the distinction between two perspectives on negative feedback. We first leverage LLM to better separate between positive and negative sets for each user, and implement Re-Weighted BPR, a dedicated Bayesian Personalized Ranking loss function tailored to our perspective on negative feedback. Experimental results show that our model outperforms strong baseline models. The code is available at https://github.com/Chanwoo-Jeong-2000/ReFINe.
In diverse professional environments, ranging from academic conferences to corporate earnings calls, the ability to anticipate audience questions stands paramount. Traditional methods, which rely on manual assessment of an audience's background, interests, and subject knowledge, often fall short-particularly when facing large or heterogeneous groups, leading to imprecision and inefficiency. While NLP has made strides in text-based question generation, its primary focus remains on academic settings, leaving the intricate challenges of professional domains, especially earnings call conferences, underserved. Addressing this gap, our paper pioneers the multi-question generation (MQG) task specifically designed for earnings call contexts. Our methodology involves an exhaustive collection of earnings call transcripts and a novel annotation technique to classify potential questions. Furthermore, we introduce a retriever-enhanced strategy to extract relevant information. With a core aim of generating a spectrum of potential questions that analysts might pose, we derive these directly from earnings call content. Empirical evaluations underscore our approach's edge, revealing notable excellence in the accuracy, consistency, and perplexity of the questions generated.
The proliferation of misinformation on social media platforms (SMPs) poses significant threats to public discourse, public health, and democracy, which requires effective moderation strategies. This work investigates the usability of soft moderation techniques, such as warning labels and contextual links, designed to mitigate the spread of misinformation while maintaining user autonomy. Using a mixed methods approach, it analyzes quantitative engagement metrics and qualitative user perceptions across platforms including Instagram, X (formally known as Twitter), TikTok, and Threads.
The results show that contextual links improve credibility and encourage deeper interactions, especially among younger users, while warning labels raise skepticism but may decrease engagement. The user study draws attention to platform-specific dynamics, demographic differences, and the vital role user education and transparency play in enhancing the effectiveness of moderation. This work fills in gaps in the literature by providing practical insights to help SMPs and policymakers develop knowledgeable and reliable digital environments.
Diffusion-based recommender systems (DR) have gained increasing attention for their advanced generative and denoising capabilities. However, existing DR face two central limitations: (i) a trade-off between enhancing generative capacity via noise injection and retaining the loss of personalized information. (ii) the underutilization of rich item-side information. To address these challenges, we present a <u>C</u> ollaborative <u>Diff</u> usion model for <u>Rec</u> ommender System (CDiff4Rec). Specifically, CDiff4Rec generates pseudo-users from item features and leverages collaborative signals from both real and pseudo personalized neighbors identified through behavioral similarity, thereby effectively reconstructing nuanced user preferences. Experimental results on three public datasets show that CDiff4Rec outperforms competitors by effectively mitigating the loss of personalized information through the integration of item content and collaborative signals.
The complexity of legal documents, particularly in financial legal disputes, poses significant challenges for both experts and automated systems. This study introduces a system leveraging Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) technology to recommend key evidence in financial advisor dispute cases. Unlike traditional legal AI tasks focused on outcome prediction, our approach emphasizes supporting judicial reasoning by recommending key evidence relevant to adjudication. We constructed a dataset of 371 annotated cases from Taiwan, spanning 25 years, including claims, judicial opinions, and final judgments, with annotations highlighting key evidence. Using RAG, our system retrieves and generates evidence recommendations grounded in analogous past cases while maintaining temporal and contextual consistency. This methodology enhances judicial efficiency and supports equitable legal decision-making by streamlining the recommendation of critical evidence.
Disinformation, irrespective of domain or language, aims to deceive or manipulate public opinion, typically employing advanced persuasion techniques. Qualitative and quantitative research on the weaponisation of persuasion techniques in disinformation narratives, however, has been mostly limited to specific topics (e.g., COVID-19).To address this gap, our study conducts a large-scale, multi-domain analysis of the role of 16 persuasion techniques in disinformation narratives, by leveraging a state-of-the-art persuasion technique classifier. We demonstrate how different persuasion techniques are employed disproportionately in different disinformation domains. We also include an in-depth case study on climate change disinformation, which demonstrates how linguistic, psychological, and cultural factors shape the adaptation of persuasion strategies to fit unique thematic contexts.
Reinforcement learning has been widely applied in automated bidding. Traditional approaches model bidding as a Markov Decision Process (MDP). Recently, some studies have explored using generative reinforcement learning methods to address long-term dependency issues in bidding environments. Although effective, these methods typically rely on supervised learning approaches, which are vulnerable to low data quality due to the amount of sub-optimal bids and low probability rewards resulting from the low click and conversion rates. Unfortunately, few studies have addressed these challenges. In this paper, we formalize the automated bidding as a sequence decision-making problem and propose a novel Expert-guided Bag Reward Transformer (EBaReT) to address concerns related to data quality and uncertainty rewards. Specifically, to tackle data quality issues, we generate a set of expert trajectories to serve as supplementary data in the training process and employ a Positive-Unlabeled (PU) learning-based discriminator to identify expert transitions. To ensure the decision also meets the expert level, we further design a novel expert-guided inference strategy. Moreover, to mitigate the uncertainty of rewards, we consider the transitions within a certain period as a ''bag'' and carefully design a reward function that leads to a smoother acquisition of rewards. Extensive experiments demonstrate that our model achieves superior performance compared to state-of-the-art bidding methods.
Personalized news headline generation aims to provide users with attention-grabbing headlines that are tailored to their preferences. Prevailing methods focus on user-oriented content preferences, but most of them overlook the fact that diverse stylistic preferences are integral to users' panoramic interests, leading to suboptimal personalization. In view of this, we propose a novel <u>S</u> tylistic- <u>C</u> ontent <u>A</u> ware <u>Pe</u> rsonalized Headline Generation (SCAPE) framework. SCAPE extracts both content and stylistic features from headlines with the aid of large language model (LLM) collaboration. It further adaptively integrates users' long- and short-term interests through a contrastive learning-based hierarchical fusion network. By incorporating the panoramic interests into the headline generator, SCAPE reflects users' stylistic-content preferences during the generation process. Extensive experiments on the real-world dataset PENS demonstrate the superiority of SCAPE over baselines.
Geocoding involves automatic extraction of location coordinates of incidents reported in news articles, and can be used for epidemic intelligence or disaster management. This paper introduces Retrieval-Augmented Coordinate Capture Of Online News articles (RACCOON), an open-source geocoding approach that extracts geolocations from news articles. RACCOON uses a retrieval-augmented generation (RAG) approach where candidate locations and associated information are retrieved in the form of context from a location database, and a prompt containing the retrieved context, location mentions and news articles is fed to an LLM to generate the location coordinates. Our evaluation on three datasets, two underlying LLMs, three baselines and several ablation tests based on the components of RACCOON demonstrate the utility of RACCOON. To the best of our knowledge, RACCOON is the first RAG-based approach for geocoding using pre-trained LLMs.
Successful quantitative investment relies on accurate predictions of the future stock price. Deep learning-based solutions have recently demonstrated a superior ability to capture the intricate and nonlinear interactions among various market variables. However, most existing methods use the same parameters to fit all samples, without considering that the real stock market often exhibits multiple patterns. To alleviate this issue, we propose a novel module called Mixture of Experts with Retrieval-Augmented Representation (MERA). Essentially, MERA consists of a set of independent experts for differentiated modeling as well as a GateNet that dynamically allocates data of different patterns to the most suitable experts. The model backbone is responsible for learning the coarse-grained representations for all stock patterns. Then, each expert in the MERA module focuses on the specific pattern and performs a more fine-grained analysis. However, accurate data allocation remains challenging due to the lack of explicit pattern identifiers. To overcome this, MERA retrieves relevant samples using high-level representations from self-supervised pre-training. The label information of neighbor samples is promising discriminative signals to indicate the target stock pattern. Extensive experiments on real-world stock markets show significant improvements. Our code is released at https://github.com/chenchen1104/MERA.
Modeling user behavior sequences in recommender systems is essential for understanding user preferences over time, enabling personalized and accurate recommendations for improving user retention and enhancing business values. Despite its significance, there are two challenges for current sequential modeling approaches. From the spatial dimension, it is difficult to mutually perceive similar users' interests for a generalized intention understanding; from the temporal dimension, current methods are generally prone to forgetting long-term interests due to the fixed-length input sequence. In this paper, we present Large Memory Network (LMN), providing a novel idea by compressing and storing user history behavior information in a large-scale memory block. With the elaborated online deployment strategy, the memory block can be easily scaled up to million-scale in the industry. Extensive offline comparison experiments, memory scaling up experiments, and online A/B test on Douyin E-Commerce Search (ECS) are performed, validating the superior performance of LMN. Currently, LMN has been fully deployed in Douyin ECS, serving millions of users each day.
Extracting key information from news articles is crucial for advancing search systems. Historically, the 5W1H framework, which organises information based on 'Who', 'What', 'When', 'Where', 'Why', and 'How', has been a predominant method in digital journalism empowering search tools. The rise of Large Language Models (LLMs) has sparked new research into their potential for performing such information extraction tasks effectively. Our study examines a novel approach to employing LLMs in the 5W1H extraction process, particularly focusing on their capacity to mimic human reasoning. We introduce two innovative Chain-of-Thought (COT) prompting techniques to extract 5W1H in news: extractive reasoning and question-level reasoning. The former directs the LLM to pinpoint and highlight essential details from texts, while the latter encourages the model to emulate human-like reasoning at the question-response level. Our research methodology includes experiments with leading LLMs using prompting strategies to ascertain the most effective approach. The results indicate that COT prompting significantly outperforms other methods. In addition, we show that the effectiveness of LLMs in such tasks depends greatly on the nature of the questions posed.
In the past years, we have witnessed the remarkable success of Text-to-Image (T2I) models and their widespread use on the web. Extensive research in making T2I models produce hyper-realistic images has led to new concerns, such as generating Not-Safe-For-Work (NSFW) web content and polluting the web society. To help prevent misuse of T2I models and create a safer web environment for users features like NSFW filters and post-hoc security checks are used in these models. However, recent work unveiled how these methods can easily fail to prevent misuse. In particular, adversarial attacks on text and image modalities can easily outplay defensive measures. %Exploiting such leads to the growing concern of preventing adversarial attacks on text and image modalities. Moreover, there is currently no robust multimodal NSFW dataset that includes both prompt and image pairs and adversarial examples. This work proposes a million-scale prompt and image dataset generated using open-source diffusion models. Second, we develop a multimodal defense to distinguish safe and NSFW text and images, which is robust against adversarial attacks and directly alleviates current challenges. Our extensive experiments show that our model performs well against existing SOTA NSFW detection methods in terms of accuracy and recall, drastically reducing the Attack Success Rate (ASR) in multimodal adversarial attack scenarios. Code: https://github.com/shahidmuneer/multimodal-nsfw-defense.
Describing real-world entities varies widely across online sources, complicating the task of web data integration. We study the challenge of efficiently discovering syntactic transformations for web tables, where two tables are not initially equi-joinable but can become joinable after some transformations. Detecting these transformations is challenging due to the large space of possible candidates, which further grows with the number of rows. To overcome this, we introduce a cluster-based table sampling that reduces the number of required rows for transformation generation by capturing underlying data patterns. Our extensive evaluation demonstrates that the proposed method significantly improves the runtime performance of state-of-the-art methods on two real-world web datasets, sourced from webpage content and organizational forums, by one to two orders of magnitude with negligible impact on accuracy.
The effective training and evaluation of retrieval systems require a substantial amount of relevance judgments, which are traditionally collected from human assessors - a process that is both costly and time-consuming. Large Language Models (LLMs) have shown promise in generating relevance labels for search tasks, offering a potential alternative to manual assessments. Current approaches often rely on a single LLM, such as GPT-4, which, despite being effective, are expensive and prone to intra-model biases that can favour systems leveraging similar models. In this work, we introduce JudgeBlender, a framework that employs smaller, open-source models to provide relevance judgments by combining evaluations across multiple LLMs (LLMBlender) or multiple prompts (Prompt-Blender). By leveraging the LLMJudge benchmark [10], we compare JudgeBlender with state-of-the-art methods and the top performers in the LLMJudge challenge. Our results show that JudgeBlender achieves competitive performance, demonstrating that very large models are often unnecessary for reliable relevance assessments.
Recommender systems are pivotal in Internet social platforms, yet they often cater to users' historical interests, leading to critical issues like echo chambers. To broaden user horizons, proactive recommender systems aim to guide user interest to gradually like a target item beyond historical interests through an influence path,i.e., a sequence of recommended items. As a representative, Influential Recommender System (IRS) designs a sequential model for influence path planning but faces issues of lacking target item inclusion and path coherence. To address the issues, we leverage the advanced planning capabilities of Large Language Models (LLMs) and propose an LLM-based Influence Path Planning (LLM-IPP) method. LLM-IPP generates coherent and effective influence paths by capturing user interest shifts and item characteristics. We introduce novel evaluation metrics and user simulators to benchmark LLM-IPP against traditional methods. Our experiments demonstrate that LLM-IPP significantly enhances user acceptability and path coherence, outperforming existing approaches.
In this paper, we propose a new ensemble knowledge distillation method for distilling knowledge from LLM-based recommendation (teacher) models to traditional light-weight sequential recommendation (student) models. In particular, instead of using one single teacher model, the averaged prediction from multiple teachers is employed as the soft targets of knowledge distillation. Further, only the top K soft labels of teachers' output distribution are sampled for distillation to make it more focused on the corresponding high-ranked items. Extensive experiments on three public datasets show the effectiveness of the proposed ensemble knowledge distillation for sequential recommendation (EKD4Rec).
Dynamically capturing topic-target pairs can provide a mechanism explaining the reason that triggers the sarcasm. Existing approaches ignore the event evolution in real-world scenario. Hot topics reflecting the trends of events provide external knowledge clues such as topic tags and posts that change over time. How to eliminate the noise and conflicts existed in the knowledge at different time is a key challenge. This paper proposes a Knowledge Alignment method based on a Large Language Model (KA-LLM) for dynamic detection of topic-target pairs. Guided by knowledge clues, the LLM dynamically adjusts the topological structure of the knowledge graph, enabling the knowledge features to pay more attention to the current events. A hybrid alignment approach is designed to achieve knowledge fusion through feature contrast and reconstruction. The effectiveness of the proposed method is validated on digital and automobile datasets.
Hyperbolic classifiers, which typically view hyperbolic hyperplanes as decision boundaries, are generalizations of Euclidean classifiers in hyperbolic space suitable for modeling hierarchical data. In this paper, we present Maximal Separating Poincaré Hyperplane (MaSH), a hyperbolic geometric inductive bias that enhances the generalization capability of hyperbolic classifiers, especially on the class-imbalanced settings. MaSH encourages 1) the equiangularity of the ideal points of the Poincaré hyperplanes of all classes; and 2) the equiradiality of these Poincaré hyperplanes. The two properties jointly encourage the maximal separation bias for hyperbolic classifiers. We perform experiments on imbalanced/long-tailed classification and the results show consistent improvements.
Large Language Models (LLMs) have been integrated into recommendation systems to enhance user behavior comprehension. The Retrieval Augmented Generation (RAG) technique is further incorporated into these systems to retrieve more relevant items and improve system performance. However, existing RAG methods rely primarily on textual semantics and often fail to incorporate the most relevant items, limiting the effectiveness of the systems.
In this paper, we propose Representation learning for retrieval-Augmented Large Language model Recommendation (RALLRec). Specifically, we enhance textual semantics by prompting LLMs to generate more detailed item descriptions, followed by joint representation learning of textual and collaborative semantics, which are extracted by the LLM and recommendation models, respectively. Considering the potential time-varying characteristics of user interest, a simple yet effective reranking method is further introduced to capture the dynamics of user preference. We conducted extensive experiments on three real-world datasets, and the evaluation results validated the effectiveness of our method. Code is made public at https://github.com/JianXu95/RALLRec.
Index structures are a fundamental component of web search and modern database systems, playing a crucial role in enabling efficient data retrieval and management. However, selecting the most appropriate index structure for specific scenarios and workloads is a highly challenging task. This is especially true for learned indexes, where the cumulative distribution characteristics of stored keys not only affects performance but also impacts the robustness of the index. In this paper, we propose a key distribution-aware framework called KOE, to estimate the performance and robustness of various index structures. The framework predicts two critical metrics-performance and memory consumption-across various data distributions and workloads, integrating both temporal and statistical characteristics through a novel learning-based approach, denoted as TStatE. Experimental results demonstrate that KOE provides accurate, data-driven performance and robustness estimation, facilitates systematic index selection and optimization, and can effectively generalize across different index types and workloads, making it a robust tool for exploring trade-offs in index deployment.
The integration of multi-omic data is pivotal for understanding complex diseases, but its high dimensionality and noise present significant challenges. Graph Neural Networks (GNNs) offer a robust framework for analyzing large-scale signaling pathways and protein-protein interaction networks, yet they face limitations in expressivity when capturing intricate biological relationships. To address this, we propose Graph Sequence Language Model (GraphSeqLM), a framework that enhances GNNs with biological sequence embeddings generated by Large Language Models (LLMs). These embeddings encode structural and biological properties of DNA, RNA, and proteins, augmenting GNNs with enriched features for analyzing sample-specific multi-omic data. By integrating topological, sequence-derived, and biological information, GraphSeqLM demonstrates superior predictive accuracy and outperforms existing methods, paving the way for more effective multi-omic data integration in precision medicine https://github.com/FuhaiLiAiLab/GraphSeqLM.
In this paper, we present Edu-Values, the first Chinese education values evaluation benchmark that includes seven core values: professional philosophy, teachers' professional ethics, education laws and regulations, cultural literacy, educational knowledge and skills, basic competencies and subject knowledge. We meticulously design 1,418 questions, covering multiple-choice, multi-modal question answering, subjective analysis, adversarial prompts, and Chinese traditional culture (short answer) questions. We conduct human feedback based automatic evaluation over 21 state-of-the-art (SoTA) LLMs, and highlight three main findings: (1) due to differences in educational culture, Chinese LLMs outperform English LLMs, with Qwen 2 ranking the first with a score of 81.37; (2) LLMs often struggle with teachers' professional ethics and professional philosophy; (3) leveraging Edu-Values to build an external knowledge repository for RAG significantly improves LLMs' alignment. This demonstrates the effectiveness of the proposed benchmark.
In this era of open-ended language modeling, where task boundaries are gradually fading, an urgent question emerges: have we made significant progress in text classification with the full benefit of LLMs? To answer this question, we propose RGPT, an adaptive boosting framework tailored to produce a specialized text classification LLM by recurrently ensembling a pool of base learners. The base learners are constructed by adaptively adjusting the distribution of training samples and iteratively fine-tuning LLMs with them. Such base learners are then ensembled to be a specialized text classification LLM, by recurrently incorporating the historical predictions from the previous learners. Through a comprehensive empirical comparison, we show that RGPT significantly outperforms 8 state-of-the-art (SoTA) PLMs and 7 SoTA LLMs on four benchmarks by 2.90% on average. Further evaluation experiments reveal a clear superiority of RGPT over average human classification performance.
Trusted multi-view classification has been widely applied in safety-critical domains by integrating single-view information and providing reliable uncertainty estimates. However, using a unified uncertainty measure overlooks the differences between various types of uncertainty, which impedes the adoption of appropriate learning strategies for different types of uncertain samples to enhance model performance. To address this issue, we propose a novel uncertainty measure based on Dempster-Shafer Theory, which refines uncertainty into confusion caused by multi-view classification conflicts and ignorance arising from unknown classes. Based on the refined uncertainty measure, we implement a trusted multi-view classification algorithm, where the learning objective consists of the loss terms of prediction error, ignorance and confusion. Through optimizing the hybrid learning objective, we can improve the prediction accuracy and identify the unknown and conflicted samples in multi-view classification. Extensive experiments on authorized multi-view datasets validate the superiority of the proposed method. The codes have been released at https://github.com/muxixi727/RTMC.
It is our great pleasure to welcome you to the Two Days Workshops at the ACM International World Wide Web Conference 2025, Sydney, Australia.
This year, 156 workshop organizers have prepared 26 high-quality workshops, marking a significant increase from 16 workshops in the previous year. Of these 26 workshops, 8 will run as full-day events and 18 as half-day sessions.
For the first time, the Workshop Chairs introduced a fast-track submission process, aimed at helping high-quality papers (short or long) that narrowly missed acceptance for the main conference due to minor issues. In this process, workshop organizers acted as meta-reviewers and leveraged existing reviews to expedite decisions.
The recent achievements and availability of Large Language Models have paved the road to a new range of applications and use-cases. Pre-trained language models are now being involved at-scale in many fields where they were until now absent from. More specifically, the progress made by causal generative models has open the door to using them through textual instructions aka. prompts. Unfortunately, the performances of these prompts are highly dependent on the exact phrasing used and therefore practitioners need to adopt fail-retry strategies. Based on the success of the past edition, this second international workshop on prompt engineering gathers practitioners (both from Academia and Industry) to exchange about good practices, optimizations, results and novel paradigms about the design of efficient and safe prompts.
Natural Language Interfaces (NLIs) backed by Large Language Models (LLMs) are used to interact with visualizations through natural language queries. Using the specific example of 2.5D treemaps, the Delphi tool was recently presented, introducing an interactive 2.5D visualization with an accompanying chat interface, where the LLM can react to user input and adapt the visualization at its own discretion. While Delphi has demonstrated effectiveness, the authors have not included an evaluation of the LLM's performance with respect to its prompt and specific task types. In this study, we systematically evaluate the impact of prompt engineering on Delphi's ability to answer factual questions related to data and visualization. Specifically, we investigate the effect of the Chain-of-Thought prompting technique by employing a questionnaire comprising 40 questions across ten low-level analytic tasks. Our findings aim to refine prompt design methodologies and enhance the usability and effectiveness of NLIs in advanced visualization systems.
This paper introduces Iterative Proof-Driven Development, a novel prompt engineering method designed to take advantage of GPT-4 level models that are able to follow detailed instructions. Iterative Proof-Driven Development is a structured prompt that outlines a test-driven process for solving complex mathematics and programming tasks. The prompt provides an LLM with instructions to decompose a problem into sub-problems, generate test cases, verify results, and integrate sub-solutions. While Chain of Thought (CoT) reasoning remains central to individual steps, the structured process ensures consistency and reliability. This method takes advantage of embedded code interpreters within chatbot user interfaces, enabling immediate execution, debugging, and iterative refinement without external tools. We demonstrate this prompting technique by guiding an LLM through the development of a small CSV processing library, illustrating each step and highlighting how the method ensures well-tested, logically consistent, and reliable outputs. Finally, we explore potential use cases, future directions for automation with agent-based systems (e.g., Langgraph), and the broader implications of prompt engineering for larger software development projects.
Commercial large language models (LLMs) such as GPT-3.5 have emerged as powerful tools for diverse natural language processing (NLP) tasks, yet concerns persist about their reliability in generating factual responses. This study investigates the potential of commercial LLMs such as GPT-3.5 for fact verification, addressing three key research questions: (1) Can GPT-3.5 perform fact verification reliably? (2) How do different prompting strategies affect their performance? (3) What are the common errors they make with the most effective prompt? Using the benchmark FEVER 1.0 dataset, we designed and evaluated three prompts, with experiments conducted using GPT-3.5 as a representative commercial LLM. Our experiments demonstrate that GPT-3.5 achieves a Label Accuracy (LA) of over 72% with the best-performing prompt, significantly outperforming simpler prompts. A detailed error analysis reveals that approximately 70% of mistakes stem from logical reasoning and contextual misunderstandings. These findings suggest that carefully crafted prompts can substantially enhance the accuracy of LLMs, in fact verification tasks, highlighting their potential as supportive tools for applications in sensitive domains.
Over the past few years, the range of use cases involving Large Language Models (LLMs) has grown dramatically. Alongside this growth, many techniques have been established by the community to boost LLM performance. Among them, relying on the fact that LLMs excel at reproducing behaviors, practitioners have been charging their prompts with examples (a.k.a. shots) to guide or orientate the LLMs towards the correct directions given their main instruction. More recently, LLMs have evolved, allowing users to define overall roles by offering two inputs: system and user messages, based on the assumption that -in a sense- system instructions would be dedicated to admin/designer of chatbot interfaces. In such a setting: what is the best place to give example to the LLM so to improve its performances? In this study, we address this research question by systematically trying different shooting combinations with different popular benchmarks across a large set of LLMs. Our experiments show that it tends to be more beneficial to guide the LLMs through their system prompting mechanisms, leaving only the questions into their user messages.
In this paper, we propose a natural language interface visualization framework leveraging visualization grammar to balance the flexibility and stability of generated visualizations. Our system employs a JSON schema for visualization specification and an instruction prompt with semantically distinct sections for task context, visualizations, datasets, and control mechanisms. This design enables robust state management, live prompt adjustments, ensures clarity, consistency, and reusability in visualization generation.
Chest X-ray imaging plays a critical role in diagnosing chest diseases, making it a cornerstone in clinical and research domains. Automating disease diagnosis and extracting relevant clinical information from chest X-ray reports have become essential for developing AI-driven healthcare systems. While effective, deep learning models require extensive labelled datasets, making the labelling of diseases from radiology reports crucial. Traditionally, rule-based labelling approaches have been employed, but the emergence of large language models (LLMs) has introduced new possibilities through instruction-based prompt engineering. In this study, we explore various prompt engineering techniques, including in-context learning and prompt chaining, to label multilabel disease reports and extract key clinical findings from radiology reports. We conducted ablation studies on both proprietary LLMs (e.g., GPT-4 Turbo, GPT-3.5 Turbo) and publicly available LLMs (e.g., Llama2-7B, Llama2-13B, Llama3-8B, Llama2-70B), comparing their performance in terms of clinical accuracy, privacy, and computational cost. Our findings demonstrate that well-crafted prompts on publicly available and lightweight LLMs can achieve competitive results compared to larger and/or proprietary models, offering a cost-effective and privacy-preserving solution for clinical applications. These results highlight the potential of leveraging advanced prompt engineering to streamline disease labelling and enhance the quality of automated report generation in radiology.
The construction of Knowledge Graphs (KGs) often demands substantial manual effort and domain expertise, especially when converting structured data formats like CSV files into KGs. Recent advancements in Large Language Models (LLMs) offer promising avenues to simplify this process through prompt engineering.
This study investigates various prompting strategies-zero-shot, one-shot, prompt chaining, and a hybrid approach-to enable LLMs to automate the creation of KGs from CSV files. Using a dataset containing quality metrics for 2,026 KGs generated by KGHeartBeat, the paper assesses the performance of GPT-4o, GPT-o1 mini, Claude 3.5 Sonnet, and Gemini 1.5 pro, across different prompt configurations. The findings reveal that the hybrid approach consistently produces the most accurate and complete KGs, effectively addressing challenges related to scalability and complexity.
Large Language Models (LLMs) are often used for tasks that involve reasoning about the physical world, like recommending travel itineraries. However, success at these tasks requires the LLM to have been exposed to the relevant places, which is not true for lesser-known or alternatively named places, like Indigenous place names. Our prompting technique handles this issue using Retrieval Augmented Generation, encoding a spatial graph of common places and a mapping to their Indigenous alternatives. Our method improves LLM performance on spatial tasks involving lesser-known place names, thus advancing AI fairness.
EdgePrompt is a prompt engineering framework that implements pragmatic guardrails for Large Language Models (LLMs) in the K-12 educational settings through structured prompting inspired by neural-symbolic principles. The system addresses educational disparities in Indonesia's Frontier, Outermost, Underdeveloped (3T) regions by enabling offline-capable content safety controls. It combines: (1) content generation with structured constraint templates, (2) assessment processing with layered validation, and (3) lightweight storage for content and result management. The framework implements a multi-stage verification workflow that maintains safety boundaries while preserving model capabilities in connectivity-constrained environments. Initial deployment targets Grade 5 language instruction, demonstrating effective guardrails through structured prompt engineering without formal symbolic reasoning components.
In the era of foundation models and Large Language Models (LLMs), Euclidean space is the de facto geometric setting of our machine learning architectures [9, 31, 33]. However, recent literature has demonstrated that this choice comes with fundamental limitations [2, 12, 15, 29]. To that end, non-Euclidean learning is quickly gaining traction, particularly in web-related applications where complex relationships and structures are prevalent. Non-Euclidean spaces, such as hyperbolic, spherical, and mixed-curvature spaces, have been shown to provide more efficient and effective representations for data with intrinsic geometric properties, including web-related data like social network topology, query-document relationships, and user-item interactions [4, 8, 14, 19, 20, 24, 25, 27, 34, 38, 39, 44, 46, 49]. Integrating foundation models with non- Euclidean geometries has great potential to enhance their ability to capture and model the underlying structures, leading to better performance in search, recommendations, and content understanding. This workshop focuses on the intersection of Non-Euclidean Foundation Models and GEometric Learning (NEGEL), exploring its potential benefits, including the potential benefits for advancing web-related technologies, challenges, and future directions.
Information diffusion prediction aims to predict which users are more likely to engage in the diffusion path on the next timestamp. Previous methods focus on either capturing dynamic dependencies in Euclidean space or static implicit dependencies in hyperbolic space, lacking the double consideration of both the implicit hierarchical relationships that are hard captured in Euclidean space and dynamic user preferences over time. In this study, we propose a novel Multi-scale Hyperbolic Diffusion model (MH-Diff), which leverages the unique properties of hyperbolic space to capture the complex hierarchical relationships in both social graphs and diffusion cascades. Specifically, we utilize the hyperbolic space to represent user dependencies and implicit tree-like structures, which are difficult to model in traditional Euclidean space. Next, we introduce a dynamic hyperbolic learning module with multi-scale sequential hypergraph attention network to captures users' dynamic preferences at different time scales. Moreover, we leverage a contextual prediction module to enhance the interaction among user hyperbolic representations in the current cascade. Finally, we evaluate MH-Diff on three real-world datasets and demonstrate that it significantly outperforms state-of-the-art models, validating the superior capability of hyperbolic space in capturing both global statics and temporal dynamics.
The application of large language models (LLMs) to graph data has attracted a lot of attention recently. LLMs allow us to use deep contextual embeddings from pretrained models in text-attributed graphs, where shallow embeddings are often used for the text attributes of nodes. However, it is still challenging to efficiently encode the graph structure and features into a sequential form for use by LLMs. In addition, the performance of an LLM alone, is highly dependent on the structure of the input prompt, which limits their effectiveness as a reliable approach and often requires iterative manual adjustments that could be slow, tedious and difficult to replicate programmatically. In this paper, we propose GraphiT (Graphs in Text), a framework for encoding graphs into a textual format and optimizing LLM prompts for graph prediction tasks. Here we focus on node classification for text-attributed graphs. We encode the graph data for every node and its neighborhood into a concise text to enable LLMs to better utilize the information in the graph. We then further programmatically optimize the LLM prompts using the DSPy framework to automate this step and make it more efficient and reproducible. GraphiT outperforms our LLM-based baselines on three datasets and we show how the optimization step in GraphiT leads to measurably better results without manual prompt tweaking. We also demonstrated that our graph encoding approach is competitive to other graph encoding methods while being less expensive because it uses significantly less tokens for the same task.
The next Point-of-Interest (POI) recommendation has gained significant research interest, focusing on learning users' mobility patterns from sparse check-in data. Existing POI recommendation models face two main constraints. First, most models are based on Euclidean space and struggle with capturing the inherent hierarchical structures in historical check-ins. Second, various transition semantics in both one-hop and sequential transitions cannot be properly utilized to understand user movement trends. To overcome the above limitations, we introduce rotation operations in hyperbolic space, enabling the joint modeling of hierarchical structures and various transition semantics to effectively capture complex mobility patterns. Specifically, a novel hyperbolic rotation-based recommendation model HMST is developed for the next POI recommendation. To our knowledge, this is the first work to explore the hyperbolic rotations for the next POI recommendation tasks. Extensive experiments on three real-world datasets demonstrate the superiority of our proposed approach over the various state-of-the-art baselines.
Link prediction is a widely studied task in Graph Representation Learning (GRL) for modeling relational data. The early theories in GRL were based on the assumption of a symmetric adjacency matrix, reflecting an undirected setting. As a result, much of the following state-of-the-art research has continued to operate under this symmetry assumption, even though real-world data often involve crucial information conveyed through the direction of relationships. This oversight limits the ability of these models to fully capture the complexity of directed interactions. In this paper, we focus on the challenge of directed link prediction by evaluating key heuristics that have been successful in undirected settings. We propose simple but effective adaptations of these heuristics to the directed link prediction task and demonstrate that these modifications produce competitive performance compared to the leading Graph Neural Networks (GNNs) originally designed for undirected graphs. Through an extensive set of experiments, we derive insights that inform the development of a novel framework for directed link prediction, which not only surpasses baseline methods but also outperforms state-of-the-art GNNs on multiple benchmarks.
The third edition of the International Workshop on Multimedia Content Analysis for Social Good (MM4SG 2025) was held alongside the prestigious Web Conference 2025. This workshop aimed to tackle the critical challenge of analyzing and moderating multimodal content across digital platforms. In today's era, where diverse forms of multimodal data-including memes, text-embedded images, and fabricated content-can rapidly shape public opinion and influence societal narratives, the demand for sophisticated and ethical content moderation strategies has become increasingly urgent. MM4SG 2025 provided a unique forum for interdisciplinary collaboration, bringing together researchers and practitioners from natural language processing, machine learning, computational social science, and ethics to address these pressing concerns. This paper highlights the key themes, discussions, and contributions of the third edition of the MM4SG workshop, with a particular focus on the intersection of computational linguistics and multimodal content analysis. It also explores future directions for the workshop, including expanding its scope and impact in subsequent editions.
Identifying and mapping research expertise within an organisation or research ecosystem is challenging due to the complexity and fragmentation of available research information. Researchers have diverse expertise across fields such as biology and engineering, making uniform categorisation difficult. Additionally, overlapping skills across disciplines blur boundaries, further complicating classification.
This talk explores the use of Generative AI in transforming disconnected research information into structured Research Capability Maps and Research Knowledge Graphs. Recent developments in GenAI have introduced new capabilities for topic classification, subject mapping, and knowledge mining from vast amounts of literature, datasets, and other research outputs. We examine the concept of Augmented Intelligence, which integrates domain expertise with AI-assisted processes to enhance informed dialogue among researchers. This approach significantly improves the identification of research capabilities within an organisation and the assessment of research capacity within its collaboration networks.
We have implemented this approach as part of Research Link Australia project within the health sector, involving health research institutions, medical research institutes (MRIs), universities, and not-for-profit organisations. This talk will present findings from this case study, demonstrating how the Augmented Intelligence approach provided a platform for fostering a shared understanding of research capabilities and capacities within this network.
This work builds upon the Research Graph framework, which connects research entities including research publications, research data collections, researcher profiles, organisations and funded projects. Leveraging this framework and use of GenAI, this case study shows how fragmented research information can be transformed to a connected graph and lead to insights into research capabilities within institutions and identifying potential research collaboration opportunities in Australia ad overseas. This is a demonstration of human and AI collaboration based on a human-in-the-loop intelligence approach.
The rise in popularity of general-purpose large language models (LLMs) raises questions around bias and fairness in the decision they make. Do these models reflect the biases and stereotypes present in the data they have been pre-trained on? If so, how should we deal with it? In this talk, we first discuss issues of bias in human data using as an example gender bias in Wikipedia where we looked at how well represented genders are across different categories of articles. We then move on to look at issues of bias in Artificial Intelligence (AI) using as an example political bias in LLMs. We show how it is possible to measure the political standing of different LLMs and to control their standing by telling them to impersonate certain profiles. This also shows what are some of the existing stereotypes (e.g., a museum curator is left-wing and a retired army officer is right-wing) embedded into the LLMs during pre-training. Finally, we discuss how to explore and manage bias existing in LLMs, how these models perform when used for sensitive tasks, and how users tend to trust AI agents for low-risk and high-risks tasks.
Knowledge graphs (KG) are becoming increasingly popular, at the heart of Gartner's emerging tech impact radar, especially as a complementary theme for addressing the challenges of recent advances in natural language processing (NLP) with large language models related to responsible AI such as fairness, transparency, accountability, and explainability. Sir Tim Berners-Lee's seminal work ''Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web'', envisioned a World Wide Web where information is not only accessible but structured, allowing machines to interpret data meaningfully. This vision laid the groundwork for technologies such as RDF (Resource Description Framework) and OWL (Web Ontology Language), which serve as foundational components for modern KGs.
However, the process of building domain-specific KGs from extensive text corpora is highly complex and resource-intensive, requiring careful task design for entity recognition, disambiguation, and relationship extraction, among others. These tasks are essential to ensure accuracy and relevance in knowledge representation, but they pose considerable challenges. Addressing these complexities is crucial for the continued advancement and application of KGs across domains.
In this context, the 4th NLP4KGC workshop is held to create a collaborative platform for researchers, practitioners, and industry experts in NLP and KG construction. Following the success and growing community engagement in the previous three editions, this year's workshop aims to deepen collaboration and encourage innovative solutions in this rapidly evolving field. The 4th NLP4KGC will continue to bridge academia and industry, fostering the exchange of insights, tools, and methodologies at the intersection of NLP and KG development. The 4th NLP4KGC will consist of five accepted papers and three keynotes from distinguished speakers.
Knowledge graphs (KGs) have emerged as fundamental infrastructure for organizing and structuring vast amounts of information across various domains. However, building and maintaining large-scale KGs presents significant challenges-not only in terms of initial construction but also in ensuring their long-term sustainability. Traditional approaches to KG development and maintenance often rely on institutional funding, making them vulnerable to resource constraints. In this keynote, I will discuss alternative community-driven models for sustainable knowledge graph construction and maintenance, emphasizing grassroots-level engagement, decentralized curation, and incentivized reuse of knowledge assets. I will discuss how leveraging collaborative contributions, open governance models, and incentive structures-including blockchain-based provenance tracking and reputation systems-can promote long-term commitment without dependence on centralized management.
Through real-world examples, including lessons from the development and maintenance of the FoodKG, I will outline the key technical, social, and economic challenges in constructing a large-scale knowledge graph that remains both useful and adaptable over time. FoodKG was built as a structured representation of over one million recipes, including their ingredients, nutrients, and relationships to various dietary constraints, all expressed in RDF to ensure flexibility and interoperability. By leveraging linked data principles, FoodKG integrates diverse sources, such as nutritional databases, upper-level food ontologies, food composition datasets, and information from recipe websites, creating a rich resource that can support AI-driven applications in the food and health domain. Since its creation, the FoodKG has been applied in various real-world scenarios, demonstrating its value across multiple domains. Some examples include: (1) Personalized diet recommenders that use FoodKG to generate meal plans based on users' dietary needs, restrictions, and preferences; (2) Innovative recipe extraction with ingredient substitutions, making cooking more inclusive for people with allergies or specific health conditions; and (3) Health assistant chatbots that leverage FoodKG to provide context-aware nutritional guidance, helping users make informed food choices.
While FoodKG has proven to be a valuable resource, its long-term impact hinges on its ability to evolve beyond its initial design. Community-driven extensions-such as the integration of new recipes, cultural food traditions, and emerging nutritional insights-can significantly expand its scope and relevance. Fortunately, the inherent flexibility of the underlying semantics allows for the seamless incorporation of new data sources. However, curating knowledge through open contributions presents key challenges, particularly in ensuring accuracy, consistency, and reliability. To address this, we need robust validation workflows that combine automated checks, expert reviews, and crowd-sourced verification. Additionally, incentivizing high-quality contributions is essential for sustaining engagement. Reward mechanisms-such as attribution-based credit systems, reputation scores, and micro-rewards tied to the future usage of knowledge assets-can motivate contributors and establish a self-sustaining ecosystem where knowledge is continuously expanded and refined.
Ultimately, rethinking sustainability models in KG development is not just a technical challenge, but a paradigm shift. By embracing open, collaborative, and incentivized approaches, we can transform knowledge graphs into living, evolving resources-not just static datasets, but dynamic ecosystems that empower researchers, practitioners, and the broader public alike. The future of knowledge graphs depends not just on how we build them, but on how we sustain and grow them together.
This manuscript introduces paper2lkg, a novel Local Knowledge Graph Construction (KGC) pipeline designed to transform individual academic papers into their structured local Knowledge Graph (KG) representations. The pipeline harnesses Large Language Models (LLMs), particularly generative LLMs, to automate key Natural Language Processing (NLP) tasks in KGC. The constructed local KGs can potentially be used to enrich an existing academic KG that lacks detailed local representations of individual papers or further integrated into new academic KGs through Knowledge Graph Alignment (KGA).
This paper introduces a novel methodology for constructing a comprehensive biomedical knowledge graph by applying advanced Natural Language Processing (NLP) techniques. By leveraging Large Language Models (LLMs) and a multifaceted prompt engineering approach, we effectively perform Named Entity Recognition (NER) and Relation Extraction (RE) on biomedical literature, targeting entities such as diseases, drugs, proteins, procedures, and symptoms. Our methodology incorporates eight distinct prompt engineering strategies for NER and a standardized approach for RE, facilitating the extraction of intricate inter-entity relationships. The resulting knowledge graph amalgamates diverse data sources into a unified framework, enabling efficient querying, visualization, and analysis of biomedical information. Furthermore, we present an innovative query processing pipeline that integrates GPT-3.5 turbo with the knowledge graph, allowing users to interact with the graph through natural language. This integrated system empowers the discovery of novel correlations, accelerating scientific research and fostering interdisciplinary collaboration. This represents a substantial contribution to the field of biomedical knowledge graph construction, offering a robust platform for accelerating scientific discovery and informing clinical decision-making.
Recent advances in Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) have shown promise in academic question answering. However, existing approaches often fail to fully utilize document structural information and lack diversity in retrieved contexts. This paper presents StructRAG, a structure-aware RAG framework that leverages scholarly knowledge graphs for enhanced question answering. Our framework features three key innovations: (1) an automated knowledge graph construction pipeline based on Deep Document Model (DDM) that preserves document hierarchical structure, (2) a structure-aware retrieval mechanism that combines semantic relevance with source diversity, and (3) a context-enhanced generation approach that integrates structural metadata for improved answer synthesis. Experimental results on 329 computer science papers demonstrate that StructRAG significantly outperforms vanilla RAG baseline. While maintaining comparable semantic accuracy (91% vs 90%), our approach achieves substantially higher diversity in generated answers (Distinct-1: 62% vs 52%, Distinct-2: 89% vs 78%) and better answer quality across all metrics, with notable improvements in relevance (29%) and readability (36.5%). These results demonstrate that StructRAG effectively enhances both the diversity and quality of academic question answering.
The emergence of Large Language Models (LLMs) has revolutionised natural language processing capabilities. However, despite these advances, effectively optimising prompts for knowledge extraction tasks like Named Entity Recognition (NER) remains challenging. This paper presents a zero-shot automated prompt engineering approach that decomposes the NER task into two phases: entity boundary detection and entity classification. Our method incorporates structured task analysis, automated prompt generation, test case generation, and iterative optimisation, requiring no labelled training examples. This decomposition allows for more precise entity recognition while maintaining the efficiency. Through experimentation on the CoNLL-2003 dataset using standard exact-match evaluation metrics, our approach demonstrates improvements over unified methods, achieving a 75.39% F1 score compared to baseline approaches (72.90%). The key contributions include: (1) A structured pipeline for zero-shot automated prompt engineering in NER tasks that addresses the challenges of prompt design and optimisation; (2) A two-phase approach to NER tasks that separates boundary detection from entity classification; and (3) Experimental results demonstrating the effectiveness of our approach compared to existing zero-shot approaches in NER tasks.
Knowledge Graphs (KGs) are crucial in the field of artificial intelligence and are widely used in downstream tasks, such as question-answering (QA). The construction of KGs typically requires significant effort from domain experts. Large Language Models (LLMs) have recently been used for Knowledge Graph Construction (KGC). However, most existing approaches focus on a local perspective, extracting knowledge triplets from individual sentences or documents, missing a fusion process to combine the knowledge in a global KG. This work introduces Graphusion, a zero-shot KGC framework from free text. It contains three steps: in Step 1, we extract a list of seed entities using topic modeling to guide the final KG includes the most relevant entities; in Step 2, we conduct candidate triplet extraction using LLMs; in Step 3, we design the novel fusion module that provides a global view of the extracted knowledge, incorporating entity merging, conflict resolution, and novel triplet discovery. Results show that Graphusion achieves scores of 2.92 and 2.37 out of 3 for entity extraction and relation recognition, respectively. Moreover, we showcase how Graphusion could be applied to the Natural Language Processing (NLP) domain and validate it in an educational scenario. Specifically, we introduce TutorQA, a new expert-verified benchmark for QA, comprising six tasks and a total of 1,200 QA pairs. Using the Graphusion-constructed KG, we achieve a significant improvement on the benchmark, for example, a 9.2% accuracy improvement on sub-graph completion.
The rapid evolution of Large Language Models (LLMs) has profoundly impacted social media, transforming how information is generated, disseminated, and analyzed. With their ability to process vast amounts of data, grasp contextual nuances, and engage in human-like dialogue, LLMs present new opportunities and challenges for understanding online interactions. The SocialLLM 2025 workshop builds on the foundation laid by SocialNLP, expanding the scope to explore the capabilities and implications of LLMs in social media research. This year's workshop at TheWebConf 2025 focuses on three pivotal themes: leveraging LLMs for mental health support, enhancing emotion detection in textual interactions, and improving misinformation detection through active learning. The selected papers illustrate cutting-edge advancements in these areas, demonstrating how LLMs can be fine-tuned for therapeutic dialogue generation, assessed for their emotional intelligence, and optimized for misinformation detection with minimal labeled data. These contributions highlight the growing interdisciplinary nature of LLM research, merging insights from natural language processing, social computing, and artificial intelligence ethics. By bringing together researchers and practitioners from diverse backgrounds, SocialLLM 2025 aims to foster meaningful discussions on the opportunities and risks associated with LLM-driven social media applications. The workshop serves as a platform for exploring novel methodologies, addressing ethical concerns, and shaping future directions for responsible AI deployment in social media environments. Through collaborative efforts, we seek to advance the field and ensure that LLMs contribute positively to the digital ecosystem.
Social media has evolved into a platform where individuals can freely disseminate content online. However, this unrestricted environment has unfortunately led to its misuse, with these platforms increasingly being used to circulate inappropriate content, misinformation, and disinformation. Misinformation detection becomes a crucial task in seeking social safety. In this work, we propose a framework called Few Labels with Active Learning (FLAL), which leverages a margin-sampling technique within the active learning paradigm. This enables the model to prioritize and learn from the most informative yet uncertain instances, enhancing its performance by focusing iteratively on examples that contribute the most to learning. As a result, the need for extensive labelling efforts is reduced. We evaluate FLAL across four multilingual benchmark datasets, where it demonstrates competitive results despite utilizing only a few labelled samples. To the best of our knowledge, this is the first study to apply active learning in conjunction with the few-label paradigm to data derived from an Arabic-language context.
Virtual Mental Health Assistants offer a promising solution to address the growing demand for accessible and scalable mental healthcare. However, existing dialogue generation models struggle with the complexities inherent in mental health conversations. In this paper, we explore the limitations of current Medical Dialogue Generation models by conducting experiments on the large language model ChatMGL. We propose modifications to ChatMGL, including fine-tuning the model on a mental health dataset without proximal policy optimization and incorporating dialogue act labels, to enhance its ability to handle the complex nature of mental health dialogues. Our results demonstrate that these modifications outperform baseline models in terms of ROUGE and BERT scores. Our work suggests that specialized fine-tuning and incorporating domain-specific knowledge can improve the efficacy of virtual assistants for mental health support.
This work investigates the capabilities of large language models (LLMs) in detecting and understanding human emotions through text. Drawing upon emotion models from psychology, we adopt an interdisciplinary perspective that integrates computational and affective sciences insights. The main goal is to assess how accurately they can identify emotions expressed in textual interactions and compare different models on this specific task. This research contributes to broader efforts to enhance human-computer interaction, making artificial intelligence technologies more responsive and sensitive to users' emotional nuances. By employing a methodology that involves comparisons with a state-of-the-art model on the GoEmotions dataset, we aim to gauge LLMs' effectiveness as a system for emotional analysis, paving the way for potential applications in various fields that require a nuanced understanding of human language.
This paper presents the 2nd place solution of the WWW'25 AgentSociety Challenge (Recommendation Track), which aims to advance the development of Large Language Model (LLM)-based agents for personalized recommendation. During this challenge, participants are tasked with building LLM-based agents capable of ranking items according to user preferences. To evaluate the practical capabilities of these agents, the organizer provides a robust evaluation platform that faithfully simulates real-world recommendation scenarios across multiple domains (e.g., Yelp, Goodreads, and Amazon). To tackle this challenge, our team, RecHackers, develops an agent-voting system inspired by the idea that self-consistency across multiple LLM outputs can enhance the accuracy. Specifically, our approach follows a prompting-sampling-voting paradigm. We first carefully design prompts for individual agents based on user interaction and item characteristics. Then, we query LLM-based agents multiple times with a controlled temperature to obtain a diverse range of candidate rankings. Lastly, we aggregate these rankings with predefined voting mechanisms, such as majority voting or Borda count, to derive the final ranking list. This simple yet effective solution achieved the 2nd place in the WWW'25 AgentSociety Challenge (Recommendation Track).
Uncorrectable Errors (UEs) of Dynamic Random Access Memory (DRAM) have been identified as a major failure cause in data centers and the failure of High Bandwidth Memory (HBM), which severely threatens the availability and reliability of cloud service, entire computing clusters, and web-based applications. Forecasting UEs before enacting preemptive maintenance measures has emerged as a viable strategy for diminishing server outages and some machine-learning-based solutions have also been proposed. However, predicting UEs presents several challenges: data noise and extreme imbalance as the UEs are exceedingly rare in memory events; heterogeneous data sources as the DRAMs in the field come from different manufacturing or architecture platforms; distribution shifts due to the hardware aging; and latent factors due to the dynamic access mechanism. These challenges become even more pronounced in web-centric contexts, where UE patterns can vary significantly across applications, including content delivery networks, e-commerce platforms, and real-time analytics.We developed a comprehensive real-world memory error dataset that includes both micro-level and bit-level information. This dataset presents a two-stage challenge aimed at devising a more efficient and generalized solution for event prediction. We believe the competition and dataset provide a breeding ground to foster discussions and further progress on several important research topics towards real-world ML applications. More details can be found in the competition homepage: https://hwcloud-ras.github.io/SmartMem.github.io/
Few-shot multimodal dialogue intention recognition is a critical challenge in the e-commerce domainn. Previous methods have primarily enhanced model classification capabilities through post-training techniques. However, our analysis reveals that training for few-shot multimodal dialogue intention recognition involves two interconnected tasks, leading to a seesaw effect in multi-task learning. This phenomenon is attributed to knowledge interference stemming from the superposition of weight matrix updates during the training process. To address these challenges, we propose Knowledge-Decoupled Synergetic Learning (KDSL), which mitigates these issues by utilizing smaller models to transform knowledge into interpretable rules, while applying the post-training of larger models. By facilitating collaboration between the large and small multimodal large language models for prediction, our approach demonstrates significant improvements. Notably, we achieve outstanding results on two real Taobao datasets, with enhancements of 6.37% and 6.28% in online weighted F1 scores compared to the state-of-the-art method, thereby validating the efficacy of our framework.
Image scene classification and dialogue intent recognition are fundamental tasks in intelligent e-commerce. The former classifies user-uploaded images, while the latter integrates multi-turn dialogue and visual information to extract user intent. However, existing multimodal models struggle with effectively utilizing e-commerce data and generalizing to vertical domains, limiting their practical applicability. To address this, we propose EcomMIR, an E-COMmerce Multimodal Intent Recognition framework based on CN-CLIP and MiniCPM-V. EcomMIR improves model generalization and robustness through multi-level intent data denoising, high-confidence data selection, and hierarchical labeling. Specifically, CN-CLIP employs contrastive learning to align image and text embeddings for efficient scene classification, while MiniCPM-V, a multimodal large language model, deeply integrates textual and visual information to model dialogue context and accurately recognize user intent. Experimental results show that EcomMIR achieves superior performance in both tasks, ranking Top 2 in the WWW2025 Multimodal Intent Recognition for Dialogue Systems challenge, offering an effective solution for multimodal tasks in intelligent e-commerce.
In recent years, Vision Language Models (VLMs) have significantly advanced multi-modal reasoning and generation tasks, offering promising solutions for e-commerce applications like multi-modal user intent recognition. However, VLMs trained on open-domain datasets often struggle with understanding fine-grained e-commerce semantics. This limitation stems primarily from two key factors: the scarcity of labeled multi-modal data for training and the lack of domain-specific knowledge to effectively reason. To overcome the above problems, we introduce IntentionGPT, a novel framework for multi-modal user intent recognition in e-commerce with limited labeled data. Specifically, we first propose a self-data augmentation method to generate diverse synthetic samples from minimal seed labeled samples to tackle data scarcity challenge. Meanwhile, a new collaborative filtering mechanism is proposed to filter out noise samples to enhance data quality. Second, a structure-aware retrieval method is proposed for retrieving domain knowledge to enhance the model's multi-modal reasoning ability. Third, the model soups method are utilized to fuse models from different training stages, combining their strengths and mitigating potential biases for robust performance. This work systematically addresses both data scarcity and domain adaptation challenges, and achieves first place in the WWW2025 multi-modal dialogue system intent recognition challenge.
The increasing complexity of e-commerce customer service (CS) scenarios, driven by rapid product evolution and user base growth, presents unique challenges for intent recognition. Unlike generic user-generated content (UGC), CS-UGC exhibits multimodal complexity (e.g., product inquiries, return requests) that traditional methods struggle to address due to (1) Limited CS domain-specific knowledge, which hampers the ability of large language models (LLMs) to handle multimodal CS data, and (2) The complexity and noise in CS UGC, which undermines the robustness of traditional approaches. In this paper, we propose Customer Service Augmented LLM Merge (CuSMer), a novel framework integrating semi-supervised learning with model merging techniques through dual pipelines: (i) pseudo-labeling → fine-tuning → LLM merging and (ii) image augmentation → fine-tuning → LLM merging. These piplines enhance the robustness of LLMs against noisy, out-of-distribution data while improving their multimodal understanding of CS scenarios. Evaluated on Alibaba's real-world datasets, CuSMer demonstrates superior robustness in noisy environments and enhanced multimodal understanding compared to baseline LLMs. It achieved third place in the first round and first place in the final round in the WWW25 - Competition: Multimodal Dialogue System Intent Recognition Challenge, validating its scalability and effectiveness for industrial CS applications.