Skip to main content

2024 | Buch

Advances in Knowledge Discovery and Data Mining

28th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2024, Taipei, Taiwan, May 7–10, 2024, Proceedings, Part V

herausgegeben von: De-Nian Yang, Xing Xie, Vincent S. Tseng, Jian Pei, Jen-Wei Huang, Jerry Chun-Wei Lin

Verlag: Springer Nature Singapore

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

The 6-volume set LNAI 14645-14650 constitutes the proceedings of the 28th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2024, which took place in Taipei, Taiwan, during May 7–10, 2024.

The 177 papers presented in these proceedings were carefully reviewed and selected from 720 submissions. They deal with new ideas, original research results, and practical development experiences from all KDD related areas, including data mining, data warehousing, machine learning, artificial intelligence, databases, statistics, knowledge engineering, big data technologies, and foundations.

Inhaltsverzeichnis

Frontmatter

Multimedia and Multimodal Data

Frontmatter
Re-thinking Human Activity Recognition with Hierarchy-Aware Label Relationship Modeling

Human Activity Recognition (HAR) has been studied for decades, from data collection, learning models, to post-processing and result interpretations. However, the inherent hierarchy in the activities remains relatively under-explored, despite its significant impact on model performance and interpretation. In this paper, we propose H-HAR, by rethinking the HAR tasks from a fresh perspective by delving into their intricate global label relationships. Rather than building multiple classifiers separately for multi-layered activities, we explore the efficacy of a flat model enhanced with graph-based label relationship modeling. Being hierarchy-aware, the graph-based label modeling enhances the fundamental HAR model, by incorporating intricate label relationships into the model. We validate the proposal with a multi-label classifier on complex human activity data. The results highlight the advantages of the proposal, which can be vertically integrated into advanced HAR models to further enhance their performances.

Jingwei Zuo, Hakim Hacid
Geometrically-Aware Dual Transformer Encoding Visual and Textual Features for Image Captioning

When describing pictures from the point of view of human observers, the tendency is to prioritize eye-catching objects, link them to corresponding labels, and then integrate the results with background information (i.e., nearby objects or locations) to provide context. Most caption generation schemes consider the visual information of objects, while ignoring the corresponding labels, the setting, and/or the spatial relationship between the object and setting. This fails to exploit most of the useful information that the image might otherwise provide. In the current study, we developed a model that adds the object’s tags to supplement the insufficient information in visual object features, and established relationship between objects and background features based on relative and absolute coordinate information. We also proposed an attention architecture to account for all of the features in generating an image description. The effectiveness of the proposed Geometrically-Aware Dual Transformer Encoding Visual and Textual Features (GDVT) is demonstrated in experiment settings with and without pre-training.

Yu-Ling Chang, Hao-Shang Ma, Shiou-Chi Li, Jen-Wei Huang
MHDF: Multi-source Heterogeneous Data Progressive Fusion for Fake News Detection

Social media platforms are inundated with an extensive volume of unverified information, most of which originates from heterogeneous data from a variety of diverse sources, spreading rapidly and widely, thereby posing a significant threat to both individuals and society. An existing challenge in multimodal fake news detection is its limitation to acquiring textual and visual data exclusively from a single source, which leads to a high level of subjectivity in news reporting, incomplete data coverage, and difficulties in adapting to the various forms and sources of fake news. In this paper, we propose a fake news detection model (MHDF) for multi-source heterogeneous data progressive fusion. Our approach begins with gathering, filtering, and cleaning data from multiple sources to create a reliable multi-source multimodal dataset, which involved obtaining reports from diverse perspectives on each event. Subsequently, progressive fusion is achieved by combining features from diverse sources. This is achieved by inputting the features obtained from the textual feature extractor and visual feature extractor into the news textual and visual feature fusion module. We also integrated sentiment features from the text into the model, allowing for multi-level feature extraction. Experimental results and analysis indicate that our approach outperforms other methods.

Yongxin Yu, Ke Ji, Yuan Gao, Zhenxiang Chen, Kun Ma, Jun Wu
Accurate Semi-supervised Automatic Speech Recognition via Multi-hypotheses-Based Curriculum Learning

How can we accurately transcribe speech signals into texts when only a portion of them are annotated? ASR (Automatic Speech Recognition) systems are extensively utilized in many real-world applications including automatic translation systems and transcription services. Due to the exponential growth of available speech data without annotations and the significant costs of manual labeling, semi-supervised ASR approaches have garnered attention. Such scenarios include transcribing videos in streaming platforms, where a vast amount of content is uploaded daily but only a fraction of them are transcribed manually. Previous approaches for semi-supervised ASR use a pseudo labeling scheme to incorporate unlabeled examples during training. Nevertheless, their effectiveness is restricted as they do not take into account the uncertainty linked to the pseudo labels when using them as labels for unlabeled cases. In this paper, we propose MOCA ( ), an accurate framework for semi-supervised ASR. MOCA generates multiple hypotheses for each speech instance to consider the uncertainty of the pseudo label. Furthermore, MOCA considers the various degrees of uncertainty in pseudo labels across speech instances, enabling a robust training on the uncertain dataset. Extensive experiments on real-world speech datasets show that MOCA successfully improves the transcription performance of previous ASR models.

Junghun Kim, Ka Hyun Park, U Kang
MM-PhyQA: Multimodal Physics Question-Answering with Multi-image CoT Prompting

While Large Language Models (LLMs) can achieve human-level performance in various tasks, they continue to face challenges when it comes to effectively tackling multi-step physics reasoning tasks. To identify the shortcomings of existing models and facilitate further research in this area, we curated a novel dataset, MM-PhyQA, which comprises well-constructed, high school-level multimodal physics problems. By evaluating the performance of contemporary LLMs that are publicly available, both with and without the incorporation of multimodal elements in these problems, we aim to shed light on their capabilities. For generating answers for questions consisting of multimodal input (in this case, images and text) we employed Zero-shot prediction using GPT-4 and utilized LLaVA (LLaVA and LLaVA-1.5), the latter of which were fine-tuned on our dataset. For evaluating the performance of LLMs consisting solely of textual input, we tested the performance of the base and fine-tuned versions of the Mistral-7B and LLaMA2-7b models. We also showcased the performance of the novel Multi-Image Chain-of-Thought (MI-CoT) Prompting technique, which when used to train LLaVA-1.5 13b yielded the best results when tested on our dataset, with superior scores in most metrics and the highest accuracy of 71.65% on the test set.

Avinash Anand, Janak Kapuriya, Apoorv Singh, Jay Saraf, Naman Lal, Astha Verma, Rushali Gupta, Rajiv Shah
Adversarial Text Purification: A Large Language Model Approach for Defense

Adversarial purification is a defense mechanism for safeguarding classifiers against adversarial attacks without knowing the type of attacks or training of the classifier. These techniques characterize and eliminate adversarial perturbations from the attacked inputs, aiming to restore purified samples that retain similarity to the initially attacked ones and are correctly classified by the classifier. Due to the inherent challenges associated with characterizing noise perturbations for discrete inputs, adversarial text purification has been relatively unexplored. In this paper, we investigate the effectiveness of adversarial purification methods in defending text classifiers. We propose a novel adversarial text purification that harnesses the generative capabilities of Large Language Models (LLMs) to purify adversarial text without the need to explicitly characterize the discrete noise perturbations. We utilize prompt engineering to exploit LLMs for recovering the purified samples for given adversarial examples such that they are semantically similar and correctly classified. Our proposed method demonstrates remarkable performance over various classifiers, improving their accuracy under the attack by over 65% on average.

Raha Moraffah, Shubh Khandelwal, Amrita Bhattacharjee, Huan Liu
lil’HDoC: An Algorithm for Good Arm Identification Under Small Threshold Gap

Good arm identification (GAI) is a pure-exploration bandit problem in which a single learner outputs an arm as soon as it is identified as a good arm. A good arm is defined as an arm with an expected reward greater than or equal to a given threshold. This paper focuses on the GAI problem under a small threshold gap, which refers to the distance between the expected rewards of arms and the given threshold. We propose a new algorithm called lil’HDoC to significantly improve the total sample complexity of the HDoC algorithm. We demonstrate that the sample complexity of the first $$\lambda $$ λ output arm in lil’HDoC is bounded by the original HDoC algorithm, except for one negligible term, when the distance between the expected reward and threshold is small. Extensive experiments confirm that our algorithm outperforms the state-of-the-art algorithms in both synthetic and real-world datasets.

Tzu-Hsien Tsai, Yun-Da Tsai, Shou-De Lin

Recommender Systems

Frontmatter
ScaleViz: Scaling Visualization Recommendation Models on Large Data

Automated visualization recommendation (Vis-Rec) models help users to derive crucial insights from new datasets. Typically, such automated Vis-Rec models first calculate a large number of statistics from the datasets and then use machine-learning models to score or classify multiple visualizations choices to recommend the most effective ones, as per the statistics. However, state-of-the-art models rely on a very large number of expensive statistics and therefore using such models on large datasets becomes infeasible due to prohibitively large computational time, limiting the effectiveness of such techniques to most large real-world datasets. In this paper, we propose a novel reinforcement-learning (RL) based framework that takes a given Vis-Rec model and a time budget from the user and identifies the best set of input statistics, specifically for a target dataset, that would be most effective while generating accurate enough visual insights. We show the effectiveness of our technique as it enables two state of the art Vis-Rec models to achieve up to 10X speedup in time-to-visualize on four large real-world datasets.

Ghazi Shazan Ahmad, Shubham Agarwal, Subrata Mitra, Ryan Rossi, Manav Doshi, Vibhor Porwal, Syam Manoj Kumar Paila
Collaborative Filtering in Latent Space: A Bayesian Approach for Cold-Start Music Recommendation

Personalized music recommendation technology is effective in helping users discover desired songs. However, accurate recommendations become challenging in cold-start scenarios with newly registered or limited data users. To address the accuracy, diversity, and interpretability challenges in cold-start music recommendation, we propose CFLS, a novel approach that conducts collaborative filtering in the space of latent variables based on the Variational Auto-Encoder (VAE) framework. CFLS replaces the standard normal distribution prior in VAE with a Gaussian process (GP) prior based on user profile information, enabling consideration of user correlations in the latent space. Experimental results on real-world datasets demonstrate the effectiveness and superiority of our proposed method. Visualization techniques are employed to showcase the diversity, interpretability, and user-controllability of the recommendation results achieved by CFLS.

Menglin Kong, Li Fan, Shengze Xu, Xingquan Li, Muzhou Hou, Cong Cao
On Diverse and Precise Recommendations for Small and Medium-Sized Enterprises

Recommender Systems are a popular and common means to extract relevant information for users. Small and medium-sized enterprises make up a large share of the overall amount of business but need to be more frequently considered regarding the demand for recommender systems. Different conditions, such as the small amount of data, lower computational capabilities, and users frequently not possessing an account, require a different and potentially a more small-scale recommender system. The requirements regarding quality are similar: High accuracy and high diversity are certainly an advantage. We provide multiple solutions with different variants solely based on information contained in event-based sequences and temporal information. Our code is available at GitHub ( https://github.com/lmu-dbs/DP-Recs ). We conduct experiments on four different datasets with an increasing set of items to show a possible range for scalability. The promising results show the applicability of these grammar-based recommender system variants and leave the final decision on which recommender to choose to the user and its ultimate goals.

Ludwig Zellner, Simon Rauch, Janina Sontheim, Thomas Seidl
HMAR: Hierarchical Masked Attention for Multi-behaviour Recommendation

In the context of recommendation systems, addressing multi-behavioral user interactions has become vital for understanding the evolving user behavior. Recent models utilize techniques like graph neural networks and attention mechanisms for modeling diverse behaviors, but capturing sequential patterns in historical interactions remains challenging. To tackle this, we introduce Hierarchical Masked Attention for multi-behavior recommendation (HMAR). Specifically, our approach applies masked self-attention to items of the same behavior, followed by self-attention across all behaviors. Additionally, we propose historical behavior indicators to encode the historical frequency of each item’s behavior in the input sequence. Furthermore, the HMAR model operates in a multi-task setting, allowing it to learn item behaviors and their associated ranking scores concurrently. Extensive experimental results on four real-world datasets demonstrate that our proposed model outperforms state-of-the-art methods. Our code and datasets are available here ( https://github.com/Shereen-Elsayed/HMAR ).

Shereen Elsayed, Ahmed Rashed, Lars Schmidt-Thieme
Residual Spatio-Temporal Collaborative Networks for Next POI Recommendation

As location-based services become increasingly integrated into users’ lives, the next point-of-interest (POI) recommendation has become a prominent area of research. Currently, many studies are based on Recurrent Neural Networks (RNNs) to model user behavioral dependencies, thereby capturing user interests in POIs. However, these methods lack consideration of discrete check-in information, failing to comprehend the complex motivations behind user behavior. Moreover, the information collaboration efficiency of existing methods is relatively low, making it challenging to effectively incorporate the numerous collaborative signals within the historical trajectory sequences, thus limiting improvements in recommendation performance. To address the issues mentioned above, we propose a novel Residual Spatio-Temporal Collaborative Network (RSTCN) for improved next POI recommendation. Specifically, we design an encoder-decoder architecture based on residual linear layers to better integrate spatio-temporal collaborative signals by feature projection at each time step, thus improving the capture of users’ long-term dependencies. Furthermore, we have devised a skip-learning algorithm to construct discrete data in a skipping manner, aiming to consider potential relationships between discrete check-ins and thus enhance the modeling capacity of short-term user dependencies. Extensive experiments on two real-world datasets demonstrate that our model significantly outperforms state-of-the-art methods.

Yonghao Huang, Pengxiang Lan, Xiaokang Li, Yihao Zhang, Kaibei Li
Conditional Denoising Diffusion for Sequential Recommendation

Contemporary attention-based sequential recommendations often encounter the oversmoothing problem, which generates indistinguishable representations. Although contrastive learning addresses this problem to a degree by actively pushing items apart, we still identify a new ranking plateau issue. This issue manifests as the ranking scores of top retrieved items being too similar, making it challenging for the model to distinguish the most preferred items from such candidates. This leads to a decline in performance, particularly in top-1 metrics. In response to these issues, we present a conditional denoising diffusion model that includes a stepwise diffuser, a sequence encoder, and a cross-attentive conditional denoising decoder. This approach streamlines the optimization and generation process by dividing it into simpler, more tractable sub-steps in a conditional autoregressive manner. Furthermore, we introduce a novel optimization scheme that incorporates both cross-divergence loss and contrastive loss. This new training scheme enables the model to generate high-quality sequence/item representations while preventing representation collapse. We conduct comprehensive experiments on four benchmark datasets, and the superior performance achieved by our model attests to its efficacy. We open-source our code at https://github.com/YuWang-1024/CDDRec.

Yu Wang, Zhiwei Liu, Liangwei Yang, Philip S. Yu
UIPC-MF: User-Item Prototype Connection Matrix Factorization for Explainable Collaborative Filtering

In recent years, prototypes have gained traction as an interpretability concept in the Computer Vision Domain, and have also been explored in Recommender System algorithms. This paper introduces UIPC-MF, an innovative prototype-based matrix factorization technique aimed at offering explainable collaborative filtering recommendations. Within UIPC-MF, both users and items link with prototype sets that encapsulate general collaborative features. UIPC-MF uniquely learns connection weights, highlighting the relationship between user and item prototypes, offering a fresh method for determining the final predicted score beyond the conventional dot product. Comparative results show that UIPC-MF surpasses other prototype-based benchmarks in Hit Ratio and Normalized Discounted Cumulative Gain across three datasets, while enhancing transparency.

Lei Pan, Von-Wun Soo
Towards Multi-subsession Conversational Recommendation

Conversational recommendation systems (CRS) could acquire dynamic user preferences towards desired items through multi-round interactive dialogue. Previous CRS works mainly focus on the single conversation (subsession) that the user quits after a successful recommendation, neglecting the common scenario where the user has multiple conversations (multi-subsession) over a short period. Therefore, we propose a novel conversational recommendation scenario named Multi-Subsession Multi-round Conversational Recommendation (MSMCR), where the user would still resort to CRS after several subsessions and might preserve vague interests, and the system would proactively ask attributes to activate user interests in the current subsession. To fill the gap in this new CRS scenario, we devise a novel framework called Multi-Subsession Conversational Recommender with Activation Attributes (MSCAA). Specifically, we first develop a context-aware recommendation module, comprehensively modeling user interests from historical interactions, previous subsessions, and feedback in the current subsession. Furthermore, an attribute selection policy module is proposed to learn a flexible strategy for asking appropriate attributes to elicit user interests. Finally, we design a conversation policy module to manage the above two modules to decide actions between asking and recommending. Extensive experiments on four datasets verify the effectiveness of our MSCAA framework for the proposed MSMCR setting (More details of our work are presented in https://arxiv.org/pdf/2310.13365v1.pdf ).

Yu Ji, Qi Shen, Shixuan Zhu, Hang Yu, Yiming Zhang, Chuan Cui, Zhihua Wei
False Negative Sample Aware Negative Sampling for Recommendation

Negative sampling plays a key role in implicit feedback collaborative filtering. It draws high-quality negative samples from a large number of uninteracted samples. Existing methods primarily focus on hard negative samples, while overlooking the issue of sampling bias introduced by false negative samples. We first experimentally show the adverse effect of false negative samples in hard negative sampling strategies. To mitigate this adverse effect, we propose a method that dynamically identifies and eliminates false negative samples based on dynamic negative sampling (EDNS). Our method integrates a global identification module and a positives-context identification module. The former performs clustering on embeddings of all users and items and deletes uninteracted items that are in the same cluster as the corresponding user as false negative samples. The latter constructs a similarity measure for uninteracted items based on the positive sample set of the user and removes the top-k items ranked by the measure as false negative samples. Finally, we utilize the dynamic negative sampling strategy to build a sample pool from the corrected uninteracted sample set, effectively mitigating the risk of introducing false negative samples Experiments on three real-world datasets show that our approach significantly outperforms state-of-the-art negative sampling baselines.

Liguo Chen, Zhigang Gong, Hong Xie, Mingqiang Zhou
Multi-sourced Integrated Ranking with Exposure Fairness

Integrated ranking system is one of the critical components of industrial recommendation platforms. An integrated ranking system is expected to generate a mix of heterogeneous items from multiple upstream sources. Two main challenges need to be solved in this process, namely, (i) Utility-fairness tradeoff: an integrated ranking system is required to balance the overall platform’s utility and exposure fairness among different sources; (ii) Information utilization from upstream sources: each source sequence has been carefully arranged by its provider, so how to efficiently utilize the source sequential information is important and should be carefully considered by the integrated ranking system. Existing methods generally cannot address these two challenges well. In this paper, we propose an integrated ranking model called Multi-sourced Constrained Ranking (MSCRank). It is a dual RNN-based model managing the utility-fairness tradeoff with multi-task learning, and capturing information in source sequences with a novel MA-GRU cell. We compare MSCRank with various baselines on public and industrial datasets, and MSCRank achieves the state-of-the-art performance on both utility and fairness. Online A/B test further validates the effectiveness of MSCRank.

Yifan Liu, Weiwen Liu, Wei Xia, Jieming Zhu, Weinan Zhang, Zhenhua Dong, Yang Wang, Ruiming Tang, Rui Zhang, Yong Yu
Soft Contrastive Learning for Implicit Feedback Recommendations

Collaborative filtering (CF) plays a crucial role in the development of recommendations. Most CF research focuses on implicit feedback due to its accessibility, but deriving user preferences from such feedback is challenging given the inherent noise in interactions. Existing works primarily employ unobserved interactions as negative samples, leading to a critical noisy-label problem. In this study, we propose SCLRec (Soft Contrastive Learning for Recommendations), a novel method to alleviate the noise issue in implicit recommendations. To this end, we first construct a similarity matrix based on user and item embeddings along with item popularity information. Subsequently, to leverage information from nearby samples, we employ entropy optimal transport to obtain the matching matrix from the similarity matrix. The matching matrix provides additional supervisory signals that uncover matching relationships of unobserved user-item interactions, thereby mitigating the noise issue. Finally, we treat the matching matrix as soft targets, and use them to train the model via contrastive learning loss. Thus, we term it soft contrastive learning, which combines the denoising capability of soft targets with the representational strength of contrastive learning to enhance implicit recommendations. Extensive experiments on three public datasets demonstrate that SCLRec achieves consistent performance improvements compared to state-of-the-art CF methods.

Zhen-Hua Zhuang, Lijun Zhang
Dual-Graph Convolutional Network and Dual-View Fusion for Group Recommendation

Group recommendation constitutes a burgeoning research focus in recommendation systems. Despite a multitude of approaches achieving satisfactory outcomes, they still fail to address two major challenges: 1) these methods confine themselves to capturing user preferences exclusively within groups, neglecting to consider user collaborative signals beyond groups, which reveal users’ potential interests; 2) they do not sufficiently take into account the impact of multiple factors on group decision-making, such as individual expertise and influence, and the group’s general preferences. To tackle these challenges, we propose a new model named DDGR (Dual-Graph Convolutional Network and Dual-View Fusion for Group Recommendation), designed to capture representations addressing two aspects: member preferences and group preferences. DDGR consists of two components: 1) a dual-graph convolutional network that combines the benefits of both hypergraphs and graphs to fully explore member potential interests and collaborative signals; 2) a dual-view fusion strategy that accurately simulates the group negotiation process to model the impact of multiple factors from member and group view, which can obtain semantically rich group representations. Thorough validation on two real-world datasets indicates that our model significantly surpasses state-of-the-art methods.

Chenyang Zhou, Guobing Zou, Shengxiang Hu, Hehe Lv, Liangrui Wu, Bofeng Zhang
TripleS: A Subsidy-Supported Storage for Electricity with Self-financing Management System

In this paper, we propose a Subsidy-Supported Storage (also called TripleS) to assist grid management. Q-learning algorithms first determine the origin subsidies, and the proposed self-financing mechanism then balances the expected costs and gains, and generates the final subsidies. During market equilibrium, energy storage is fully charged when there is excess electricity and discharged when there is insufficient electricity. The electricity market then calculates the cash flow of the subsidies, and the remaining cash is used to make up for the self-discharge loss of the storage units. Experimental results demonstrate the effectiveness of the proposed TripleS in maintaining grid stability.

Jia-Hao Syu, Rafal Cupek, Chao-Chun Chen, Jerry Chun-Wei Lin

Spatio-temporal Data

Frontmatter
Mask Adaptive Spatial-Temporal Recurrent Neural Network for Traffic Forecasting

How to model the spatial-temporal graph is a crucial problem for the accuracy of traffic forecasting. Existing GNN-based work mostly captures spatial dependencies by using a pre-defined graph for close nodes and a self-adaptive graph for distant nodes. However, the pre-defined graphs cannot accurately represent the genuine spatial dependency due to the complexity of traffic conditions. Furthermore, existing methods cannot effectively capture the spatial heterogeneity and temporal periodicity in traffic data. Additionally, small errors in each time step will greatly amplify in the long sequence prediction for a sequence-to-sequence model. To address these issues, we propose a novel framework, MASTRNN, for traffic forecasting. Firstly, a novel mask-adaptive matrix is proposed to enhance the pre-defined graph, which is learned through node embedding. Secondly, we assign identity embeddings to each node and each time step in order to capture the spatial heterogeneity and temporal periodicity, respectively. Thirdly, a multi-head attention layer is employed between the encoder and decoder to alleviate the problem of error propagation. Experimental results on three real-world traffic network datasets demonstrate that MASTRNN outperforms the state-of-the-art baselines.

Xingbang Hu, Shuo Zhang, Wenbo Zhang, Hejiao Huang
Distributional Kernel: An Effective and Efficient Means for Trajectory Retrieval

In this paper, we propose a new and powerful way to represent trajectories and measure the distance between them using a distributional kernel. Our method has two unique properties: (i) the identity property which ensures that dissimilar trajectories have no short distances, and (ii) a runtime orders of magnitude faster than that of existing distance measures. An extensive evaluation on several large real-world trajectory datasets confirms that our method is more effective and efficient in trajectory retrieval tasks than traditional and deep learning-based distance measures.

Yuanyi Shang, Kai Ming Ting, Zijing Wang, Yufan Wang
Multi-agent Reinforcement Learning for Online Placement of Mobile EV Charging Stations

As global interest shifts toward sustainable transportation with the proliferation of electric vehicles (EVs), the demand for an efficient, real-time, and robust charging infrastructure becomes increasingly pronounced. This paper introduces an approach to address the imbalance between the surging EV demand and the existing charging infrastructure: the concept of Mobile Charging Stations (MCSs). The research develops an algorithm for the dynamic placement of MCSs to significantly reduce the waiting time for EV owners. The core of this research is the Two-stage Placement and Management with Multi-Agent Reinforcement Learning (2PM-MARL) for a dynamic balancing of charging demand and supply. The complexity of the problem is elaborated by showing the NP-hard nature of the MCS placement issue through a relation to the Uncapacitated Facility Location Problem (UFLP), underscoring the computational challenges and emphasizing the need for intelligent real-time solutions. Our framework is validated through comprehensive experiments using real-world charging session data. The results exhibit significant reductions in the waiting time, suggesting the potential practicality and efficiency of our proposed model.

Lo Pang-Yun Ting, Chi-Chun Lin, Shih-Hsun Lin, Yu-Lin Chu, Kun-Ta Chuang
Localization Through Deep Learning in New and Low Sampling Rate Environments

Source localization in wireless networks is essential for spectrum utilization optimization. Traditional methods often require extensive transmitter information while existing deep learning approaches perform poorly in new and low sampling rate environments. We introduce LocNet, a deep learning approach that overcomes these limitations using a compact UNet-like architecture incorporating environmental maps. Unlike other deep learning strategies, LocNet adopts loss functions designed explicitly for imbalanced data, moving beyond the conventional mean-square error loss. Our comparative analysis reveals that LocNet outperforms other deep learning models by more than a factor of two. This advancement underscores LocNet’s suitability for real-world deployment across diverse operational contexts.

Thanh Dat Le, Yan Huang
MPRG: A Method for Parallel Road Generation Based on Trajectories of Multiple Types of Vehicles

Accurate and up-to-date digital road maps are the foundation of many applications, such as navigation and autonomous driving. Recently, the ubiquity of GPS devices in vehicular systems has led to an unprecedented amount of vehicle sensing data for map inference. Existing trajectory-based map generation methods are difficult to accurately generate parallel roads where the GPS positioning errors are large, and the sampling frequency is low. In this paper, we propose a novel method MPRG to discover parallel roads based on the differences between free and fixed trajectories from different types of vehicles. This method can serve as a plugin for any existing map generation method. MPRG extracts highly discriminative features by utilizing the spatial distribution and regional correlation information of trajectories from different vehicle types. Then, the multidimensional features are fed into an SVM classification model suitable for small sample to identify and generate the parallel roads. We apply MPRG to three advanced road generation methods using GPS data from Shenzhen. The results show that we can significantly improve the performance of parallel road generation.

Bingru Han, Juanjuan Zhao, Xitong Gao, Kejiang Ye, Fan Zhang
GSPM: An Early Detection Approach to Sudden Abnormal Large Outflow in a Metro System

Early detection of Sudden Abnormal Large Outflow (SALO) aims to determine abnormal large outflows and locate the station where real-time outflow significantly exceeds expectations. SALO serves as a crucial indicator for city administration to identify emerging crowd gathering events as early as possible. Existing solutions can’t work well for SALO prediction due to the lack of modeling the dynamic gathering trend of passenger flows in SALO instances, characterized by strong randomness and low probability. In this paper, we propose a novel Gathering Score based Prediction Method, called GSPM, for SALO prediction. GSPM introduces a gathering score to quantify the dynamic gathering trend of abnormal online flows, limits the SALO location to a few candidate stations, and locates it using a utility-theory-based model. This method is built on key data-driven insights, such as obvious increases in online flows before SALO occurrences, and passengers are more inclined to gather near stations. We evaluate GSPM with extensive experiments based on smart card data collected by Automatic Fare Collection system over two years. The results demonstrate that GSPM surpasses the results of state-of-the-art baselines.

Li Sun, Juanjuan Zhao, Fan Zhang, Kejiang Ye
FMSYS: Fine-Grained Passenger Flow Monitoring in a Large-Scale Metro System Based on AFC Smart Card Data

In this paper, we investigate the real-time fine-grained passenger flows in a complex metro system. Our primary focus is on addressing crucial questions, such as determining the number of passengers on a moving train and in specific station areas (e.g., access channel, transfer channel, platform). These insights are essential for effective traffic management and ensuring public safety. Existing visual analysis methods face limitations in achieving comprehensive network coverage due to deployment costs. To overcome this challenge, we introduce FMSYS, a cloud-based analysis system leveraging smart card data for efficient and reliable real-time passenger flow predictions. FMSYS identifies each passenger’s travel patterns and classifies passengers into two groups: regular (D-group) and stochastic (ND-group). It models stochastic movement of passengers using a state transition process at the group level and employs a combined approach of KNN and Gaussian Process Regression for dynamic state transition prediction. Empirical analysis, based on six months of smart card transactions in Shenzhen, China, validates the effectiveness of FMSYS.

Li Sun, Juanjuan Zhao, Fan Zhang, Rui Zhang, Kejiang Ye
Enhanced HMM Map Matching Model Based on Multiple Type Trajectories

Map matching (MM) aims to align GPS trajectory with the actual roads on a map that vehicles pass through, essential for applications like trajectory search and route planning. The Hidden Markov Model (HMM) is commonly employed for online MM due to its interpretability and suitability for low GPS sampling rates. However, in complex urban areas with notable GPS drift, existing HMM methods face efficiency and accuracy challenges due to the use of a uniform road search radius and imprecise real-time road condition understanding. This paper proposes an improved HMM method using multiple trajectory types based on the following key ideas: Vehicle trajectories can be divided into two types: fixed trajectories (e.g., bus) and free trajectories (taxis, private cars). The relatively accurate information of fixed trajectories can help us more accurately measure the error distribution, as well as accurate road conditions. The novelty of our approach lies in the following aspects: i) Using fixed bus trajectories to estimate region-specific GPS error distribution, optimizing observation probabilities and reducing candidate road search costs. ii) Utilizing real-time fixed trajectories for accurate, real-time road state estimation, enhancing dynamic state transition probabilities in HMM. Empirical analysis, based on real bus and taxi trajectories in Shenzhen over half a year, demonstrate that our method outperforms existing methods in terms of map matching efficiency and accuracy.

Yuchen Song, Juanjuan Zhao, Xitong Gao, Fan Zhang, Kejiang Ye
A Multimodal and Multitask Approach for Adaptive Geospatial Region Embeddings

Geospatial region embeddings are vital in developing predictive models tailored to urban environments. Such models enable critical applications, including crime rate prediction and land usage classification. However, state-of-the-art methods typically generate embeddings based on fixed administrative regions. These regions may not always align with specific tasks or areas of user interest. Creating fine-grained embeddings tailored to specific tasks and regions of user interest is labor-intensive and requires substantial resources. In this paper, we propose MAGRE – a novel approach that generates fine-granular adaptive geospatial region embeddings by leveraging multimodal and multitask learning. The embeddings generated by MAGRE can be flexibly aggregated to suit various region boundaries, rendering them effective in diverse urban applications. Our experimental results demonstrate that MAGRE’s embeddings outperform state-of-the-art embedding baselines, resulting in a 25.73% reduction in root mean squared error for crime rate prediction and a 19.08% reduction for check-in count prediction.

Rajjat Dadwal, Ran Yu, Elena Demidova
Attention Mechanism Based Multi-task Learning Framework for Transportation Time Prediction

Transportation time prediction (TIP) of a truck is one of key tasks for supporting the services in bulk logistics like route planning. But TIP prediction is challenging as it involves travel time prediction and dwell time prediction, which are influenced by various complex factors. Besides, there exists mutually constrained effects between travel time prediction and dwell time prediction. In this paper, we propose an Attention Mechanism based Multi-Task prediction framework consisting of travel pattern learning, stay pattern learning and transportation time modeling, called AMP. In view of that low prediction performance resulted by uncertain dwell time and mutually constrained effects between travel time and dwell time, we put forward a stay pattern learning module based on transformer and multi-factor attention mechanism. Furthermore, we design a multi-task learning based prediction module embedded with a mutual cross-attention mechanism to enhance overall prediction performance. Experimental results on a large-scale logistics data set demonstrate that our proposal can reduce MAPE by an average of 9.2%, MAE by an average of 19.5%, and RMSE by an average of 23.0% as compared to the baselines.

Miaomiao Yang, Tao Wu, Jiali Mao, Kaixuan Zhu, Aoying Zhou
MSTAN: A Multi-view Spatio-Temporal Aggregation Network Learning Irregular Interval User Activities for Fraud Detection

Discovering fraud patterns from numerous user activities is crucial for fraud detection. However, three factors make this task quite challenging: Firstly, previous research usually utilize just one of the two forms of user activity, namely sequential behavior and interaction relationship, leaving much information unused. Additionally, nearly all works merely study on a single view of user activities, but fraud patterns often span across multiple views. Moreover, most existing models can only handle regular time intervals, while in reality, user activities occur with irregular time intervals. To effectively discover fraud patterns from user activities, this paper proposes MSTAN (Multi-view Spatio-Temporal Aggregation Network) for fraud detection. It addresses the above problems through three phases: (1) In short-term aggregation, SIFB (Sequential behavior and Interaction relationship Fusion Block) is employed to integrate sequential behavior and interaction relationship. (2) In view aggregation, 2-dimensional multi-view user activity embedding is obtained for simultaneously mining multiple views. (3) In long-term aggregation CTLSTM (Convolutional Time LSTM) is designed to deal with irregular time intervals. Experiments on two real world datasets demonstrate that our model outperforms the comparison methods.

Wenbo Zhang, Shuo Zhang, Xingbang Hu, Hejiao Huang
Backmatter
Metadaten
Titel
Advances in Knowledge Discovery and Data Mining
herausgegeben von
De-Nian Yang
Xing Xie
Vincent S. Tseng
Jian Pei
Jen-Wei Huang
Jerry Chun-Wei Lin
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
Electronic ISBN
978-981-9722-62-4
Print ISBN
978-981-9722-64-8
DOI
https://doi.org/10.1007/978-981-97-2262-4

Premium Partner