nach oben

2024 | Buch

Kapitel lesen Erstes Kapitel lesen

Advances in Knowledge Discovery and Data Mining

28th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2024, Taipei, Taiwan, May 7–10, 2024, Proceedings, Part II

herausgegeben von: De-Nian Yang, Xing Xie, Vincent S. Tseng, Jian Pei, Jen-Wei Huang, Jerry Chun-Wei Lin

Verlag: Springer Nature Singapore

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

The 6-volume set LNAI 14645-14650 constitutes the proceedings of the 28th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2024, which took place in Taipei, Taiwan, during May 7–10, 2024.

The 177 papers presented in these proceedings were carefully reviewed and selected from 720 submissions. They deal with new ideas, original research results, and practical development experiences from all KDD related areas, including data mining, data warehousing, machine learning, artificial intelligence, databases, statistics, knowledge engineering, big data technologies, and foundations.

Inhaltsverzeichnis

Frontmatter

Deep Learning

Frontmatter

AdaPQ: Adaptive Exploration Product Quantization with Adversary-Aware Block Size Selection Toward Compression Efficiency

Product Quantization (PQ) has received an increasing research attention due to the effectiveness on bit-width compression for memory efficiency. PQ is developed to divide weight values into blocks and adopt clustering algorithms dividing them into groups assigned with quantized values accordingly. Existing research mainly focused on the clustering strategy design with a minimal error between the original weights and the quantized values for the performance maintenance. However, the block division, i.e., the selection of block size, determines the choices of number of clusters and compression rate which has not been fully studied. Therefore, this paper proposes a novel scheme AdaPQ with the process, Adaptive Exploration Product Quantization, to first flexibly construct varying block sizes by padding the filter weights, which enlarges the search space of quantization result of PQ and avoids being suffered from a sub-optimal solution. Afterward, we further design a strategy, Adversary-aware Block Size Selection, to select an appropriate block size for each layer by evaluating the sensitivity on performance under perturbation for obtaining a minor performance loss under a high compression rate. Experimental results show that AdaPQ achieves a higher accuracy under a similar compression rate compared with the state-of-the-art.

Yan-Ting Ye, Ting-An Chen, Ming-Syan Chen

Ranking Enhanced Supervised Contrastive Learning for Regression

Supervised contrastive learning has shown promising results in image classification tasks where the representations are pulled together if they share same labels or otherwise pushed apart. Such dispersion process in the representation space benefits the downstream classification tasks. However, when applied to regression tasks directly, such dispersion lacks guidance of the relationship among target labels (i.e. the label distances), which leads to the disalignment between representation distances and label distances. Achieving such alignment without compromising the dispersion of learned representations is challenging. In this paper, we propose a Ranking Enhanced Supervised Contrastive Loss (RESupCon) to empower the representation dispersion process with ranking alignment between representation distances and label distances in a controlled fashion. We demonstrate the effectiveness of our method in image regression tasks on four real-world datasets with various interests, including meteorological, medical and human facial data. Experimental results of our method show that representations with better ranking are learned and improvements are made over other baselines in terms of RMSE on all four datasets.

Ziheng Zhou, Ying Zhao, Haojia Zuo, Wenguang Chen

Treatment Effect Estimation Under Unknown Interference

Causal inference is a powerful tool for effective decision-making in various areas, such as medicine and commerce. For example, it allows businesses to determine whether an advertisement has a role in influencing a customer to buy the advertised product. The influence of an advertisement on a particular customer is considered the advertisement’s individual treatment effect (ITE). This study estimates ITE from data in which units are potentially connected. In this case, the outcome for a unit can be influenced by treatments to other units, resulting in inaccurate ITE estimation, a phenomenon known as interference. Existing methods for ITE estimation that address interference rely on knowledge of connections between units. However, these methods are not applicable when this connection information is missing due to privacy concerns, a scenario known as unknown interference. To overcome this limitation, this study proposes a method that designs a graph structure learner, which infers the structure of interference by imposing an $$L_0$$ L 0 -norm regularization on the number of potential connections. The inferred structure is then fed into a graph convolution network to model interference received by units. We carry out extensive experiments on several datasets to verify the effectiveness of the proposed method in addressing unknown interference.

Xiaofeng Lin, Guoxi Zhang, Xiaotian Lu, Hisashi Kashima

A New Loss for Image Retrieval: Class Anchor Margin

The performance of neural networks in content-based image retrieval (CBIR) is highly influenced by the chosen loss (objective) function. The majority of objective functions for neural models can be divided into metric learning and statistical learning. Metric learning approaches require a pair mining strategy that often lacks efficiency, while statistical learning approaches are not generating highly compact features due to their indirect feature optimization. To this end, we propose a novel repeller-attractor loss that falls in the metric learning paradigm, yet directly optimizes for the $$L_{2}$$ L 2 metric without the need of generating pairs. Our loss is formed of three components. One leading objective ensures that the learned features are attracted to each designated learnable class anchor. The second loss component regulates the anchors and forces them to be separable by a margin, while the third objective ensures that the anchors do not collapse to zero. Furthermore, we develop a more efficient two-stage retrieval system by harnessing the learned class anchors during the first stage of the retrieval process, eliminating the need of comparing the query with every image in the database. We establish a set of three datasets (CIFAR-100, Food-101, and ImageNet-200) and evaluate the proposed objective on the CBIR task, by using both convolutional and transformer architectures. Compared to existing objective functions, our empirical evidence shows that the proposed objective is generating superior and more consistent results.

Alexandru Ghiţă, Radu Tudor Ionescu

Personalized EDM Subject Generation via Co-factored User-Subject Embedding

This paper introduces the Co-Factored User-Subject Embedding based Personalized EDM Subject Generation Framework (COUPES), a model for creating personalized Electronic Direct Mail (EDM) subjects. COUPES adapts to individual content and style preferences using a dual-encoder structure to process product descriptions and template features. It employs a soft template-based selective encoder and matrix co-factorization for nuanced user embeddings. Experiments show that COUPES excels in generating engaging, personalized subjects and reconstructing recommendation ratings, proving its effectiveness in personalized marketing and recommendation systems.

Yu-Hsiu Chen, Zhi Rui Tam, Hong-Han Shuai

Spatial-Temporal Bipartite Graph Attention Network for Traffic Forecasting

Accurate traffic forecasting is pivotal for an efficient data-driven transportation system. The intricate nature of spatial-temporal dependencies and non-linearity present in traffic data has posed a significant challenge to the modeling of accurate traffic forecasting systems. Lately, there has been a significant effort to develop complex Spatial-Temporal Graph Neural Networks (STGNN) that predominantly utilize various Graph Neural Networks (GNN) and attention-based encoder-decoder architectures due to their ability to capture non-linear dependencies in spatial and temporal domains effectively. However, conventional GNNs limit explicit propagation of past information among nodes, while attention-based models such as transformers do not support finer-grained attention score distribution. In this study, we address the aforementioned issues and introduce a novel STGNN namely, Spatio-Temporal Bipartite Graph Attention Network (STBGAT) that allows explicit modeling of past information propagation among nodes. Further, we present a heterogeneous cross-attention mechanism in a transformer to compute finer-grained feature-wise attention distribution enabling the model to capture richer and more expressive temporal dependencies. Our experiments reveal that the proposed architecture outperforms the state-of-the-art approaches proposed in recent literature.

Dimuthu Lakmal, Kushani Perera, Renata Borovica-Gajic, Shanika Karunasekera

CMed-GPT: Prompt Tuning for Entity-Aware Chinese Medical Dialogue Generation

Medical dialogue generation relies on natural language generation techniques to enable online medical consultations. Recently, the widespread adoption of large-scale models in the field of natural language processing has facilitated rapid advancements in this technology. Existing medical dialogue models are mostly based on BERT and pre-trained on English corpora, but there is a lack of high-performing models on the task of Chinese medical dialogue generation. To solve the above problem, this paper proposes CMed-GPT, which is the GPT pre-training language model based on Chinese medical domain text. The model is available in two versions, namely, base and large, with corresponding perplexity values of 8.64 and 8.01. Additionally, we incorporate lexical and entity embeddings into the dialogue text in a uniform manner to meet the requirements of downstream dialogue generation tasks. By applying both fine-tuning and p-tuning to CMed-GPT, we lowered the PPL from 8.44 to 7.35. This study not only confirms the exceptional performance of the CMed-GPT model in generating Chinese biomedical text but also highlights the advantages of p-tuning over traditional fine-tuning with prefix prompts. Furthermore, we validate the significance of incorporating external information in medical dialogue generation, which enhances the quality of dialogue generation.

Zhijie Qu, Juan Li, Zerui Ma, Jianqiang Li

MvRNA: A New Multi-view Deep Neural Network for Predicting Parkinson’s Disease

Magnetic Resonance Imaging (MRI) is a critical medical diagnostic tool that assists experts in precisely identifying lesions. However, due to its high-dimensional nature, it requires substantial storage resources. If only one MRI slice were to be used, a significant amount of information might be lost. To address these issues, we propose segmenting 3D MRI data and training these slices separately. We propose a new Multi-view Learning neural network based on ResNet and an Attention mechanism, called MvRNA. ResNet18 is selected as the backbone network, and the Squeeze-and-Excitation network is applied between blocks to extract features from slices. Additionally, we propose a new BWH (Basic Block with Hybrid Dilated Convolution) module to capture a broader range of receptive fields, thus acquiring additional spatial features. We obtained data from Parkinson’s Progression Markers Initiative (PPMI) and applied our method to distinguish between Healthy Control, Prodromal, and Parkinson’s disease patients. The experimental results demonstrate that our method achieved an accuracy of 81.84%.

Lin Chen, Yuxin Zhou, Xiaobo Zhang, Zhehao Zhang, Hailong Zheng

Path-Aware Cross-Attention Network for Question Answering

Reasoning is an essential ability in QA systems, and the integration of this ability into QA systems has been the subject of considerable research. A prevalent strategy involves incorporating domain knowledge graphs using Graph Neural Networks (GNNs) to augment the performance of pre-trained language models. However, this approach primarily focuses on individual nodes and fails to leverage the extensive relational information present within the graph fully. In this paper, we present a novel model called Path-Aware Cross-Attention Network (PCN), which incorporates meta-paths containing relational information into the model. The PCN features a multi-layered, bidirectional cross-attention mechanism that facilitates information exchange between the textual representation and the path representation at each layer. By integrating rich inference information into the language model and contextual semantic information into the path representation, this mechanism enhances the overall effectiveness of the model. Furthermore, we incorporate a self-learning mechanism for path scoring, enabling weighted evaluation. The performance of our model is assessed across three benchmark datasets, covering the domains of commonsense question answering (CommonsenseQA, OpenbookQA) and medical question answering (MedQA-USMLE). The experimental results validate the efficacy of our proposed model.

Ziye Luo, Ying Xiong, Buzhou Tang

StyleAutoEncoder for Manipulating Image Attributes Using Pre-trained StyleGAN

Deep conditional generative models are excellent tools for creating high-quality images and editing their attributes. However, training modern generative models from scratch is very expensive and requires large computational resources. In this paper, we introduce StyleAutoEncoder (StyleAE), a lightweight AutoEncoder module, which works as a plugin for pre-trained generative models and allows for manipulating the requested attributes of images. The proposed method offers a cost-effective solution for training deep generative models with limited computational resources, making it a promising technique for a wide range of applications. We evaluate StyleAE by combining it with StyleGAN, which is currently one of the top generative models. Our experiments demonstrate that StyleAE is at least as effective in manipulating image attributes as the state-of-the-art algorithms based on invertible normalizing flows. However, it is simpler, faster, and gives more freedom in designing neural architecture.

Andrzej Bedychaj, Jacek Tabor, Marek Śmieja

SEE: Spherical Embedding Expansion for Improving Deep Metric Learning

The primary goal of deep metric learning is to construct a comprehensive embedding space that can effectively represent samples originating from both intra- and inter-classes. Although extensive prior work has explored diverse metric functions and innovative training strategies, much of this work relies on default training data. Consequently, the potential variations inherent within this data remain largely unexplored, constraining the model’s robustness to unseen images. In this context, we introduce the Spherical Embedding Expansion (SEE) method. SEE aims to uncover the latent semantic variations in training data. Especially, our method augments the embedding space with synthetic representations based on Max-Mahalanobis distribution (MMD) centers, which maximize the dispersion of these synthetic features without increasing computational costs. We evaluated the efficacy of SEE on four renowned standard benchmarks for the image retrieval task. The results demonstrate that SEE consistently enhances the performance of conventional methods when integrated with them, setting a new benchmark for deep metric learning performance across all settings. Particularly, the proposed method reveals its potency, especially when training with a low-dimensional embedding space and a large number of classes.

Binh Minh Le, Simon S. Woo

Multi-modal Recurrent Graph Neural Networks for Spatiotemporal Forecasting

The spatial and temporal dynamics of many real-world systems present a significant challenge to multi-variate forecasting where features of both forms, as well as their inter-dependencies, must be modeled correctly. State-of-the-art approaches utilize a limited set of exogenous features (outside the forecast variable) to model temporal dynamics and Graph Neural Networks, with pre-defined or learned networks, to model spatial dynamics. While much work has been done to model dependencies, existing approaches do not adequately capture the explicit and implicit modalities present in real-world systems. To address these limitations we propose MMR-GNN, a spatiotemporal model capable of (a) augmenting pre-defined (or absent) networks into optimal dependency structures (b) fusing multiple explicit modalities and (c) learning multiple implicit modalities. We show improvement over existing methods using several hydrology and traffic datasets. Our code is publicly available at https://github.com/HipGraph/MMR-GNN .

Nicholas Majeske, Ariful Azad

Layer-Wise Sparse Training of Transformer via Convolutional Flood Filling

Sparsifying the Transformer has garnered considerable interest, as training the Transformer is very computationally demanding. Prior efforts to sparsify the Transformer have either used a fixed pattern or data-driven approach to reduce the number of operations involving the computation of multi-head attention, which is the main bottleneck of the Transformer. However, existing methods suffer from inevitable problems, including potential loss of essential sequence features and an increase in the model size. In this paper, we propose a novel sparsification scheme for the Transformer that integrates convolution filters and the flood filling method to efficiently capture the layer-wise sparse pattern in attention operations. Our sparsification approach significantly reduces the computational complexity and memory footprint of the Transformer during training. Efficient implementations of the layer-wise sparsified attention algorithm on GPUs are developed, demonstrating our SPION that achieves up to 2.78 $$\times $$ × speedup over existing state-of-the-art sparse Transformer models and maintain high evaluation quality.

Bokyeong Yoon, Yoonsang Han, Gordon Euhyun Moon

Towards Cost-Efficient Federated Multi-agent RL with Learnable Aggregation

Multi-agent reinforcement learning (MARL) often adopts centralized training with a decentralized execution (CTDE) framework to facilitate cooperation among agents. When it comes to deploying MARL algorithms in real-world scenarios, CTDE requires gradient transmission and parameter synchronization for each training step, which can incur disastrous communication overhead. To enhance communication efficiency, federated MARL is proposed to average the gradients periodically during communication. However, such straightforward averaging leads to poor coordination and slow convergence arising from the non-i.i.d. problem which is evidenced by our theoretical analysis. To address the two challenges, we propose a federated MARL framework, termed cost-efficient federated multi-agent reinforcement learning with learnable aggregation (FMRL-LA). Specifically, we use asynchronous critics to optimize communication efficiency by filtering out redundant local updates based on the estimation of agent utilities. A centralized aggregator rectifies these estimations conditioned on global information to improve cooperation and reduce non-i.i.d. impact by maximizing the composite system objectives. For a comprehensive evaluation, we extend a challenging multi-agent autonomous driving environment to the federated learning paradigm, comparing our method to competitive MARL baselines. Our findings indicate that FMRL-LA can adeptly balance performance and efficiency. Code and appendix can be found in https://github.com/ArronDZhang/FMRL_LA .

Yi Zhang, Sen Wang, Zhi Chen, Xuwei Xu, Stano Funiak, Jiajun Liu

LongStory: Coherent, Complete and Length Controlled Long Story Generation

A human author can write any length of story without losing coherence. Also, they always bring the story to a proper ending, an ability that current language models lack. In this work, we present the LongStory for coherent, complete, and length-controlled long story generation. LongStory introduces two novel methodologies: (1) the long and short-term contexts weight calibrator (CWC) and (2) long story structural positions (LSP). The CWC adjusts weights for long-term context Memory and short-term context Cheating, acknowledging their distinct roles. The LSP employs discourse tokens to convey the structural positions of a long story. Trained on three datasets with varied average story lengths, LongStory outperforms other baselines, including the strong story generator Plotmachine, in coherence, completeness, relevance, and repetitiveness. We also perform zero-shot tests on each dataset to assess the model’s ability to predict outcomes beyond its training data and validate our methodology by comparing its performance with variants of our model.

Kyeongman Park, Nakyeong Yang, Kyomin Jung

Relation-Aware Label Smoothing for Self-KD

Knowledge distillation (KD) is widely used to improve models’ performances by transferring a larger teacher’s knowledge to a smaller student model. However, KD has a disadvantage where a pre-trained teacher model is required, which can lead to training inefficiency. Therefore, self-knowledge distillation, enhancing the student by itself, has been proposed. Although self-knowledge distillation shows remarkable performance improvement with fewer resources than conventional teacher-student based KD approaches, existing self-KD methods still require additional time and memory for training. We propose Relation-Aware Label Smoothing for Self-Knowledge Distillation (RAS-KD) that regularizes the student model itself by utilizing the inter-class relationships between class representative vectors with a light-weight auxiliary classifier. Compared to existing self-KD methods that only consider the instance-level knowledge, we show that proposed global-level knowledge is sufficient to achieve competitive performance while being extremely efficient training cost. Also, we achieve extra performance improvement through instance-level supervision. We demonstrate RAS-KD outperforms existing self-KD approaches in various tasks with negligible additional cost.

Jeongho Kim, Simon S. Woo

Bi-CryptoNets: Leveraging Different-Level Privacy for Encrypted Inference

Privacy-preserving neural networks have attracted increasing attention in recent years, and various algorithms have been developed to keep the balance between accuracy, computational complexity and information security from the cryptographic view. This work takes a different view from the input data and structure of neural networks. We decompose the input data (e.g., some images) into sensitive and insensitive segments according to importance and privacy. The sensitive segment includes some important and private information such as human faces and we take strong homomorphic encryption to keep security, whereas the insensitive one contains some background and we add perturbations. We propose the bi-CryptoNets, i.e., plaintext and ciphertext branches, to deal with two segments, respectively, and ciphertext branch could utilize the information from plaintext branch by unidirectional connections. We adopt knowledge distillation for our bi-CryptoNets by transferring representations from a well-trained teacher neural network. Empirical studies show the effectiveness and decrease of inference latency for our bi-CryptoNets.

Man-Jie Yuan, Zheng Zou, Wei Gao

Enhancing YOLOv7 for Plant Organs Detection Using Attention-Gate Mechanism

Herbarium scans are valuable raw data for studying how plants adapt to climate change and respond to various factors. Characterization of plant traits from these images is important for investigating such questions, thereby supporting plant taxonomy and biodiversity description. However, processing these images for meaningful data extraction is challenging due to scale variance, complex backgrounds that contain annotations, and the variability in specimen color, shape, and orientation of specimens. In addition, the plant organs often appear compressed, deformed, or damaged, with overlapping occurrences that are common in scans. Traditional organ recognition techniques, while adept at segmenting discernible plant characteristics, are limited in herbarium scanning applications. Two automated methods for plant organ identification have been previously reported. However, they show limited effectiveness, especially for small organs. In this study we introduce YOLOv7-ag model, which is a novel model based on the YOLOv7 that incorporates an attention-gate mechanism, which enhances the detection of plant organs, including stems, leaves, flowers, fruits, and seeds. YOLOv7-ag significantly outperforms previous state of the art as well as the original YOLOv7 and YOLOv8 models with a precision and recall rate of 99.2% and 98.0%, respectively, particularly in identifying small plant organs.

Hanane Ariouat, Youcef Sklab, Marc Pignal, Florian Jabbour, Régine Vignes Lebbe, Edi Prifti, Jean-Daniel Zucker, Eric Chenin

On Dark Knowledge for Distilling Generators

Knowledge distillation has been applied on generative models, such as Variational Autoencoder (VAE) and Generative Adversarial Networks (GANs). To distill the knowledge, the synthetic outputs of a teacher generator are used to train a student model. While the dark knowledge, i.e., the probabilistic output, is well explored in distilling classifiers, little is known about the existence of an equivalent dark knowledge for generative models and its extractability. In this paper, we derive the first kind of empirical risk bound for distilling generative models from a Bayesian perspective. Through our analysis, we show the existence of the dark knowledge for generative models, i.e., Bayes probability distribution of a synthetic output from a given input, which achieves lower empirical risk bound than merely using the synthetic output of the generators. Furthermore, we propose a Dark Knowledge based Distillation, DKtill, which trains the student generator based on the (approximate) dark knowledge. Our extensive evaluation on distilling VAE, conditional GANs, and translation GANs on Facades and CelebA datasets show that the FID of student generators trained by DKtill combining dark knowledge are lower than student generators trained only by the synthetic outputs by up to 42.66%, and 78.99%, respectively.

Chi Hong, Robert Birke, Pin-Yu Chen, Lydia Y. Chen

RPH-PGD: Randomly Projected Hessian for Perturbed Gradient Descent

The perturbed gradient descent (PGD) method, which adds random noises in the search directions, has been widely used in solving large-scale optimization problems, owing to its capability to escape from saddle points. However, it is inefficient sometimes for two reasons. First, the random noises may not point to a descent direction, so PGD may still stagnate around saddle points. Second, the size of random noises, which is controlled by the radius of the perturbation ball, may not be properly configured, so the convergence is slow. In this paper, we proposed a method, called RPH-PGD (Randomly Projected Hessian for Perturbed Gradient Descent), to improve the performance of PGD. The randomly projected Hessian (RPH) is created by projecting the Hessian matrix into a relatively small subspace which contains rich information about the eigenvectors of the original Hessian matrix. RPH-PGD utilizes the eigenvalues and eigenvectors of the randomly projected Hessian to identify the negative curvatures and uses the matrix itself to estimate the changes of Hessian matrices, which is necessary information for dynamically adjusting the radius during the computation. In addition, RPH-PGD employs the finite difference method to approximate the product of the Hessian and vectors, instead of constructing the Hessian explicitly. The amortized analysis shows the time complexity of RPH-PGD is only slightly higher than that of PGD. The experimental results show RPH-PGD does not only converge faster than PGD, but also converges in cases that PGD cannot.

Chi-Chang Li, Jay Huang, Wing-Kai Hon, Che-Rung Lee

Transformer based Multitask Learning for Image Captioning and Object Detection

In several real-world scenarios like autonomous navigation and mobility, to obtain a better visual understanding of the surroundings, image captioning and object detection play a crucial role. This work introduces a novel multitask learning framework that combines image captioning and object detection into a joint model. We propose TICOD, Transformer-based Image Captioning and Object Detection model for jointly training both tasks by combining the losses obtained from image captioning and object detection networks. By leveraging joint training, the model benefits from the complementary information shared between the two tasks, leading to improved performance for image captioning. Our approach utilizes a transformer-based architecture that enables end-to-end network integration for image captioning and object detection and performs both tasks jointly. We evaluate the effectiveness of our approach through comprehensive experiments on the MS-COCO dataset. Our model outperforms the baselines from image captioning literature by achieving a $$3.65\%$$ 3.65 % improvement in BERTScore.

Debolena Basak, P. K. Srijith, Maunendra Sankar Desarkar

Communicative and Cooperative Learning for Multi-agent Indoor Navigation

The ability to cooperate and work as a team is one of the “holy grail” goals of intelligent robots. To address the importance of communication in multi-agent reinforcement learning (MARL), we propose a Cooperative Indoor Navigation (CIN) task, where agents cooperatively navigate to reach a goal in a 3D indoor room with realistic observation inputs. This navigation task is more challenging and closer to real-world robotic applications than previous multi-agent tasks since each agent can observe only part of the environment from its first-person view. Therefore, this task requires the communication and cooperation of agents to accomplish. To research the CIN task, we collect a large-scale dataset with challenging demonstration trajectories. The code and data of the CIN task have been released. The prior methods of MARL primarily emphasized the learning of policies for multiple agents but paid little attention to the communication model, resulting in their inability to perform optimally in the CIN task. In this paper, we propose a MARL model with a communication mechanism to address the CIN task. In our experiments, we discover that our proposed model outperforms previous MARL methods and communication is the key to addressing the CIN task. Our quantitative results shows that our proposed MARL method outperforms the baseline by 6% on SPL. And our qualitative results demonstrates that the agent with the communication mechanism is able to explore the whole environment sufficiently so that navigate efficiently.

Fengda Zhu, Vincent CS Lee, Rui Liu

Enhancing Continuous Domain Adaptation with Multi-path Transfer Curriculum

Addressing the large distribution gap between training and testing data has long been a challenge in machine learning, giving rise to fields such as transfer learning and domain adaptation. Recently, Continuous Domain Adaptation (CDA) has emerged as an effective technique, closing this gap by utilizing a series of intermediate domains. This paper contributes a novel CDA method, W-MPOT, which rigorously addresses the domain ordering and error accumulation problems overlooked by previous studies. Specifically, we construct a transfer curriculum over the source and intermediate domains based on Wasserstein distance, motivated by theoretical analysis of CDA. Then we transfer the source model to the target domain through multiple valid paths in the curriculum using a modified version of continuous optimal transport. A bidirectional path consistency constraint is introduced to mitigate the impact of accumulated mapping errors during continuous transfer. We extensively evaluate W-MPOT on multiple datasets, achieving up to 54.1% accuracy improvement on multi-session Alzheimer MR image classification and 94.7% MSE reduction on battery capacity estimation.

Hanbing Liu, Jingge Wang, Xuan Zhang, Ye Guo, Yang Li

Graphs and Networks

Frontmatter

Enhancing Network Role Modeling: Introducing Attributed Multiplex Structural Role Embedding for Complex Networks

Numerous studies have focused on defining node roles within networks, producing network embeddings that maintain the structural role proximity of nodes. Yet, these approaches often fall short when applied to complex real-world networks, such as Twitter, where nodes have varying types of relationships (e.g., following, retweeting, replying) and possess relevant attributes impacting their network role (e.g., user profiles). To address these limitations, this study presents a novel method for attributed (for dealing with attributed nodes) multiplex (for dealing with networks with different types of edges) structural role embedding. This approach uses an autoencoder mechanism to concurrently encode node structure, relationships, and attributes, thus successfully modeling nodes’ roles within networks. Our method’s effectiveness is shown through quantitative and qualitative analyses conducted on synthetic networks, outperforming established benchmarks in identifying node roles within multiplex and attributed networks. Additionally, we have assembled a robust real-world multiplex network composed of almost all verified Twitter users comprised of retweet, reply, and followership interactions between these users, representing three different layers in our multiplex network. This network serves as a practical environment to evaluate our method’s capability to map the structural roles of users within real-world attributed multiplex networks. Using a verified dataset of influential users as a reference, we show our method excels over the existing benchmarks in learning structural roles on large-scale, real-world attributed multiplex networks, exemplified by our Twitter network.

Lili Wang, Chenghan Huang, Ruiye Yao, Chongyang Gao, Weicheng Ma, Soroush Vosoughi

Query-Decision Regression Between Shortest Path and Minimum Steiner Tree

Considering a graph with unknown weights, can we find the shortest path for a pair of nodes if we know the minimal Steiner trees associated with some subset of nodes? That is, with respect to a fixed latent decision-making system (e.g., a weighted graph), we seek to solve one optimization problem (e.g., the shortest path problem) by leveraging information associated with another optimization problem (e.g., the minimal Steiner tree problem). In this paper, we study such a prototype problem called query-decision regression with task shifts, focusing on the shortest path problem and the minimum Steiner tree problem. We provide theoretical insights regarding the design of realizable hypothesis spaces for building scoring models, and present two principled learning frameworks. Our experimental studies show that such problems can be solved to a decent extent with statistical significance.

Guangmo Tong, Peng Zhao, Mina Samizadeh

Enhancing Policy Gradient for Traveling Salesman Problem with Data Augmented Behavior Cloning

The use of deep reinforcement learning (DRL) techniques to solve classical combinatorial optimization problems like the Traveling Salesman Problem (TSP) has garnered considerable attention due to its advantage of flexible and fast model-based inference. However, DRL training often suffers low efficiency and scalability, which hinders model generalization. This paper proposes a simple yet effective pre-training method that utilizes behavior cloning to initialize neural network parameters for policy gradient DRL. To alleviate the need for large amounts of demonstrations in behavior cloning, we exploit the symmetry of TSP solutions for augmentation. Our method is demonstrated by enhancing the state-of-the-art policy gradient models Attention and POMO for the TSP. Experimental results show that the optimality gap of the solution is significantly reduced while the DRL training time is greatly shortened. This also enables effective and efficient solving of larger TSP instances.

Yunchao Zhang, Kewen Liao, Zhibin Liao, Longkun Guo

Leveraging Transfer Learning for Enhancing Graph Optimization Problem Solving

Reinforcement learning to solve graph optimization problems has attracted increasing attention recently. Typically, these models require extensive training over numerous graph instances to develop generalizable strategies across diverse graph types, demanding significant computational resources and time. Instead of tackling these problems one by one, we propose to employ transfer learning to utilize knowledge gained from solving one graph optimization problem to aid in solving another. Our proposed framework, dubbed the State Extraction with Transfer-learning (SET), focuses on quickly adapting a model trained for a specific graph optimization task to a new but related problem by considering the distributional differences among the objective values between the graph optimization problems. We conduct a series of experimental evaluations on graphs that are both synthetically generated and sourced from real-world data. The results demonstrate that SET outperforms other algorithmic and learning-based baselines. Additionally, our analysis of knowledge transferability provides insights into the effectiveness of applying models trained on one graph optimization task to another. Our study is one of the first studies exploring transfer learning in the context of graph optimization problems.

Hui-Ju Hung, Wang-Chien Lee, Chih-Ya Shen, Fang He, Zhen Lei

SD-Attack: Targeted Spectral Attacks on Graphs

Graph learning (GL) models have been applied in various predictive tasks on graph data. But, similarly to other machine learning models, GL models are also vulnerable to adversarial attacks. As a powerful attack method on graphs, spectral attack jeopardizes the eigenvalues or eigenvectors of the graph topology-related matrices (e.g., graph adjacency matrix and graph Laplacian matrix) due to their inherent connections to certain structural properties of the underlying graph. However, most existing spectral attack methods focus on damaging the global graph structural properties and can hardly perform effective attacks on a target node. In this paper, we propose a novel targeted spectral attack method that can perform model-agnostic attacks effectively on the local structural properties of a target node. First, we define a novel node-specific metric—spectral density distance, which measures the difference of the local structural properties for the same target node between two different graph topologies. Then, we conduct attacks by maximizing the spectral density distance between the graphs before and after perturbation. Additionally, we also develop an effective strategy to improve attack efficiency by using the eigenvalue perturbation theory. Experimental results on three widely used datasets demonstrate the effectiveness of our proposed approach.

Xianren Zhang, Jing Ma, Yushun Dong, Chen Chen, Min Gao, Jundong Li

Improving Structural and Semantic Global Knowledge in Graph Contrastive Learning with Distillation

Graph contrastive learning has emerged as a pivotal task in the realm of graph representation learning, with the primary objective of maximizing mutual information between graph-augmented pairs exhibiting similar semantics. However, existing unsupervised graph contrastive learning approaches face a notable limitation in capturing both structural and semantic global information. This issues poses a substantial challenge, as nodes in close geographical proximity do not consistently possess similar features. To tackle this issue, this study introduces a simple framework for Distillation Node and Prototype Graph Contrastive Learning (DNPGCL). The framework enables contrastive learning by harnessing similar knowledge distillation to obtain more valuable structural and semantic global indications. Experimental results demonstrate that DNGCL outperforms existing unsupervised learning methods across a range of diverse graph datasets.

Mi Wen, Hongwei Wang, Yunsheng Xue, Yi Wu, Hong Wen

DEGNN: Dual Experts Graph Neural Network Handling both Edge and Node Feature Noise

Graph Neural Networks (GNNs) have achieved notable success in various applications over graph data. However, recent research has revealed that real-world graphs often contain noise, and GNNs are susceptible to noise in the graph. To address this issue, several Graph Structure Learning (GSL) models have been introduced. While GSL models are tailored to enhance robustness against edge noise through edge reconstruction, a significant limitation surfaces: their high reliance on node features. This inherent dependence amplifies their susceptibility to noise within node features. Recognizing this vulnerability, we present DEGNN, a novel GNN model designed to adeptly mitigate noise in both edges and node features. The core idea of DEGNN is to design two separate experts: an edge expert and a node feature expert. These experts utilize self-supervised learning techniques to produce modified edges and node features. Leveraging these modified representations, DEGNN subsequently addresses downstream tasks, ensuring robustness against noise present in both edges and node features of real-world graphs. Notably, the modification process can be trained end-to-end, empowering DEGNN to adjust dynamically and achieves optimal edge and node representations for specific tasks. Comprehensive experiments demonstrate DEGNN’s efficacy in managing noise, both in original real-world graphs and in graphs with synthetic noise.

Tai Hasegawa, Sukwon Yun, Xin Liu, Yin Jun Phua, Tsuyoshi Murata

Alleviating Over-Smoothing via Aggregation over Compact Manifolds

Graph neural networks (GNNs) have achieved significant success in various applications. Most GNNs learn the node features with information aggregation of its neighbors and feature transformation in each layer. However, the node features become indistinguishable after many layers, leading to performance deterioration: a significant limitation known as over-smoothing. Past work adopted various techniques for addressing this issue, such as normalization and skip-connection of layer-wise output. After the study, we found that the information aggregations in existing work are all contracted aggregations, with the intrinsic property that features will inevitably converge to the same single point after many layers. To this end, we propose the aggregation over compacted manifolds method (ACM) that replaces the existing information aggregation with aggregation over compact manifolds, a special type of manifold, which avoids contracted aggregations. In this work, we theoretically analyze contracted aggregation and its properties. We also provide an extensive empirical evaluation that shows ACM can effectively alleviate over-smoothing and outperforms the state-of-the-art. The code can be found in https://github.com/DongzhuoranZhou/ACM.git .

Dongzhuoran Zhou, Hui Yang, Bo Xiong, Yue Ma, Evgeny Kharlamov

Are Graph Embeddings the Panacea?

An Empirical Survey from the Data Fitness Perspective

Graph representation learning has emerged as a machine learning go-to technique, outperforming traditional tabular view of data across many domains. Current surveys on graph representation learning predominantly have an algorithmic focus with the primary goal of explaining foundational principles and comparing performances, yet the natural and practical question “Are graph embeddings the panacea?” has been so far neglected. In this paper, we propose to examine graph embedding algorithms from a data fitness perspective by offering a methodical analysis that aligns network characteristics of data with appropriate embedding algorithms. The overarching objective is to provide researchers and practitioners with comprehensive and methodical investigations, enabling them to confidently answer pivotal questions confronting node classification problems: 1) Is there a potential benefit of applying graph representation learning? 2) Is structural information alone sufficient? 3) Which embedding technique would best suit my dataset? Through 1400 experiments across 35 datasets, we have evaluated four network embedding algorithms – three popular GNN-based algorithms (GraphSage, GCN, GAE) and node2vec – over traditional classification methods, namely SVM, KNN, and Random Forest (RF). Our results indicate that the cohesiveness of the network, the representation of relation information, and the number of classes in a classification problem play significant roles in algorithm selection.

Qiang Sun, Du Q. Huynh, Mark Reynolds, Wei Liu

Revisiting Link Prediction with the Dowker Complex

We propose a novel method to study properties of graph-structured data by means of a geometric construction called Dowker complex. We study this simplicial complex through the use of persistent homology, which has shown to be a prominent tool to uncover relevant geometric and topological information in data. A positively weighted graph induces a distance in its sets of vertices. A classical approach in persistent homology is to construct a filtered Vietoris-Rips complex with vertices on the vertices of the graph. However, when the size of the set of vertices of the graph is large, the obtained simplicial complex may be computationally hard to handle. A solution The Dowker complex is constructed on a sample in the set of vertices of the graph called landmarks. A way to guaranty sparsity and proximity of the set of landmarks to all the vertices of the graph is by considering $$\epsilon $$ ϵ -nets. We provide theoretical proofs of the stability of the Dowker construction and comparison with the Vietorips-Rips construction. We perform experiments showing that the Dowker complex based neural networks model performs good with respect to baseline methods in tasks such as link prediction and resilience to attacks.

Jae Won Choi, Yuzhou Chen, José Frías, Joel Castillo, Yulia Gel

GraphNILM: A Graph Neural Network for Energy Disaggregation

Non-Intrusive Load Monitoring (NILM) remains a critical issue in both commercial and residential energy management, with a key challenge being the requirement for individual appliance-specific deep learning models. These models often disregard the interconnected nature of loads and usage patterns, stemming from diverse user behavior. To address this, we introduce GraphNILM, an innovative end-to-end model that leverages graph neural networks to deliver appliance-level energy usage analysis for an entire home. In its initial phase, GraphNILM employs Gaussian random variables to depict the graph edges, later enhancing prediction accuracy by substituting these edges with observations of appliance interrelationships, stripping the individual load enery from the aggregated main energy all at one time, resulting in reduced memory usage, especially with more than three loads involved, thus presenting a time and space-efficient solution for real-world implementation. Comprehensive testing on popular NILM datasets confirms that our model outperforms existing benchmarks in both accuracy and memory consumption, suggesting its considerable promise for future deployment in edge devices.

Rui Shang, Siji Chen, Zhiqian Chen, Chang-Tien Lu

A Contraction Tree SAT Encoding for Computing Twin-Width

Twin-width is a structural width parameter and matrix invariant introduced by Bonnet et al. [FOCS 2020], that has been gaining attention due to its various fields of applications. In this paper, inspired by the SAT approach of Schidler and Szeider [ALENEX 2022], we provide a new SAT encoding for computing twin-width. The encoding aims to encode the contraction sequence as a binary tree. The asymptotic size of the formula under our encoding is smaller than in the state-of-the-art relative encoding of Schidler and Szeider. We also conduct an experimental study, comparing the performance of the new encoding and the relative encoding.

Yinon Horev, Shiraz Shay, Sarel Cohen, Tobias Friedrich, Davis Issac, Lior Kamma, Aikaterini Niklanovits, Kirill Simonov

Backmatter

Titel: Advances in Knowledge Discovery and Data Mining
herausgegeben von: De-Nian Yang
Xing Xie
Vincent S. Tseng
Jian Pei
Jen-Wei Huang
Jerry Chun-Wei Lin
Verlag: Springer Nature Singapore
Electronic ISBN: 978-981-9722-53-2
Print ISBN: 978-981-9722-52-5
DOI: https://doi.org/10.1007/978-981-97-2253-2

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Deep Learning

Frontmatter

AdaPQ: Adaptive Exploration Product Quantization with Adversary-Aware Block Size Selection Toward Compression Efficiency

Ranking Enhanced Supervised Contrastive Learning for Regression

Treatment Effect Estimation Under Unknown Interference

A New Loss for Image Retrieval: Class Anchor Margin

Personalized EDM Subject Generation via Co-factored User-Subject Embedding

Spatial-Temporal Bipartite Graph Attention Network for Traffic Forecasting

CMed-GPT: Prompt Tuning for Entity-Aware Chinese Medical Dialogue Generation

MvRNA: A New Multi-view Deep Neural Network for Predicting Parkinson’s Disease

Path-Aware Cross-Attention Network for Question Answering

StyleAutoEncoder for Manipulating Image Attributes Using Pre-trained StyleGAN

SEE: Spherical Embedding Expansion for Improving Deep Metric Learning

Multi-modal Recurrent Graph Neural Networks for Spatiotemporal Forecasting

Layer-Wise Sparse Training of Transformer via Convolutional Flood Filling

Towards Cost-Efficient Federated Multi-agent RL with Learnable Aggregation

LongStory: Coherent, Complete and Length Controlled Long Story Generation

Relation-Aware Label Smoothing for Self-KD

Bi-CryptoNets: Leveraging Different-Level Privacy for Encrypted Inference

Enhancing YOLOv7 for Plant Organs Detection Using Attention-Gate Mechanism

On Dark Knowledge for Distilling Generators

RPH-PGD: Randomly Projected Hessian for Perturbed Gradient Descent

Transformer based Multitask Learning for Image Captioning and Object Detection

Communicative and Cooperative Learning for Multi-agent Indoor Navigation

Enhancing Continuous Domain Adaptation with Multi-path Transfer Curriculum

Graphs and Networks

Frontmatter

Enhancing Network Role Modeling: Introducing Attributed Multiplex Structural Role Embedding for Complex Networks

Query-Decision Regression Between Shortest Path and Minimum Steiner Tree

Enhancing Policy Gradient for Traveling Salesman Problem with Data Augmented Behavior Cloning

Leveraging Transfer Learning for Enhancing Graph Optimization Problem Solving

SD-Attack: Targeted Spectral Attacks on Graphs

Improving Structural and Semantic Global Knowledge in Graph Contrastive Learning with Distillation

DEGNN: Dual Experts Graph Neural Network Handling both Edge and Node Feature Noise

Alleviating Over-Smoothing via Aggregation over Compact Manifolds

Are Graph Embeddings the Panacea?

Revisiting Link Prediction with the Dowker Complex

GraphNILM: A Graph Neural Network for Energy Disaggregation

A Contraction Tree SAT Encoding for Computing Twin-Width

Backmatter

Premium Partner