1 Introduction
-
We introduce contrastive representation learning into the anomaly detection task, which makes nodes more semantically distinguishable and vastly benefits anomaly discrimination.
-
We decouple contrastive representation learning and anomaly discrimination, adopting a dynamic weight allocation strategy for these two task branches, ulteriorly resolving semantic mixture and imbalance issue in anomaly detection.
-
We conduct a series of experiments on six datasets and the results demonstrate the superiority of DSLAD over the existing models.
2 Related Work
2.1 Graph Neural Networks
2.2 Graph Self-supervised Learning
2.3 Anomaly Detection on Attributed Graph
3 Problem Definition
Notations | Statements |
---|---|
\({\mathcal {G}}=({\mathcal {V}}, {\textbf{X}}, {\textbf{A}})\) | An attributed graph |
\(v_i\) | The i-th node in \({\mathcal {G}}\) |
\({\mathcal {G}}_i\) | The subgraph originated from \(v_i\) |
K | The number of nodes in \({\mathcal {G}}_i\) |
\({\textbf{X}} \in {\mathbb {R}}^{N \times d(0)}\) | The attribute matrix of \({\mathcal {G}}\) |
\({\textbf{A}} \in {\mathbb {R}}^{N \times N}\) | The adjacency matrix of \({\mathcal {G}}\) |
\({\textbf{X}}_{\{i\}} \in {\mathbb {R}}^{K \times d(0)}\) | The attribute matrix of \({\mathcal {G}}_i\) |
\({\textbf{A}}_{\{i\}} \in {\mathbb {R}}^{K \times K}\) | The adjacency matrix of \({\mathcal {G}}_i\) |
\({\textbf{h}}_{\{i\}}^{(l)} \in {\mathbb {R}}^{1 \times d(l)}\) | The embedding of \(v_i\) in the \(l-\)th layer |
\({\textbf{W}}^{(l))} \in {\mathbb {R}}^{d(l-1) \times d(l)}\) | The weight matrix in the l-th layer |
\({\textbf{H}}_{\{i\}}^{(l)} \in {\mathbb {R}}^{N \times d(l)}\) | The hidden matrix in the l-th layer of \({\mathcal {G}}_i\) |
\({\textbf{Z}}^{i}\in {\mathbb {R}}^{K \times d}\) | The context representation matrix of \({\mathcal {G}}_i\) |
\({\textbf{U}}_{\{i\}} \in {\mathbb {R}}^{N \times d(0)}\) | The reconstructed attribute matrix of \({\mathcal {G}}_i\) |
\({\textbf{g}}_i \in {\mathbb {R}}^{1\times d}\) | The subgraph-level representation of \({\mathcal {G}}_i\) |
\({\textbf{e}}_i \in {\mathbb {R}}^{1\times d}\) | The node-level representation of \(v_i\) |
\(x_i\) \(\in {\mathbb {R}}^{1\times d(0)}\) | The attribute of \(v_i\) |
\({\textbf{W}}_d \in {\mathbb {R}}^{d \times d}\) | The weight matrix of bilinear pooling |
\({\textbf{s}}_i^{con(-)}\) | The negative context anomaly score of \(v_i\) |
\({\textbf{s}}_i^{con(+)}\) | The positive context anomaly score of \(v_i\) |
\({\textbf{s}}_i^{rec}\) | The reconstruction anomaly score of \(v_i\) |
\({\textbf{s}}_i\) | The anomaly score of \(v_i\) |
4 Method
4.1 Discrimination Pair Sampling
-
Target node selection A set of nodes are randomly selected from the input graph every epoch without replacement so that each node has the same chance of being chosen.
-
Subgraph sampling For every selected target node, a neighboring subgraph is sampled via random walks with restart (RWR) [36] as augmentation, avoiding introducing extra anomalies. Other sampling methods also can be considered. The size of the neighboring subgraph is fixed to K, which determines the scope of the target node for matching.
4.2 GNN-Based Embedding
4.3 Anomaly Discrimination
4.3.1 Context Anomaly
4.3.2 Reconstruction Anomaly
4.4 Contrastive Representation Learning
4.5 Decoupling
4.6 Anomaly Score Calculation
Dataset | Anomalies | nodes | Features | Edges |
---|---|---|---|---|
Cora | 150 | 2,708 | 1,433 | 5,429 |
Citeseer | 150 | 3,327 | 3,703 | 4,732 |
Pubmed | 600 | 19,717 | 500 | 44,338 |
ACM | 600 | 16,484 | 8,337 | 71,980 |
BlogCatalog | 300 | 5,196 | 8,189 | 171,743 |
Flickr | 450 | 7,575 | 12,407 | 239,739 |
5 Experiments
5.1 Datasets
-
Citation network datasets Cora, Citeseer, Pubmed [43] and ACM [44] are four public citation network datasets, composed of scientific publications. In the four citation networks, the published papers are transformed into nodes while edges represent the citation relationships between papers. And the description text of papers can be transformed into nodes features.
-
Social network datasets BlogCatalog and Flickr [45] are acquired from the websites for sharing blogs and images, respectively. In the two datasets, each user is represented by a node, and links among nodes illustrate the relationships between corresponding users. Users often describe themselves with personalized information, such as posting blogs and public photos. Features can be extracted from such information.
5.2 Baselines
-
AMEN [14] compares the correlation of features of the target nodes and their ego-networks to identify the nodes with low scores as anomalies.
-
Radar [25] analyzes the residuals of attribute information and its coherence with graph information to detect the abnormal nodes as anomalies.
-
ANOMALOUS [26] utilizes CUR decomposition and residual analysis to distinguish the irregular nodes as anomalies.
-
DOMINANT [2] learns node embeddings by autoencoders and take the reconstruction errors as the anomaly scores.
5.3 Evaluation Metrics
Methods | Cora | Citeseer | Pubmed | BlogCatalog | Flickr | ACM |
---|---|---|---|---|---|---|
AMEN [2016] | 0.6266 | 0.6154 | 0.7713 | 0.6392 | 0.6573 | 0.5626 |
Radar [2017] | 0.6587 | 0.6709 | 0.6233 | 0.7401 | 0.7399 | 0.7247 |
ANOMALOUS [2018] | 0.5770 | 0.6307 | 0.7316 | 0.7237 | 0.7434 | 0.7038 |
DOMINANT [2018] | 0.8155 | 0.8251 | 0.8081 | 0.7468 | 0.7442 | 0.7601 |
DGIAD [2019] | 0.7511 | 0.8293 | 0.6962 | 0.5827 | 0.6237 | 0.6240 |
CoLA [2021] | 0.8799 | 0.8968 | 0.9512 | 0.7854 | 0.7513 | 0.8237 |
SL-GAD [2021] | 0.9130 | 0.9136 | 0.9672 | 0.8184 | 0.7966 | 0.8538 |
ANEMONE [2021] | 0.9057 | 0.9189 | 0.9548 | 0.8067 | 0.7637 | 0.8709 |
Sub-cr [2022] | 0.9080 | 0.9331 | 0.9677 | 0.8139 | 0.7872 | 0.8047 |
GRADATE [2023] | 0.9062 | 0.9233 | 0.9578 | 0.7517 | 0.7348 | 0.8722 |
Ours | 0.9196 | 0.9481 | 0.9772 | 0.8275 | 0.8631 | 0.8809 |
Mean | Std | P-value(mean+std) | |
---|---|---|---|
Cora | 0.9196 | 0.0028 | 0.0331 |
Citeseer | 0.9481 | 0.0027 | |
Pubmed | 0.9772 | 0.0004 | |
ACM | 0.8809 | 0.0022 | |
BlogCatalog | 0.8275 | 0.0041 | |
Flickr | 0.8631 | 0.0017 |
5.4 Experiments Setting
5.5 Comparison Results
-
Compared with the most advanced baselines, our method outperforms baselines on all benchmark datasets with a large margin. It reveals the effectiveness of our method.
-
The shallow methods AMEN, Radar, and ANOMALOUS perform worse than other baselines because of the limitation of expressiveness capacity.
-
The node-classification-targeted methods DOMINANT and DGIAD perform better than shallow methods. DOMINANT reconstructs attribute matrix and adjacency matrix, not directly targeting to detect anomalies. DGIAD contrasts the nodes and the whole graph, utilizing very little local information.
-
The anomaly-detection-targeted methods CoLA, ANEMONE, SL-GAD, Sub-cr and GRADATE make a step further. However, they mainly concentrate on training anomaly discriminator, still constrained by semantic mixture and imbalance issue.
Cora | Citeseer | Pubmed | BlogCatalog | Flickr | ACM | |
---|---|---|---|---|---|---|
Ours | 19.0854 | 19.6866 | 26.8413 | 14.7390 | 15.2717 | 15.8089 |
5.6 Differences in the Distribution of Anomaly Scores
5.7 Augmentation Strategy
5.8 \(\mathbf {\pi (\beta )}\) Strategy
Variants | Cora | Citeseer | Pubmed | BlogCatalog | Flickr | ACM |
---|---|---|---|---|---|---|
DSLAD | 0.9196 | 0.9481 | 0.9772 | 0.8275 | 0.8631 | 0.8809 |
DSLAD w/o cl | 0.8948 | 0.9357 | 0.9726 | 0.8163 | 0.7883 | 0.8232 |
DSLAD w/o con | 0.8275 | 0.8036 | 0.8050 | 0.7474 | 0.744 | 0.7463 |
DSLAD w/o rec | 0.9060 | 0.9023 | 0.9545 | 0.7982 | 0.6954 | 0.8565 |
5.9 Ablation Studies
-
DSLAD w/o cl: remove contrastive representation learning and set \(\pi (\beta ) = 1\).
-
DSLAD w/o con: remove context score.
-
DSLAD w/o rec: remove reconstruction score.