1 Introduction
-
Challenge 1 Unknown Graph Structure. The load values are affected by various factors, and the relationships between these factors are very complex. Therefore, there is no explicit graph structure in most cases. The relationships between variables have to be discovered from data rather than being provided as ground truth knowledge.
-
Challenge 2 Mining temporal dependencies. Temporal dependencies are one of the important data characteristics of load forecasting task. Therefore, in order to fully explore the relationships between time steps of each factor, it is necessary to obtain temporal dependencies from the historical data of each factor.
-
Challenge 3 GNN Learning. The current common short-term load forecasting methods (such as LSTM-based methods and CNN-based methods) did not effectively use the relationships between central node and its neighbors for message passing. In addition, the traditional GCN-based models may reduce the utilization of the data information of previous graph convolution layers due to the stacking of graph convolution layers.
-
We propose a new deep learning model for load forecasting. This model works from the data information of each node in historical data. We propose the temporal convolution module that can fully explore the deep data information behind historical data and work in conjunction with the densely connected residual convolution module to effectively handle temporal and spatial dependencies. The module effectiveness assessment further proves the effectiveness of both modules.
-
We propose the graph learning module to learn complex relationships between variables. Our method effectively solves the problem of not being able to provide an explicit graph structure in the field of load forecasting. The experiment shows that the graph structure of automatic learning has good interpretability.
-
We propose a graph convolution operation with densely connected residual structure, which can fully utilize the data information of each graph convolution layer to handle spatial dependencies and effectively solve the problem of information loss during message passing. The ablation study further proves the effectiveness of this design.
-
GLFN-TC supports automatic construction of graph structure between variables with complex relationships. Therefore, the model is general for other fields without predefined graph structure. The model generality assessment further proves the generality of GLFN-TC.
-
We conduct experiments on five load datasets and three non-load datasets. Experimental results show that our method outperforms the baseline methods on all datasets.
2 Short-Term Load Forecasting Based on Graph Neural Network with Temporal Convolution
2.1 Overview
-
Graph Learning Module This module creates a feature vector for each node, and uses the feature vector to learn a graph structure to express the relationships between nodes.
-
Temporal Convolution Module This module uses dilated 1D-CNN to obtain the temporal dependencies of each node’s historical data.
-
Densely Connected Residual Convolution Module This module uses graph convolution operation with densely connected residual structure to aggregate the data information of the central node and its neighbors.
-
Load Forecasting Module This module fuses data information with feature vector to predict future load values.
2.2 Graph Learning Module
2.3 Temporal Convolution Module
2.4 Densely Connected Residual Convolution Module
2.5 Load Forecasting Module
3 Experiments
-
RQ 1 (Comparison With Baseline Methods) How does the proposed method perform compared with baseline methods for short-term load forecasting?
-
RQ 2 (Model Generality Assessment) Is the proposed method also effective when applied for other fields?
-
RQ 3 (Ablation Study) How do the various components of the proposed model affect the overall performance of the model?
-
RQ 4 (Module Effectiveness Assessment) Do modules achieve the expected effect in our method?
-
RQ 5 (Repeatability Assessment And Parameter Sensitivity) Is the result of multiple runs of our method stable? Is the dilation factor affect the results?
3.1 Datasets and Evaluation Metrics
-
ISO-NE ISO-NE dataset covers the data from March 2003 to December 2014, and the sample rate is 1 h. The number of nodes in this dataset is 7, including load, temperature, etc.
-
AT AT dataset covers the data from January 2011 to December 2016, and the sample rate is 1 h. The number of nodes in this dataset is 6, including load, temperature, wind speed, wind direction, etc.
-
AP AP dataset covers the data from January 2006 to December 2010, and the sample rate is 0.5 h. The number of nodes in this dataset is 7, including load, electricity price, humidity, etc.
-
SH SH dataset covers the data from January 2017 to August 2020, and the sample rate is 1 h. The number of nodes in this dataset is 16, including load, week of year, day of week, etc.
-
NCENT NCENT dataset covers the data from January 2002 to December 2018, and the sample rate is 1 h. The number of nodes in this dataset is 6, including load, year, etc.
3.2 Experimental Setup
-
NF Naive Forecast, this method uses the load value of the last time step of the training data as the load values for future prediction.
-
SA Simple Average, this method takes the average value of all load values in training data as the load values for future prediction.
-
MA Moving Average, in this method, the load value of the current time step is the average of the load values of the previous \(n\) time steps. In the experiment, \(n=\) 4.
-
RNN Recurrent neural network is a kind of neural network with short-term memory ability. In other words, the output of the network is not only related to the current input data, but also related to the previous input data.
-
CNN Convolutional neural network can extract the characteristics of input data in load forecasting tasks. In experiments, we use 1D-CNN for STLF.
-
LSTM Long short-term memory network is a variant of RNN. Compared with RNN, it can effectively capture association between long sequences and alleviate the phenomenon of gradient vanishing or gradient exploding.
-
CNN_LSTM It is the combination of 1D-CNN and LSTM, so that the model can have the characteristics of CNN and LSTM at the same time.
-
Informer Informer is an improvement based on transformer. Informer uses ProbSparse self-attention, self-attention distilling and generative decoder to solve some problems when transformer is applied to LSTF, such as high memory usage [19].
-
T-GCN Temporal graph convolutional network model combines graph convolution network and gated recurrent unit to capture spatial and temporal dependencies simultaneously. Specifically, the former captures spatial dependencies, while the latter captures temporal dependencies [20].
3.3 RQ 1: Comparison with Baseline Methods
Models | ISO-NE | AT | AP | SH | NCENT | |||||
---|---|---|---|---|---|---|---|---|---|---|
MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | |
NF | 1.0025 | 0.8271 | 2.2295 | 1.2169 | 1.0993 | 0.8204 | 5.9534 | 2.2432 | 2.1699 | 1.3288 |
SA | 0.8642 | 0.7304 | 1.0576 | 0.8735 | 0.8035 | 0.7119 | 0.9736 | 0.7803 | 0.8812 | 0.7300 |
MA | 0.8139 | 0.7225 | 2.3775 | 1.2613 | 1.3199 | 0.9163 | 7.2194 | 2.5027 | 3.6731 | 1.7699 |
GLFN-TC | 0.2167 | 0.3494 | 0.1187 | 0.2406 | 0.1631 | 0.3036 | 0.1852 | 0.2990 | 0.2432 | 0.3902 |
Models | ISO-NE | AT | AP | SH | NCENT | |||||
---|---|---|---|---|---|---|---|---|---|---|
MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | |
RNN | 0.3643 | 0.4623 | 0.1541 | 0.2685 | 0.2280 | 0.3532 | 0.2065 | 0.3296 | 0.3926 | 0.5141 |
CNN | 0.3195 | 0.4340 | 0.1377 | 0.2625 | 0.2562 | 0.3879 | 0.2539 | 0.3606 | 0.2907 | 0.4390 |
LSTM | 0.3761 | 0.4990 | 0.1549 | 0.2793 | 0.2412 | 0.3682 | 0.2583 | 0.3732 | 0.7627 | 0.7169 |
CNN_LSTM | 0.3793 | 0.5158 | 0.1324 | 0.2507 | 0.2262 | 0.3554 | 0.2161 | 0.3435 | 0.3285 | 0.4653 |
Informer | 0.2563 | 0.3825 | 0.1715 | 0.2866 | 0.1989 | 0.3118 | 0.2213 | 0.3577 | 0.2769 | 0.4121 |
T-GCN | 0.4053 | 0.5195 | 0.3902 | 0.4689 | 0.5270 | 0.6207 | 0.7063 | 0.6928 | 1.2220 | 0.9521 |
GLFN-TC | 0.2167 | 0.3494 | 0.1187 | 0.2406 | 0.1631 | 0.3036 | 0.1852 | 0.2990 | 0.2432 | 0.3902 |
3.4 RQ 2: Model Generality Assessment
-
TN The field of TN dataset is wind speed prediction; it covers the data from January 2000 to December 2014, and the sample rate is 1 h. The number of nodes in this dataset is 6, including wind speed, year, etc.
-
PL The field of PL dataset is price prediction, it covers the data from November 2017 to December 2020, and the sample rate is 1 h. The number of nodes in this dataset is 4, including electricity price, energy from wind sources, etc.
-
TC The field of TC dataset is photovoltaic power prediction, it covers the data from January 2016 to December 2017, and the sample rate is 1 h. The number of nodes in this dataset is 9, including power, temperature, sunshine, etc.
Models | TN | PL | TC | |||
---|---|---|---|---|---|---|
MSE | MAE | MSE | MAE | MSE | MAE | |
NF | 0.6569 | 0.6701 | 1.5642 | 1.0392 | 1.3010 | 0.6057 |
SA | 1.1904 | 1.2523 | 0.5952 | 0.6075 | 0.9411 | 0.8111 |
MA | 0.7292 | 0.7169 | 1.6641 | 1.0793 | 0.9456 | 0.7552 |
RNN | 0.2828 | 0.4306 | 0.3573 | 0.4516 | 0.2413 | 0.3275 |
CNN | 0.3945 | 0.5150 | 0.3559 | 0.4539 | 0.2655 | 0.3525 |
LSTM | 0.3182 | 0.4559 | 0.4190 | 0.5017 | 0.2406 | 0.3226 |
CNN_LSTM | 0.5330 | 0.5969 | 0.3359 | 0.4460 | 0.2392 | 0.3179 |
Informer | 0.3971 | 0.5021 | 0.5598 | 0.5771 | 0.8683 | 0.6099 |
T-GCN | 53.7011 | 5.1406 | 71.8001 | 7.2619 | 0.2490 | 0.3206 |
GLFN-TC | 0.2712 | 0.4136 | 0.3212 | 0.4365 | 0.2326 | 0.2961 |
3.5 RQ 3: Ablation Study
-
GLFN-TC-tc GLFN-TC without the temporal convolution module. We replace the temporal convolution module with a linear layer.
-
GLFN-TC-sc GLFN-TC without the densely connected residual convolution module. We replace the densely connected residual convolution module with a linear layer.
-
GLFN-TC-dcrc GLFN-TC eliminates the densely connected residual convolution structure so that the output of the previous layer in the graph convolution step is the input of the next layer and takes the output of the last layer as the output of densely connected residual convolution module.
Models | ISO-NE | AT | AP | SH | NCENT | |||||
---|---|---|---|---|---|---|---|---|---|---|
MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | |
GLFN-TC | 0.2167 | 0.3494 | 0.1187 | 0.2406 | 0.1631 | 0.3036 | 0.1852 | 0.2990 | 0.2432 | 0.3902 |
GLFN-TC-tc | 0.2764 | 0.3863 | 0.1379 | 0.2675 | 0.1838 | 0.3154 | 0.1999 | 0.3295 | 0.2521 | 0.3956 |
GLFN-TC-sc | 0.2704 | 0.3976 | 0.1451 | 0.2786 | 0.2260 | 0.3546 | 0.2609 | 0.3951 | 0.3805 | 0.4901 |
GLFN-TC-dcrc | 0.2894 | 0.3961 | 0.1107 | 0.2344 | 0.1720 | 0.3102 | 0.2690 | 0.3782 | 0.2605 | 0.3992 |
3.6 RQ 4: Module Effectiveness Assessment
-
GLFN-TC* This variant replaces the temporal convolution module in GLCN-TC with LSTM.
-
GLFN-TC** This variant replaces the densely connected residual convolution module in GLCN-TC with two-layer GCN.
Models | ISO-NE | AT | AP | SH | NCENT | |||||
---|---|---|---|---|---|---|---|---|---|---|
MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | |
GLFN-TC* | 0.6135 | 0.6676 | 0.1406 | 0.2772 | 0.2135 | 0.3499 | 0.2883 | 0.3890 | 0.8902 | 0.7823 |
GLFN-TC** | 0.2822 | 0.4011 | 0.1252 | 0.2568 | 0.2002 | 0.3322 | 0.2905 | 0.3645 | 0.2608 | 0.3969 |
GLFN-TC | 0.2167 | 0.3494 | 0.1187 | 0.2406 | 0.1631 | 0.3036 | 0.1852 | 0.2990 | 0.2432 | 0.3902 |
3.7 RQ 5: Repeatability Assessment and Parameter Sensitivity
Trained | ISO-NE | AT | AP | SH | NCENT | |||||
---|---|---|---|---|---|---|---|---|---|---|
MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | |
1 | 0.2141 | 0.3516 | 0.1102 | 0.2345 | 0.1518 | 0.2927 | 0.1837 | 0.2915 | 0.2450 | 0.3878 |
2 | 0.2210 | 0.3467 | 0.1198 | 0.2367 | 0.1754 | 0.3132 | 0.1686 | 0.2826 | 0.2628 | 0.4056 |
3 | 0.2150 | 0.3498 | 0.1262 | 0.2505 | 0.1620 | 0.3050 | 0.2032 | 0.3229 | 0.2219 | 0.3771 |
Avg | 0.2167 | 0.3494 | 0.1187 | 0.2406 | 0.1631 | 0.3036 | 0.1852 | 0.2990 | 0.2432 | 0.3902 |