Skip to main content
Erschienen in:
Buchtitelbild

Open Access 2024 | OriginalPaper | Buchkapitel

Enhancing Arrhythmia Diagnosis with Data-Driven Methods: A 12-Lead ECG-Based Explainable AI Model

verfasst von : Emmanuel C. Chukwu, Pedro A. Moreno-Sánchez

Erschienen in: Digital Health and Wireless Solutions

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Accurate and early prediction of arrhythmias using Electrocardiograms (ECG) presents significant challenges due to the non-stationary nature of ECG signals and inter-patient variability, posing difficulties even for seasoned cardiologists. Deep Learning (DL) methods offer precision in identifying diagnostic ECG patterns for arrhythmias, yet they often lack the transparency needed for clinical application, thus hindering their broader adoption in healthcare. This study introduces an explainable DL-based prediction model using ECG signals to classify nine distinct arrhythmia categories. We evaluated various DL architectures, including ResNet, DenseNet, and VGG16, using raw ECG data. The ResNet34 model emerged as the most effective, achieving an Area Under the Receiver Operating Characteristic (AUROC) of 0.98 and an F1-score of 0.826. Additionally, we explored a hybrid approach that combines raw ECG signals with Heart Rate Variability (HRV) features. Our explainability analysis, utilizing the SHAP technique, identifies the most influential ECG leads for each arrhythmia type and pinpoints critical signal segments for individual disease prediction. This study emphasizes the importance of explainability in arrhythmia prediction models, a critical aspect often overlooked in current research, and highlights its potential to enhance model acceptance and utility in clinical settings.

1 Introduction

Cardiovascular disease (CVD) is the leading cause of death in Europe and the EEUU, causing 3.9 million and 1.8 million deaths annually [1]. Traditional CVD diagnosis relies on rule-based evaluation of patient history and clinical examinations. This approach struggles with the volume and diversity of data and depends heavily on medical expertise, leading to challenges in resource-limited settings like developing countries.
The electrocardiogram (ECG) is a key, non-invasive tool for diagnosing cardiac conditions, utilizing a 12-lead setup to capture heart’s electrical activity through distinct P, Q, R, S, and T waves [2]. While ECG, especially in identifying cardiac arrhythmias, is straightforward, interpreting these signals, particularly in complex cases, remains challenging and prone to errors with serious implications [3]. Additionally, Heart Rate Variability (HRV) analysis, which examines variations in consecutive heartbeats, has emerged as a crucial technique in cardiac assessment. It assesses the autonomic nervous system’s impact on the heart by analyzing the R-R interval, the time between successive R wave peaks, and the N-N interval, the duration between consecutive QRS complexes. These measures help in understanding the cardiac system’s dynamic state [4].
Arrhythmias, a common and varied group of CVDs diagnosed using ECG, are characterized by irregular heartbeats due to improper electrical signaling, leading to abnormally fast, slow, or inconsistent heart rhythms. This work focuses on several arrhythmia classes including Atrial fibrillation (AF), Right and Left Bundle Branch Blocks (RBBB and LBBB), First-degree atrioventricular block (IAVB), and Premature Atrial and Ventricular Contractions (PAC and PVC), along with Myocardial Infarction (MI) [5]. Diagnosing arrhythmias is challenging due to: i) absence of symptoms during ECG recording; ii) high inter-patient ECG signal variability; iii) non-stationary signal morphology affected by physical state, noise, and artifacts; and iv) the need for large data volumes to avoid false diagnoses [5].
Computer-Aided Diagnosis Systems (CADS) address arrhythmia diagnosis challenges by leveraging digital technologies for the analysis of physiological and clinical data, aiding clinicians in making more informed decisions. Traditional ECG analysis techniques in CADS rely on automated detection of ECG components and classifying them based on fixed rules, but they often fall short due to outdated rules and sensitivity to imperfect ECG recordings. In the medical field, Artificial Intelligence (AI), particularly Machine Learning (ML) and Deep Learning (DL), has significantly enhanced CADS. AI combines mathematical and computer science theories to create systems capable of intelligent actions, with DL being notable for its ability to process large volumes of data through artificial neural networks. These networks perform sequential transformations to highlight crucial input features for classification and regression tasks. Modern arrhythmia diagnosis models increasingly use DL, credited for its precision in identifying ECG waveforms like QRS complexes, and P and T waves, facilitating the calculation of vital clinical measures including heart rate and axis deviation [6, 7].
AI’s potential in various fields, including healthcare, is often hindered by its ‘black box’ nature, leading to trust issues due to a lack of transparency [8]. Healthcare professionals need to understand the reasoning behind AI-recommended treatments. Without this level of explainability, AI’s adoption in healthcare can be negatively impacted. Explainable AI (XAI) addresses this challenge by providing insights into the decision-making processes of AI systems. XAI aims to make the logic behind AI algorithms clear, thereby aligning advanced AI capabilities with the healthcare sector’s need for transparent decision-making.
In recent literature, advancements in arrhythmia classification using DL have been notable, especially with the application of Convolutional Neural Networks (CNNs). A diverse range of studies has utilized various DL architectures, showing significant progress in ECG signal analysis. Chen et al. [9] combined a CNN with RestNet-34 layers and bi-directional LSTM, achieving an accuracy of 0.81 on 12-lead ECG samples. Their study, while promising, highlighted the need for balanced datasets. Cheng et al. [10] used a modified 1-D CNN on the MIT-BIH arrhythmia database, focusing on compressed ECG signals suitable for wearable devices. Gao et al. [11]‘s approach involved a 4-layer LSTM model, achieving an accuracy of 0.992, demonstrating robustness against noise and normal ECG beat dominance. Niu et al. [12] introduced a novel DL method based on adversarial domain adaptation, while Romdhane et al. [13] focused on enhancing minority class classification accuracy. Wang et al. [14, 15] employed 1-D CNNs and continuous wavelet transform techniques, demonstrating effectiveness against noise. Yildirim et al. [16] developed a 1D-CNN suitable for mobile and cloud computing applications due to its efficiency. Zhang [17] and Zhang et al. [18] used 1D-CNN networks, showing the superiority of 12-lead over single-lead ECGs. Rai et al. [19] tested a CNN + LSTM ensemble approach, improving minority class accuracy. Finally, Toma et al. [20] presented a parallel approach combining RNN and 2D CNN, effectively capturing temporal and spatial ECG signal characteristics. These studies illustrate the advancements and diversity in DL applications for arrhythmia detection, with a trend towards optimizing network architectures and inputs for enhanced classification accuracy. However, the integration of explainable AI (XAI) remains a largely unexplored area, crucial for the clinical applicability of these models.
In our study, we developed an explainable model for detecting cardiac arrhythmias using 12-lead ECG signals. We explored two approaches: one using raw ECG signals as input for various DL architectures like ResNet, VGG, and DenseNet, and another combining raw ECG signals with HRV features in a hybrid model. We thoroughly evaluated the performance of these classifiers. Furthermore, we assessed the explainability of the most effective model by analyzing the significance of different leads in arrhythmia classification and presenting case examples that illustrate the ECG signal segments influencing the predictions.
The remainder of this paper is organized as follows: Section 2 outlines the dataset, the DL algorithms, training and testing methodologies, and the explainability technique used in building the arrhythmia detector. Section 3 details the evaluation results, including the predictive performance of the DL approaches (using raw ECG and the hybrid method with HRV features) and the explainability analysis of the most effective model. Section 4 discusses these results, and Sect. 5 concludes the paper with key findings.

2 Material and Methods

2.1 Dataset Description

In our research, we utilized the ‘China Physiological Signal Challenge 2018 (CPSC 2018)’ dataset to investigate arrhythmia predictions [21]. CPSC 2018 is an extensive dataset that was collected and curated to facilitate research within the domain of physiological signal processing to encourage the development of algorithms for the detection of morphological abnormalities. This dataset also comprises 12 lead ECG recordings and was sourced through collaboration with 11 hospitals in China. CPSC 2018 includes a total of 6,877 individual data samples, with a gender distribution of 3,178 females and 3,699 males. The ECG recordings are sampled at a frequency of 500 Hz, and they vary in length, ranging from 6 to 60 s. Within the CPSC 2018 dataset, researchers have access to ECG recordings representing nine distinct cardiac states, including atrial fibrillation (AF) with 1098 recordings, intrinsic paroxysmal atrioventricular block (I-AVB) with 704 recordings, left bundle branch block (LBBB) with 207 recordings, normal heartbeat (SNR) with 918 recordings, premature atrial contraction (PAC) with 574, premature ventricular contraction (PVC) with 653 recordings, right bundle branch block (RBBB) with 1695 recordings, ST-segment depression (STD) with 826 recordings, and ST-segment elevation (STE) with 202 recordings. It’s worth noting that among the 6,877 recordings, 476 of them have two or three different labels, indicating their complexity.

2.2 HRV Features Extraction

In this study, we computed 33 Heart Rate Variability (HRV) features from each subject’s entire ECG using the pyHRV Python library [22] and BioSPPy toolbox for biosignal processing [23]. BioSPPy’s ECG processing and R-peak detection algorithms enabled us to calculate the Normal-to-Normal Interval (NNI) series, from which we extracted HRV features covering time-domain, frequency-domain, and non-linear parameters.
The HRV features included: maximum and minimum NNI, standard deviation (SD) of heart rate (HR), maximum and minimum HR, mean HR, root mean square of NNI difference, number of NN intervals differing by more than 20 ms and 50 ms, ratios of NN20 and NN50 to total NNI, SD1 and SD2 (standard deviations of the major and minor axes), ratio of SD1 to SD2, maximum and minimum NNI difference, mean NNI difference, sample entropy, area S of the fitted ellipse, fast Fourier transform (FFT) metrics, number of NNI, TINN (baseline width of the interpolated triangle) computation values, triangular index, and AR (autoregression) metrics.

2.3 Deep Learning Algorithms

Deep learning (DL) algorithms have revolutionized AI, particularly in healthcare, by outperforming traditional machine learning methods in complex tasks. In this subsection, we explore the DL algorithms applied in our study, detailing their architecture, training strategies, and their specific use in arrhythmia prediction. For a comprehensive comparative analysis, we utilized models such as ResNet34, ResNet50, VGG16, and DenseNet, each chosen for their distinct strengths in deep learning applications. These experiments utilize a learning rate of 0.0001, the Adam optimizer, and Binary Cross Entropy (BCE) with Logits Loss as loss function.
ResNet34 and ResNet50, part of the Residual Neural Network family by He et al. [24], are designed primarily for image recognition tasks. ResNet50 has a deeper structure with 50 layers, compared to the 34 layers of ResNet34. Both architectures follow a similar design, featuring convolutional, pooling, activation, and fully connected layers to extract detailed features. A key innovation in ResNet is the introduction of residual learning to tackle the vanishing or exploding gradient problem in deep networks. This is achieved through skip connections, enabling the network to learn residual functions and maintain performance in deeper layers.
In our study, we implemented both ResNet34 and ResNet50 for arrhythmia prediction. These architectures, chosen for their proven efficacy across various tasks, are particularly well-suited for this purpose. Their ability to handle intricate features makes them ideal for our ECG analysis, which uses 1D CNNs to process raw ECG data (12 leads, 30-s recordings at 500 Hz). This approach, leveraging 1D CNNs’ effectiveness in time-series data [25], allows for detailed feature extraction and classification across nine diagnostic categories. To mitigate overfitting, dropout regularization with a probability of 0.2 was implemented after the initial convolutional layer’s activation function, stochastically zeroing activations. By using these architectures, we aim to enhance our understanding of model performance and specialization in ECG classification.
The VGG16 model, a 16-layer convolutional neural network developed by the Visual Graphics Group at the University of Oxford [26], is also used in our study. Renowned for its simplicity and effectiveness in image recognition, we adapted VGG16 as a feature extractor for ECG signals. Its architecture, composed predominantly of convolutional and max-pooling layers, is adept at learning complex hierarchical features from raw ECG data. In our research, VGG16 excelled in identifying detailed features in the ECG signals, crucial for high-level cardiac state classification. By processing 12-lead ECG signals through its 1D CNN layers, VGG16 transformed them into structured representations for downstream classification tasks. This allowed us to detect intricate patterns and categorize data effectively.
DenseNet, or Densely Connected Convolutional Networks, is a cutting-edge CNN architecture employed in our ECG classification framework. It addresses deep neural network challenges like information propagation and feature reuse by introducing dense connections between layers for improved information flow and gradient propagation [27]. In contrast to traditional CNNs with sequential layer connections, DenseNet layers receive inputs from all previous layers and pass their feature maps to all subsequent layers in a dense block, enhancing feature reuse and efficient learning of discriminative features. In our study, DenseNet efficiently extracts hierarchical representations from raw ECG signals. Notably, DenseNet is recognized for its parameter efficiency, delivering competitive performance with fewer parameters than other architectures. This efficiency was crucial in our research, allowing for effective feature extraction with reduced computational complexity.

2.4 Training, Testing and Performance Metrics

For consistent and reliable model development in our deep learning training process, we adopted a standardized data split approach using a custom function. Our preprocessed ECG dataset was divided into three subsets: training, validation, and testing, with proportions of 80%, 10%, and 10% respectively. The majority (80%) of the data was allocated for training, ensuring the model’s exposure to a wide array of ECG signals and patterns. The validation set, constituting 10%, was used during training to monitor and fine-tune the model’s performance, helping to prevent overfitting. The remaining 10% comprised the test set, which was completely unseen during the training and validation phases. This setup allowed for an unbiased assessment of the model’s performance on new, unseen ECG samples.
In assessing our models’ performance in predicting nine distinct arrhythmia classes, we employed six diverse evaluation metrics for a comprehensive analysis. These metrics included accuracy, recall, precision, F1-score, Area Under the Receiver Operating Curve (AUROC), and the confusion matrix. Together, they provide a holistic view of model effectiveness, measuring not only the accuracy in classifying arrhythmias but also the ability to differentiate between various arrhythmia types and other class instances.

2.5 Explainability AI Techniques

SHAP (SHapley Additive exPlanations), introduced by Lundberg et al. [28], is a key interpretability framework used in our study to elucidate the predictions of complex machine learning models. Utilizing tools referenced in [17], we applied SHAP values to determine the importance of each feature in our 12-lead ECG input data, identifying the most influential leads in our predictive models. SHAP values offer an in-depth analysis of how each feature contributes to individual predictions, providing insights into the model’s decision-making process. Additionally, they enable a broader examination of the model’s behavior by summarizing feature impacts across all predictions, revealing general patterns and trends in the data. SHAP also facilitates model comparison and selection by evaluating different models based on their feature contributions, aligning with principles from game theory for a fair and mathematically sound attribution of feature importance.
In our study, we utilized the SHAP DeepExplainer class, a specialized tool for interpreting deep learning models with an efficient computational approach. The SHAP DeepExplainer approximates conditional expectations of SHAP values, integrating multiple background samples to summarize the difference between expected model outputs (based on these samples) and the actual model outputs. This method provides a practical way to understand the model’s reasoning by comparing its predictions with a baseline derived from the background data.

3 Results

This section details the classification outcomes of two methods examined in our study: DL with raw ECG signal, and a hybrid method combining DL with HRV and raw ECG signals. For each method, we highlight the algorithm with the highest performance, based on AUROC scores, and present its confusion matrix and ROC curve.
Moreover, we delve into the approach that yielded the most accurate classification, emphasizing explainability. This involves analyzing the significance of different ECG leads in identifying arrhythmia classes and providing explanations for individual instances, specifically highlighting the ECG segments that contributed to arrhythmia classification.

3.1 Deep Learning Classification with ECG Raw Signal

The raw ECG signal is used to fit the several deep learning architecture proposed, i.e. ResNet34, ResNet50, DenseNet, VGG16. Their performances are shown, respectively, in Table 1, Table 2, Table 3 and Table 4, where the metrics denote the model’s performances for each arrhythmia following the one-versus-the-rest (OVR) approach. The average for each metric is also shown.
Assessing the average AUROC and subsequently the F1-score of the four DL classifiers, ResNet34 emerges as the best performer with an AUROC of 0.98. Thus, the ROC curve and confusion matrix of ResNet34 are shown in Fig. 1 and Fig. 2, respectively.

3.2 Hybrid Approach: Optimal Classifier with ECG Raw Signal and HRV Features

While DL algorithms effectively process raw ECG signals for arrhythmia classification, we propose a hybrid approach that combines these with HRV features to, in theory, enhance their performance, especially in misclassification-prone categories. Illustrated in Fig. 3, our model integrates HRV features into the fully connected layer of the ResNet34 network, aiming to combine raw signal processing with feature-based analysis for better classification accuracy. The fully connected layer is dimensioned to accommodate the flattened convolutional feature representations as well as the auxiliary inputs (HRV features). The integration of auxiliary features into the model’s decision-making process is an attempt to enhance its adaptability and performance.
Similar to the approaches using pure DL networks, the classification performance of the hybrid approach is documented in Table 5. Consequently, the ROC curve and confusion matrix of the hybrid approach are shown in Fig. 4 and Fig. 5, respectively. Figure 6 shows a comprehensive comparison of all classifiers’ AUROC performance for each of the arrhythmias considered in the diagnosis. By using Kruskal-Wallis test, we confirm that there is a static statistically significant difference in performance among the classifiers (p = 0.0008 << 0.05).
Table 1.
ResNet34 classifier performance
Arrhythmia category
Accuracy
Precision
Recall
F1-Score
AUC
SNR
0.96
0.77
0.84
0.80
0.98
AF
0.98
0.94
0.93
0.94
0.99
IAVB
0.98
0.96
0.90
0.93
0.99
LBBB
0.99
0.95
0.83
0.88
1.00
RBBB
0.96
0.93
0.94
0.94
0.99
PAC
0.95
0.66
0.71
0.68
0.96
PVC
0.97
0.87
0.86
0.87
0.99
STD
0.96
0.84
0.85
0.85
0.97
STE
0.97
0.63
0.50
0.56
0.95
Average
0.97
0.84
0.82
0.83
0.98
Table 2.
ResNet50 classifier performance
Arrhythmia category
Accuracy
Precision
Recall
F1-Score
AUC
SNR
0.95
0.74
0.82
0.78
0.97
AF
0.98
0.95
0.93
0.94
0.99
IAVB
0.99
0.95
0.92
0.93
1.00
LBBB
0.99
0.78
0.91
0.84
1.00
RBBB
0.96
0.90
0.96
0.93
0.99
PAC
0.95
0.70
0.66
0.68
0.94
PVC
0.97
0.90
0.85
0.87
0.99
STD
0.94
0.69
0.84
0.76
0.98
STE
0.97
0.53
0.67
0.59
0.97
Average
0.97
0.79
0.84
0.81
0.98
Table 3.
DenseNet classifier performance
Arrhythmia category
Accuracy
Precision
Recall
F1-Score
AUC
SNR
0.94
0.73
0.71
0.72
0.97
AF
0.97
0.87
0.97
0.92
0.99
IAVB
0.99
0.96
0.95
0.95
0.99
LBBB
0.99
0.86
0.83
0.84
0.96
RBBB
0.96
0.91
0.96
0.93
0.99
PAC
0.89
0.42
0.67
0.52
0.88
PVC
0.94
0.71
0.67
0.69
0.95
STD
0.96
0.88
0.76
0.82
0.97
STE
0.97
0.54
0.58
0.56
0.95
Average
0.96
0.76
0.79
0.77
0.96
Table 4.
VGG16 classifier performance
Arrhythmia category
Accuracy
Precision
Recall
F1-Score
AUC
SNR
0.95
0.71
0.82
0.76
0.97
AF
0.97
0.90
0.93
0.92
0.99
IAVB
0.97
0.92
0.80
0.86
0.98
LBBB
0.99
0.95
0.78
0.86
0.95
RBBB
0.95
0.86
0.97
0.92
0.98
PAC
0.90
0.42
0.41
0.42
0.82
PVC
0.96
0.90
0.74
0.81
0.96
STD
0.95
0.86
0.66
0.75
0.95
STE
0.97
0.52
0.63
0.57
0.96
Average
0.96
0.78
0.75
0.76
0.95
Table 5.
Hybrid approach performance
Arrhythmia category
Accuracy
Precision
Recall
F1-Score
AUC
SNR
0.95
0.76
0.75
0.76
0.95
AF
0.19
0.19
1.00
0.33
0.95
IAVB
0.97
0.93
0.82
0.86
0.94
LBBB
0.98
0.68
0.83
0.75
0.90
RBBB
0.28
0.28
1.00
0.44
0.96
PAC
0.93
0.59
0.68
0.63
0.87
PVC
0.96
0.77
0.82
0.79
0.92
STD
0.95
0.81
0.73
0.76
0.92
STE
0.97
0.60
0.50
0.54
0.85
Average
0.79
0.623
0.791
0.65
0.92

3.3 Explainability Analysis of the Optimal Model

According to the objectives of our study, we have conducted an explainability analysis of the best-performing model, specifically ResNet34. For that purpose, the XAI technique SHAP will be used to offer a global and local explainability analysis.
For global explainability, our focus is on determining the significance of each of the 12 ECG leads in predicting different arrhythmia categories. The results of this global explainability analysis are depicted in Fig. 7.
By aggregating the SHAP values for each lead across all predicted instances, we can ascertain the positive or negative influence of each lead on the prediction of a specific arrhythmia. It is important to note, given the multiclass nature of our classification problem, that the negative values observed for each category may not be intuitively interpretable since they could be affected by any of the other category’s predictions.
Therefore, by observing the bars with a positive influence we can propose the following Table 6 that indicates the most relevant features for each of the arrhythmia categories. This relevant feature ranking allows us to propose the presentation of an individual explainability approach since SHAP can indicate which region of the ECG signal contributes to the prediction. Therefore, we show in Fig. 8 one instance predicted for each of the arrhythmia categories and ECG segment more relevant for such prediction on the relevant features identified in the global explainability approach.
Table 6.
Prominent lead for each cardiac state detection
Arrhythmia category
SNR
AF
IAVB
LBBB
RBBB
PAC
PVC
STD
STE
Relevant ECG leads
II, avR
V2, V3
V2, V5
avR, avF1 
V2, V4
V2, V3
V4
V1, V2
V1, V2

4 Discussion

ML and DL are increasingly important in CVD detection, leveraging ECGs as key data sources. ECGs are crucial for improving diagnostic accuracy in data-driven predictive models for CVD [29]. These technologies not only facilitate early CVD detection, leading to better health outcomes, but also help address the demand for skilled cardiologists in ECG data analysis. Recent research in clinical cardiology suggests that ML and DL, especially in combination with other methods, provide superior predictive power for cardiovascular or overall mortality compared to traditional clinical or imaging techniques alone [7]. DL’s strength lies in its ability to capture temporal signal variations and autonomously learn from complex inputs like ECG signals, provided there’s enough high-quality training data. This learning bypasses the need for predefined feature processing, offering an end-to-end solution that minimizes errors in feature calculation, thereby enhancing model accuracy [3034].
In our study, we developed an explainable model for detecting cardiac arrhythmias using 12-lead ECG signals by exploring two different approaches: one using raw ECG signals as input for various DL architectures, and another combining raw ECG signals with HRV features in a hybrid model. We identify the best-performing model in the multiclass arrhythmia prediction by assessing the AUROC for each arrhythmia category and their average. If classifiers exhibit equal AUROC values, we also evaluate the F1-score, an important metric in multiclass classification scenarios.
In our evaluation of DL models for arrhythmia detection, both ResNet models outperformed DenseNet and VGG16, with ResNet34 and ResNet50 showing similar effectiveness. However, ResNet34 marginally leads as the optimal model with an AUROC of 0.98 and an F1-score of 0.83, compared to ResNet50 (AUROC: 0.98, F1-score: 0.81), DenseNet (AUROC: 0.96, F1-score: 0.77), and VGG16 (AUROC: 0.95, F1-score: 0.76). By inspecting the different ROC curves and the confusion matrix for individual arrhythmia classifications, ResNet34 excels in detecting SNR, LBBB, RBBB, PAC, and STD, while ResNet50 is superior in classifying AF, IAVB, PVC, and STE.
The hybrid approach, combining ResNet34 with Heart Rate Variability (HRV) features, did not enhance performance in any arrhythmia category, including overall average. Notably, it showed reduced accuracy and precision, especially in AF and RBBB categories. This suggests that instead of providing ResNet34 with additional informative features, the HRV integration might have impeded the network’s learning, leading to a decline in classification performance. Thus, the integration of HRV features appears to compromise rather than improve the model’s efficacy in arrhythmia classification.
This study’s results indicate that the employed DL algorithms have the potential to aid clinicians and healthcare professionals in detecting cardiac conditions that might otherwise be missed or diagnosed later through specialist evaluations or echocardiography. Leveraging both short-term and long-term learning, these methods enable early detection of cardiovascular diseases (CVD), facilitating timely treatment initiation. This early intervention can lead to improved health outcomes, while delays or missed diagnoses could exacerbate health conditions.
ML and DL tools offer precise predictions but their ‘black box’ nature poses significant interpretability challenges, hindering clinician acceptance due to difficulties in understanding decision-making processes. This lack of transparency is especially problematic in clinical practice, where clinicians need clarity for effective decision-making. The interpretability and application of these advanced models, particularly in identifying critical features, remain complex, potentially limiting their utility in settings without computer assistance. This opacity significantly impedes the adoption of AI models by healthcare professionals who require comprehensible explanations of AI-derived results. In response, the emergence of XAI is a critical development, aiming to demystify AI models and outcomes to enhance accessibility and user trust. XAI facilitates the identification of key features influencing predictions and explores causal relationships between features and clinical outcomes. Despite its importance, research focusing on the understandability and trust in ML models, particularly those aiding in the diagnosis or prognosis of CVD, is still scarce, highlighting a vital area for future exploration.
In our study, we employed the SHAP technique to analyze explainability at both global and individual levels using the 12-lead ECG data. Globally, SHAP enabled us to perform a lead-specific relevance analysis for each arrhythmia category, guiding clinical experts to focus on the most influential leads identified. For example, the V2 lead is highlighted as the most significant feature in six out of nine arrhythmia categories, with avR, V1, V3, and V4 also being important. This information is valuable for researchers focusing on specific arrhythmia predictions, as it suggests prioritizing these leads in ECG monitoring. Interestingly, this contrasts with the common use of ECG lead II in arrhythmia detection, which our model found to be less relevant, highlighting a divergence from established practices noted in related literature.
The significant advantage of our explainable model lies in the synergy created by merging global and individual explainability approaches. Clinicians can use features deemed important globally to scrutinize individual arrhythmia predictions. This process allows them to verify if the ECG segments identified by SHAP as influential align with established clinical knowledge. Essentially, the model facilitates a comprehensive cross-verification, enabling clinicians to confirm the clinical validity of globally relevant features by examining their contribution to specific arrhythmia classifications on a case-by-case basis.
This study’s primary limitation is its reliance on a single database for constructing the prediction model, potentially affecting both the model’s generalizability and the validity of explainability analysis. Future research should include additional databases like MIT-BIH to validate and benchmark our findings. Additionally, our study focused exclusively on ECG signal characteristics, excluding other potentially informative patient data such as demographic details, medical history, and laboratory results, due to database constraints. Expanding future models to datasets with comprehensive patient information is recommended. Finally, our study lacks clinical validation of the explainability results, a common challenge in CVD prediction models using XAI. Future efforts should aim to correlate XAI findings with established clinical knowledge to enhance their medical relevance.

5 Conclusions

This study developed a deep learning (DL) based prediction model for nine arrhythmia classes using 12-lead ECG signals and tested various DL architectures including ResNet, DenseNet, and VGG16. ResNet34 was identified as the most effective model, achieving an AUROC of 0.98 and an F1-score of 0.826. A hybrid approach combining raw ECG signals with Heart Rate Variability (HRV) features was also tested, but it did not outperform the raw signal model. Additionally, we conducted an explainability analysis using SHAP within the XAI framework. This analysis pinpointed key ECG leads and signal segments critical for arrhythmia prediction, enhancing the model’s transparency and potential usability for clinical professionals. The study highlights the significance of explainable models in cardiovascular disease (CVD) prediction, promoting their acceptance in clinical settings due to the clarity of model decisions.

Acknowledgement

This work is funded by the project PerCard (Personalised Prognostics and Diagnostics for Improved Decision Support in Cardiovascular Diseases) in ERA PerMed supported by the Research Council of Finland (decision number 351846), under the frame of ERA PerMed; by the Federal Ministry of Education and Research, Germany [i.e. “Bundesministerium für Bildung und Forschung (BMBF)”], grant number 01KU2211; and by the Fondazione Regionale della Ricerca Biomedica (FRRB) under the frame of ERA PerMed.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://​creativecommons.​org/​licenses/​by/​4.​0/​), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Literatur
10.
Zurück zum Zitat Cheng, Y., Ye, Y., Hou, M., He, W., Pan, T.: Multi-label arrhythmia classification from fixed-length compressed ECG segments in real-time wearable ECG monitoring. In: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference 2020, pp. 580–583 (2020). https://doi.org/10.1109/EMBC44109.2020.9176188 Cheng, Y., Ye, Y., Hou, M., He, W., Pan, T.: Multi-label arrhythmia classification from fixed-length compressed ECG segments in real-time wearable ECG monitoring. In: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference 2020, pp. 580–583 (2020). https://​doi.​org/​10.​1109/​EMBC44109.​2020.​9176188
18.
Zurück zum Zitat Zhang, Y., Yu, J., Zhang, Y., Liu, C., Li, H.: A convolutional neural network for identifying premature ventricular contraction beat and right bundle branch block beat. Presented at the 2018 International Conference on Sensor Networks and Signal Processing (SNSP 2018) (2018). https://doi.org/10.1109/SNSP.2018.00037 Zhang, Y., Yu, J., Zhang, Y., Liu, C., Li, H.: A convolutional neural network for identifying premature ventricular contraction beat and right bundle branch block beat. Presented at the 2018 International Conference on Sensor Networks and Signal Processing (SNSP 2018) (2018). https://​doi.​org/​10.​1109/​SNSP.​2018.​00037
20.
Metadaten
Titel
Enhancing Arrhythmia Diagnosis with Data-Driven Methods: A 12-Lead ECG-Based Explainable AI Model
verfasst von
Emmanuel C. Chukwu
Pedro A. Moreno-Sánchez
Copyright-Jahr
2024
DOI
https://doi.org/10.1007/978-3-031-59091-7_16

Premium Partner