Skip to main content

2024 | OriginalPaper | Buchkapitel

On the Efficient Explanation of Outlier Detection Ensembles Through Shapley Values

verfasst von : Simon Klüttermann, Chiara Balestra, Emmanuel Müller

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Feature bagging models have revealed their practical usability in various contexts, among them in outlier detection, where they build ensembles to reliably assign outlier scores to data samples. However, the interpretability of so-obtained outlier detection methods is far from achieved. Among the standard black-box models interpretability approaches, we find Shapley values that clarify the roles of single inputs. However, Shapley values are characterized by high computational runtimes that make them useful in pretty low-dimensional applications. We propose bagged Shapley values, a method to achieve interpretability of feature bagging ensembles, especially for outlier detection. The method not only assigns local importance scores to each feature of the initial space, helping to increase the interpretability but also solves the computational issue; specifically, the bagged Shapley values can be exactly computed in polynomial time.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
All experiments were performed on Intel Xeon E5 CPUs. In the paper, we stick to CPUs over GPUs also when we use neural network submodels; the choice is justified by the higher amount of parallelization they allow.
 
3
The isolation forest takes about \(220\text {min}\) of CPU time. DEAN requires about \(113\text {days}\); However, the independent ensembles are easy to parallelize, and less accurate results can already be achieved with ten thousand submodels (\(27 \text {hours}\)).
 
Literatur
1.
Zurück zum Zitat Ali, K.M., Pazzani, M.J.: Error reduction through learning multiple descriptions. Mach. Learn. 24, 173–202 (1996)CrossRef Ali, K.M., Pazzani, M.J.: Error reduction through learning multiple descriptions. Mach. Learn. 24, 173–202 (1996)CrossRef
2.
Zurück zum Zitat Balestra, C., Li, B., Müller, E.: slidshaps - sliding shapley values for correlation-based change detection in time series. In: DSAA (2023) Balestra, C., Li, B., Müller, E.: slidshaps - sliding shapley values for correlation-based change detection in time series. In: DSAA (2023)
3.
4.
Zurück zum Zitat Burgess, M.A., Chapman, A.C.: Approximating the shapley value using stratified empirical bernstein sampling. In: IJCAI (2021) Burgess, M.A., Chapman, A.C.: Approximating the shapley value using stratified empirical bernstein sampling. In: IJCAI (2021)
5.
Zurück zum Zitat Böing, B., Klüttermann, S., Müller, E.: Post-robustifying deep anomaly detection ensembles by model selection. In: ICDM (2022) Böing, B., Klüttermann, S., Müller, E.: Post-robustifying deep anomaly detection ensembles by model selection. In: ICDM (2022)
6.
Zurück zum Zitat van Campen, T., Hamers, H., Husslage, B., Lindelauf, R.: A new approximation method for the shapley value applied to the WTC 9/11 terrorist attack. Soc. Netw. Anal. Min. 8, 1–12 (2018) van Campen, T., Hamers, H., Husslage, B., Lindelauf, R.: A new approximation method for the shapley value applied to the WTC 9/11 terrorist attack. Soc. Netw. Anal. Min. 8, 1–12 (2018)
7.
Zurück zum Zitat Castro, J., Gómez, D., Tejada, J.: Polynomial calculation of the shapley value based on sampling. Comput. Oper. Res. 36(5), 1726–1730 (2009)MathSciNetCrossRef Castro, J., Gómez, D., Tejada, J.: Polynomial calculation of the shapley value based on sampling. Comput. Oper. Res. 36(5), 1726–1730 (2009)MathSciNetCrossRef
8.
Zurück zum Zitat Deng, L.: The MNIST database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 29, 141–142 (2012)CrossRef Deng, L.: The MNIST database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 29, 141–142 (2012)CrossRef
9.
Zurück zum Zitat Dissanayake, T., Fernando, T., Denman, S., Sridharan, S., Ghaemmaghami, H., Fookes, C.: A robust interpretable deep learning classifier for heart anomaly detection without segmentation. IEEE J. Biomed. Health Inform. 25, 2162–2171 (2021)CrossRef Dissanayake, T., Fernando, T., Denman, S., Sridharan, S., Ghaemmaghami, H., Fookes, C.: A robust interpretable deep learning classifier for heart anomaly detection without segmentation. IEEE J. Biomed. Health Inform. 25, 2162–2171 (2021)CrossRef
10.
Zurück zum Zitat Dong, L., Shulin, L., Zhang, H.: A method of anomaly detection and fault diagnosis with online adaptive learning under small training samples. Pattern Recogn. 64, 374–385 (2017)CrossRef Dong, L., Shulin, L., Zhang, H.: A method of anomaly detection and fault diagnosis with online adaptive learning under small training samples. Pattern Recogn. 64, 374–385 (2017)CrossRef
11.
Zurück zum Zitat Han, S., Hu, X., Huang, H., Jiang, M., Zhao, Y.: Adbench: anomaly detection benchmark. In: NeurIPS (2022) Han, S., Hu, X., Huang, H., Jiang, M., Zhao, Y.: Adbench: anomaly detection benchmark. In: NeurIPS (2022)
12.
Zurück zum Zitat Hilal, W., Gadsden, S.A., Yawney, J.: Financial fraud: a review of anomaly detection techniques and recent advances. Expert Syst. Appl. 193, 116429 (2022)CrossRef Hilal, W., Gadsden, S.A., Yawney, J.: Financial fraud: a review of anomaly detection techniques and recent advances. Expert Syst. Appl. 193, 116429 (2022)CrossRef
13.
Zurück zum Zitat Kadir, T., Brady, M.: Saliency, scale and image description. Int. J. Comput. Vision 45(2), 83–105 (2001)CrossRef Kadir, T., Brady, M.: Saliency, scale and image description. Int. J. Comput. Vision 45(2), 83–105 (2001)CrossRef
14.
Zurück zum Zitat Klüttermann, S., Müller, E.: Evaluating and comparing heterogeneous ensemble methods for unsupervised anomaly detection. In: IJCNN (2023) Klüttermann, S., Müller, E.: Evaluating and comparing heterogeneous ensemble methods for unsupervised anomaly detection. In: IJCNN (2023)
15.
Zurück zum Zitat Li, Z., Zhu, Y., Van Leeuwen, M.: A survey on explainable anomaly detection. ACM Trans. Knowl. Discovery Data 18, 1–54 (2023) Li, Z., Zhu, Y., Van Leeuwen, M.: A survey on explainable anomaly detection. ACM Trans. Knowl. Discovery Data 18, 1–54 (2023)
16.
Zurück zum Zitat Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: ICDM (2008) Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: ICDM (2008)
17.
Zurück zum Zitat Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV (2015) Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV (2015)
18.
Zurück zum Zitat Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017) Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
19.
Zurück zum Zitat Müller, E., Keller, F., Blanc, S., Böhm, K.: Outrules: a framework for outlier descriptions in multiple context spaces. In: ECML PKDD (2012) Müller, E., Keller, F., Blanc, S., Böhm, K.: Outrules: a framework for outlier descriptions in multiple context spaces. In: ECML PKDD (2012)
20.
Zurück zum Zitat Park, C.H., Kim, J.: An explainable outlier detection method using region-partition trees. J. Supercomput. 77, 3062–3076 (2021)CrossRef Park, C.H., Kim, J.: An explainable outlier detection method using region-partition trees. J. Supercomput. 77, 3062–3076 (2021)CrossRef
21.
Zurück zum Zitat Ribeiro, M.T., Singh, S., Guestrin, C.: “why should i trust you?” explaining the predictions of any classifier. In: KDD (2016) Ribeiro, M.T., Singh, S., Guestrin, C.: “why should i trust you?” explaining the predictions of any classifier. In: KDD (2016)
22.
Zurück zum Zitat Ruff, L., et al.: Deep one-class classification. In: ICML (2018) Ruff, L., et al.: Deep one-class classification. In: ICML (2018)
23.
Zurück zum Zitat Sandim, M.O.: Using Stacked Generalization for Anomaly Detection. Ph.D. thesis Sandim, M.O.: Using Stacked Generalization for Anomaly Detection. Ph.D. thesis
24.
Zurück zum Zitat Schapire, R.E., et al.: A brief introduction to boosting. In: IJCAI (1999) Schapire, R.E., et al.: A brief introduction to boosting. In: IJCAI (1999)
25.
Zurück zum Zitat Shapley, L.S.: A value for n-person games. Contributions to the Theory of Games (1953) Shapley, L.S.: A value for n-person games. Contributions to the Theory of Games (1953)
26.
Zurück zum Zitat Strumbelj, E., Kononenko, I.: An efficient explanation of individual classifications using game theory. J. Mach. Learn. Res. 11, 1–18 (2010)MathSciNet Strumbelj, E., Kononenko, I.: An efficient explanation of individual classifications using game theory. J. Mach. Learn. Res. 11, 1–18 (2010)MathSciNet
27.
Zurück zum Zitat Štrumbelj, E., Kononenko, I.: Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 41(3), 647–665 (2014)CrossRef Štrumbelj, E., Kononenko, I.: Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 41(3), 647–665 (2014)CrossRef
28.
Zurück zum Zitat Takahashi, T., Ishiyama, R.: FIBAR: fingerprint imaging by binary angular reflection for individual identification of metal parts. In: EST (2014) Takahashi, T., Ishiyama, R.: FIBAR: fingerprint imaging by binary angular reflection for individual identification of metal parts. In: EST (2014)
29.
Zurück zum Zitat Tallón-Ballesteros, A., Chen, C.: Explainable AI: using shapley value to explain complex anomaly detection ml-based systems. Mach. Learn. Artif. Intell. 332, 152 (2020) Tallón-Ballesteros, A., Chen, C.: Explainable AI: using shapley value to explain complex anomaly detection ml-based systems. Mach. Learn. Artif. Intell. 332, 152 (2020)
30.
Zurück zum Zitat Triguero, I., et al.: Keel 3.0: An open source software for multi-stage analysis in data mining. Int. J. Comput. Intell. Syst. 10, 1238–1249 (2017)CrossRef Triguero, I., et al.: Keel 3.0: An open source software for multi-stage analysis in data mining. Int. J. Comput. Intell. Syst. 10, 1238–1249 (2017)CrossRef
31.
Zurück zum Zitat Zimek, A., Campello, R.J., Sander, J.: Ensembles for unsupervised outlier detection: challenges and research questions a position paper. SIGKDD Expl. Newslet. 15, 11–22 (2014)CrossRef Zimek, A., Campello, R.J., Sander, J.: Ensembles for unsupervised outlier detection: challenges and research questions a position paper. SIGKDD Expl. Newslet. 15, 11–22 (2014)CrossRef
Metadaten
Titel
On the Efficient Explanation of Outlier Detection Ensembles Through Shapley Values
verfasst von
Simon Klüttermann
Chiara Balestra
Emmanuel Müller
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-2259-4_4

Premium Partner