nach oben

Erschienen in:

2024 | OriginalPaper | Buchkapitel

Interpreting Pretrained Language Models via Concept Bottlenecks

verfasst von : Zhen Tan, Lu Cheng, Song Wang, Bo Yuan, Jundong Li, Huan Liu

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer Nature Singapore

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. However, the lack of interpretability due to their “black-box” nature poses challenges for responsible implementation. Although previous studies have attempted to improve interpretability by using, e.g., attention weights in self-attention layers, these weights often lack clarity, readability, and intuitiveness. In this research, we propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans. For example, we learn the concept of “Food” and investigate how it influences the prediction of a model’s sentiment towards a restaurant review. We introduce C\(^3\)M, which combines human-annotated and machine-generated concepts to extract hidden neurons designed to encapsulate semantically meaningful and task-specific concepts. Through empirical evaluations on real-world datasets, we show that our approach offers valuable insights to interpret PLM behavior, helps diagnose model failures, and enhances model robustness amidst noisy concept labels.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel On the Efficient Explanation of Outlier Detection Ensembles Through Shapley Values

Nächstes Kapitel Unmasking Dementia Detection by Masking Input Gradients: A JSM Approach to Model Interpretability and Precision

Nur mit Berechtigung zugänglich

https://www.kaggle.com/datasets/omkarsabnis/yelp-reviews-dataset.

https://github.com/Zhen-Tan-dmml/CBM_NLP.git.

Abraham, E.D., et al.: Cebab: estimating the causal effects of real-world concepts on NLP model behavior. In: Advances in Neural Information Processing Systems, vol. 35, pp. 17582–17596 (2022)

Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.A.: Mixmatch: a holistic approach to semi-supervised learning. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

Bills, S., et al.: Language models can explain neurons in language models (2023). https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html

Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020)

Cai, H., Xia, R., Yu, J.: Aspect-category-opinion-sentiment quadruple extraction with implicit aspects and opinions. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (2021)

Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

Diao, S., et al.: Black-box prompt learning for pre-trained language models. arXiv preprint arXiv:2201.08531 (2022)

Englesson, E., Azizpour, H.: Generalized Jensen-Shannon divergence loss for learning with noisy labels. In: Advances in Neural Information Processing Systems, vol. 34, pp. 30284–30297 (2021)

Galassi, A., Lippi, M., Torroni, P.: Attention in natural language processing. IEEE Trans. Neural Netw. Learn.Syst. 32(10), 4291–4308 (2020)CrossRef

10.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

11.

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef

12.

Kim, B., et al.: Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). In: International Conference on Machine Learning, pp. 2668–2677. PMLR (2018)

13.

Kim, E., Klinger, R.: Who feels what and why? Annotation of a literature corpus with semantic roles of emotions. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1345–1359 (2018)

14.

Koh, P.W., et al.: Concept bottleneck models. In: International Conference on Machine Learning, pp. 5338–5348. PMLR (2020)

15.

Liu, Y., Cheng, H., Zhang, K.: Identifiability of label noise transition matrix. arXiv preprint arXiv:2202.02016 (2022)

16.

Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

17.

Losch, M., Fritz, M., Schiele, B.: Interpretability beyond classification output: semantic bottleneck networks. arXiv preprint arXiv:1907.10882 (2019)

18.

Maas, A., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150 (2011)

19.

Madsen, A., Reddy, S., Chandar, S.: Post-hoc interpretability for neural NLP: a survey. ACM Comput. Surv. 55(8), 1–42 (2022)CrossRef

20.

Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

21.

Németh, R., Sik, D., Máté, F.: Machine learning of concepts hard even for humans: the case of online depression forums. Int. J. Qual. Methods 19, 1609406920949338 (2020)CrossRef

22.

Oikarinen, T., Das, S., Nguyen, L.M., Weng, T.-W.: Label-free concept bottleneck models. In: The Eleventh International Conference on Learning Representations (2023)

23.

OpenAI. Gpt-4 Technical report (2023)

24.

Paszke, A., et al.: Automatic differentiation in PyTorch. In: NeurIPS (2017)

25.

Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)

26.

Ross, A., Marasović, A., Peters, M.E.: Explaining NLP models via minimal contrastive editing (mice). In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 3840–3852 (2021)

27.

Sohn, K., et al.: Fixmatch: simplifying semi-supervised learning with consistency and confidence. In: Advances in Neural Information Processing Systems, vol. 33, pp. 596–608 (2020)

28.

Vig, J., et al.: Investigating gender bias in language models using causal mediation analysis. In: Advances in Neural Information Processing Systems, vol. 33, pp. 12388–12401 (2020)

29.

Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing (2020)

30.

Yang, J., Zhang, Y., Li, L., Li, X.: Yedda: a lightweight collaborative text span annotation tool. arXiv preprint arXiv:1711.03759 (2017)

31.

Yin, K., Neubig, G.: Interpreting language models with contrastive explanations. arXiv preprint arXiv:2202.10419 (2022)

32.

Zarlenga, M.E., et al.: Concept embedding models. In: NeurIPS 2022 - 36th Conference on Neural Information Processing Systems (2022)

33.

Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: Mixup: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)

34.

Zhang, W., Li, X., Deng, Y., Bing, L., Lam, W.: A survey on aspect-based sentiment analysis: tasks, methods, and challenges. IEEE Trans. Knowl. Data Eng. (2022)

35.

Zhu, J., et al.: Incorporating BERT into neural machine translation. In: International Conference on Learning Representations (2020)

Titel: Interpreting Pretrained Language Models via Concept Bottlenecks
verfasst von: Zhen Tan
Lu Cheng
Song Wang
Bo Yuan
Jundong Li
Huan Liu
Verlag: Springer Nature Singapore
Buch: Advances in Knowledge Discovery and Data Mining
Print ISBN: 978-981-9722-61-7

Electronic ISBN: 978-981-9722-59-4

Copyright-Jahr: 2024
DOI: https://doi.org/10.1007/978-981-97-2259-4_5

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner