Skip to main content

2024 | OriginalPaper | Buchkapitel

SALSA: Salience-Based Switching Attack for Adversarial Perturbations in Fake News Detection Models

verfasst von : Chahat Raj, Anjishnu Mukherjee, Hemant Purohit, Antonios Anastasopoulos, Ziwei Zhu

Erschienen in: Advances in Information Retrieval

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Despite advances in fake news detection algorithms, recent research reveals that machine learning-based fake news detection models are still vulnerable to carefully crafted adversarial attacks. In this landscape, traditional methods, often relying on text perturbations or heuristic-based approaches, have proven insufficient, revealing a critical need for more nuanced and context-aware strategies to enhance the robustness of fake news detection. Our research identifies and addresses three critical areas: creating subtle perturbations, preserving core information while modifying sentence structure, and incorporating inherent interpretability. We propose SALSA, an adversarial Salience-based Switching Attack strategy that harnesses salient words, using similarity-based switching to address the shortcomings of traditional adversarial attack methods. Using SALSA, we perform a two-way attack: misclassifying real news as fake and fake news as real. Due to the absence of standardized metrics to evaluate adversarial attacks in fake news detection, we further propose three new evaluation metrics to gauge the attack’s success. Finally, we validate the transferability of our proposed attack strategy across attacker and victim models, demonstrating our approach’s broad applicability and potency. Code and data are available here at https://​github.​com/​iamshnoo/​salsa.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
2.
Zurück zum Zitat Ali, H., et al.: All your fake detector are belong to us: evaluating adversarial robustness of fake-news detectors under black-box settings. IEEE Access 9, 81678–81692 (2021)CrossRef Ali, H., et al.: All your fake detector are belong to us: evaluating adversarial robustness of fake-news detectors under black-box settings. IEEE Access 9, 81678–81692 (2021)CrossRef
3.
Zurück zum Zitat Chang, G., Gao, H., Yao, Z., Xiong, H.: TextGuise: adaptive adversarial example attacks on text classification model. Neurocomputing 529, 190–203 (2023)CrossRef Chang, G., Gao, H., Yao, Z., Xiong, H.: TextGuise: adaptive adversarial example attacks on text classification model. Neurocomputing 529, 190–203 (2023)CrossRef
4.
Zurück zum Zitat Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. arXiv arXiv:1712.06751 (2017) Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. arXiv arXiv:​1712.​06751 (2017)
6.
Zurück zum Zitat Gao, J., Lanchantin, J., Soffa, M.L., Qi, Y.: Black-box generation of adversarial text sequences to evade deep learning classifiers. In: 2018 IEEE Security and Privacy Workshops (SPW), pp. 50–56. IEEE (2018) Gao, J., Lanchantin, J., Soffa, M.L., Qi, Y.: Black-box generation of adversarial text sequences to evade deep learning classifiers. In: 2018 IEEE Security and Privacy Workshops (SPW), pp. 50–56. IEEE (2018)
7.
Zurück zum Zitat Ghaffari Laleh, N., et al.: Adversarial attacks and adversarial robustness in computational pathology. Nat. Commun. 13(1), 5711 (2022)CrossRef Ghaffari Laleh, N., et al.: Adversarial attacks and adversarial robustness in computational pathology. Nat. Commun. 13(1), 5711 (2022)CrossRef
8.
Zurück zum Zitat Horne, B.D., Nørregaard, J., Adali, S.: Robust fake news detection over time and attack. ACM Trans. Intell. Syst. Technol. (TIST) 11(1), 1–23 (2019) Horne, B.D., Nørregaard, J., Adali, S.: Robust fake news detection over time and attack. ACM Trans. Intell. Syst. Technol. (TIST) 11(1), 1–23 (2019)
10.
Zurück zum Zitat Koenders, C., Filla, J., Schneider, N., Woloszyn, V.: How vulnerable are automatic fake news detection methods to adversarial attacks? arXiv arXiv:2107.07970 (2021) Koenders, C., Filla, J., Schneider, N., Woloszyn, V.: How vulnerable are automatic fake news detection methods to adversarial attacks? arXiv arXiv:​2107.​07970 (2021)
11.
Zurück zum Zitat Li, J., Ji, S., Du, T., Li, B., Wang, T.: TextBugger: generating adversarial text against real-world applications. arXiv arXiv:1812.05271 (2018) Li, J., Ji, S., Du, T., Li, B., Wang, T.: TextBugger: generating adversarial text against real-world applications. arXiv arXiv:​1812.​05271 (2018)
12.
Zurück zum Zitat Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017) Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
13.
Zurück zum Zitat Morris, J.X., Lifland, E., Yoo, J.Y., Grigsby, J., Jin, D., Qi, Y.: TextAttack: a framework for adversarial attacks, data augmentation, and adversarial training in NLP. arXiv arXiv:2005.05909 (2020) Morris, J.X., Lifland, E., Yoo, J.Y., Grigsby, J., Jin, D., Qi, Y.: TextAttack: a framework for adversarial attacks, data augmentation, and adversarial training in NLP. arXiv arXiv:​2005.​05909 (2020)
14.
Zurück zum Zitat Nørregaard, J., Horne, B.D., Adalı, S.: NELA-GT-2018: a large multi-labelled news dataset for the study of misinformation in news articles. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 13, pp. 630–638 (2019) Nørregaard, J., Horne, B.D., Adalı, S.: NELA-GT-2018: a large multi-labelled news dataset for the study of misinformation in news articles. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 13, pp. 630–638 (2019)
15.
Zurück zum Zitat Oshikawa, R., Qian, J., Wang, W.Y.: A survey on natural language processing for fake news detection. arXiv arXiv:1811.00770 (2018) Oshikawa, R., Qian, J., Wang, W.Y.: A survey on natural language processing for fake news detection. arXiv arXiv:​1811.​00770 (2018)
16.
Zurück zum Zitat Pan, L., Hang, C.W., Sil, A., Potdar, S.: Improved text classification via contrastive adversarial training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 11130–11138 (2022) Pan, L., Hang, C.W., Sil, A., Potdar, S.: Improved text classification via contrastive adversarial training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 11130–11138 (2022)
17.
Zurück zum Zitat Pruthi, D., Dhingra, B., Lipton, Z.C.: Combating adversarial misspellings with robust word recognition. arXiv arXiv:1905.11268 (2019) Pruthi, D., Dhingra, B., Lipton, Z.C.: Combating adversarial misspellings with robust word recognition. arXiv arXiv:​1905.​11268 (2019)
18.
Zurück zum Zitat Ren, S., Deng, Y., He, K., Che, W.: Generating natural language adversarial examples through probability weighted word saliency. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, July 2019, pp. 1085–1097. Association for Computational Linguistics (2019) Ren, S., Deng, Y., He, K., Che, W.: Generating natural language adversarial examples through probability weighted word saliency. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, July 2019, pp. 1085–1097. Association for Computational Linguistics (2019)
19.
Zurück zum Zitat Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108 (2019) Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108 (2019)
20.
Zurück zum Zitat Shu, K., Cui, L., Wang, S., Lee, D., Liu, H.: dEFEND: explainable fake news detection. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 395–405 (2019) Shu, K., Cui, L., Wang, S., Lee, D., Liu, H.: dEFEND: explainable fake news detection. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 395–405 (2019)
21.
Zurück zum Zitat Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. Newsl. 19(1), 22–36 (2017)CrossRef Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. Newsl. 19(1), 22–36 (2017)CrossRef
22.
Zurück zum Zitat Shu, K., Wang, S., Liu, H.: Beyond news contents: the role of social context for fake news detection. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 312–320 (2019) Shu, K., Wang, S., Liu, H.: Beyond news contents: the role of social context for fake news detection. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 312–320 (2019)
23.
Zurück zum Zitat Simoncini, W., Spanakis, G.: SeqAttack: on adversarial attacks for named entity recognition. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 308–318 (2021) Simoncini, W., Spanakis, G.: SeqAttack: on adversarial attacks for named entity recognition. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 308–318 (2021)
24.
Zurück zum Zitat Wolf, T., et al.: Transformers: State-of-the-Art Natural Language Processing, October 2020, pp. 38–45 (2020) Wolf, T., et al.: Transformers: State-of-the-Art Natural Language Processing, October 2020, pp. 38–45 (2020)
25.
Zurück zum Zitat Xu, K., et al.: Structured adversarial attack: towards general implementation and better interpretability. arXiv arXiv:1808.01664 (2018) Xu, K., et al.: Structured adversarial attack: towards general implementation and better interpretability. arXiv arXiv:​1808.​01664 (2018)
27.
Zurück zum Zitat Zhang, X., Ghorbani, A.A.: An overview of online fake news: characterization, detection, and discussion. Inf. Process. Manage. 57(2), 102025 (2020)CrossRef Zhang, X., Ghorbani, A.A.: An overview of online fake news: characterization, detection, and discussion. Inf. Process. Manage. 57(2), 102025 (2020)CrossRef
28.
Zurück zum Zitat Zhou, X., Zafarani, R.: A survey of fake news: fundamental theories, detection methods, and opportunities. ACM Comput. Surv. (CSUR) 53(5), 1–40 (2020)CrossRef Zhou, X., Zafarani, R.: A survey of fake news: fundamental theories, detection methods, and opportunities. ACM Comput. Surv. (CSUR) 53(5), 1–40 (2020)CrossRef
29.
Zurück zum Zitat Zhou, Z., Guan, H., Bhat, M.M., Hsu, J.: Fake news detection via NLP is vulnerable to adversarial attacks. arXiv arXiv:1901.09657 (2019) Zhou, Z., Guan, H., Bhat, M.M., Hsu, J.: Fake news detection via NLP is vulnerable to adversarial attacks. arXiv arXiv:​1901.​09657 (2019)
Metadaten
Titel
SALSA: Salience-Based Switching Attack for Adversarial Perturbations in Fake News Detection Models
verfasst von
Chahat Raj
Anjishnu Mukherjee
Hemant Purohit
Antonios Anastasopoulos
Ziwei Zhu
Copyright-Jahr
2024
DOI
https://doi.org/10.1007/978-3-031-56069-9_3

Premium Partner