Skip to main content

2024 | OriginalPaper | Buchkapitel

FakeClaim: A Multiple Platform-Driven Dataset for Identification of Fake News on 2023 Israel-Hamas War

verfasst von : Gautam Kishore Shahi, Amit Kumar Jaiswal, Thomas Mandl

Erschienen in: Advances in Information Retrieval

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We contribute the first publicly available dataset of factual claims from different platforms and fake YouTube videos on the 2023 Israel-Hamas war for automatic fake YouTube video classification. The FakeClaim data is collected from 60 fact-checking organizations in 30 languages and enriched with metadata from the fact-checking organizations curated by trained journalists specialized in fact-checking. Further, we classify fake videos within the subset of YouTube videos using textual information and user comments. We used a pre-trained model to classify each video with different feature combinations. Our best-performing fine-tuned language model, Universal Sentence Encoder (USE), achieves a Macro F1 of 87%, which shows that the trained model can be helpful for debunking fake videos using the comments from the user discussion.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Lovelace, A.G.: Tomorrow’s wars and the media. The US Army War College Quarterly: Parameters 52(2), 117–134 (2022) Lovelace, A.G.: Tomorrow’s wars and the media. The US Army War College Quarterly: Parameters 52(2), 117–134 (2022)
2.
Zurück zum Zitat Shahi, G.K., Nandini, D.: FakeCovid–a multilingual cross-domain fact check news dataset for COVID-19. arXiv preprint arXiv:2006.11343 (2020) Shahi, G.K., Nandini, D.: FakeCovid–a multilingual cross-domain fact check news dataset for COVID-19. arXiv preprint arXiv:​2006.​11343 (2020)
3.
Zurück zum Zitat Shahi, G.K., Majchrzak, T.A.: AMUSED: an annotation framework of multimodal social media data. In: Sanfilippo, F., Granmo, O.C., Yayilgan, S.Y., Bajwa, I.S. (eds.) Intelligent Technologies and Applications. INTAP 2021. CCIS, vol. 1616. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-10525-8_23 Shahi, G.K., Majchrzak, T.A.: AMUSED: an annotation framework of multimodal social media data. In: Sanfilippo, F., Granmo, O.C., Yayilgan, S.Y., Bajwa, I.S. (eds.) Intelligent Technologies and Applications. INTAP 2021. CCIS, vol. 1616. Springer, Cham (2022). https://​doi.​org/​10.​1007/​978-3-031-10525-8_​23
4.
Zurück zum Zitat Cer, D., et al.: Universal sentence encoder for English. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP: System Demonstrations, Brussels, Belgium, October 31–November 4, pp. 169–174. Association for Computational Linguistics (2018) Cer, D., et al.: Universal sentence encoder for English. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP: System Demonstrations, Brussels, Belgium, October 31–November 4, pp. 169–174. Association for Computational Linguistics (2018)
5.
Zurück zum Zitat Fatima, K. Salem, A. Feel, R.A. Elbassuoni, S. Jaber, M., Farah, M.: FA-KES: a fake news dataset around the Syrian war. In: Proceedings of the Thirteenth International Conference on Web and Social Media, ICWSM 2019, Munich, Germany, June 11–14, 2019, pp. 573–582. AAAI Press (2019) Fatima, K. Salem, A. Feel, R.A. Elbassuoni, S. Jaber, M., Farah, M.: FA-KES: a fake news dataset around the Syrian war. In: Proceedings of the Thirteenth International Conference on Web and Social Media, ICWSM 2019, Munich, Germany, June 11–14, 2019, pp. 573–582. AAAI Press (2019)
6.
Zurück zum Zitat Chen, E., Ferrara, E.: Tweets in time of conflict: a public dataset tracking the Twitter discourse on the war between Ukraine and Russia. In: Proceedings of the Seventeenth International AAAI Conference on Web and Social Media, ICWSM 2023, June 5–8, 2023, Limassol, Cyprus, pp. 1006–1013. AAAI Press (2023) Chen, E., Ferrara, E.: Tweets in time of conflict: a public dataset tracking the Twitter discourse on the war between Ukraine and Russia. In: Proceedings of the Seventeenth International AAAI Conference on Web and Social Media, ICWSM 2023, June 5–8, 2023, Limassol, Cyprus, pp. 1006–1013. AAAI Press (2023)
7.
Zurück zum Zitat Pierri, F., Luceri, L., Jindal, N., Ferrara, E.: Propaganda and misinformation on Facebook and Twitter during the Russian Invasion of Ukraine. In: Proceedings of the 15th ACM Web Science Conference, WebSci, Austin, TX, USA, 30 April 2023–1 May, pp. 65–74. ACM (2023) Pierri, F., Luceri, L., Jindal, N., Ferrara, E.: Propaganda and misinformation on Facebook and Twitter during the Russian Invasion of Ukraine. In: Proceedings of the 15th ACM Web Science Conference, WebSci, Austin, TX, USA, 30 April 2023–1 May, pp. 65–74. ACM (2023)
8.
Zurück zum Zitat Zhu, Y., Haq, E., Lee, L., Tyson, G., Hui, P.: A Reddit dataset for the Russo-Ukrainian conflict in 2022. arXiv preprint arXiv:2206.05107 (2022) Zhu, Y., Haq, E., Lee, L., Tyson, G., Hui, P.: A Reddit dataset for the Russo-Ukrainian conflict in 2022. arXiv preprint arXiv:​2206.​05107 (2022)
9.
Zurück zum Zitat Shin, Y., Sojdehei, Y., Zheng, L., Blanchard, B.: Content-based unsupervised fake news detection on Ukraine-Russia war. SMU Data Sci. Rev. 7(1), 3 (2023) Shin, Y., Sojdehei, Y., Zheng, L., Blanchard, B.: Content-based unsupervised fake news detection on Ukraine-Russia war. SMU Data Sci. Rev. 7(1), 3 (2023)
10.
Zurück zum Zitat Köhler, J., et al.: Overview of the CLEF-2022 checkthat! lab: task 3 on fake news detection. In: Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum, Bologna, Italy, Sept. 5th–8th, pp. 404–421. CEUR-WS.org (2022) Köhler, J., et al.: Overview of the CLEF-2022 checkthat! lab: task 3 on fake news detection. In: Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum, Bologna, Italy, Sept. 5th–8th, pp. 404–421. CEUR-WS.org (2022)
11.
Zurück zum Zitat Shahi, G.K., Struß, J.M., Mandl, T.: Overview of the CLEF-2021 checkthat! lab: task 3 on fake news detection. In: Proceedings of the Working Notes of CLEF - Conference and Labs of the Evaluation Forum, Bucharest, Romania, September 21–24th, volume 2936 of CEUR Workshop Proceedings, pp. 406–423. CEUR-WS.org, (2021) Shahi, G.K., Struß, J.M., Mandl, T.: Overview of the CLEF-2021 checkthat! lab: task 3 on fake news detection. In: Proceedings of the Working Notes of CLEF - Conference and Labs of the Evaluation Forum, Bucharest, Romania, September 21–24th, volume 2936 of CEUR Workshop Proceedings, pp. 406–423. CEUR-WS.org, (2021)
12.
Zurück zum Zitat Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)
13.
Zurück zum Zitat Röchert, D., Shahi, G.K., Neubaum, G., Ross, B., Stieglitz, S.: The networked context of COVID-19 misinformation: Informational homogeneity on Youtube at the beginning of the pandemic. Online Soc. Netw. Media 26, 100164 (2021) Röchert, D., Shahi, G.K., Neubaum, G., Ross, B., Stieglitz, S.: The networked context of COVID-19 misinformation: Informational homogeneity on Youtube at the beginning of the pandemic. Online Soc. Netw. Media 26, 100164 (2021)
14.
Zurück zum Zitat Wu, J., Hooi, B.: Decor: degree-corrected social graph refinement for fake news detection. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2582–2593 (2023) Wu, J., Hooi, B.: Decor: degree-corrected social graph refinement for fake news detection. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2582–2593 (2023)
15.
Zurück zum Zitat Shahi, G.K., Dirkson, A., Majchrzak, T.: An exploratory study of COVID-19 misinformation on Twitter. Online Soc. Netw. Media 22, 100104 (2021)CrossRef Shahi, G.K., Dirkson, A., Majchrzak, T.: An exploratory study of COVID-19 misinformation on Twitter. Online Soc. Netw. Media 22, 100104 (2021)CrossRef
16.
Zurück zum Zitat Ajao, O., Bhowmik, D., Zargari, Z.: Fake news identification on Twitter with hybrid CNN and RNN models. In: Proceedings of the 9th International Conference on Social Media and Society, SMSociety 2018, Copenhagen, Denmark, July 18–20, 2018, pp. 226–230. ACM (2018) Ajao, O., Bhowmik, D., Zargari, Z.: Fake news identification on Twitter with hybrid CNN and RNN models. In: Proceedings of the 9th International Conference on Social Media and Society, SMSociety 2018, Copenhagen, Denmark, July 18–20, 2018, pp. 226–230. ACM (2018)
17.
Zurück zum Zitat Shazeer, N., Doherty, R., Evans, C., Waterson, C.: Swivel: improving embeddings by noticing what’s missing. arXiv preprint arXiv:1602.02215 (2016) Shazeer, N., Doherty, R., Evans, C., Waterson, C.: Swivel: improving embeddings by noticing what’s missing. arXiv preprint arXiv:​1602.​02215 (2016)
20.
Zurück zum Zitat Jaiswal, A., Liu, H.: Lightweight adaptation of neural language models via subspace embedding. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp. 3968–3972 (2023) Jaiswal, A., Liu, H.: Lightweight adaptation of neural language models via subspace embedding. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp. 3968–3972 (2023)
Metadaten
Titel
FakeClaim: A Multiple Platform-Driven Dataset for Identification of Fake News on 2023 Israel-Hamas War
verfasst von
Gautam Kishore Shahi
Amit Kumar Jaiswal
Thomas Mandl
Copyright-Jahr
2024
DOI
https://doi.org/10.1007/978-3-031-56069-9_5

Premium Partner