nach oben

Erschienen in:

2024 | OriginalPaper | Buchkapitel

PDMTT: A Plagiarism Detection Model Towards Multi-turn Text Back-Translation

verfasst von : Xiaoling He, Yuanding Zhou, Chuan Qin, Zhenxing Qian, Xinpeng Zhang

Erschienen in: Digital Forensics and Watermarking

Verlag: Springer Nature Singapore

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

With the development of communication technologies, the practice of creating new texts by manipulating original sentence structures through multi-turn machine translation is widespread across various domains. Existing plagiarism detection models often treat different features uniformly and overlook the significance of disparities within high-dimensional features. Therefore, this paper proposes a novel plagiarism detection model towards multi-turn text back-translation (PDMTT), adopting a novel mechanism that combines local and global features and enhances them. The grouping enhancement fusion (GEF) mechanism assigns importance coefficients to sub-features, reinforcing critical aspects while diminishing less relevant ones. These enhanced features, generated by the GEF mechanism, are leveraged to extract high-quality text representations, thereby improving the precision of the model in distinguishing original content from back-translated texts. Furthermore, we improve the back-translation plagiarism detection capability of our model by optimizing the contrastive loss function and utilizing the fused translated representations as targets. To validate the effectiveness of our model, we also constructed a multi-tuple back-translation plagiarism dataset for model training and validation. Experimental results demonstrate that the proposed PDMTT outperforms previous methods in back-translation plagiarism detection, yielding superior text representations. The ablation study further confirms that the incorporation of the GEF mechanism effectively enhances the discrimination capability of our model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Privacy-Preserving Image Scaling Using Bicubic Interpolation and Homomorphic Encryption

Nächstes Kapitel An Image Perceptual Hashing Algorithm Based on Convolutional Neural Networks

Lu, L., Zhou, L.: DNAP: detection of news article plagiarism. In: 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), Chengdu, China, pp. 337–341 (2021)

Jones, M.: Back-translation: the latest form of plagiarism. In: The 4th Asia Pacific Conference on Educational Integrity, Wollongong, Australia, pp. 1–7 (2009)

Anchal, P., Urvashi, G.: A review on diverse algorithms used in the context of plagiarism detection. In: 2023 International Conference on Advancement in Computation & Computer Technologies (InCACCT), Gharuan, India, pp. 1–6 (2023)

Salha, A., Naomie, S., Ajith, A.: Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans. Syst. Man Cybern. Part C 42(2), 133–149 (2012)CrossRef

Alzahrani, S., Salim, N.: Fuzzy semantic-based string similarity for extrinsic plagiarism detection lab report for PAN at CLEF 2010. In: CLEF 2010 LABs and Workshops, Notebook Papers, Padua, Italy, 22–23 September 2010 (2010)

zu Eissen, S.M., Stein, B.: Intrinsic plagiarism detection. In: Lalmas, M., MacFarlane, A., Rüger, S., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 565–569. Springer, Heidelberg (2006). https://doi.org/10.1007/11735106_66CrossRef

El-Rashidy, M.A., Mohamed, R.G., El-Fishawy, N.A., et al.: An effective text plagiarism detection system based on feature selection and SVM techniques. Multimedia Tools Appl. 83, 2609–2646 (2023). https://doi.org/10.1007/s11042-023-15703-4CrossRef

Poibeau, T.: Machine Translation. MIT Press, Cambridge (2017)CrossRef

Yoon, K.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014, pp. 1746–1751 (2014)

10.

Cho, K., Van, M.B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014, pp. 1724–1734 (2014)

11.

Jeffrey, P., Richard, S., Christopher, D.M.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014, pp. 1532–1543 (2014)

12.

Jacob, D., Ming-Wei, C., Kenton, L., Kristina, T.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT 2019, pp. 4171–4186 (2019)

13.

Jun, G., Di, H., Xu, T., et al.: Representation degeneration problem in training natural language generation models. In: International Conference on Learning Representations, New Orleans, America, 6–9 May 2018 (2018)

14.

Nils, R., Iryna, G.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019, pp. 3982–3992 (2019)

15.

Li, B., Zhou, H., He, J.X., et al.: On the sentence embeddings from pre-trained language models. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 16–18 November 2020, pp. 9119–9130 (2020)

16.

Su, J.L., Cao, J.R., Liu, W.J., Ouyang, Y.W.: Whitening sentence representations for better semantics and faster retrieval. CoRR abs/2103.15316 (2021)

17.

Yan, Y.M., Li, R.M., Wang, S.R., et al.: ConSERT: a contrastive framework for self-supervised sentence representation transfer. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 5065–5075. Association for Computational Linguistics (2021)

18.

Spaces.Ac.cn. https://spaces.ac.cn/archives/8860. Accessed 12 June 2022

19.

Li, X., Hu, X.L., Yang, J.: Spatial group-wise enhance: improving semantic feature learning in convolutional networks. CoRR abs/1905.09646 (2019)

20.

Hu, B.T., Chen, Q.C., Zhu, F.Z.: LCSTS: a large scale chinese short text summarization dataset. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015, pp. 1967–1972 (2015)

21.

Cer, D., Diab, M., Agirre, E., et al.: SemEval-2017 task 1: semantic textual similarity multilingual and crosslingual focused evaluation. In: The 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada, August 2017, pp. 1–14 (2017)

22.

Nils, R., Philip, B., Iryna, G.: Task-oriented intrinsic evaluation of semantic textual similarity. In: Proceedings of COLING 2016, The 26th International Conference on Computational Linguistics: Technical Papers, pp. 87–96. The COLING 2016 Organizing Committee, Osaka (2016)

23.

Wang, T.Z., Isola, P.: Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: Proceedings of the 37th International Conference on Machine Learning. PMLR, vol. 119, pp. 9929–9939 (2020)

24.

Gao, T.Y., Yao, X.C., Chen, D.Q.: SimCSE: simple contrastive learning of sentence embeddings. In: 2021 Conference on Empirical Methods in Natural Language Processing, Virtual, Punta Cana, 7–11 November 2021, pp. 6894–6910 (2021)

25.

Conneau, A., Kiela, D., Schwenk, H., et al.: Supervised learning of universal sentence representations from natural language inference data. In: Palmer, M., Hwa, R., Riedel, S. (eds) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, 07–11 September 2017, pp. 670–680. Association for Computational Linguistics, Copenhagen (2017)

26.

Feng, M.F., Chen, Y.S., Guo, Y.C., et al.: Learning text representations for finding similar exercises. In: 2019 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW), Yilan, Taiwan, 20–22 May 2019, pp. 1–2 (2019)

Titel: PDMTT: A Plagiarism Detection Model Towards Multi-turn Text Back-Translation
verfasst von: Xiaoling He
Yuanding Zhou
Chuan Qin
Zhenxing Qian
Xinpeng Zhang
Verlag: Springer Nature Singapore
Buch: Digital Forensics and Watermarking
Print ISBN: 978-981-9725-84-7

Electronic ISBN: 978-981-9725-85-4

Copyright-Jahr: 2024
DOI: https://doi.org/10.1007/978-981-97-2585-4_6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner