Skip to main content

2024 | OriginalPaper | Buchkapitel

PDMTT: A Plagiarism Detection Model Towards Multi-turn Text Back-Translation

verfasst von : Xiaoling He, Yuanding Zhou, Chuan Qin, Zhenxing Qian, Xinpeng Zhang

Erschienen in: Digital Forensics and Watermarking

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the development of communication technologies, the practice of creating new texts by manipulating original sentence structures through multi-turn machine translation is widespread across various domains. Existing plagiarism detection models often treat different features uniformly and overlook the significance of disparities within high-dimensional features. Therefore, this paper proposes a novel plagiarism detection model towards multi-turn text back-translation (PDMTT), adopting a novel mechanism that combines local and global features and enhances them. The grouping enhancement fusion (GEF) mechanism assigns importance coefficients to sub-features, reinforcing critical aspects while diminishing less relevant ones. These enhanced features, generated by the GEF mechanism, are leveraged to extract high-quality text representations, thereby improving the precision of the model in distinguishing original content from back-translated texts. Furthermore, we improve the back-translation plagiarism detection capability of our model by optimizing the contrastive loss function and utilizing the fused translated representations as targets. To validate the effectiveness of our model, we also constructed a multi-tuple back-translation plagiarism dataset for model training and validation. Experimental results demonstrate that the proposed PDMTT outperforms previous methods in back-translation plagiarism detection, yielding superior text representations. The ablation study further confirms that the incorporation of the GEF mechanism effectively enhances the discrimination capability of our model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Lu, L., Zhou, L.: DNAP: detection of news article plagiarism. In: 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), Chengdu, China, pp. 337–341 (2021) Lu, L., Zhou, L.: DNAP: detection of news article plagiarism. In: 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), Chengdu, China, pp. 337–341 (2021)
2.
Zurück zum Zitat Jones, M.: Back-translation: the latest form of plagiarism. In: The 4th Asia Pacific Conference on Educational Integrity, Wollongong, Australia, pp. 1–7 (2009) Jones, M.: Back-translation: the latest form of plagiarism. In: The 4th Asia Pacific Conference on Educational Integrity, Wollongong, Australia, pp. 1–7 (2009)
3.
Zurück zum Zitat Anchal, P., Urvashi, G.: A review on diverse algorithms used in the context of plagiarism detection. In: 2023 International Conference on Advancement in Computation & Computer Technologies (InCACCT), Gharuan, India, pp. 1–6 (2023) Anchal, P., Urvashi, G.: A review on diverse algorithms used in the context of plagiarism detection. In: 2023 International Conference on Advancement in Computation & Computer Technologies (InCACCT), Gharuan, India, pp. 1–6 (2023)
4.
Zurück zum Zitat Salha, A., Naomie, S., Ajith, A.: Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans. Syst. Man Cybern. Part C 42(2), 133–149 (2012)CrossRef Salha, A., Naomie, S., Ajith, A.: Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans. Syst. Man Cybern. Part C 42(2), 133–149 (2012)CrossRef
5.
Zurück zum Zitat Alzahrani, S., Salim, N.: Fuzzy semantic-based string similarity for extrinsic plagiarism detection lab report for PAN at CLEF 2010. In: CLEF 2010 LABs and Workshops, Notebook Papers, Padua, Italy, 22–23 September 2010 (2010) Alzahrani, S., Salim, N.: Fuzzy semantic-based string similarity for extrinsic plagiarism detection lab report for PAN at CLEF 2010. In: CLEF 2010 LABs and Workshops, Notebook Papers, Padua, Italy, 22–23 September 2010 (2010)
8.
9.
Zurück zum Zitat Yoon, K.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014, pp. 1746–1751 (2014) Yoon, K.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014, pp. 1746–1751 (2014)
10.
Zurück zum Zitat Cho, K., Van, M.B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014, pp. 1724–1734 (2014) Cho, K., Van, M.B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014, pp. 1724–1734 (2014)
11.
Zurück zum Zitat Jeffrey, P., Richard, S., Christopher, D.M.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014, pp. 1532–1543 (2014) Jeffrey, P., Richard, S., Christopher, D.M.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014, pp. 1532–1543 (2014)
12.
Zurück zum Zitat Jacob, D., Ming-Wei, C., Kenton, L., Kristina, T.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT 2019, pp. 4171–4186 (2019) Jacob, D., Ming-Wei, C., Kenton, L., Kristina, T.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT 2019, pp. 4171–4186 (2019)
13.
Zurück zum Zitat Jun, G., Di, H., Xu, T., et al.: Representation degeneration problem in training natural language generation models. In: International Conference on Learning Representations, New Orleans, America, 6–9 May 2018 (2018) Jun, G., Di, H., Xu, T., et al.: Representation degeneration problem in training natural language generation models. In: International Conference on Learning Representations, New Orleans, America, 6–9 May 2018 (2018)
14.
Zurück zum Zitat Nils, R., Iryna, G.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019, pp. 3982–3992 (2019) Nils, R., Iryna, G.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019, pp. 3982–3992 (2019)
15.
Zurück zum Zitat Li, B., Zhou, H., He, J.X., et al.: On the sentence embeddings from pre-trained language models. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 16–18 November 2020, pp. 9119–9130 (2020) Li, B., Zhou, H., He, J.X., et al.: On the sentence embeddings from pre-trained language models. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 16–18 November 2020, pp. 9119–9130 (2020)
16.
Zurück zum Zitat Su, J.L., Cao, J.R., Liu, W.J., Ouyang, Y.W.: Whitening sentence representations for better semantics and faster retrieval. CoRR abs/2103.15316 (2021) Su, J.L., Cao, J.R., Liu, W.J., Ouyang, Y.W.: Whitening sentence representations for better semantics and faster retrieval. CoRR abs/2103.15316 (2021)
17.
Zurück zum Zitat Yan, Y.M., Li, R.M., Wang, S.R., et al.: ConSERT: a contrastive framework for self-supervised sentence representation transfer. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 5065–5075. Association for Computational Linguistics (2021) Yan, Y.M., Li, R.M., Wang, S.R., et al.: ConSERT: a contrastive framework for self-supervised sentence representation transfer. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 5065–5075. Association for Computational Linguistics (2021)
19.
Zurück zum Zitat Li, X., Hu, X.L., Yang, J.: Spatial group-wise enhance: improving semantic feature learning in convolutional networks. CoRR abs/1905.09646 (2019) Li, X., Hu, X.L., Yang, J.: Spatial group-wise enhance: improving semantic feature learning in convolutional networks. CoRR abs/1905.09646 (2019)
20.
Zurück zum Zitat Hu, B.T., Chen, Q.C., Zhu, F.Z.: LCSTS: a large scale chinese short text summarization dataset. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015, pp. 1967–1972 (2015) Hu, B.T., Chen, Q.C., Zhu, F.Z.: LCSTS: a large scale chinese short text summarization dataset. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015, pp. 1967–1972 (2015)
21.
Zurück zum Zitat Cer, D., Diab, M., Agirre, E., et al.: SemEval-2017 task 1: semantic textual similarity multilingual and crosslingual focused evaluation. In: The 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada, August 2017, pp. 1–14 (2017) Cer, D., Diab, M., Agirre, E., et al.: SemEval-2017 task 1: semantic textual similarity multilingual and crosslingual focused evaluation. In: The 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada, August 2017, pp. 1–14 (2017)
22.
Zurück zum Zitat Nils, R., Philip, B., Iryna, G.: Task-oriented intrinsic evaluation of semantic textual similarity. In: Proceedings of COLING 2016, The 26th International Conference on Computational Linguistics: Technical Papers, pp. 87–96. The COLING 2016 Organizing Committee, Osaka (2016) Nils, R., Philip, B., Iryna, G.: Task-oriented intrinsic evaluation of semantic textual similarity. In: Proceedings of COLING 2016, The 26th International Conference on Computational Linguistics: Technical Papers, pp. 87–96. The COLING 2016 Organizing Committee, Osaka (2016)
23.
Zurück zum Zitat Wang, T.Z., Isola, P.: Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: Proceedings of the 37th International Conference on Machine Learning. PMLR, vol. 119, pp. 9929–9939 (2020) Wang, T.Z., Isola, P.: Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: Proceedings of the 37th International Conference on Machine Learning. PMLR, vol. 119, pp. 9929–9939 (2020)
24.
Zurück zum Zitat Gao, T.Y., Yao, X.C., Chen, D.Q.: SimCSE: simple contrastive learning of sentence embeddings. In: 2021 Conference on Empirical Methods in Natural Language Processing, Virtual, Punta Cana, 7–11 November 2021, pp. 6894–6910 (2021) Gao, T.Y., Yao, X.C., Chen, D.Q.: SimCSE: simple contrastive learning of sentence embeddings. In: 2021 Conference on Empirical Methods in Natural Language Processing, Virtual, Punta Cana, 7–11 November 2021, pp. 6894–6910 (2021)
25.
Zurück zum Zitat Conneau, A., Kiela, D., Schwenk, H., et al.: Supervised learning of universal sentence representations from natural language inference data. In: Palmer, M., Hwa, R., Riedel, S. (eds) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, 07–11 September 2017, pp. 670–680. Association for Computational Linguistics, Copenhagen (2017) Conneau, A., Kiela, D., Schwenk, H., et al.: Supervised learning of universal sentence representations from natural language inference data. In: Palmer, M., Hwa, R., Riedel, S. (eds) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, 07–11 September 2017, pp. 670–680. Association for Computational Linguistics, Copenhagen (2017)
26.
Zurück zum Zitat Feng, M.F., Chen, Y.S., Guo, Y.C., et al.: Learning text representations for finding similar exercises. In: 2019 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW), Yilan, Taiwan, 20–22 May 2019, pp. 1–2 (2019) Feng, M.F., Chen, Y.S., Guo, Y.C., et al.: Learning text representations for finding similar exercises. In: 2019 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW), Yilan, Taiwan, 20–22 May 2019, pp. 1–2 (2019)
Metadaten
Titel
PDMTT: A Plagiarism Detection Model Towards Multi-turn Text Back-Translation
verfasst von
Xiaoling He
Yuanding Zhou
Chuan Qin
Zhenxing Qian
Xinpeng Zhang
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-2585-4_6

Premium Partner