Skip to main content

2024 | OriginalPaper | Buchkapitel

Linguistic Steganalysis Based on Clustering and Ensemble Learning in Imbalanced Scenario

verfasst von : Shengnan Guo, Xuekai Chen, Zhuang Wang, Zhongliang Yang, Linna Zhou

Erschienen in: Digital Forensics and Watermarking

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the rapid development of the Internet, more and more methods of text steganography have emerged. However, these methods are easily abused in public networks for malicious purposes, which poses a great threat to cyberspace security. At present, a large number of text steganalysis methods have been proposed to game with text steganography. However, existing methods typically assume a balanced class distribution. In reality, stego texts are far less than cover texts. How to accurately detect stego texts in massive texts becomes a challenge. In this paper, we propose a text steganalysis method based on an under-sample method and ensemble learning in imbalanced scenarios. Specifically, we introduce the thinking of clustering to under-sample the majority class samples (cover texts) based on the detection difficulty of the samples, in order to select samples with rich information. Ensemble learning is then used to ensemble the detection results of multiple base classifiers and guide the sampling process. We designed several experiments to test the detection performance of the proposed model. Experimental results show that the proposed model can effectively compensate for the deficiencies of existing methods, even in highly imbalanced datasets, the model can still detect stego texts effectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Barandela, R., Valdovinos, R.M., Sánchez, J.S.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6, 245–256 (2003)MathSciNetCrossRef Barandela, R., Valdovinos, R.M., Sánchez, J.S.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6, 245–256 (2003)MathSciNetCrossRef
2.
Zurück zum Zitat Chen, Z., Huang, L., Miao, H., Yang, W., Meng, P.: Steganalysis against substitution-based linguistic steganography based on context clusters. Comput. Electr. Eng. 37(6), 1071–1081 (2011)CrossRef Chen, Z., Huang, L., Miao, H., Yang, W., Meng, P.: Steganalysis against substitution-based linguistic steganography based on context clusters. Comput. Electr. Eng. 37(6), 1071–1081 (2011)CrossRef
3.
Zurück zum Zitat Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805 (2018)
5.
Zurück zum Zitat Galar, M., Fernández, A., Barrenechea, E., Herrera, F.: EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn. 46(12), 3460–3471 (2013)CrossRef Galar, M., Fernández, A., Barrenechea, E., Herrera, F.: EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn. 46(12), 3460–3471 (2013)CrossRef
6.
Zurück zum Zitat Gao, L., Zhang, L., Liu, C., Wu, S.: Handling imbalanced medical image data: a deep-learning-based one-class classification approach. Artif. Intell. Med. 108, 101935 (2020)CrossRef Gao, L., Zhang, L., Liu, C., Wu, S.: Handling imbalanced medical image data: a deep-learning-based one-class classification approach. Artif. Intell. Med. 108, 101935 (2020)CrossRef
7.
Zurück zum Zitat He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008) He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)
8.
Zurück zum Zitat Huang, Y.F., Tang, S., Yuan, J.: Steganography in inactive frames of VoIP streams encoded by source codec. IEEE Trans. Inf. Forensics Secur. 6(2), 296–306 (2011)CrossRef Huang, Y.F., Tang, S., Yuan, J.: Steganography in inactive frames of VoIP streams encoded by source codec. IEEE Trans. Inf. Forensics Secur. 6(2), 296–306 (2011)CrossRef
9.
Zurück zum Zitat Johnson, N.F., Sallee, P.A.: Detection of hidden information, covert channels and information flows. In: Wiley Handbook of Science and Technology for Homeland Security, pp. 1–37 (2008) Johnson, N.F., Sallee, P.A.: Detection of hidden information, covert channels and information flows. In: Wiley Handbook of Science and Technology for Homeland Security, pp. 1–37 (2008)
11.
Zurück zum Zitat Li, S., Wang, J., Liu, P.: Detection of generative linguistic steganography based on explicit and latent text word relation mining using deep learning. IEEE Trans. Dependable Secure Comput. 20(2), 1476–1487 (2022)CrossRef Li, S., Wang, J., Liu, P.: Detection of generative linguistic steganography based on explicit and latent text word relation mining using deep learning. IEEE Trans. Dependable Secure Comput. 20(2), 1476–1487 (2022)CrossRef
12.
Zurück zum Zitat Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2008) Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2008)
13.
Zurück zum Zitat Liu, Y., Chawla, N.V., Harper, M.P., Shriberg, E., Stolcke, A.: A study in machine learning from imbalanced data for sentence boundary detection in speech. Comput. Speech Lang. 20(4), 468–494 (2006)CrossRef Liu, Y., Chawla, N.V., Harper, M.P., Shriberg, E., Stolcke, A.: A study in machine learning from imbalanced data for sentence boundary detection in speech. Comput. Speech Lang. 20(4), 468–494 (2006)CrossRef
14.
Zurück zum Zitat Liu, Z., Wei, P., Jiang, J., Cao, W., Bian, J., Chang, Y.: MESA: boost ensemble imbalanced learning with meta-sampler. In: Advances in Neural Information Processing Systems, vol. 33, pp. 14463–14474 (2020) Liu, Z., Wei, P., Jiang, J., Cao, W., Bian, J., Chang, Y.: MESA: boost ensemble imbalanced learning with meta-sampler. In: Advances in Neural Information Processing Systems, vol. 33, pp. 14463–14474 (2020)
15.
Zurück zum Zitat Niu, Y., Wen, J., Zhong, P., Xue, Y.: A hybrid R-BILSTM-C neural network based text steganalysis. IEEE Sig. Process. Lett. 26(12), 1907–1911 (2019)CrossRef Niu, Y., Wen, J., Zhong, P., Xue, Y.: A hybrid R-BILSTM-C neural network based text steganalysis. IEEE Sig. Process. Lett. 26(12), 1907–1911 (2019)CrossRef
16.
Zurück zum Zitat Samanta, S., Dutta, S., Sanyal, G.: A real time text steganalysis by using statistical method. In: 2016 IEEE International Conference on Engineering and Technology (ICETECH), pp. 264–268. IEEE (2016) Samanta, S., Dutta, S., Sanyal, G.: A real time text steganalysis by using statistical method. In: 2016 IEEE International Conference on Engineering and Technology (ICETECH), pp. 264–268. IEEE (2016)
17.
Zurück zum Zitat Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern.-Part A Syst. Hum. 40(1), 185–197 (2009)CrossRef Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern.-Part A Syst. Hum. 40(1), 185–197 (2009)CrossRef
18.
Zurück zum Zitat Sun, B., Chen, H., Wang, J., Xie, H.: Evolutionary under-sampling based bagging ensemble method for imbalanced data classification. Front. Comput. Sci. 12, 331–350 (2018)CrossRef Sun, B., Chen, H., Wang, J., Xie, H.: Evolutionary under-sampling based bagging ensemble method for imbalanced data classification. Front. Comput. Sci. 12, 331–350 (2018)CrossRef
19.
Zurück zum Zitat Sun, Z., Song, Q., Zhu, X., Sun, H., Xu, B., Zhou, Y.: A novel ensemble method for classifying imbalanced data. Pattern Recogn. 48(5), 1623–1637 (2015)CrossRef Sun, Z., Song, Q., Zhu, X., Sun, H., Xu, B., Zhou, Y.: A novel ensemble method for classifying imbalanced data. Pattern Recogn. 48(5), 1623–1637 (2015)CrossRef
20.
Zurück zum Zitat Tang, W., Li, B., Tan, S., Barni, M., Huang, J.: CNN-based adversarial embedding for image steganography. IEEE Trans. Inf. Forensics Secur. 14(8), 2074–2087 (2019)CrossRef Tang, W., Li, B., Tan, S., Barni, M., Huang, J.: CNN-based adversarial embedding for image steganography. IEEE Trans. Inf. Forensics Secur. 14(8), 2074–2087 (2019)CrossRef
21.
Zurück zum Zitat Wang, Y., Zhang, W., Li, W., Yu, X., Yu, N.: Non-additive cost functions for color image steganography based on inter-channel correlations and differences. IEEE Trans. Inf. Forensics Secur. 15, 2081–2095 (2019)CrossRef Wang, Y., Zhang, W., Li, W., Yu, X., Yu, N.: Non-additive cost functions for color image steganography based on inter-channel correlations and differences. IEEE Trans. Inf. Forensics Secur. 15, 2081–2095 (2019)CrossRef
22.
Zurück zum Zitat Wang, Y., Gan, W., Yang, J., Wu, W., Yan, J.: Dynamic curriculum learning for imbalanced data classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5017–5026 (2019) Wang, Y., Gan, W., Yang, J., Wu, W., Yan, J.: Dynamic curriculum learning for imbalanced data classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5017–5026 (2019)
23.
Zurück zum Zitat Wu, H., Yi, B., Ding, F., Feng, G., Zhang, X.: Linguistic steganalysis with graph neural networks. IEEE Sig. Process. Lett. 28, 558–562 (2021)CrossRef Wu, H., Yi, B., Ding, F., Feng, G., Zhang, X.: Linguistic steganalysis with graph neural networks. IEEE Sig. Process. Lett. 28, 558–562 (2021)CrossRef
24.
Zurück zum Zitat Xiang, L., Sun, X., Luo, G., Xia, B.: Linguistic steganalysis using the features derived from synonym frequency. Multimedia Tools Appl. 71, 1893–1911 (2014)CrossRef Xiang, L., Sun, X., Luo, G., Xia, B.: Linguistic steganalysis using the features derived from synonym frequency. Multimedia Tools Appl. 71, 1893–1911 (2014)CrossRef
25.
Zurück zum Zitat Yang, H., Bao, Y., Yang, Z., Liu, S., Huang, Y., Jiao, S.: Linguistic steganalysis via densely connected LSTM with feature pyramid. In: Proceedings of the 2020 ACM Workshop on Information Hiding and Multimedia Security, pp. 5–10 (2020) Yang, H., Bao, Y., Yang, Z., Liu, S., Huang, Y., Jiao, S.: Linguistic steganalysis via densely connected LSTM with feature pyramid. In: Proceedings of the 2020 ACM Workshop on Information Hiding and Multimedia Security, pp. 5–10 (2020)
26.
Zurück zum Zitat Yang, H., Cao, X.: Linguistic steganalysis based on meta features and immune mechanism. Chin. J. Electron. 19(4), 661–666 (2010) Yang, H., Cao, X.: Linguistic steganalysis based on meta features and immune mechanism. Chin. J. Electron. 19(4), 661–666 (2010)
27.
Zurück zum Zitat Yang, J., Yang, Z., Zhang, S., Tu, H., Huang, Y.: SeSy: linguistic steganalysis framework integrating semantic and syntactic features. IEEE Sig. Process. Lett. 29, 31–35 (2021)CrossRef Yang, J., Yang, Z., Zhang, S., Tu, H., Huang, Y.: SeSy: linguistic steganalysis framework integrating semantic and syntactic features. IEEE Sig. Process. Lett. 29, 31–35 (2021)CrossRef
28.
Zurück zum Zitat Yang, Z.L., Guo, X.Q., Chen, Z.M., Huang, Y.F., Zhang, Y.J.: RNN-Stega: linguistic steganography based on recurrent neural networks. IEEE Trans. Inf. Forensics Secur. 14(5), 1280–1295 (2018)CrossRef Yang, Z.L., Guo, X.Q., Chen, Z.M., Huang, Y.F., Zhang, Y.J.: RNN-Stega: linguistic steganography based on recurrent neural networks. IEEE Trans. Inf. Forensics Secur. 14(5), 1280–1295 (2018)CrossRef
29.
Zurück zum Zitat Yang, Z., Du, X., Tan, Y., Huang, Y., Zhang, Y.J.: AAG-Stega: automatic audio generation-based steganography. arXiv preprint arXiv:1809.03463 (2018) Yang, Z., Du, X., Tan, Y., Huang, Y., Zhang, Y.J.: AAG-Stega: automatic audio generation-based steganography. arXiv preprint arXiv:​1809.​03463 (2018)
30.
Zurück zum Zitat Yang, Z., Huang, Y., Zhang, Y.J.: A fast and efficient text steganalysis method. IEEE Sig. Process. Lett. 26(4), 627–631 (2019)CrossRef Yang, Z., Huang, Y., Zhang, Y.J.: A fast and efficient text steganalysis method. IEEE Sig. Process. Lett. 26(4), 627–631 (2019)CrossRef
31.
Zurück zum Zitat Yang, Z., Huang, Y., Zhang, Y.J.: TS-CSW: text steganalysis and hidden capacity estimation based on convolutional sliding windows. Multimedia Tools Appl. 79, 18293–18316 (2020)CrossRef Yang, Z., Huang, Y., Zhang, Y.J.: TS-CSW: text steganalysis and hidden capacity estimation based on convolutional sliding windows. Multimedia Tools Appl. 79, 18293–18316 (2020)CrossRef
32.
Zurück zum Zitat Zhang, S., Yang, Z., Yang, J., Huang, Y.: Provably secure generative linguistic steganography. arXiv preprint arXiv:2106.02011 (2021) Zhang, S., Yang, Z., Yang, J., Huang, Y.: Provably secure generative linguistic steganography. arXiv preprint arXiv:​2106.​02011 (2021)
33.
Zurück zum Zitat Zhou, F., et al.: Dynamic self-paced sampling ensemble for highly imbalanced and class-overlapped data classification. Data Min. Knowl. Disc. 36(5), 1601–1622 (2022)MathSciNetCrossRef Zhou, F., et al.: Dynamic self-paced sampling ensemble for highly imbalanced and class-overlapped data classification. Data Min. Knowl. Disc. 36(5), 1601–1622 (2022)MathSciNetCrossRef
35.
Metadaten
Titel
Linguistic Steganalysis Based on Clustering and Ensemble Learning in Imbalanced Scenario
verfasst von
Shengnan Guo
Xuekai Chen
Zhuang Wang
Zhongliang Yang
Linna Zhou
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-2585-4_22

Premium Partner