Skip to main content
Top

2022 | OriginalPaper | Chapter

CodeDiff: A Malware Vulnerability Detection Tool Based on Binary File Similarity for Edge Computing Platform

Authors : Kang Wang, Longchuan Yan, Zihao Chu, Yonghe Guo, Yongji Liu, Lei Cui, Zhiyu Hao

Published in: Wireless Algorithms, Systems, and Applications

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Malware detection has become a hot research pot as the development of Internet of Things and edge computing have grown in popularity. Specifically, various malware exploits firmware vulnerabilities on hardware platform, resulting in significant financial losses for both IoT users and edge platform providers. In this paper, we propose CodeDiff, a fresh approach for malware vulnerability detection on IoT and edge computing platforms based on the binary file similarity detection. CodeDiff is an unsupervised learning method that employs both semantic and structural information for binary diffing and does not require label data. Through the SkipGram with Negative Sampling, we generate the word vocabulary for instruction data. The Graph AutoEncoder is then used to embed both the semantic and structure information into the representation matrix for the CFG. After this, we employ the Improved Graph AutoEncoder to fuse all the function structures, function characteristics and function features to the fusion matrix. Finally, we propose the specific matrix comparison to achieve the high accuracy similarity results in short amount of time. Furthermore, we test the prototype on binary datasets OpenSSL and Curl. The results reveal that CodeDiff gives high performance on the binary file similarity detection, which contributes to identify malware vulnerability and improves the security of Internet of Things platforms.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Pewny, J., Garmany, B., Gawlik, R., Rossow, C., Holz, T.: Cross-architecture bug search in binary executables. In: IEEE Symposium on Security and Privacy 2015, pp. 709–724 (2015) Pewny, J., Garmany, B., Gawlik, R., Rossow, C., Holz, T.: Cross-architecture bug search in binary executables. In: IEEE Symposium on Security and Privacy 2015, pp. 709–724 (2015)
2.
go back to reference Wang, X., Jhi, Y.-C., Zhu, S., Liu, P.: Behavior based software theft detection. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, pp. 280–290 (2009) Wang, X., Jhi, Y.-C., Zhu, S., Liu, P.: Behavior based software theft detection. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, pp. 280–290 (2009)
3.
go back to reference Tian, J., Xing, W., Li, Z.: BVDetector: a program slice-based binary code vulnerability intelligent detection system. Inf. Softw. Technol. 123, 106289 (2020)CrossRef Tian, J., Xing, W., Li, Z.: BVDetector: a program slice-based binary code vulnerability intelligent detection system. Inf. Softw. Technol. 123, 106289 (2020)CrossRef
4.
go back to reference Eschweiler, S., Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient cross-architecture identification of bugs in binary code. In: NDSS, vol. 52, pp. 58–79 (2016) Eschweiler, S., Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient cross-architecture identification of bugs in binary code. In: NDSS, vol. 52, pp. 58–79 (2016)
5.
go back to reference Feng, Q., Zhou, R., Xu, C., Cheng, Y., Testa, B., Yin, H.: Scalable graph-based bug search for firmware images. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 480–491 (2016) Feng, Q., Zhou, R., Xu, C., Cheng, Y., Testa, B., Yin, H.: Scalable graph-based bug search for firmware images. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 480–491 (2016)
8.
go back to reference Xu, W., Evans, D., Qi, Y.: Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017) Xu, W., Evans, D., Qi, Y.: Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:​1704.​01155 (2017)
9.
go back to reference Ding, S.H., Fung, B.C., Charland, P.: Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: IEEE Symposium on Security and Privacy (SP), pp. 472–489 (2019) Ding, S.H., Fung, B.C., Charland, P.: Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: IEEE Symposium on Security and Privacy (SP), pp. 472–489 (2019)
11.
go back to reference Luo, Z., Wang, B., Tang, Y., Xie, W.: Semantic-based representation binary clone detection for cross-architectures in the internet of things. Appl. Sci. 9(16), 3283 (2019)CrossRef Luo, Z., Wang, B., Tang, Y., Xie, W.: Semantic-based representation binary clone detection for cross-architectures in the internet of things. Appl. Sci. 9(16), 3283 (2019)CrossRef
12.
go back to reference Zuo, F., Li, X., Young, P., Luo, L., Zeng, Q., Zhang, Z.: Neural machine translation inspired binary code similarity comparison beyond function pairs. arXiv preprint arXiv:1808.04706 (2018) Zuo, F., Li, X., Young, P., Luo, L., Zeng, Q., Zhang, Z.: Neural machine translation inspired binary code similarity comparison beyond function pairs. arXiv preprint arXiv:​1808.​04706 (2018)
13.
14.
go back to reference Eagle, C.: The IDA Pro Book. No Starch Press (2011) Eagle, C.: The IDA Pro Book. No Starch Press (2011)
15.
go back to reference Andriesse, D.: Practical Binary Analysis: Build Your Own Linux Tools for Binary Instrumentation, Analysis, and Disassembly. No Starch Press (2018) Andriesse, D.: Practical Binary Analysis: Build Your Own Linux Tools for Binary Instrumentation, Analysis, and Disassembly. No Starch Press (2018)
16.
go back to reference Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014) Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
17.
18.
go back to reference Zhao, J., Nagarakatte, S., Martin, M.M., Zdancewic, S.: Formalizing the LLVM intermediate representation for verified program transformations. In: Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 427–440 (2012) Zhao, J., Nagarakatte, S., Martin, M.M., Zdancewic, S.: Formalizing the LLVM intermediate representation for verified program transformations. In: Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 427–440 (2012)
19.
go back to reference Hetherington, I.L.: A characterization of the problem of new, out-of-vocabulary words in continuous-speech recognition and understanding (1995) Hetherington, I.L.: A characterization of the problem of new, out-of-vocabulary words in continuous-speech recognition and understanding (1995)
21.
go back to reference Lai, S., Liu, K., He, S., Zhao, J.: How to generate a good word embedding. IEEE Intell. Syst. 31(6), 5–14 (2016)CrossRef Lai, S., Liu, K., He, S., Zhao, J.: How to generate a good word embedding. IEEE Intell. Syst. 31(6), 5–14 (2016)CrossRef
22.
go back to reference Blajer, J.A.G.W., Krawczyk, M.: The inverse simulation study of aircraft flight path reconstruction. Transport 17(3), 103–107 (2002) Blajer, J.A.G.W., Krawczyk, M.: The inverse simulation study of aircraft flight path reconstruction. Transport 17(3), 103–107 (2002)
23.
go back to reference Pan, S., Hu, R., Long, G., Jiang, J., Yao, L., Zhang, C.: Adversarially regularized graph autoencoder for graph embedding. arXiv preprint arXiv:1802.04407 (2018) Pan, S., Hu, R., Long, G., Jiang, J., Yao, L., Zhang, C.: Adversarially regularized graph autoencoder for graph embedding. arXiv preprint arXiv:​1802.​04407 (2018)
24.
go back to reference Kusner, M.J., Paige, B., Hernández-Lobato, J.M.: Grammar variational autoencoder. In: International Conference on Machine Learning, pp. 1945–1954 (2017) Kusner, M.J., Paige, B., Hernández-Lobato, J.M.: Grammar variational autoencoder. In: International Conference on Machine Learning, pp. 1945–1954 (2017)
26.
go back to reference Cooper, K.D., Torczon, L.: Engineering a Compiler. Elsevier, New York (2011) Cooper, K.D., Torczon, L.: Engineering a Compiler. Elsevier, New York (2011)
27.
go back to reference Gao, J., Yang, X., Fu, Y., Jiang, Y., Sun, J.: VulSeeker: a semantic learning based vulnerability seeker for cross-platform binary. In: 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 896–899 (2018) Gao, J., Yang, X., Fu, Y., Jiang, Y., Sun, J.: VulSeeker: a semantic learning based vulnerability seeker for cross-platform binary. In: 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 896–899 (2018)
28.
go back to reference Nagra, J., Collberg, C.: Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection: Obfuscation, Watermarking, and Tamperproofing for Software Protection. Pearson Education (2009) Nagra, J., Collberg, C.: Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection: Obfuscation, Watermarking, and Tamperproofing for Software Protection. Pearson Education (2009)
29.
go back to reference Cui, A., Costello, M., Stolfo, S.: When firmware modifications attack: a case study of embedded exploitation (2013) Cui, A., Costello, M., Stolfo, S.: When firmware modifications attack: a case study of embedded exploitation (2013)
30.
go back to reference Martin, A., Raponi, S., Combe, T., Di Pietro, R.: Docker ecosystem-vulnerability analysis. Comput. Commun. 122, 30–43 (2018)CrossRef Martin, A., Raponi, S., Combe, T., Di Pietro, R.: Docker ecosystem-vulnerability analysis. Comput. Commun. 122, 30–43 (2018)CrossRef
Metadata
Title
CodeDiff: A Malware Vulnerability Detection Tool Based on Binary File Similarity for Edge Computing Platform
Authors
Kang Wang
Longchuan Yan
Zihao Chu
Yonghe Guo
Yongji Liu
Lei Cui
Zhiyu Hao
Copyright Year
2022
DOI
https://doi.org/10.1007/978-3-031-19211-1_42