Top

Published in:

2024 | OriginalPaper | Chapter

VulMAE: Graph Masked Autoencoders for Vulnerability Detection from Source and Binary Codes

Authors : Mahmoud Zamani, Saquib Irtiza, Latifur Khan, Kevin W. Hamlen

Published in: Foundations and Practice of Security

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The first graph masked auto-encoder (GraphMAE) model for software vulnerability detection is designed and developed, with a comparative evaluation against other self-supervised learning (SSL) methods. Evaluation of the domain-specific GraphMAE model (VulMAE) for the vulnerability detection task shows exceptional promise, outperforming all other baseline models in the study. The approach is particularly well-suited for cybersecurity applications where gathering substantial real-world labeled samples is difficult, since graph SSL methods (e.g., contrastive and generative models) offer data classification in AI tasks without requiring vast amounts of labeled data for effective training.

The study fills a key gap in the literature on automated and machine-assisted discovery and patching of software security vulnerabilities, which has become increasingly critical with the dramatic increase in modern software complexity, but for which graph neural network (GNN) approaches are understudied relative to traditional processes, such as manual source code auditing and fuzzing. To conduct the study, the evaluation applies models to source and binary software components sourced from the National Vulnerability Database (NVD). A new dataset is curated by extracting vulnerable code fragments from six applications with NVD-documented security flaws and converting them to four graph types using specialized tools based on code property graphs and binary semantics lifting. The data is used to train contrastive and generative learning models for comparison. VulMAE achieves a weighted F1 score of 0.936 and a weighted Recall of 0.938, which is the highest of all tested methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Web Scams Detection System

next chapter Analysis of Cryptographic CVEs: Lessons Learned and Perspectives

https://github.com/Saquibirtiza/VulMAE.git.

Booth, H., Rike, D., Witte, G.A.: The national vulnerability database (NVD): Overview. ITL Bulletin, National Institute of Standards and Technology (2013)

Brumley, D., Jager, I., Avgerinos, T., Schwartz, E.J.: BAP: a binary analysis platform. In: Proceedings of International Conference on Computer Aided Verification, pp. 463–469 (2011)

Chakraborty, S., Krishna, R., Ding, Y., Ray, B.: Deep learning based vulnerability detection: are we there yet. IEEE Trans. Softw. Eng. 48, 3280–3296 (2022)CrossRef

Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artifi. Intell. Res. 16(1), 321–357 (2002)CrossRef

Croft, R., Newlands, D., Chen, Z., Babar, M.A.: An empirical study of rule-based and learning-based approaches for static application security testing. In: Proceedings of ACM/IEEE International Symposium Empirical Software Engineering and Measurement (2021)

DevNest: How to bypass sudo – exploit CVE-2023-22809 vulnerability. Medium (2023). https://medium.com/@dev.nest/how-to-bypass-sudo-exploit-cve-2023-22809-vulnerability-296ef10a1466

Hassani, K., Khasahmadi, A.H.: Contrastive multi-view representation learning on graphs. In: Proceedings of International Conference on Machine Learning, pp. 4116–4126 (2020)

Hin, D., Kan, A., Chen, H., Babar, M.A.: LineVD: statement-level vulnerability detection using graph neural networks. In: Proceedings of International Conference on Mining Software Repositories, pp. 596–607 (2022)

Hjelm, R.D., et al.: Learning deep representations by mutual information estimation and maximization. In: Proceedings of International Conference on Learning Representation (2019)

10.

Hohnka, M.J., Miller, J.A., Dacumos, K.M., Fritton, T.J., Erdley, J.D., Long, L.N.: Evaluation of compiler-induced vulnerabilities. J. Aerospace Inform. Syst. 16(10), 409–426 (2019)CrossRef

11.

Hou, Z., Liu, X., Cen, Y., Dong, Y., Yang, H., Wang, C., Tang, J.: GraphMAE: self-supervised masked graph autoencoders. In: Proceedings of ACM Conference on Knowledge Discovery and Data Mining, pp. 594–604 (2022)

12.

Kazius, J., McGuire, R., Bursi, R.: Derivation and validation of toxicophores for mutagenicity prediction. J. Med. Chem. 48(1), 312–320 (2005)CrossRef

13.

Kipf, T.N., Welling, M.: Variational graph auto-encoders. arXiv:1611.07308 (2016)

14.

Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: Proceedings of International Conferen on Learning Representation (Poster) (2017)

15.

Le, T., et al.: Maximal divergence sequential autoencoder for binary software vulnerability detection. In: Proceedings of International Conference on Learning Representation (2019)

16.

Li, X., Feng, B., Li, G., Li, T., He, M.: A vulnerability detection system based on fusion of assembly code and source code. Sec. Commun. Netw. 2021 (2021)

17.

Li, Z., Zou, D., Xu, S., Chen, Z., Zhu, Y., Jin, H.: VulDeeLocator: a deep learning-based fine-grained vulnerability detector. IEEE Trans. Dependable Sec. Comput. 19(4), 2821–2837 (2021)CrossRef

18.

Li, Z., Zou, D., Xu, S., Jin, H., Qi, H., Hu, J.: Vulpecker: an automated vulnerability detection system based on code similarity analysis. In: Proceedings of Annual Computer Security Applications Conference, pp. 201–213 (2016)

19.

Li, Z., Zou, D., Xu, S., Jin, H., Zhu, Y., Chen, Z.: SySeVR: a framework for using deep learning to detect software vulnerabilities. IEEE Trans. Dependable Sec. Comput. 19(4), 2244–2258 (2021)CrossRef

20.

Li, Z., et al.: Vuldeepecker: a deep learning-based system for vulnerability detection. In: Proceedings of Annual Network & Distributed System Security Symposium (2018)

21.

Lin, G., Zhang, J., Luo, W., Pan, L., Xiang, Y.: POSTER: vulnerability discovery with function representation learning from unlabeled projects. In: Proceedings of ACM Conference on Computer and Communications Security, pp. 2539–2541 (2017)

22.

Lin, G.: Cross-project transfer representation learning for vulnerable function discovery. IEEE Trans. Indus. Inform. 14(7), 3289–3297 (2018)CrossRef

23.

Lipp, S., Banescu, S., Pretschner, A.: An empirical study on the effectiveness of static C code analyzers for vulnerability detection. In: Proceedings of ACM International Symposium on Software Testing and Analysis, pp. 544–555 (2022)

24.

Ma, R., Jian, Z., Chen, G., Ma, K., Chen, Y.: ReJection: a AST-based reentrancy vulnerability detection method. In: Proceedings of Chinese Conference on Trusted Computing and Information Security, pp. 58–71 (2020)

25.

Mizrahi, Y.: OpenSSH pre-auth double free CVE-2023-25136 – writeup and proof-of-concept. JFrog (2023). https://jfrog.com/blog/openssh-pre-auth-double-free-cve-2023-25136-writeup-and-proof-of-concept

26.

NIST: CVSS severity distribution over time. https://nvd.nist.gov/general/visualizations/vulnerability-visualizations/cvss-severity-distribution-over-time#CVSSSeverityOverTime, (Accessed 12 Sep 2023)

27.

Pinconschi, E., Abreu, R., Adão, P.: A comparative study of automatic program repair techniques for security vulnerabilities. In: Proceedings of IEEE International Symposium on Software Reliability Engineering, pp. 196–207 (2021)

28.

Russell, R., et al.: klM.: Automated vulnerability detection in source code using deep representation learning. In: Proceedings of IEEE International Conference on Machine Learning and Applications, pp. 757–762 (2018)

29.

Schlichtkrull, M., Kipf, T.N., Bloem, P., van den Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: Proceedings of European Semantic Web Conference, pp. 593–607 (2018)

30.

Shervashidze, N., Schweitzer, P., Leeuwen, E.J.V., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-Lehman graph kernels. J. Mach. Learn. Res. 12(9) (2011)

31.

Shimchik, N., Ignatyev, V., Belevantsev, A.: Improving accuracy and completeness of source code static taint analysis. In: Ivannikov Ispras Open Conference, pp. 61–68 (2021)

32.

Sun, F.Y., Hoffmann, J., Verma, V., Tang, J.: Infograph: unsupervised and semi-supervised graph-level representation learning via mutual information maximization. In: Proceedings of International Conference on Learning Representations (2020)

33.

Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: Proceedings of International Conference on Learning Representation (2017)

34.

Veličković, P., Fedus, W., Hamilton, W.L., Liò, P., Bengio, Y., Hjelm, R.D.: Deep graph infomax. In: Proceedings of International Conference on Learning Representation (2019)

35.

Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? In: Proceedings of International Conference on Learning Representation (2019)

36.

Xu, L., Sun, F., Su, Z.: Constructing precise control flow graphs from binaries. The University of California, Davis, Tech. rep. (2009)

37.

Yamaguchi, F., Golde, N., Arp, D., Rieck, K.: Modeling and discovering vulnerabilities with code property graphs. In: Proceedings IEEE Symposium on Security & Privacy, pp. 590–604 (2014)

38.

Yamaguchi, F., Lindner, F.F., Rieck, K.: Vulnerability extrapolation: assisted discovery of vulnerabilities using machine learning. In: Proceedings of USENIX Workshop Offensive Technologies, pp. 118–127 (2011)

39.

Yanardag, P., Vishwanathan, S.: Deep graph kernels. In: Proceedings of ACM International Conference on Knowledge Discovery and Data Mining, pp. 1365–1374 (2015)

40.

You, Y., Chen, T., Sui, Y., Chen, T., Wang, Z., Shen, Y.: Graph contrastive learning with augmentations. In: Proceedings of Conference on Neural Information Processing Systems, pp. 5812–5823 (2020)

41.

Zhang, H., Wu, Q., Yan, J., Wipf, D., Yu, P.S.: From canonical correlation analysis to self-supervised graph neural networks. In: Proceedings of Conference on Neural Information Processing Systems, pp. 76–89 (2021)

42.

Zhou, M., et al.: A method for software vulnerability detection based on improved control flow graph. Wuhan University J. Nat. Sci. 24(2), 149–160 (2019)CrossRef

43.

Zhou, Y., Liu, S., Siow, J., Du, X., Liu, Y.: Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: Proceedings of Conference on Neural Information Processing Systems, pp. 10197–10207 (2019)

44.

Zhu, Q., Du, B., Yan, P.: Self-supervised training of graph convolutional networks. In: Proceedings of International Conference on Machine Learning, Online (2020)

Title: VulMAE: Graph Masked Autoencoders for Vulnerability Detection from Source and Binary Codes
Authors: Mahmoud Zamani
Saquib Irtiza
Latifur Khan
Kevin W. Hamlen
Publisher: Springer Nature Switzerland
Book: Foundations and Practice of Security
Print ISBN: 978-3-031-57536-5

Electronic ISBN: 978-3-031-57537-2

Copyright Year: 2024
DOI: https://doi.org/10.1007/978-3-031-57537-2_12

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner