ABSTRACT
Malware, short for Malicious Software, is growing continuously in numbers and sophistication as our digital world continuous to grow. It is a very serious problem and many efforts are devoted to malware detection in today's cybersecurity world. Many machine learning algorithms are used for the automatic detection of malware in recent years. Most recently, deep learning is being used with better performance. Deep learning models are shown to work much better in the analysis of long sequences of system calls. In this paper a shallow deep learning-based feature extraction method (word2vec) is used for representing any given malware based on its opcodes. Gradient Boosting algorithm is used for the classification task. Then, k-fold cross-validation is used to validate the model performance without sacrificing a validation split. Evaluation results show up to 96% accuracy with limited sample data.
- Mihai Christodorescu and Somesh Jha. 2003. Static Analysis of Executables to Detect Malicious Patterns. In Proceedings of the 12th Conference on USENIX Security Symposium - Volume 12 (SSYM'03). USENIX Association, Berkeley, CA, USA, 12--12. http://dl.acm.org/citation.cfm?id=1251353.1251365 Google ScholarDigital Library
- George E Dahl, Jack W Stokes, Li Deng, and Dong Yu. 2013. Large-scale malware classification using random projections and neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 3422--3426.Google Scholar
- Jake Drew, Tyler Moore, and Michael Hahsler. 2016. Polymorphic malware detection using sequence classification methods. In Security and Privacy Workshops (SPW), 2016 IEEE. IEEE, 81--87.Google ScholarCross Ref
- Jerome H. Friedman. 2000. Greedy Function Approximation: A Gradient Boosting Machine, In Annals of Statistics. Annals of Statistics 29, 1189--1232. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.29.9093Google ScholarCross Ref
- Wenyi Huang and Jack W Stokes. 2016. MtNet: a multi-task neural network for dynamic malware classification. In Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 399--418. Google ScholarDigital Library
- "IDA". 2013. "Ida : Disassembler and debugger. https://www.hexrays.com/products/ida/". (2013).Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3111--3119. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf Google ScholarDigital Library
- Razvan Pascanu, Jack W Stokes, Hermineh Sanossian, Mady Marinescu, and Anil Thomas. 2015. Malware classification with recurrent networks. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 1916--1920.Google ScholarCross Ref
- Igor Popov. 2017. Malware detection using machine learning based on word2vec embeddings of machine code instructions. In Data Science and Engineering (SSDSE), 2017 Siberian Symposium on. IEEE, 1--4.Google ScholarCross Ref
- Igor Santos, Felix Brezo, Xabier Ugarte-Pedrero, and Pablo G. Bringas. 2013. Opcode sequences as representation of executables for data-mining-based unknown malware detection. Information Sciences 231 (2013), 64 -- 82. Data Mining for Information Security. Google ScholarDigital Library
- Joshua Saxe and Konstantin Berlin. 2015. Deep neural network based malware detection using two dimensional binary program features. In Malicious and Unwanted Software (MALWARE), 2015 10th International Conference on. IEEE, 11--20. Google ScholarDigital Library
- Alexander Statnikov and Constantin F Aliferis. 2007. Are random forests better than support vector machines for microarray-based cancer classification?. In AMIA annual symposium proceedings, Vol. 2007. American Medical Informatics Association, 686.Google Scholar
- A. H. Sung, J. Xu, P. Chavez, and S. Mukkamala. 2004. Static Analyzer of Vicious Executables (SAVE). In Proceedings of the 20th Annual Computer Security Applications Conference (ACSAC '04). IEEE Computer Society, Washington, DC, USA, 326--334. Google ScholarDigital Library
- S. Momina Tabish, M. Zubair Shafiq, and Muddassar Farooq. 2009. Malware Detection Using Statistical Analysis of Byte-level File Content. In Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics (CSIKDD '09). ACM, New York, NY, USA, 23--31. Google ScholarDigital Library
- B. Yadegari, B. Johannesmeyer, B. Whitely, and S. Debray. 2015. A Generic Approach to Automatic Deobfuscation of Executable Code. In 2015 IEEE Symposium on Security and Privacy. 674--691. Google ScholarDigital Library
- Yanfang Ye, Tao Li, Donald Adjeroh, and S Sitharama Iyengar. 2017. A survey on malware detection using data mining techniques. ACM Computing Surveys (CSUR) 50, 3 (2017), 41. Google ScholarDigital Library
- Mahmood Yousefi-Azar, Vijay Varadharajan, Len Hamey, and Uday Tupakula. 2017. Autoencoder-based feature learning for cyber security applications. In Neural Networks (IJCNN), 2017 International Joint Conference on. IEEE, 3854--3861.Google ScholarCross Ref
- Mikhail Zolotukhin and Timo Hamalainen. 2014. Detection of zero-daymalware based on the analysis of opcode sequences. (01 2014), 386--391.Google Scholar
Index Terms
- Malware classification using deep learning methods
Recommendations
Phishing e-mail detection by using deep learning algorithms
ACMSE '18: Proceedings of the ACMSE 2018 ConferencePhishing e-mails are considered as spam e-mails, which aim to collect sensitive personal information about the users via network. Since the main purpose of this behavior is mostly to harm users financially, it is vital to detect these phishing or spam e-...
Opcode sequences as representation of executables for data-mining-based unknown malware detection
Malware can be defined as any type of malicious code that has the potential to harm a computer or network. The volume of malware is growing faster every year and poses a serious global security threat. Consequently, malware detection has become a ...
Malware detection using adaptive data compression
AISec '08: Proceedings of the 1st ACM workshop on Workshop on AISecA popular approach in current commercial anti-malware software detects malicious programs by searching in the code of programs for scan strings that are byte sequences indicative of malicious code. The scan strings, also known as the signatures of ...
Comments