research-article

Malware classification using deep learning methods

Authors:
Bugra Cakir

BearTell, Inc., Ankara, Turkey

BearTell, Inc., Ankara, Turkey
View Profile

,
Erdogan Dogdu

Cankaya University, Ankara, Turkey

Cankaya University, Ankara, Turkey
View Profile

ACMSE '18: Proceedings of the ACMSE 2018 ConferenceMarch 2018Article No.: 10Pages 1–5https://doi.org/10.1145/3190645.3190692

Published:29 March 2018Publication History

ACMSE '18: Proceedings of the ACMSE 2018 Conference

Pages 1–5

ABSTRACT

Malware, short for Malicious Software, is growing continuously in numbers and sophistication as our digital world continuous to grow. It is a very serious problem and many efforts are devoted to malware detection in today's cybersecurity world. Many machine learning algorithms are used for the automatic detection of malware in recent years. Most recently, deep learning is being used with better performance. Deep learning models are shown to work much better in the analysis of long sequences of system calls. In this paper a shallow deep learning-based feature extraction method (word2vec) is used for representing any given malware based on its opcodes. Gradient Boosting algorithm is used for the classification task. Then, k-fold cross-validation is used to validate the model performance without sacrificing a validation split. Evaluation results show up to 96% accuracy with limited sample data.

References

Mihai Christodorescu and Somesh Jha. 2003. Static Analysis of Executables to Detect Malicious Patterns. In Proceedings of the 12th Conference on USENIX Security Symposium - Volume 12 (SSYM'03). USENIX Association, Berkeley, CA, USA, 12--12. http://dl.acm.org/citation.cfm?id=1251353.1251365 Google ScholarDigital Library
George E Dahl, Jack W Stokes, Li Deng, and Dong Yu. 2013. Large-scale malware classification using random projections and neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 3422--3426.Google Scholar
Jake Drew, Tyler Moore, and Michael Hahsler. 2016. Polymorphic malware detection using sequence classification methods. In Security and Privacy Workshops (SPW), 2016 IEEE. IEEE, 81--87.Google ScholarCross Ref
Jerome H. Friedman. 2000. Greedy Function Approximation: A Gradient Boosting Machine, In Annals of Statistics. Annals of Statistics 29, 1189--1232. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.29.9093Google ScholarCross Ref
Wenyi Huang and Jack W Stokes. 2016. MtNet: a multi-task neural network for dynamic malware classification. In Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 399--418. Google ScholarDigital Library
"IDA". 2013. "Ida : Disassembler and debugger. https://www.hexrays.com/products/ida/". (2013).Google Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3111--3119. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf Google ScholarDigital Library
Razvan Pascanu, Jack W Stokes, Hermineh Sanossian, Mady Marinescu, and Anil Thomas. 2015. Malware classification with recurrent networks. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 1916--1920.Google ScholarCross Ref
Igor Popov. 2017. Malware detection using machine learning based on word2vec embeddings of machine code instructions. In Data Science and Engineering (SSDSE), 2017 Siberian Symposium on. IEEE, 1--4.Google ScholarCross Ref
Igor Santos, Felix Brezo, Xabier Ugarte-Pedrero, and Pablo G. Bringas. 2013. Opcode sequences as representation of executables for data-mining-based unknown malware detection. Information Sciences 231 (2013), 64 -- 82. Data Mining for Information Security. Google ScholarDigital Library
Joshua Saxe and Konstantin Berlin. 2015. Deep neural network based malware detection using two dimensional binary program features. In Malicious and Unwanted Software (MALWARE), 2015 10th International Conference on. IEEE, 11--20. Google ScholarDigital Library
Alexander Statnikov and Constantin F Aliferis. 2007. Are random forests better than support vector machines for microarray-based cancer classification?. In AMIA annual symposium proceedings, Vol. 2007. American Medical Informatics Association, 686.Google Scholar
A. H. Sung, J. Xu, P. Chavez, and S. Mukkamala. 2004. Static Analyzer of Vicious Executables (SAVE). In Proceedings of the 20th Annual Computer Security Applications Conference (ACSAC '04). IEEE Computer Society, Washington, DC, USA, 326--334. Google ScholarDigital Library
S. Momina Tabish, M. Zubair Shafiq, and Muddassar Farooq. 2009. Malware Detection Using Statistical Analysis of Byte-level File Content. In Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics (CSIKDD '09). ACM, New York, NY, USA, 23--31. Google ScholarDigital Library
B. Yadegari, B. Johannesmeyer, B. Whitely, and S. Debray. 2015. A Generic Approach to Automatic Deobfuscation of Executable Code. In 2015 IEEE Symposium on Security and Privacy. 674--691. Google ScholarDigital Library
Yanfang Ye, Tao Li, Donald Adjeroh, and S Sitharama Iyengar. 2017. A survey on malware detection using data mining techniques. ACM Computing Surveys (CSUR) 50, 3 (2017), 41. Google ScholarDigital Library
Mahmood Yousefi-Azar, Vijay Varadharajan, Len Hamey, and Uday Tupakula. 2017. Autoencoder-based feature learning for cyber security applications. In Neural Networks (IJCNN), 2017 International Joint Conference on. IEEE, 3854--3861.Google ScholarCross Ref
Mikhail Zolotukhin and Timo Hamalainen. 2014. Detection of zero-daymalware based on the analysis of opcode sequences. (01 2014), 386--391.Google Scholar

Index Terms

Malware classification using deep learning methods
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Neural networks
2. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Malware and its mitigation

Recommendations

Phishing e-mail detection by using deep learning algorithms
ACMSE '18: Proceedings of the ACMSE 2018 Conference

Phishing e-mails are considered as spam e-mails, which aim to collect sensitive personal information about the users via network. Since the main purpose of this behavior is mostly to harm users financially, it is vital to detect these phishing or spam e-...
Read More
Opcode sequences as representation of executables for data-mining-based unknown malware detection

Malware can be defined as any type of malicious code that has the potential to harm a computer or network. The volume of malware is growing faster every year and poses a serious global security threat. Consequently, malware detection has become a ...
Read More
Malware detection using adaptive data compression
AISec '08: Proceedings of the 1st ACM workshop on Workshop on AISec

A popular approach in current commercial anti-malware software detects malicious programs by searching in the code of programs for scan strings that are byte sequences indicative of malicious code. The scan strings, also known as the signatures of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACMSE '18: Proceedings of the ACMSE 2018 Conference
March 2018
246 pages
ISBN:9781450356961
DOI:10.1145/3190645
Conference Chair:
Ka-Wing Wong
Eastern Kentucky University
,
Program Chair:
Chi Shen
Kentucky State University
,
Publications Chair:
Dana Brown
Bluegrass Community & Technical College
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 March 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
classification
deep learning
machine learning
malware detection
supervised learning
Qualifiers
- research-article
Conference

Acceptance Rates
ACMSE '18 Paper Acceptance Rate34of41submissions,83%Overall Acceptance Rate178of377submissions,47%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 50
  Total Citations
  View Citations
- 1,350
  Total Downloads
- Downloads (Last 12 months)131
- Downloads (Last 6 weeks)23
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Malware classification using deep learning methods

ACMSE '18: Proceedings of the ACMSE 2018 Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Phishing e-mail detection by using deep learning algorithms

Opcode sequences as representation of executables for data-mining-based unknown malware detection

Malware detection using adaptive data compression

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Malware classification using deep learning methods

ACMSE '18: Proceedings of the ACMSE 2018 Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Phishing e-mail detection by using deep learning algorithms

Opcode sequences as representation of executables for data-mining-based unknown malware detection

Malware detection using adaptive data compression

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media