A Learning-based Static Malware Detection System with Integrated Feature

Zhiguo Chen; Xiaorui Zhang; Sungryul Kim

doi:10.32604/iasc.2021.016933

Open Access icon Open Access

ARTICLE

A Learning-based Static Malware Detection System with Integrated Feature

Zhiguo Chen^1,*, Xiaorui Zhang^1,2, Sungryul Kim³

1 School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, 210044, China
2 Jiangsu Engineering Center of Network Monitoring, Engineering Research Center of Digital Forensics, Ministry of Education, Nanjing, 210044, China
3 Department of Internet and Multimedia Engineering, Konkuk University, Seoul, 05029, Korea

* Corresponding Author: Zhiguo Chen. Email: email

Intelligent Automation & Soft Computing 2021, 27(3), 891-908. https://doi.org/10.32604/iasc.2021.016933

Received 01 January 2021; Accepted 08 February 2021; Issue published 01 March 2021

Abstract

The rapid growth of malware poses a significant threat to the security of computer systems. Analysts now need to examine thousands of malware samples daily. It has become a challenging task to determine whether a program is a benign program or malware. Making accurate decisions about the program is crucial for anti-malware products. Precise malware detection techniques have become a popular issue in computer security. Traditional malware detection uses signature-based strategies, which are the most widespread method used in commercial anti-malware software. This method works well against known malware but cannot detect new malware. To overcome the deficiency of the signature-based approach, we proposed a static malware detection system using data mining techniques to identify known and unknown malware by comparing the malware and benign programs’ profiles with real-time response with low false-positive ratio. The proposed system includes a sample labeling module, a feature extraction module, a pre-processing module, and a decision module. The sample labeling module used the VirusTotal to correctly label the collected samples. The feature extraction module statically extracts a set of header information, section entropy, APIs, and section opcode n-grams. The pre-processing module is primarily based on the PCA algorithm used to reduce the dimensionality of the features, thus reducing the overhead costs of computation. The decision module uses various machine-learning algorithms such as K-Nearest Neighbors (KNN), Decision Tree (DT), Gradient Boosting Decision Tree (GBDT), and Extreme Gradient Boosting (XGBoost) to build the detection model for judging whether the program is a benign program or malware. The experimental results indicate our proposed system can achieve 99.56% detection accuracy and 99.55% f1-score on the extracted 79 features using the XGBoost algorithm, and it has the potential for real-time large-scale malware detection tasks.

Keywords

Static analysis; malware detection; machine learning; computer security; principal component analysis

Cite This Article

APA Style

Chen, Z., Zhang, X., Kim, S. (2021). A Learning-based Static Malware Detection System with Integrated Feature. Intelligent Automation & Soft Computing, 27(3), 891–908. https://doi.org/10.32604/iasc.2021.016933

Vancouver Style

Chen Z, Zhang X, Kim S. A Learning-based Static Malware Detection System with Integrated Feature. Intell Automat Soft Comput. 2021;27(3):891–908. https://doi.org/10.32604/iasc.2021.016933

IEEE Style

Z. Chen, X. Zhang, and S. Kim, “A Learning-based Static Malware Detection System with Integrated Feature,” Intell. Automat. Soft Comput., vol. 27, no. 3, pp. 891–908, 2021. https://doi.org/10.32604/iasc.2021.016933

BibTex EndNote RIS

Citations

1

[click to view]

Copyright © 2021 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

A Learning-based Static Malware Detection System with Integrated Feature

Abstract

Keywords

Cite This Article

Citations

3944

2483

3

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link