TY  - EJOU
AU  - Albahar, Marwan 
AU  - Thanoon, Mohammed 
AU  - Alzilai, Monaj 
AU  - Alrehily, Alaa 
AU  - Alfaar, Munirah 
AU  - Algamdi, Maimoona 
AU  - Alassaf, Norah 

TI  - Toward Robust Classifiers for PDF Malware Detection
T2  - Computers, Materials \& Continua

PY  - 2021
VL  - 69
IS  - 2
SN  - 1546-2226

AB  - Malicious Portable Document Format (PDF) files represent one of the largest threats in the computer security space. Significant research has been done using handwritten signatures and machine learning based on detection <i>via</i> manual feature extraction. These approaches are time consuming, require substantial prior knowledge, and the list of features must be updated with each newly discovered vulnerability individually. In this study, we propose two models for PDF malware detection. The first model is a convolutional neural network (CNN) integrated into a standard deviation based regularization model to detect malicious PDF documents. The second model is a support vector machine (SVM) based ensemble model with three different kernels. The two models were trained and tested on two different datasets. The experimental results show that the accuracy of both models is approximately 100%, and the robustness against evasive samples is excellent. Further, the robustness of the models was evaluated with malicious PDF documents generated using Mimicus. Both models can distinguish the different vulnerabilities exploited in malicious files and achieve excellent performance in terms of generalization ability, accuracy, and robustness.
KW  - Malicious PDF classification; robustness; guiding principles; convolutional neural network; new regularization

DO  - 10.32604/cmc.2021.018260