Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions

Minh Vo; Anh Vo; Trang Nguyen; Rohit Sharma; Tuong Le

doi:10.32604/cmc.2021.015645

Open Access icon Open Access

ARTICLE

Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions

Minh Thanh Vo¹, Anh H. Vo², Trang Nguyen³, Rohit Sharma⁴, Tuong Le^2,5,*

1 Faculty of Information Technology, Ho Chi Minh City University of Technology (HUTECH), Ho Chi Minh City, Vietnam
2 Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam
3 Faculty of Information Technology, Ho Chi Minh City Open University, Ho Chi Minh City, Vietnam
4 Department of Electronics & Communication Engineering, SRM Institute of Science and Technology, NCR Campus, Ghaziabad, India
5 Informetrics Research Group, Ton Duc Thang University, Ho Chi Minh City, Vietnam

* Corresponding Author: Tuong Le. Email: email

Computers, Materials & Continua 2021, 68(1), 521-535. https://doi.org/10.32604/cmc.2021.015645

Received 01 December 2020; Accepted 02 February 2021; Issue published 22 March 2021

Abstract

In recent years, the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age. Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting. However, the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs. This causes a reduction in the predictability and performance of traditional machine learning models. We therefore present an efficient framework that uses an oversampling technique called FJD-OT (Fake Job Description Detection Using Oversampling Techniques) to improve the predictability of detecting fake job descriptions. In the proposed framework, we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module. We then use a bag of words in combination with the term frequency-inverse document frequency (TF-IDF) approach to extract the features from the text data to create the feature dataset in the second module. Next, our framework applies k-fold cross-validation, a commonly used technique to test the effectiveness of machine learning models, that splits the experimental dataset [the Employment Scam Aegean (ESA) dataset in our study] into training and test sets for evaluation. The training set is passed through the third module, an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module. The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics.

Keywords

Fake job description detection; class imbalance problem; oversampling techniques

Cite This Article

APA Style

Vo, M.T., Vo, A.H., Nguyen, T., Sharma, R., Le, T. (2021). Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions. Computers, Materials & Continua, 68(1), 521–535. https://doi.org/10.32604/cmc.2021.015645

Vancouver Style

Vo MT, Vo AH, Nguyen T, Sharma R, Le T. Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions. Comput Mater Contin. 2021;68(1):521–535. https://doi.org/10.32604/cmc.2021.015645

IEEE Style

M. T. Vo, A. H. Vo, T. Nguyen, R. Sharma, and T. Le, “Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions,” Comput. Mater. Contin., vol. 68, no. 1, pp. 521–535, 2021. https://doi.org/10.32604/cmc.2021.015645

BibTex EndNote RIS

Citations

1

[click to view]

Copyright © 2021 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions

Abstract

Keywords

Cite This Article

Citations

4902

2405

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link