Semantic Based Greedy Levy Gradient Boosting Algorithm for Phishing Detection

R. Jenni; S. Shankar

doi:10.32604/csse.2022.019300

Open Access icon Open Access

ARTICLE

Semantic Based Greedy Levy Gradient Boosting Algorithm for Phishing Detection

R. Sakunthala Jenni^*, S. Shankar

Department of Computer Science and Engineering, Hindusthan College of Engineering and Technology, Coimbatore, 641032, India

* Corresponding Author: R. Sakunthala Jenni. Email: email

Computer Systems Science and Engineering 2022, 41(2), 525-538. https://doi.org/10.32604/csse.2022.019300

Received 09 April 2021; Accepted 14 June 2021; Issue published 25 October 2021

Abstract

The detection of phishing and legitimate websites is considered a great challenge for web service providers because the users of such websites are indistinguishable. Phishing websites also create traffic in the entire network. Another phishing issue is the broadening malware of the entire network, thus highlighting the demand for their detection while massive datasets (i.e., big data) are processed. Despite the application of boosting mechanisms in phishing detection, these methods are prone to significant errors in their output, specifically due to the combination of all website features in the training state. The upcoming big data system requires MapReduce, a popular parallel programming, to process massive datasets. To address these issues, a probabilistic latent semantic and greedy levy gradient boosting (PLS-GLGB) algorithm for website phishing detection using MapReduce is proposed. A feature selection-based model is provided using a probabilistic intersective latent semantic preprocessing model to minimize errors in website phishing detection. Here, the missing data in each URL are identified and discarded for further processing to ensure data quality. Subsequently, with the preprocessed features (URLs), feature vectors are updated by the greedy levy divergence gradient (model) that selects the optimal features in the URL and accurately detects the websites. Thus, greedy levy efficiently differentiates between phishing websites and legitimate websites. Experiments are conducted using one of the largest public corpora of a website phish tank dataset. Results show that the PLS-GLGB algorithm for website phishing detection outperforms state-of-the-art phishing detection methods. Significant amounts of phishing detection time and errors are also saved during the detection of website phishing.

Keywords

Web service providers; probabilistic intersective; latent semantic; greedy levy; divergence; gradient; phishing detection; big data

Cite This Article

APA Style

Jenni, R.S., Shankar, S. (2022). Semantic based greedy levy gradient boosting algorithm for phishing detection. Computer Systems Science and Engineering, 41(2), 525-538. https://doi.org/10.32604/csse.2022.019300

Vancouver Style

Jenni RS, Shankar S. Semantic based greedy levy gradient boosting algorithm for phishing detection. Comput Syst Sci Eng. 2022;41(2):525-538 https://doi.org/10.32604/csse.2022.019300

IEEE Style

R.S. Jenni and S. Shankar, "Semantic Based Greedy Levy Gradient Boosting Algorithm for Phishing Detection," Comput. Syst. Sci. Eng., vol. 41, no. 2, pp. 525-538. 2022. https://doi.org/10.32604/csse.2022.019300

BibTex EndNote RIS

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Semantic Based Greedy Levy Gradient Boosting Algorithm for Phishing Detection

Abstract

Keywords

Cite This Article

1750

799

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link