Real-Time Spammers Detection Based on Metadata Features with Machine Learning

Adnan Ali; Jinlong Li; Huanhuan Chen; Uzair Bhatti; Asad Khan

doi:10.32604/iasc.2023.041645

Open Access icon Open Access

ARTICLE

Real-Time Spammers Detection Based on Metadata Features with Machine Learning

Adnan Ali¹, Jinlong Li¹, Huanhuan Chen¹, Uzair Aslam Bhatti², Asad Khan^3,*

1 School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026, China
2 School of Information and Communication Engineering, Hainan University, Haikou, 570228, China
3 Metaverse Research Institute, School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, 510006, China

* Corresponding Author: Asad Khan. Email: email

Intelligent Automation & Soft Computing 2023, 38(3), 241-258. https://doi.org/10.32604/iasc.2023.041645

Received 30 April 2023; Accepted 10 July 2023; Issue published 27 February 2024

Abstract

Spammer detection is to identify and block malicious activities performing users. Such users should be identified and terminated from social media to keep the social media process organic and to maintain the integrity of online social spaces. Previous research aimed to find spammers based on hybrid approaches of graph mining, posted content, and metadata, using small and manually labeled datasets. However, such hybrid approaches are unscalable, not robust, particular dataset dependent, and require numerous parameters, complex graphs, and natural language processing (NLP) resources to make decisions, which makes spammer detection impractical for real-time detection. For example, graph mining requires neighbors’ information, posted content-based approaches require multiple tweets from user profiles, then NLP resources to make decisions that are not applicable in a real-time environment. To fill the gap, firstly, we propose a REal-time Metadata based Spammer detection (REMS) model based on only metadata features to identify spammers, which takes the least number of parameters and provides adequate results. REMS is a scalable and robust model that uses only 19 metadata features of Twitter users to induce 73.81% F1-Score classification accuracy using a balanced training dataset (50% spam and 50% genuine users). The 19 features are 8 original and 11 derived features from the original features of Twitter users, identified with extensive experiments and analysis. Secondly, we present the largest and most diverse dataset of published research, comprising 211 K spam users and 1 million genuine users. The diversity of the dataset can be measured as it comprises users who posted 2.1 million Tweets on seven topics (100 hashtags) from 6 different geographical locations. The REMS’s superior classification performance with multiple machine and deep learning methods indicates that only metadata features have the potential to identify spammers rather than focusing on volatile posted content and complex graph structures. Dataset and REMS’s codes are available on GitHub ().

Keywords

Spam detection; online social networks; metadata; machine learning

Cite This Article

APA Style

Ali, A., Li, J., Chen, H., Bhatti, U.A., Khan, A. (2023). Real-Time Spammers Detection Based on Metadata Features with Machine Learning. Intelligent Automation & Soft Computing, 38(3), 241–258. https://doi.org/10.32604/iasc.2023.041645

Vancouver Style

Ali A, Li J, Chen H, Bhatti UA, Khan A. Real-Time Spammers Detection Based on Metadata Features with Machine Learning. Intell Automat Soft Comput. 2023;38(3):241–258. https://doi.org/10.32604/iasc.2023.041645

IEEE Style

A. Ali, J. Li, H. Chen, U. A. Bhatti, and A. Khan, “Real-Time Spammers Detection Based on Metadata Features with Machine Learning,” Intell. Automat. Soft Comput., vol. 38, no. 3, pp. 241–258, 2023. https://doi.org/10.32604/iasc.2023.041645

BibTex EndNote RIS

Copyright © 2023 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Real-Time Spammers Detection Based on Metadata Features with Machine Learning

Abstract

Keywords

Cite This Article

1894

913

1

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link