With the frequent occurrence of telecommunications and network fraud crimes in recent years, new frauds have emerged one after another which has caused huge losses to the people. However, due to the lack of an effective preventive mechanism, the police are often in a passive position. Using technologies such as web crawlers, feature engineering, deep learning, and artificial intelligence, this paper proposes a user portrait fraud warning scheme based on Weibo public data. First, we perform preliminary screening and cleaning based on the keyword “defrauded” to obtain valid fraudulent user Identity Documents (IDs). The basic information and account information of these users is user-labeled to achieve the purpose of distinguishing the types of fraud. Secondly, through feature engineering technologies such as avatar recognition, Artificial Intelligence (AI) sentiment analysis, data screening, and follower blogger type analysis, these pictures and texts will be abstracted into user preferences and personality characteristics which integrate multi-dimensional information to build user portraits. Third, deep neural network training is performed on the cube. 80% percent of the data is predicted based on the N-way K-shot problem and used to train the model, and the remaining 20% is used for model accuracy evaluation. Experiments have shown that Few-short learning has higher accuracy compared with Long Short Term Memory (LSTM), Recurrent Neural Networks (RNN) and Convolutional Neural Network (CNN). On this basis, this paper develops a WeChat small program for early warning of telecommunications network fraud based on user portraits. When the user enters some personal information on the front end, the back-end database can perform correlation analysis by itself, so as to match the most likely fraud types and give relevant early warning information. The fraud warning model is highly scaleable. The data of other Applications (APPs) can be extended to further improve the efficiency of anti-fraud which has extremely high public welfare value.
As of June 2022, the number of netizens in my country has reached 940 million, an increase of 36.25 million from June 2022. The Internet penetration rate reached 67%, an increase of 2.5 percent points from June 2022 and an increase of about 5 percent points higher than the global average. Among them, my country’s mobile phone netizens reached 932 million, an increase of 35.46 million from June 2022. The proportion of netizens using mobile phones reached 99.2%, basically the same as in June 2022. This is due to the large-scale construction of 4G and 5G networks in China, and the fact that mobile phones are easier to carry and operate than computers. The rapid development of the Internet is highly integrated into human society and profoundly changes our work and lifestyle. However, everything has two sides. While the Internet has brought us unparalleled convenience, it has also created some negative problems. One of them is the crime of telecommunication network fraud. According to the “Mobile Phone Security Status Report for the First Half of 2020” jointly released by 360 and the China Academy of Information and Communications Technology, the per capita loss of telecommunications network fraud victims in the first half of 2020 alone has reached 10,037 yuan. Due to the large population base in my country, it is impossible to analyze them one by one. The current anti-fraud early warning mechanism lacks pertinence, and the “mass distribution” effect of anti-fraud publicity is not significant.
At present, the situation of telecommunication and network fraud crimes is severe, and it has become the type of crime with most cases, the fastest rise, the widest coverage, and the strongest response from the people. The five types of fraud in the Public Prosecution Law account for nearly 80% percent of the cases, making them the five most prominent high-incidence cases. Among them, fraud rebate fraud has the highest rate which accounts for about one-third of the total number of cases. Fraudulent investment and wealth management fraud involves the largest amount which accounts for about one-third of all funds involved. According to the survey, 80% percent of netizens have experienced telecom fraud. Compared with the post-70 s generation who have been deceived less, the post-90 s generation is the group with the most victims of telecom and network fraud. An intriguing question is that more than half of the victims were suspicious of a scammer but ended up being scammed anyway. The characteristics of “intelligence, specialization, grouping, and transnationalization” presented by telecommunication network fraud have caused a huge impact on my country’s criminal legislation and judicial concept which makes the law enforcement of related cases more difficult. To this end, the Supreme People’s Court, together with the Supreme People’s Procuratorate and the Ministry of Public Security, successively formulated the “Opinions on Several Issues Concerning the Application of Law in Handling Criminal Cases such as Telecom Fraud” and the “Opinions on Handling Telecom and Internet Fraud and Other Criminal Cases” in 2016 and 2021, respectively. Opinions on Several Issues Concerning the Application of Law (II)”, normative legal documents such as the minutes of the “breaking card” action meeting were issued twice last year and this year. In view of the outstanding problems in judicial practice, it is necessary to continuously improve and clarify the applicable standards of law. In order to meet the needs of the current struggle, it is necessary to further regulate and guide law enforcement in handling cases.
Zhuang [
This paper proposes a fraud warning scheme based on user portraits. First of all, based on the keyword “defrauded”, preliminary screening and cleaning are performed to obtain valid user Identity Documents (IDs) that have been defrauded, and the basic information and account information of these users are used for user tags to achieve the purpose of distinguishing fraud types. Secondly, through feature engineering technologies such as avatar recognition, Artificial Intelligence (AI) sentiment analysis, data screening, and blogger type analysis, the image and text are abstracted into user preferences and personality characteristics, and multi-dimensional information is integrated to build user portraits. On this basis, a WeChat applet for early warning of telecommunications network fraud based on user portraits is developed. When the user enters some personal information on the front end, the back-end database can perform correlation analysis by itself, match the most likely fraud types and give relevant cases, so as to achieve accurate early warning. The previous research results of this paper were published at the ICAIS2021 conference.
The innovations of this paper are summarized as follows:
By designing a free and interesting prediction applet, the model can provide personalized anti-fraud warnings. Users scan the WeChat Quick Response (QR) code to open the anti-fraud security questionnaire and answer a few simple questions. Then, the anti-fraud warning algorithm accurately matches the cases with the highest similarity to itself. At the same time, the calculation results will also provide the probability of being deceived, real cases and warning messages to enhance the warning effect. Through the popularization and application of the public security organs, the model can form a fraud early warning system centered on prevention. Users get a personalized anti-fraud security detection report, which contains the most matching fraud types and the similarity between the two. At the same time, typical cases and persuasive messages of this type of fraud are given, so as to improve users’ vigilance against telecom fraud. Relevant reports show that the effect of advance warning is far better than the dissuasion effect of early warning. The small program of the proposed early warning model can be used as a detection plug-in in the software background. Through automatic detection directly based on the data in the database, the operation efficiency is greatly improved. At the same time, the Few-shot Learning algorithm used in the early warning model has higher accuracy than Long Short Term Memory (LSTM), Recurrent Neural Networks (RNN) and Convolutional Neural Network (CNN) algorithms, which can save computing time while ensuring accuracy.
The next chapters of this paper are arranged as follows. The second chapter summarizes the development and application of user portrait technology, paving the way for subsequent chapters. The third chapter, data preprocessing and feature engineering, mainly describes data acquisition and cleaning, as well as feature engineering for user portrait work. The fourth chapter proposes a telecommunication network fraud warning model based on user portraits. The fifth chapter is experimental simulation verification. The last is the summary outlook.
User portraits are to label and analyze user data to describe the characteristics of real users, so that business personnel can quickly and accurately understand user information, so as to take targeted measures to achieve expected goals. User portrait analysis is one of the key points of human behavior analysis and is widely used in business. Reference [
After completing the modeling and analysis of user portraits and behaviors, many commercial software began to focus on researching recommendation systems. The main task of the recommendation system based on user portrait is to combine users and products, by mapping user portraits and product portraits, combining association rule mining and reordering algorithms, to obtain a personalized product recommendation list. Reference [
References [
Data preprocessing refers to the necessary processing such as review, screening, sorting, etc., before the collected data is classified or grouped. On the one hand, data preprocessing is to improve the quality of the data, and on the other hand, it is also to adapt to the software or method used for data analysis. Generally speaking, data preprocessing steps are: data cleaning, data integration, data transformation, data reduction, and each large step has some small subdivision points. Of course, these four major steps do not necessarily have to be executed when doing data preprocessing. This paper mainly needs to clean the acquired data.
First, we use web crawler technology to crawl the related Weibo user id through the keyword “deceived”. There are roughly three types of user data obtained: ordinary individual users who have experienced Internet fraud, Weibo-authenticated users of anti-Internet fraud propaganda (such as public security, political and legal, anti-fraud, news) and others.
Data cleaning, as the name suggests, “black” becomes “white”, “dirty” data becomes “clean”, and dirty data is dirty in form and content. Formally dirty, such as: missing values, with special symbols, etc. Dirty content, such as: outliers, etc. This paper uses data cleaning technology to delete the last two types of useless user data, leaving only the users who have been defrauded by the Internet. Then, the data is preprocessed, and special symbols, videos, web page links, etc. are not helpful for subsequent sentiment analysis. Finally, further use user_id will to obtain the user’s gender, region, watch list, avatar, and the content of Weibo published by the user. And classify the types of deception of effective users.
Feature engineering is an integral part of machine learning and occupies a very important position in the field of machine learning. Feature engineering refers to the use of a series of engineering methods to filter out better data features from the original data to improve the training effect of the model. There is a widely circulated saying in the industry that data and features determine the upper limit of machine learning, and models and algorithms are only approaching this upper limit. If better features are used, the model and algorithm can play a greater advantage. Feature engineering usually includes data preprocessing, feature selection, and dimensionality reduction.
Considering the existing research progress of user data sentiment analysis [
The analysis of Weibo text data focuses on users’ Weibo content and following bloggers.
Firstly, the sentiment analysis is carried out on the Weibo published by users, and the text content with subjective sentiment is analyzed, summarized, reasoned and judged. Similarly, first of all, on the basis of the establishment of the dictionary, the topics, sentiment words, evaluation words, evaluation objects, opinion classification, emojis, etc. Valuable emotional information is extracted. Second, a classification method based on machine learning [
Secondly, for the analysis of the type of bloggers concerned, in Weibo, users are interested in a field or a category of things, and will follow the corresponding bloggers. According to the crawled user attention list, counting the proportion of various bloggers and understanding user preferences can be used as an important feature of the deceived user group. We adopt a tag-based data screening method to accurately and effectively classify the bloggers followed by the deceived by obtaining the Weibo tags of the bloggers followed by the deceived. The main tags are: car, sports, finance, games, photography, shopping, technology, entertainment, animation, beauty, fitness, travel, horoscope, health, real estate, parenting, religion, emotion, lottery.
As shown in
The avatar feature analysis is the first impression a user gives to others, which represents the user’s preferences and personality pursuits, and is also a very important feature. We use the image classification Application Program Interface (API) publicly available on the Internet to identify images through a vision-based image recognition algorithm [
The principle of avatar recognition is shown in
This paper uses Baidu artificial intelligence API to classify and label Weibo avatars. First log in to the
After feature engineering, the user portrait shown in
In the previous article, the user data of telecommunication network fraud victims that we crawled is limited, and it cannot be called big data in the true sense. The evaluation of telecom fraud risk model based on “user portrait” based on Weibo data is a small sample learning algorithm [
In the anti-fraud early warning model shown in
Different features have different degrees of influence on the model. We need to automatically select some features that are important to the problem and remove features that are not very relevant to the problem. This process is called feature selection. The selection of features is very important in feature engineering, and it can often directly determine the quality of the final model training effect. When constructing feature engineering, this model filters almost all valid tags of target Weibo users. In addition to basic age, region, gender, education, etc., feature engineering is also used to analyze user avatars, Weibo sentiments, and watch lists to analyze user personality characteristics. The obtained user characteristics are more objective and comprehensive. Therefore, the construction of user portraits is more observable.
This paper constructs an early warning model of telecommunication fraud crime based on Weibo data through the user characteristics induced by feature engineering. The proposed electronic fraud early warning model divides victims into six types, including transaction fraud, free delivery fraud, dating fraud, financial credit fraud, phishing fraud, and part-time fraud. In the data preprocessing stage, the data is labeled with the above-mentioned deceived types.
The proposed anti-fraud model adopts the N-way K-shot prediction model, that is, K samples are randomly selected from N types of samples for prediction each time. Calculate the similarity between the two samples with the sample to be predicted through the previously trained model, and calculate the similarity between the sample to be predicted and each category by calculating the mean value.
Next, we compare the accuracy and computation time of the Few-shot Learning algorithm in this model with the deep learning algorithms LSTM [
This paper constructs a telecommunication fraud crime early warning model. By applying this model to the WeChat applet for better promotion, the social effect of comprehensive anti-fraud can be achieved. As shown in
After the test user enters the WeChat applet, they need to answer a few simple questions, such as “what percentage of you follow up?”, “your avatar type” and so on. As shown in
The fraud prevention scheme for Weibo users based on small sample learning proposed in this paper can also be applied to other platforms, such as small video platforms such as Douyin and Bilibili. Now take the Douyin platform as an example to discuss the applicability. Douyin is similar to Weibo. You can interact with any user’s private message by just clicking on it. Therefore, the fraud methods and early warning methods based on this platform are also the same as Weibo. Compared with Weibo, the biggest difference in profiling the deceived people on the Douyin platform is the acquisition of user personal data. The information of the Weibo platform is mainly transmitted in text, and it has a web page, which is conducive to the acquisition of data by crawler software such as “Octopus”, while the Douyin platform mainly uses short video as the transmission medium, and has no web page. It is characterized by the strategy of “downloading-analyzing-targeting the deceived and crawling account homepage information” [ “Download”
Because the videos on the Douyin platform are all manually refreshed, use the Android platform emulator on the Personal Computer (PC), and download Douyin software on the simulator platform to simulate and refresh, obtain batch video sharing links, and use the open source software “Douyin Video Download Assistant”. (URL: “Split”
Use open source software “Video Framing Magic”
(URL: “Analysis”
Actively use cutting-edge technology to connect server mainland data to Baidu AI interface
(URL: “Directed Crawl”
Using the account of the deceived obtained in the “analysis” stage, you can crawl the data on the homepage of the deceived and obtain the characteristic values of the deceived. By describing the user profile with the obtained information, the signature database of the deceived person can be built. The methods for characterizing user portraits and building a deceived person’s signature database are the same as those described in this paper. Based on this scheme, it can still be done on other apps. At the same time, due to the different user groups applicable to various APPs, more comprehensive and accurate analysis results can often be given. This model can be used as a microblog background detection tool without the need for users to actively call the applet test. By periodically executing the script in the background, the user’s recent dynamic and automatic detection of the current deception type and probability of each user. Remind users through private messages, and can directly add anti-fraud measures to cases to achieve the effect of early warning and prevention. At the same time, establish a user information database directly linked to the public security organs, and report to the public security organs on a regular basis. As shown in
The anti-fraud warning based on user portraits established in this paper has broad application prospects. The model constructed in this paper can also effectively improve the anti-fraud awareness of netizens and prevent telecommunication network fraud. Future research could consider adding methods of machine learning or deep learning. By adopting more powerful technical means to reduce the success rate of telecommunication network fraud, so as to give netizens a clear online environment. In addition, WeChat applet anti-fraud warning is also a good exploration.
Wen Deng and Guangjun Liang conceived and designed the experiments; Chenfei Yu and Kefan Yao performed the experiments; Chengrui Wang and Xuan Zhang analyzed the data; Guangjun Liang wrote the paper. All authors have read and agreed to the published version of the manuscript.
This research has been supported by the
The authors declare that they have no conflicts of interest to report regarding the present study.