The rapidly escalating sophistication of e-commerce fraud in recent years has led to an increasing reliance on fraud detection methods based on machine learning. However, fraud detection methods based on conventional machine learning approaches suffer from several problems, including an excessively high number of network parameters, which decreases the efficiency and increases the difficulty of training the network, while simultaneously leading to network overfitting. In addition, the sparsity of positive fraud incidents relative to the overwhelming proportion of negative incidents leads to detection failures in trained networks. The present work addresses these issues by proposing a convolutional neural network (CNN) framework for detecting e-commerce fraud, where network training is conducted using historical market transaction data. The number of network parameters reduces via the local perception field and weight sharing inherent in the CNN framework. In addition, this deep learning framework enables the use of an algorithmic-level approach to address dataset imbalance by focusing the CNN model on minority data classes. The proposed CNN model is trained and tested using a large public e-commerce service dataset from 2018, and the test results demonstrate that the model provides higher fraud prediction accuracy than existing state-of-the-art methods.
The explosive development of e-commerce has greatly increased the extent of e-commerce fraud to the point that it has become one of the major forms of financial fraud prevalent today. Therefore, the probability of online merchants encountering fraudulent transactions is increasing at an astounding rate. Accordingly, the detection of e-commerce fraud has become a particularly important component of fraud prevention [
Early on in the development of e-commerce, the small transaction volume and relatively simple information involved in e-commerce transactions enabled human experts to formulate basic rules based on relevant knowledge and experience, and then incorporate those rules within risk control engines. Accordingly, conventional rule-based methods have been widely developed for detecting fraudulent financial activities over the years. For example, Dharwa et al. applied feature engineering methods to train their rule-based models based on statistical analyses of the historical data of users in terms of transaction times [
A number of representative data mining and machine learning methods have been applied for detecting fraudulent activities within a number of settings. For example, Huang proposed a data mining method for detecting anomalous economic activities indicative of fraud [
However, data mining and conventional machine learning approaches suffer from the sparsity of positive fraud incidents relative to the overwhelming proportion of negative incidents, and these imbalances lead to detection failures. Numerous approaches have been applied to address this issue. For example, Nanduri et al. applied machine learning and periodic network optimization based on available data to reduce e-commerce fraud, and their approach effectively addressed challenges associated with both dynamic fraud and fraud escalation [
However, the above-discussed machine learning approaches tend to suffer from an excessively high number of network parameters, which decreases the efficiency and increases the difficulty of training the network, while simultaneously leading to network overfitting. Yan addressed this issue, and demonstrated the successful application of a convolutional neural network (CNN) for implementing peer-to-peer (P2P) fraud detection in online loan credit transactions [
This paper addresses the above-discussed limitations in past efforts to apply machine learning to detect fraudulent e-commerce transactions by combining a CNN framework with the previously proposed algorithmic-level approach to focus the model learning more on the minority class of fraudulent transactions [
The standard CNN model was developed according to much earlier work on the concept of the local receptive field conducted in 1959 by Hube et al. [
The configuration of neuron connections under a fully-connected condition is illustrated in
The convolution operation with a 3 × 3 kernel applied to a 5 × 5 pixel image section is illustrated in
Under the above-described conditions when each neuron includes only 100 parameters, the convolution process involves only a single 10 × 10 convolution kernel. Obviously, the feature extraction obtained under this condition is not sufficient to support accurate model predictions. This is addressed by adopting multiple convolution kernels, where each kernel is capable ostensibly of extracting a unique feature. This process is illustrated in
The convolution operation obtaining two channels via the convolution of four channels is illustrated in
This non-linear activation function theoretically enables the CNN model to fit any function. Based on this discussion, the final number of parameters is the product of the original 4 channels, the 2 resulting channels, and the 2 × 2 = 4 pixel convolution kernels, for a total of 4 × 2 × 4 = 32 parameters.
The features learned by a single convolutional layer typically represent only local characteristics. Therefore, multiple convolutional layers have generally been employed in practical CNN applications, where the extracted features attain increasingly global characteristics with an increasing number of convolutional layers, and then the fully connected layer is used for training.
Real-world, open-source e-commerce service data from 2018 was obtained from the Kaggle website. This represents a total of 151,113 complete data samples.
Class | User fraud status, where 0 represents a non-fraud status and 1 represents an instance of fraud |
---|---|
source | Online purchase sources are divided into three types, including search engine optimization (SEO), advertisements (Ads), and direct purchases (direct) |
purchase_value | Monetary value of purchase |
browser | Browser employed by user during online purchase: Chrome, Opera, Safari, Internet Explorer, FireFox |
signup_time | User registration time |
purchase_time | User purchase time |
ip_address | IP address of user at the time of purchase |
The values X in each data sample are first standardized to a variance of 1 to improve their comparability by applying the z-score standardization method as follows.
Where
The preprocessed data samples are assembled into training, validation, and testing datasets according to the ratio 7:1:1. Therefore, the training dataset included 39,620 data samples and the validation and testing datasets included 5,660 data samples each. Both the validation and testing datasets included 634 data samples in the fraudulent class (i.e., class = 1). Here, the samples in the training dataset are naturally used for training the proposed CNN model, as well as the other models considered. Meanwhile, the samples in the validation dataset were used for parameter tuning or model optimization, and the samples in the testing dataset were used to evaluate the prediction effect of the proposed CNN model, and those of the other models considered.
The CNN model employed in the empirical analyses adopted the classic LeNet-5 and GoogLeNet architectures, and the model is illustrated schematically in
The details of the convolutional layers are given as follows. The characteristics of each user's behavior at different points in time are extracted in the first convolutional layer (C1) by means of 32 convolution kernels with a size of 7 × 7. The first convolutional layer is followed by a 3 × 3 maximum pooling layer (S1), which provides secondary feature extraction, where each neuron gathers only locally accepted domains. The first maximum pooling layer is then connected to the second convolution layer (C2), which again applies 32, 7 × 7 convolution kernels with the purpose of further extracting the characteristics of each user's behavior at different points in time. The second convolutional layer is then connected to a 2 × 2 maximum pooling layer (S2). Finally, S2 is connected to a third convolution layer (C3), which also applies 32, 7 × 7 convolution kernels to further extract the characteristics of each user’s behavior.
The CNN model was trained using the training dataset. The significant parameters adopted in the training process include the learning rate, which refers to the size of the update network weight in the optimization algorithm, and the number of iterations, which refers to the number of times the entire training dataset is input to the neural network during the training process. In addition, the batch size is an equally important parameter. As preliminary settings, a number of iterations of 29, a batch size of 40, and an activation function (1) was applied in the convolutional layer and fully connected layer. By using the model shown in
Fraud status | Actual value | ||
---|---|---|---|
0 | 1 | ||
Predictive value | 0 | 4104 | 118 |
1 | 923 | 516 | |
Forecast accuracy | 81.64% | 81.39% |
According to the results, it can be seen that the CNN model obtained a prediction accuracy of 81.64% for true negative (TN) samples (i.e., non-fraud status), while an accuracy of 81.39% was obtained for true positive (TP) samples. In addition, the results yielded a false positive (FP) rate of 18.36% and a false negative (FN) rate of 18.61%. Overall, the preliminary forecast results basically meet the expected requirements.
The prediction effect of the trained CNN obtained for the validation dataset was then optimized to determine the optimal training parameters. Accordingly, the CNN model was retrained at a learning rate of 0.01, a number of iterations set at 25, and a batch size of 40, while the rectified linear (ReLU) function was applied as the activation function in the convolutional layer, and the sigmoid function was the activation function applied in the fully connected layer. The loss function value obtained at each iteration of the training process is presented in
Fraud status | Actual value | ||
---|---|---|---|
0 | 1 | ||
Predictive value | 0 | 4309 | 104 |
1 | 718 | 530 | |
Forecast accuracy | 85.72% | 83.60% |
It can be seen from the prediction results that the optimized CNN model obtained considerably improved prediction accuracies of 85.72% for TN samples and 83.60% for TP samples, while the results yielded FP and FN rates of 14.28% and 16.40%, respectively.
The prediction results obtained for the conventional logistic, SVM, and RF models are listed in
Model | CNN | Logistic | SVM | Random forest |
---|---|---|---|---|
Non-fraud prediction accuracy | 85.72% | 84.25% | 79.28% | 74.67% |
Fraud prediction accuracy | 83.6% | 80.81% | 70.1% | 70.35% |
Loss value | 0.336 | 0.654 | 0.400 | 0.520 |
The present work addressed the excessively high number of parameters involved with conventional network approaches for detecting fraudulent activities in an e-commerce setting, and further addressed the problem of dataset imbalance by combining a CNN framework with an algorithmic-level approach to focus model learning more on the minority class of fraudulent transactions. The proposed CNN model was trained, optimized, and tested using 50,940 real-world e-commerce service data samples. In addition, the prediction results were compared with those obtained using conventional approaches, including those based on logistic, SVM, and RF models. The results demonstrated that the proposed model provides a fraud prediction accuracy rate of 83.60%, which is at least 3.4% greater than all the existing state-of-the-art methods considered. A key issue that remains to be addressed involves the application of fraud prediction methods to new users for which no past features are available (i.e., the cold start problem). This is a particularly significant problem for CNN-based methods because the input data required for training the model is unavailable for a new user. The prospect of introducing extensions to the CNN model to address this issue will be investigated in future work.
We thank Letpub (
This work was supported by the
The authors declare that they have no conflicts of interest to report regarding the present study.