Adaptive XGBOOST Hyper Tuned Meta Classifier for Prediction of Churn Customers

In India, the banks have a formidable edge in maintaining their customer retention ratio for past few decades. Downfall makes the private banks to reduce their operations and the nationalised banks merge with other banks. The researchers have used the traditional and ensemble algorithms with relevant feature engineering techniques to better classify the customers. The proposed algorithm uses a Meta classifier instead of an ensemble algorithm with an adaptive genetic algorithm for feature selection. Churn prediction is the number of customers who wants to terminate their services in the banking sector. The model considers twelve attributes like credit score, geography, gender, age, etc, to predict customer churn. The project consists of five modules as follows. First is the pre-processing module that identifies the missing data and fills the value with mean and mode. Second is the data transformation module where, the categorical data is converted into numerical data using label encoding to fasten the computations. The converted numerical data is normalized using the standard scalar technique. The feature selection module identifies the essential attributes using DragonFly and Firefly (Hybrid Fly) algorithms. The classification module designs an intelligent Meta learner, which combines the Ensemble Algorithm Extreme Gradient Boosting (XGBOOST) with base classifiers as “Extra Tree Classifier” and “Logistic Regression” to predict the churn customers.


Introduction
The customers' churn decision depends on the offers and benefits provided by the organization. To automate the process of finding the churn customers, the proposed algorithm uses machine learning [1] concepts with few important limitations from the previous works, as shown in Tab. 1.
From the above limitations, few research gaps are identified and solutions are given by the proposed method as follows.
The system needs an algorithm to support the hyper tuning of the estimators that quickly adapt to the environment for self-learning as autonomous system.
Deep Learning algorithm is good in nature, but it makes the process complicated and hence the algorithm should be automatic, dynamic and straightforward.
To solve overfitting problem, the system needs an algorithm to adjust the training data and generate good decision rules to produce the tree of good depth.
The accuracy is mainly dependent on pre-processing and feature selection processes. Blindfolded traditional approaches are suitable to handle this problem. The algorithm needs some data exploration mechanisms to decide the data cleaning techniques that is applied simultaneously to decide population characteristics.
The approach used by organizations to identify customers who are likely to churn is known as the binary classification job. Binary classification tasks interpret questions as input and the results are either yes or no. Customer deflection likelihood is most common in SaaS (software as a service) and membership-based (feebased) firms that charge on monthly, yearly, quarterly and ongoing basis. Mostly, creating a customer outreach model that yields reliable data for churn prediction is the first step in deflection probability analysis.

Literature Survey
The exploration of diverse machine learning and statistical models for progress in prediction, authors have analyzed the virtue of customer churn rate in data. The experiment shows a comparative result of testing the parameters like accuracy, specificity and sensitivity of the algorithms such as Random Forest, ANN (Artificial Neural Networks), GLM (Generalized linear model) and XG -Boost (extreme gradient boosting) [7] over the data set. This paper is mainly suitable for the commercial banks to evade customer churn and preserve the advances and deposits held by the SB (Saving' Bank) customers. Predictive algorithm compared the models and stated that Random Forest [8] had shown the best churn prediction of 78% with a high predictive score from the experimented dataset.
This model uses Multi-Layer Perceptron in Artificial Neural Network (ANN) [9] for customer churn prediction and resulted in the best accuracy compared to the traditional machine learning methods with graphical representation. For better computation of the deployed model, the researchers have segmented the dataset into three parts namely a training set, a testing set and a validation set. The feature scaling is used to improve the algorithm's computation time. The model deployed in python and Multilayer Perceptron with ANN showed 97.53% and 97.36% of efficiency respectively.
With the use of clustering and classification models, the authors developed a predictive algorithm to determine the circumstances for customer churning in telecom sector [10]. The proposed model initially categorizes churn customers using the Random Forest classification algorithm with an accuracy of 88.63% and later uses a cosine correlation to produce grouped retention. Random Forest generally uses decision trees in classification and handling nonlinear data effectively. With the associated features, the deployed model works best on the dataset. This study uses Attribute Selected Classifier for rule generation [11] and accessible visualization.
The Principal Component Analysis (PCA) establishes the features for all the indiscriminate datasets segregated from the training dataset. The key component of this development is to evaluate a better prediction model for the client churning in the telecom sector.
The research proves that the deep learning algorithm model is better than the general machine learning algorithms and ANN concepts in churn prediction. For this research, the experiment was carried out on RapidMiner Studio using two unique datasets [12]. The experimental analysis using Deep Neural Networks (DNN) [13] is carried out on hidden layers using a rectifier function and on activation layers and output stage, a sigmoid operation is used. This experiment has three main research objectives such as the Activation Function, Batched sizes and training algorithms. In the first analysis, variant combinations of active functions are used at the hidden and output layers to identify the accuracy. The second observation is to calculate the impact on the considered training deep neural network on batch size. At last, multiple training algorithms are compared with different training parameters for determining the efficiency. The proposed heuristic observation will increase the productivity of hyperparameter tuning in DNN churn model [14][15][16][17].
The proposed algorithm is carried out on a moving window that authorizes complex detection, series and dependencies based on the chosen transactions at particular time. This experiment also focused on reducing the noise data by eradicating inappropriate dependencies that arise due to the absence of analysis through time dimension. Rather than Principal Component Analysis (PCA) [18][19][20], the Boruta algorithm [21,22] is used for the feature selection process for dimensionality reduction problem. This developed research can derive substantial patterns from factual customers transmit and transactional data [23,24] authenticated by the financial institutions.

Proposed Methodology
The main focus of this study is to detect the churned customers in banking sector and to take preventive measures to protect them from exiting. The proposed system consists of five modules, as outlined in Fig. 1.

Data Pre-Processing
The churn modeling dataset contains 12 attributes and 1 class label of 50000 customers detail for training. The detailed description of the dataset is given in Tab. 2.
The basic principle of any machine learning algorithm is that it works efficiently with numerical and binary data rather than categorical data. To design an accurate system, the null values are removed before  converting the categorical data to numerical data. In the above dataset, an empty value in age attribute is filled with a mean value of the age.

Data Transformation
It describes the process of converting the categorical data into numerical data with an encoding technique known as "Label Encoding," which assigns a numerical value from 0 to n-1. The conversion process is illustrated with an example as follows. Suppose an attribute known as "Geography" has three different values represented ranging from 0 to 2 numerical values based on the sorted data as illustrated in Tab. 3.
The churn modeling dataset contains a total of 3 categorical data elements. By applying the data transformation process on these elements, the obtained numerical values of a sample set are shown in Fig. 2.

Data Standardization
The obtained data are numerical values, but still there are few attributes like credit score, age and estimated salary have wide range of possible values. This problem is addressed by the popular standardization technique known as "Standard Scalar," that redistributes the mean and standard deviation always to be 0 and 1 respectively. The redistributed values are computed for every possible values in the attribute as represented in Eq. (1).
where A[i]_New denotes the newly computed values of the attribute

Feature Extraction
This is the crucial step in machine learning process, as all attributes are not important for the classification of churn customers. In the previous studies, the relation and strength of the attributes are measured using the correlation matrix. The correlation matrix acts as the key decision-maker to measure the strength of the relation between two attributes by computing its rank using Spearman's mechanism. The proposed system tries to identify the direction of impact on the attributes and their rank order relationship between them. The results obtained for the correlation matrix are represented in Fig. 3.

Figure 3: Correlation Matrix using Spearman's Mechanism
From the above matrix, it is observed that very few elements are correlated. The proposed system implements pipelined architecture of REF with Random Forest algorithm, which computes the accuracy for every individual feature and publishes the one with the highest accuracy as the number of features to be selected. The output for various attributes vs. accuracy is plotted as "Box Plot" in Fig. 4.
From Fig. 4, it is observed that all the selected attributes have the highest accuracy compared to the other values.

Feature Selection
The proposed system has identified the important attributes by designing a novel hybrid adaptive genetic algorithm known as "Dragonfly + Firefly." The Dragonfly algorithm is the dimensionality reduction technique in large-scale industries that is purely based on the Bayesian Optimization technique and helps improve the performance by evaluating the features in parallel. The Firefly algorithm also reduces the multiple objectives concerned with the improvement of accuracy and reduction of error rate. The dynamic dragonfly creates sub swarms and moves in their favorable direction.

Intelligent Meta Classification
The traditional algorithms like SVM, DT and NB have ruled the prediction systems. On introduction of machine learning algorithms, the ensemble algorithm came into existence, but still the problem of overfitting remains unsolved in few real-time applications. Hence, the proposed system designed a meta classifier of two-step approach that defines the base classifiers and the obtained results are integrated with the meta classifier. The proposed algorithm defines the base classifiers such as logistic regression, a traditional algorithm and an extra tree classifier, an ensemble algorithm. The meta classifier is constructed as XGBOOST. The outline of the prediction system is represented in Fig. 7.
The advantage of this meta classifier architecture is that it applies many algorithms in parallel on the training dataset and collect the predictions from every individual algorithm separately. With the collected predictions, in the second level of architecture, it applies an ensembled algorithm, which is hyper tuned to predict the final class label of the record. With this 2-layered architecture, the accuracy and true positive      The performance of the proposed system is compared to the previous systems in Tab. 6. Fig. 8 clearly states that the proposed model has a better performance in all the aspects compared to the existing methods. The Meta-Learning algorithm combined with ensemble bagging algorithm known as "Extra Tree Classifier" with boosting algorithm "XGBOOST" gives us more accuracy than the traditional approaches with 97.85% of accuracy. This algorithm accurately finds the customers' who churns from the banking sector, represented by the sensitivity (Truly Predicted Class Labels) of 98.8%. The main advantage of this algorithm is to identify the important features using the Hybrid Fly algorithm.

Conclusion
The customers' retention ratio is the crucial element in private organizations, especially in banking industry. Hence, the proposed system designed a hybrid fly algorithm, an important segment to identify the crucial characteristics that act as a decision-maker to predict the churn customers. All the existing studies either used correlation or recursive methods to analyze the strength of the attributes, but the proposed algorithm uses swarm intelligence, that identifies the strength based on its working environment. The second major element of the algorithm is meta classifier, that allows many algorithms to run in parallel and improve the execution times of the model. As it involves the majority voting scheme, the model's reliability is considered to be more accurate. The hyper tuning parameters help the model to select its decision rules and threshold values that automatically makes the system to adjust their estimator values. As the dataset contains a huge amount of data, working on desktop applications, supported by the CPU requires more computation time. Hence, it is better to work in a cloud environment with Big Data tools like Apache Spark. The system can be further extended to offer different loan schemes and credit cards to the customers who wants to terminate from the banking sector.
Funding Statement: The authors received no specific funding for this study.
Conflicts of Interest: The authors declared that they have no conflicts of interest to report regarding the present study.