Online advertisements have a significant influence over the success or failure of your business. Therefore, it is important to somehow measure the impact of your advertisement before uploading it online, and this is can be done by calculating the Click Through Rate (CTR). Unfortunately, this method is not eco-friendly, since you have to gather the clicks from users then compute the CTR. This is where CTR prediction come in handy. Advertisement CTR prediction relies on the users’ log regarding click information data. Accurate prediction of CTR is a challenging and critical process for e-advertising platforms these days. CTR prediction uses machine learning techniques to determine how much the online advertisement has been clicked by a potential client: The more clicks, the more successful the ad is. In this study we develop a machine learning based click through rate prediction model. The proposed study defines a model that generates accurate results with low computational power consumption. We used four classification techniques, namely K Nearest Neighbor (KNN), Logistic Regression, Random Forest, and Extreme Gradient Boosting (XGBoost). The study was performed on the Click-Through Rate Prediction Competition Dataset. It is a click-through data that is ordered chronologically and was collected over 10 days. Experimental results reveal that XGBoost produced ROC-AUC of 0.76 with reduced number of features.
Bringing business online is the easiest way to gain profits in this era, since it is affordable and accessible globally. Because of online business’ expansion, it is easy for a business page to get lost among millions of other, possibly competitor, businesses. This is the reason why online advertisements become rather necessary for the success of a business. Nevertheless, advertisements have the same problem of possibly getting lost or ignored, This may be caused by widespread clickbait ads that can harm the client’s device and steal personal information. Therefore, it is important to get user’s trust and interest, so that the advertisement is clicked indeed, resulting in a more successful business. To complicate the measurement of ad success, there are multiple success metrics applicable to the clicking of advertisements. Our study has chosen the “Click Through Rate” metric for evaluation.
The Click Through Rate is determined by the number of times the online advertisement has been clicked by a potential client: the more clicks, the more successful the ad is. As mentioned earlier, it is difficult to gain clicks: on average CTR is 0.2%, and this number is computed versus ad views. Thus, it is important to measure the fruitfulness of the advertisement’s subject, specification, etc. to inform business marketers which ads work well and which not [
CTR can be predicted using this simple equation:
The remainder of the paper contains 5 sections. Section 2 contains a literature review on studies related to ad click prediction. Section 3 describes the proposed machine learning techniques XGBoost, Random Forest, KNN, and Logistic Regression classifiers. Section 4 details the empirical study which consists of data description and the experimental setup. Results of feature selection, optimization, and research outcomes are discussed in Section 5. Finally, the study is concluded in section 6.
Advertisements have a massive influence on attracting targeted customers, not to mention that the way advertisments are presented affects actual sales. Moreover, mobile advertisements, specifically, are crucial in an environment with time critical competition: who is the first to post an advertisement and who will make profit. For a regular, ongoing advertisement, the historical click information is used to predict the Ads’ CTR, but this method does not work in case of a new advertisement CTR prediction, because there is not enough historical data for these new advertisements.
Fang et al. [
Furthermore, another study made by Dembczynski et al. [
Shi et al. [
Gai et al. [
Xiong et al. [
Deep Learning has also been widely used in the prediction of the Click Through Rate. Edizel et al. [
Similarly, Guo et al. [
Zhang et al. [
Moreover, Wang et al. [
Similarly, Zhou et al. [
Cacheda et al. [
Furthermore, another study was made to capture the user’s interest to the right advertisment for the specific users by using an attentive deep interest (ADI) based model [
In addition, FiBiNET model was proposed that focused on the importance of features by combining a shallow model and a deep neural network into a deep model [
All the previous studies reveal the significance of the CTR and the integration of machine learning and the deep learning for the prediction of CTR. Various businesses take CTR as an essential aspect of their advertisements’ campaigns. Machine Learning communities are still exploring new possibilities regarding this subject since it is still a hardcore subject in the field of business. From that perspective, we aimed to fulfill the study that produced better outcome (AUC_ROC) with the reduced number of features. We attempted to test various machine learning algorithms to achieve the most accurate results to benefit the interested parties.
The objective of preprocessing is to achieve a noise free dataset to get the best result from the models. In this stage all records were checked if there were null values or duplicated records. All the duplicated records were dropped to reduce the noises of the dataset. There were no null values in the dataset. The dataset suffered from an imbalanced class distribution and this was treated by data under sampling. Then, labeling encoder was used to convert all string data into numerical data, so it can be fed into the ML algorithms.
The study used four supervised machine learning techniques to predict the click on a specific mobile ad, namely KNN, Logistic Regression, Random Forest, and XGBoost. The proposed methodology contains 2 additional steps for each classifier: (1) Feature selection to choose suitable features for each algorithm. (2) Parameter optimization to choose the parameters that improve the ROC–AUC score for a given algorithm. For each classifier, a different feature selection and optimization technique was applied.
K-Nearest Neighbors classification works by observing the dataset and assigning unlabeled records to a class of the most similar labeled records. KNN mainly depends on the
where
For KNN, we used a univariate feature selection technique, to select the best features in the dataset. The technique works by assigning a score for each feature based on some univariate statistical tests. Each feature is compared to the class label to check if there is an important relationship between them.
In machine learning, Logistic Regression (LR) is a statistical method that is a popular classification technique to predict the probability of occurrence of a binary event using a logistic function. Moreover, it can handle any number of numerical and/or categorical variables. The logistic function is a sigmoid function, which takes any real value between zero and one [
where
For Logistic Regression Recursive our feature selection relies on feature elimination (RFE). It is a feature selection method used for fitting the model and removing the weakest features repeatedly, until the specified number of features is reached. Because REF requires that an indicated number of features are kept, cross-validation is utilized with RFE to score distinctive feature subsets and select the best scoring collection of features [
Random Forest is one of the supervised machine learning algorithms based on ensemble learning. Random Forest combines multiple decision trees resulting in a forest of trees. It can be used for both classification and regression and it has a robust behavior in the features selection phase. Each individual decision tree in the random forest specifies the prediction class and the class with the most votes becomes the final predicted class.
The Random Forest function is represented as follows [
where
The primary goal of feature selection is to extract the important features to achieve the maximum classification performance. We used an extra tree classifier model to estimate the importance of each feature.
For the Random Forest classifier we used two optimization techniques which are K-fold cross-Validation and Grid Search CV. Cross-validation (CV) is one of the techniques used to measure the effectiveness of the built model. It is also a resampling procedure. Grid search is the process of performing hyperparameter tuning in order to define the best combination of parameters values.
Extreme Gradient Boosting tree (XGBoost) is an ensemble Machine Learning technique based on decision trees with enhancements. This algorithm is built for supervised learning techniques, such as Classification and Regression. The benefit of the algorithm is the enhancement of performance, regularization to avoid overfitting, and built-in cross-validation to choose the optimal number of iterations [
Loss function, usually evaluated by the mean squared error, is used to ensure the ability of the model to predict with the given training data.
where
As mentioned before, the regularization term is a term to avoid overfitting and there are various regularization functions that XGBoost can provide.
XGBoost makes decisions by creating weighted ensembled decision trees. XGBoost uses a CART tree which differs from the usual decision tree, since it adds the real score prediction instead of the decision value only. To validate the prediction, XGBoost creates multiple trees to make its decision and sums the total of each tree to come up with the most accurate prediction
where
The final equation of XGBoost is shown as follows.
XGBoost, as any tree algorithm, has a feature importance function which can tell the importance of each feature based on the given trained model. This function is helpful to know the logic of the model, then, eventually, to improve its performance by removing the least important features.
Parameter optimization for the XG-Boost was done using Grid search as in Logistic Regression and Random Forest. Grid search takes a model, the value of different parameters and then chooses the optimal values of those parameters.
The dataset used in our study is the Click-Through Rate Prediction Competition Dataset from the Kaggle data science community [
Feature Name | Datatype | Values (Unique) | Min - Mean - Max |
---|---|---|---|
id | Category | unique | – |
click | Category | 0, 1 | – |
hour | Continuous | 10 days, 24 hrs | – |
C1 | Continuous | 7 | 1001, -1005.09, -1012 |
banner_pos | Continuous | 7 | – |
site_id | Category | 2865 | – |
site_domain | Category | 3394 | – |
site_category | Category | 2 | – |
app_id | Category | 4154 | – |
app_domain | Category | 287 | – |
app_category | Category | 31 | – |
device_id | Category | 368962 | – |
device_ip | Category | 1078153 | – |
device_model | Category | 6098 | – |
device_type | Category | 4 | – |
device_conn_type | Category | 4 | – |
C14 | Category | 375, -18291.97, -21705 | |
C15 | Category | 120, -318.98, -1024 | |
C16 | Category | 20, -56.53, -1024 | |
C17 | Category | 112, -2044.94, -2497 | |
C18 | Category | 0, -1.47, -3 | |
C19 | Category | 33, -190.75, -1835 | |
C20 | Category | -1, -45400.49, -100248 | |
C21 | Category | 13, -69.43, -195 |
The study was implemented in Python, with Jupyter IDE. The study used a subset of the dataset with 1,048,574 records. Since the dataset is imbalanced, we had to apply the under-sampling technique to solve the issue. After that, the dataset was encoded using a hash function to convert object type columns to integer type, so it can be fed into machine learning algorithms. Next, the dataset was split into 80% for training and 20% for testing. Finally, four algorithms were trained and tested using feature selection methods and parameter optimization to see which one give the best outcome. These are:
Before implementing any enhancement techniques, KNN gave an acceptable score. After that, the univariate feature selection technique was used, but it gave a bad performance, so all the features in the dataset were used to implement the KNN. After feature selection, KNN optimization was done manually by increasing the
The optimal value for
Before feature selection, Logistic Regression gave an acceptable ROC–AUC score. Then Recursive Feature Elimination (RFE) feature selection with Cross-Validation were used to remove the least important features. The remaining features were used to apply Logistic Regression, the ROC–AUC score almost remained the same.
The parameters that used to enhance the LR performance are “C = 300” and “penalty = 12”, Grid search was used to choose the best combination of the two parameters. Applying Grid search did not provide the expected results since the ROC–AUC score did not change.
Before feature selection, Random Forest gave a high potential in CTR prediction. To start the feature selection experiment, we used the feature importance method to find the most important features in ascending order, as shown in
After feature selection, parameter optimization was done using K-fold validation and Grid search. When K-fold validation was used with 10 folds were used to split the dataset and trained the model. After using the 10-Fold cross-validation the ROC_AUC score decreased. When Grid search was implemented, two hyperparameters
Before feature selection, XGBoost also showed a high potential in CTR prediction. Feature importance method was implemented as Random Forest. After performing the function, the output of features in ascending order are shown in
The experiment was then conducted for parameter optimization. The Grid search mechanism was used. Two parameters in XGBoost were used to improve the performance of the algorithm, namely: max_depth=6 and n_estimators=200. Max_depth is the parameter that controls overfitting, in which the algorithm can learn more relationships with more depth. N_estimators is the number of decision trees that can be created to choose the correct label [
The evaluation parameter used in our study to explore the effectiveness of the proposed study was the ROC–AUC score, since it is the most used measure in binary classifications with two parameters: False Positive Rate (FPR) and True Positive Rate (TPR). This measure can tell whether the given algorithm is able to differentiate between different class labels [
Reference | Year | Techniques | Features used | Findings |
---|---|---|---|---|
Chen et al. [ |
2017 | DBN | All features | ROC – AUC = 0.7127 |
Huang et al. [ |
2019 | Bilinear | All features | |
Proposed study | 2020 | XGBoost | 19 features | ROC – AUC = 0.7640 |
Compared to Chen et al. [
The results showed that the XGBoost algorithm outperforms the remaining three algorithms which are Random Forest, Logistic Regression and KNN, While LR shows the worst performance among all the applied algorithms, see
Algorithm | Default |
ROC_AUC score/number of selected features | ROC_AUC score with optimization |
---|---|---|---|
XGBoost | 0.7587 | 0.75969/19 | 0.7640 |
Random Forest | 0.7443 | 0.74546/16 | 0.7544 |
KNN | 0.6939 | 0.6879/18 | 0.7172 |
LR | 0.6429 | 0.6428/9 | 0.6428 |
In our study, 80% of the records were used for training and 20% were used for testing, XGBoost outperformed all the other algorithms with 19 features and optimal values of max_depth=6 and n_estimators=200, see
C1, banner_pos, site_id, site_domain, site_category, app_id, app_domain, app_category, device_model, device_type, device_conn_type, C14, C15, C16, C17, C18, C19, C20, C21 |
Click Through Rate research is a very key topic in the business field. A lot of researches, investigations, and tests were conducted to decide the best approach to apply in our study. The Click Through Rate prediction was implemented by creating XGBoost, Random Forest, KNN and Logistic Regression classifiers of supervised learning. Results showed that the XGBoost model outperformed the other three models. Significant results were obtained from the other models too, with slight differences between the models themselves, depending on the evaluation metrics. To reach better results in future works, deep learning algorithms might be implemented on the same dataset, different feature engineering techniques, and other types of machine learning techniques with the goal to improve performance.
We want to thank Dr. Naya Nagy for proof reading the manuscript.