[BACK]
Computers, Materials & Continua
DOI:10.32604/cmc.2022.024764
images
Article

Feature Selection with Optimal Stacked Sparse Autoencoder for Data Mining

Manar Ahmed Hamza1,*, Siwar Ben Haj Hassine2, Ibrahim Abunadi3, Fahd N. Al-Wesabi2,4, Hadeel Alsolai5, Anwer Mustafa Hilal1, Ishfaq Yaseen1 and Abdelwahed Motwakel1

1Department of Computer and Self Development, Preparatory Year Deanship, Prince Sattam bin Abdulaziz University, Al-Kharj, 16278, Saudi Arabia
2Department of Computer Science, College of Science and Arts at Mahayil, King Khalid University, Muhayel Aseer, 62529, Saudi Arabia
3Department of Information Systems, Prince Sultan University, Riyadh, 11586, Saudi Arabia
4Faculty of Computer and IT, Sana'a University, Sana'a, 61101, Yemen
5Department of Information Systems, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, Saudi Arabia
*Corresponding Author: Manar Ahmed Hamza. Email: ma.hamza@psau.edu.sa
Received: 30 October 2021; Accepted: 05 January 2022

Abstract: Data mining in the educational field can be used to optimize the teaching and learning performance among the students. The recently developed machine learning (ML) and deep learning (DL) approaches can be utilized to mine the data effectively. This study proposes an Improved Sailfish Optimizer-based Feature Selection with Optimal Stacked Sparse Autoencoder (ISOFS-OSSAE) for data mining and pattern recognition in the educational sector. The proposed ISOFS-OSSAE model aims to mine the educational data and derive decisions based on the feature selection and classification process. Moreover, the ISOFS-OSSAE model involves the design of the ISOFS technique to choose an optimal subset of features. Moreover, the swallow swarm optimization (SSO) with the SSAE model is derived to perform the classification process. To showcase the enhanced outcomes of the ISOFS-OSSAE model, a wide range of experiments were taken place on a benchmark dataset from the University of California Irvine (UCI) Machine Learning Repository. The simulation results pointed out the improved classification performance of the ISOFS-OSSAE model over the recent state of art approaches interms of different performance measures.

Keywords: Data mining; pattern recognition; feature selection; data classification; SSAE model

1  Introduction

Data mining (DM) is an approach of extracting hidden prediction data from the larger database; also, it is one of the modern technologies having greater potential to assist institutions/Universities to focus on the fundamental data in their data warehouses [1]. The DM tool predicts behaviors and future trends, enabling institutions to make knowledge-driven, proactive decisions. The automatic, prospective analysis provided by the DM tool develops the analysis of historical events offered by retrospective tool typical of decision support system. The DM tool is capable of answering institution questions that usually take a lot of time to solve [2]. Nowadays, several industries utilize the DM tool to prepare marketing strategies and to make decisions towards target segmented customers for achieving their goals. However, the various university ignored practicing DM methods. The DM Application in the field of education is a new tendency in the globally competitive business. Understanding the DM application, terms, tasks, and techniques are the basis for the development of DM tools in educational sectors. Hence, it is necessary to examine the function of DM in educational sectors [3].

Educational Data Mining (EDM) is the application of the DM system on education information. EDM aims to study this information and solve education research problems. EDM acts toward the development of modern technology to examine the education information, and utilize DM tools for better understanding student learning environments [46]. The EDM method converts transform raw data coming from the education system into helpful data that might have a greater impact on education practice and research. The ever-growing technologies in the education system have made a great number of data available. EDM offers a clearer picture of learners and their learning processes [7] and provides a significant amount of relevant information. It employs the DM technique to examine education information and resolve education problems. Like other DM methodologies extraction process, EDM extract novel, interesting, interpretable, and useful data from the education system. But EDM is specially designed for using different kinds of information in the education system [8]. Then, this technique is utilized for enhancing the knowledge about the settings, educational phenomena, and students where they can learn [9]. The development of computational methods that integrate data and theory would assist to improve the quality of teaching and learning (T& L) procedures.

EDM research studies several fields, involving specific learning strategies from computer-adoptive testing (as well as testing at a larger scale), education software, computer-assisted collaborative learning, and the factors i.e., related to the non-retention/student failure in courses [10]. Several critical areas include the development of the student system; applications of the EDM method have been in examining pedagogical supports (in learning software, and other fields, namely collaborative learning behavior) and improving/determining the method of a domain knowledge structure. There has been growing interest in the field of an EDM system. This newly emerging field, named EDM, is a concern with the development of techniques that determine knowledge from data originating from the education environment. EDM system employs several methods, for example, K-nearest neighbor (NN), Decision Trees (DT), Naïve Bayes (NB), Neural Networks (NN), etc. analysis, and the Prediction of student performances is an important building block in education environments. academic performance of students is a critical factor in creating their future. Student Academic performance isn't a result of only one determining factor as well it depends heavily on socio-economic, personal, and psychological factors.

Serik et al. [11] suggest the incorporation of techniques like data analysis and artificial intelligence (AI) methods, with learning management systems to enhance learning. These goals are defined in novel normality which tries to find a strong education system, where particular activity can be performed in an online system, amongst technique allows the student to have a virtual assistant to assist them in their learning. In [12], a novel predictive method to evaluate student performances in academics has been designed based on clustering and classification systems and tested in a real-time manner with the student's database of several academic fields of high education institutions in Kerala, India. The results show that the hybrid method integrating classification and clustering methods yields outcomes i.e., much greater interms of attaining accuracy in prediction of student academic performances.

Francis et al. [13] aimed at finding fast, slow and average learners amongst students and displays it by prediction DM tool with classification-based models. Mhetre et al. [14] determine the student performances with various classification methods and discover the better one that produces optimum outcomes. Education Database is gathered from a Saudi University database. The database is preprocessed for filtering duplicate records; missing areas are recognized and filled with certain information. DL methods such as DNN and DM methods like random forest (RF), support vector machine (SVM), DT, and NB are applied to the dataset with Rapid Miner and Weka methods.

Aslam et al. [15] separate the significance of 2 blooming techniques, ML and Blockchain, in the education domain. Blockchain technique, with data immutability as its major benefits, was employed in the miscellaneous field for security factors. It is utilized for storing securely the achievement/degree certificates. This data will be included by the university/college to the blockchain technology, that is shared/accessed by the students via online resume with employers. This method is highly secured since there is no need to worry about the loss of data/modification to the institution. Shah et al. [16] gathered several samples of distinct kinds of student attributes with the survey forms which are related to academic performances. Then, select a few significant features with distinct feature extraction methods. Next, employed a few ML methods to that pre-processed dataset. Malhotra et al. [17] examine the educational quality that is tightened closely with the sustainable developmental objectives. The performance of the method has generated excessive information that should be suitably processed to attain useful data which is highly beneficial for future planning and development. Student grade and mark predictions from their historic academic data are useful and popular applications in the EDM system, hence it become a useful data provider that is employed in various forms to enhance the educational quality in the country. The classification method would forecast the grades whereas the regression method would forecast the mark, lastly, the outcomes are attained from both the models were investigated.

This study proposes an Improved Sailfish Optimizer-based Feature Selection with Optimal Stacked Sparse Autoencoder (ISOFS-OSSAE) for data mining and pattern recognition in the educational sector. The proposed ISOFS-OSSAE model aims to mine the educational data and derive decisions based on the feature selection and classification process. Moreover, the ISOFS-OSSAE model involves the design of the ISOFS technique to choose an optimal subset of features. Moreover, the swallow swarm optimization (SSO) with the SSAE model is derived to perform the classification process. To showcase the enhanced outcomes of the ISOFS-OSSAE model, a wide range of experiments were taken place on a benchmark dataset from the UCI repository.

2  The Proposed Model

In this work, a novel ISOFS-OSSAE technique is derived aims to mine the educational data and derive decisions based on the feature selection and classification process. The proposed ISOFS-OSSAE model involves the design of the ISOFS technique to choose an optimal subset of features. Followed by, the OSSAE model-based classifier is derived in which the parameter tuning of the SSAE model is done by the use of the SSO algorithm.

2.1 Design of ISOFS Technique

The SFO approach is depending upon the behavior of sailfishes. The sailfish population can be determined by a candidate solution in the SFO method [18]. The population in solution space is arbitrarily created. The location matrix of initiated sailfish has been demonstrated by:

XSF(t)=[Xi,j(t)]m×d=[X1,1(;)X1,2(;)X1,d(;)X2,1(t)X2,2(t)X2,d(t)Xm,1(t)Xm,2(t)Xm,d(t)](1)

In which m signifies the quantity of sailfish in sailfish population, d denotes the dimensional parameter, Xi,j(t) characterizes the value of jth variable of ith sailfish, t indicates the number of existing iterations. The matrix afterward the initiation of the sardine location has been demonstrated by:

Xs(t)=[Xi,j(t)]n×d=[X1,1(;)X1,2(;)X1,d(;)X2,1(;)X2,2(;)X2,d(;)Xn,1(;)Xn,2(;)Xn,d(;)](2)

whereas n indicates the number of sardines in the sardine population, Xi,j(t) means the value of jth parameter of ith sardine.

To estimate the quality of sardines and sailfish, the fitness of sardine and sailfish solution can be resolved and F represent fitness function as well as stored in matrix format. The fitness matrix of sailfish can be expressed as:

FSF=(t){Fl(;)F2(t)Fm(;)}={F(Xl,l(;)Xl,2(;)xl,d(;))F(X2,l(t)X2,2(t)x2,d(t))F(xm,l(t)Xm,2(t)xmd(;,))}(3)

Now, FSF(t) signifies the fitness matrix of sailfish, m implies the amount of sailfish, Fi(t) represent the fitness value of the ith sailfish. It can be formulated by:

Fs(t)={Fl(;)F2(;)Fn(t)}={FXl,l(;)Xli2(;)Xlid(;)Fx2,l(;)x2i2(;)x2id(;)FXn,l(t)Xni2(t)Xnid(t)}(4)

Here, FS(t) denotes the sardine fitness matrix, n signifies the quantity of sardines, and Fi(τ) indicates the fitness value of ith sardine. The hunting nature of sailfish alters its location based on the location of other sailfish near the sardines. The equation to update the location of sailfish is given in the following:

Xi(t)=Xelite(t1)λi(t1)×(rand×(Xelite(t1)Xinjured(t1)2)Xi(t1))(5)

where Xelite(t1) indicates the elite sailfish in the t-l iteration, Xm˙jured(t1) represent the sardine i.e., severely injured in t-l iteration, Xi(t1) shows the ith sailfish in the t-l iteration, Xi(t) refers to the ith sailfish in t iteration, rand denotes an arbitrary number within [0,1], λi indicates the upgrade coefficient:

{λi(t)=2×rand×D(t)D(t)D(t)=1(MSF(t)MSF(t)+MS(t))(6)

Let MSF(t) be the amount of sailfish in t iteration and Ms(t) signifies the amount of sardines in t iteration.

To prevent from sailfish attack, the sardine would consider the location of the elite sailfish and the attack power of sailfish in all the iterations and upgrade the location. The location upgrade equation of sardines is expressed as:

{Xi(;)=rand×(Xelite(t1)Xi(t1)+Q(t1))Q(t1)=A×(1(2×(t1)×ξ))(7)

whereas Xi(t1) indicates the ith sardine in t-l iteration, Xi(t) represent the ith sardine in t iteration, Q(t1) signifies the attack power of sailfish in t-l iteration, A & ξ denotes attack power coefficient, within 4 and 0.001, t indicates the number of present iterations, the amount of upgrade sardines γ and the amount of parameter η based on the attack power of the sailfish can be represented by:

{γ(t)=MS(t)×Q(t)η(t)=di(t)×Q(t)(8)

Now Ms(t) signifies the quantity of sardines in = t iteration and di(t) characterizes the amount of ith sardine parameter in the t iteration.

Once the fitness of sardine is minimum when compared to the sailfish, the location of the captured sardine is substituted with the location of the sailfish:

Xi(t)=Xi(t)ifF(Xi(t))>F(Xi(t))(9)

Let Xi(t) be the location of sailfish in t iteration and Xi(t) represent the location of sardine in t iteration.

The u weight inertia is presented into the alternate attack and pursuit procedure of sailfish [19], and the local searching capacity of sardines and sailfish is improved using weight inertia u. The location upgrade equation can be given below:

{Xi(t)=rand×(u(t1)×Xelite(t1)Xi(t1)+Q(t1))Xi(t)=u(t1)×Xelite(t1)λi(t1)×(rand×(Xelite(t1)Xinjured(t1)2)Xi(t1))(10)

In the equation, rand denotes an arbitrary number within [0,1] The weight inertia u can be expressed by:

u(t)=u  max  +(u  max  u  min  )×  exp  (25×(tT))3(11)

Now u  min   & u  max   represents the lower and upper bounds of weight inertia u.u  min  =0.4,u  max  =0.9.t as well as T represents the present number of iterations and the maximal amount of iterations. The steps involved in the SFO algorithm are given in Fig. 1.

images

Figure 1: Steps involved in SFO algorithm

The ISOFS technique is applied to discover the feature area effectively and generate a proper set of features. The feature selection can be considered as a multiobjective optimization problem as it requires fulfilling distinct aims to get optimum solutions that reduce the features and improve the classification performance. Therefore, a fitness function (FF) is derived to obtain solutions for attaining a balance amongst two objectives as defined below:

fitness  =αΔR(D)+β|Y||T|(12)

ΔR(D) denotes the error level of the classifier, |Y| represents the number of features, and |T| denotes the total feature count that exists in the dataset. Here, α indicates a parameter [0,  1] related to the weight of error rate of classifier, and β=1α characterizes the standing of reduced features.

2.2 Design of OSSAE Based Classification Model

The reduced feature subsets are fed into the OSSAE model to carry out the classification process. The auto encoder (AE) approach is an asymmetrical NN that extracts the features with minimal recreation error. Li et al. [20] projected a perspective, termed ‘pre-train, that separates a complex network to stack sub-networks. The training failure could be prevented since the network parameter of all the layers are allocated certain values, instead of arbitrary initiation. Nevertheless, the stacked sub-network provides generalization capacity and low training efficacy because of the easiness of the single-hidden neuron and the complexity in the selection of variables. To resolve these above-mentioned problems, the SSAE method is presented based on 2-phase networks with 5 hidden layers in all the networks. The overall framework of the SSAE method is demonstrated in Fig. 2. Initially, the X input layer is mapped to a h3 hidden nueron (termed low-level hidden layer). Next, h3 is mapped back to the recreated layer, Xrec. Then, the h3 hidden feature is converted into the input parameter for acquiring h6 (high-level hidden layer) [21].

images

Figure 2: Structure of SSAE

In the SSAE, the feature learning method follows a sequence of processes, like convolution, denoising, pooling, activation, and batch normalization (BN). A summary of this operation can be thoroughly explained in the following: Denoising to attain the strong and illustrative learning feature of the flame image, a denoising AE learning method is utilized to add distinct noises with the input signal. The white Gaussian noises are taken into account, such as the corrupted version Xn is attained by a fixed corruption ratio to the input X:

Ixn=IX+φς(13)

A rectified linear unit (ReLU) is utilized as an activation function of the hidden neuronγ, determined by:

y(γ)=  max  (0,  γ)(14)

The ReLU is an unsaturated piecewise linear function, i.e., quicker when compared to the saturated non-linear functions, like TanH and Sigmoid. Particularly, the Sigmoid function is employed in the 3rd decoders for ensuring the intensity range of the recreated layer Xrec is reliable with the input layer X, as:

y(γ)=11+eγ(15)

Pooling and upsampling: The pooling function is performed for reducing the parameter of the network. Here, P(r×r+t) represents the pooling function which condenses the feature map through choosing a maximal value with a step of t and a r×r conversion kernel. The pooling function helps improve the translation invariance as:

δi,jk=max0np{γir+t,jr+tk}(16)

In which γi,jk & δi,jk represents the value of position (i,  j) in the kth feature map of outputs and inputs [22]. Consider sij(i(1,  E),  j(1,  F)) signifies the activation of hidden layer j, in which E denotes the number of images in the trained data set and F denotes the number of layers in the hidden neuron. Next, the average activation of all the hidden neurons j are evaluated as:

pj=1Ei=1Esij(17)

whereas pj projected to be closer to 0, which implies the neuron of the hidden neuron is generally inactive”. A  Ppenalty the penalty term is added to the loss function, that penalize pj once it significantly deviates from the ptarget  sparse target. The Ppenalty penalty term can be determined by:

Ppenalty=j=1FKL(ptarget||pj)(18)

Let KL(ptarget||pj) be the KullbackLeibler divergence (KL divergence) that performs as a sparsity limitation:

KL(ptarget||pj)=ptargetlogptargetpj+(1ptarget)log1ptarget1pj(19)

When ptarget=pj,KL(ptarget||pj)=0. Or else, the KL(ptarget||pj) monotonically increased as pj deviate from ptarget.

For optimally adjusting the parameters involved in the SSAE model, the SSO algorithm is utilized.

SSO approach was inspired by the collaboration of swallow and the interface between flock members has accomplished optimal results [22,23]. This method was proposed a meta-heuristic model based on unique features of swallow comprised of intelligent social relation, fast flight, and hunting skill. Now, this approach is similar to the particle swarm optimization (PSO) method however it could be particular characteristics that couldn't be initiated in common methods which contain: Leader Particles (li), Explorer Particles (ej), and Aimless Particles (oi), which has certain responsibilities in the group. The particle ej is accountable for searching the problem space. It performed these exploring performances in the control of the number of variables. It can be expressed by

VHLi+1=VHLj+αHLrand()(ebestej)+βHLrand()(HLjej)(20)

Eq. (20) exhibits the velocity vector parameter in the path of the global leader.

αHL={if  (ej=0||ebest=0)>1.5(21)

Eqs. (21)(12) evaluates the acceleration coefficient parameter (αHL) which straightforwardly affects individual knowledge of each particle.

αHL={if  (ei<ebest)&&(<HLi)  rand()eieiebest  ei,  ebest0if  (ei<ebest)&&(ei>HLi)  2rand()ei1(2ei)  ei0if  (ei>ebest)  ebest1(2rand())(22)

βHL={if  (ej=0||ebest=0)>1.5(23)

βHL={if  (ei<ebest)&&(ei<HLi)  rand()eieiHLi  ei,  HLi0if  (ei<ebest)&&(ei>HLi)  2rand()ei1(2ei)  ei0if  (ei>ebest)  HLi1(2rand())(24)

Eqs. (23) and (24) calculates the acceleration coefficient parameter (βHL) which straightforwardly affects the integrated knowledge of each particle. Both acceleration coefficients are quantified allows placing each particle with global leaders and optimal individual knowledge. The oi particle utilizes the succeeding Eq. (25) for arbitrary motions:

oi+1  =oi+[rand({1,1})rand(mins,    maxs)1+rand(  )](25)

In the SSO method, there are 2 types of leaders: global and local leaders. The particle is separated into groups. The particle in each group is often similar. Next, an optimal particles in each group is chosen and termed as a local leader. After, an optimal particle amongst the local leaders is selected and termed as a global leader. The particle modification converges and way depends on the place of this article.

3  Results and Discussion

The experimental results analysis of the ISOFS-OSSAE technique takes place using the benchmark dataset from the UCI repository [24]. It includes a total of 649 samples with 33 attributes and 2 class labels. The results are investigated under varying dimensions.

Tab. 1 and Fig. 3 demonstrate the best cost and number of chosen features offered by the ISOFS with other FS methods. The results portrayed that the Information gain and CFSSubsetEval techniques have attained poor performance with the maximum best cost of 0.386920 and 0.366000 respectively. Followed by, the genetic algorithm (GA) and PSO models have obtained moderate best cost of 0.165283 and 0.183638 respectively. In line with this, the ant colony optimization (ACO) algorithm has accomplished near-optimal best cost of 0.030509. However, the proposed ISOFS technique has accomplished superior outcomes with the least good cost of 0.02156.

images

images

Figure 3: Best cost analysis of ISOFS with other techniques

Fig. 4 portrays the confusion matrices generated by the ISOFS-OSSAE technique under five test runs. On test run-1, the ISOFS-OSSAE technique has identified 543 instances into class 0 and 94 instances into class 1. Besides, on test run-2, the ISOFS-OSSAE technique has identified 541 instances into class 0 and 91 instances into class 1. Moreover, on test run-3, the ISOFS-OSSAE technique has identified 542 instances into class 0 and 90 instances into class 1. Likewise, on test run-4, the ISOFS-OSSAE technique has identified 543 instances into class 0 and 94 instances into class 1. At last, on test run-5, the ISOFS-OSSAE technique has identified 541 instances into class 0 and 93 instances into class 1. The values present in the confusion matrix are transformed in terms of TP, TN, FP, and FN in Tab. 2.

images

Figure 4: Confusion matrices of ISOFS-OSSAE technique under five test runs

images

Tab. 3 provides a detailed overall classification results analysis of the ISOFS-OSSAE technique under five test runs. The table values denoted that the ISOFS-OSSAE technique has gained effective classifier results. For instance, with run-1, the ISOFS-OSSAE technique has attained precn,recal,Fmeasure,accy, and kappaof 0.9891, 0.9891, 0.9891, 0.9815, and 0.8258 respectively. In line with, with run-2, the ISOFS-OSSAE technique has offered precn,recal,Fmeasure,accy, and kappaof 0.9836, 0.9854, 0.9845, 0.9738, and 0.8122 respectively. Followed by, with run-3, the ISOFS-OSSAE technique has achieved precn,recal,Fmeasure,accy, and kappaof 0.9819, 0.9872, 0.9846, 0.9738, and 0.8223. Along with that, with run-4, the ISOFS-OSSAE technique has gained precn,recal,Fmeasure,accy, and kappaof 0.9891, 0.9891, 0.9891, 0.9815, and 0.8258 respectively. Lastly, with run-5, the ISOFS-OSSAE technique has attained precn,recal,Fmeasure,accy, and kappaof 0.9891, 0.9891, 0.9891, 0.9815, and 0.8258 respectively.

images

Fig. 5 demonstrates the analysis of the average results of the ISOFS-OSSAE technique on the test dataset. The figure reported that the ISOFS-OSSAE technique has resulted in maximum average precn,recal,Fmeasure,accy, and kappa of 0.9862, 0.9872, 0.9867, 0.9775, and 0.8207.

images

Figure 5: Average classification results analysis of ISOFS-OSSAE technique

Tab. 4 offers a comprehensive comparative analysis of the ISOFS-OSSAE with recent techniques [25]. Fig. 6 demonstrates the precn, recal, and Fmeasure analysis of the ISOFS-OSSAE technique on the test dataset. The figure reported that the random tree (RT) technique has accomplished least performance with the minimal values of precn, recal, and Fmeasure. Followed by, the radial basis function network (RBFN) technique has attained slightly increased values of precn, recal, and Fmeasure. Next to that, the multilayer perceptron (MLP) model has accomplished moderate values of precn, recal, and Fmeasure. In line with, the ACO with logistic regression (ACO-LR), LR, RF, and DT models have tried to obtain reasonable values of precn, recal, and Fmeasure. However, the proposed ISOFS-OSSAE technique has resulted in maximum classification performance with the precn, recal, and Fmeasure of 0.986, 0.987, and 0.987.

images

images

Figure 6: Comparative precn, recal, and Fmeasure analysis of ISOFS-OSSAE technique

Fig. 7 showcases the accy analysis of the ISOFS-OSSAE technique with recent approaches. The figure reported that the RBFN, MLP, and RT models have attained lower accy of 0.895, 0.897, and 0.877 respectively. Followed by, the DT and RF models have reached slightly improved accy of 0.914 and 0.926 respectively. Eventually, the LR and ACO-LR models have achieved reasonable accy of 0.948 and 0.949 respectively. However, the proposed ISOFS-OSSAE technique has resulted in maximum outcome with the accy of 0.978.

images

Figure 7: Comparative accy analysis of ISOFS-OSSAE technique

Fig. 8 displays the kappa analysis of the ISOFS-OSSAE technique with recent approaches. The figure stated that the RBFN, MLP, and RT models have reached inferior kappa of 0.614, 0.592, and 0.542 respectively. Also, the DT and RF models have got slightly enhanced kappa of 0.658 and 0.685 respectively. Ultimately, the LR and ACO-LR models have realized sensible kappa of 0.792 and 0.769 respectively. However, the proposed ISOFS-OSSAE technique has resulted in supreme outcome with the accy of 0.821. From the aforementioned figures and tables, it is ensured that the ISOFS-OSSAE technique is an effective tool for data mining and pattern recognition.

images

Figure 8: Comparative kappa analysis of ISOFS-OSSAE technique

4  Conclusion

In this study, a novel ISOFS-OSSAE technique is derived aims to mine the educational data and derive decisions based on the feature selection and classification process. The proposed ISOFS-OSSAE model involves the design of the ISOFS technique to choose an optimal subset of features. Followed by, the OSSAE model-based classifier is derived in which the parameter tuning of the SSAE model is done by the use of the SSO algorithm. To showcase the enhanced outcomes of the ISOFS-OSSAE model, a wide range of experiments were taken place on a benchmark dataset from the UCI repository. The simulation results pointed out the improved classification performance of the ISOFS-OSSAE model over the recent state of art approaches with the higher accuracy of 0.978. Therefore, the ISOFS-OSSAE model can be utilized as an effective tool to mine data and recognize patterns. In the future, the classification performance can be improved by the utilization of clustering approaches.

Funding Statement: The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work under grant number (RGP 1/279/42). https://www.kku.edu.sa. The authors would like to acknowledge the support of Prince Sultan University for paying the Article Processing Charges (APC) of this publication.

Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.

References

 1.  R. J. A. Cabrera, C. A. P. Legaspi, E. J. G. Papa, R. D. Samonte and D. D. Acula, “HeMatic: An automated leukemia detector with separation of overlapping blood cells through image processing and genetic algorithm,” in 2017 Int. Conf. on Applied System Innovation, ICASI 2017. Proc.: IEEE, Sapporo, Japan, pp. 985–987, 2017. [Google Scholar]

 2.  M. Kumar, A. J. Singh and D. Handa, “Literature survey on student's performance prediction in education using data mining techniques,” International Journal of Education and Management Engineering (IJEME), vol. 7, no. 6, pp. 40–49, 2017. [Google Scholar]

 3.  C. Romero and S. Ventura, “Data mining in education: Data mining in education,” WIREs Data Mining and Knowledge Discovery, vol. 3, no. 1, pp. 12–27, 2013. [Google Scholar]

 4.  Y. V. Paredes, R. F. Siegle, I. H. Hsiao and S. D. Craig, “Educational data mining and learning analytics for improving online learning environments,” in Proc. of the Human Factors and Ergonomics Society Annual Meeting, North America, vol. 64, no. 1, pp. 500–504, 2020. [Google Scholar]

 5.  N. Padhy, “The survey of data mining applications and feature scope,” International Journal of Computer Science, Engineering and Information Technology, vol. 2, no. 3, pp. 43–58, 2012. [Google Scholar]

 6.  I. Almuniri and A. M. Said, “School's performance evaluation based on data mining,” International Journal of Engineering and Information Systems, vol. 1, no. 9, pp. 56–62, 2017. [Google Scholar]

 7.  C. Jalota and R. Agrawal, “Analysis of educational data mining using classification,” in 2019 Int. Conf. on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, pp. 243–247, 2019. [Google Scholar]

 8.  T. S. Kumar, “Data mining based marketing decision support system using hybrid machine learning algorithm,” Journal of Artificial Intelligence and Capsule Networks, vol. 2, no. 3, pp. 185–193, 2020. [Google Scholar]

 9.  S. Saeed, A. Shaikh, M. A. Memon and S. M. R. Naqvi, “Impact of data mining techniques to analyze health care data,” Journal of Medical Imaging and Health Informatics, vol. 8, no. 4, pp. 682–690, 2018. [Google Scholar]

10. S. Dwivedi and V. S. K. Roshni, “Recommender system for big data in education,” in 2017 5th National Conf. on E-Learning & E-Learning Technologies (ELELTECH), Hyderabad, India, pp. 1–4,2017. [Google Scholar]

11. M. Serik, G. Nurbekova and J. Kultan, “Big data technology in education,” Bulletin of the Karaganda University Pedagogy Series, vol. 100, no. 4, pp. 8–15, 2020. [Google Scholar]

12. W. Villegas-Ch, M. R. Cañizares and X. P. Pacheco, “Improvement of an online education model with the integration of machine learning and data analysis in an lms,” Applied Sciences, vol. 10, no. 15, pp. 5371, 2020. [Google Scholar]

13. B. K. Francis and S. S. Babu, “Predicting academic performance of students using a hybrid data mining approach,” Journal of Medical Systems, vol. 43, no. 6, pp. 162, 2019. [Google Scholar]

14. V. Mhetre and M. Nagar, “Classification based data mining algorithms to predict slow, average and fast learners in educational system using WEKA,” in 2017 Int. Conf. on Computing Methodologies and Communication (ICCMC), Erode, India, pp. 475–479, 2017. [Google Scholar]

15. N. M. Aslam, I. U. Khan, L. H. Alamri and R. S. Almuslim, “An improved early student's academic performance prediction using deep learning,” International Journal of Emerging Technologies in Learning, vol. 16, no. 12, pp. 108, 2021. [Google Scholar]

16. D. Shah, D. Patel, J. Adesara, P. Hingu and M. Shah, “Exploiting the capabilities of blockchain and machine learning in education,” Augmented Human Research, vol. 6, no. 1, pp. 1, 2021. [Google Scholar]

17. R. Malhotra and M. Khanna, “Mining the impact of object oriented metrics for change prediction using machine learning and search-based techniques,” in 2015 Int. Conf. on Advances in Computing, Communications and Informatics (ICACCI), Kochi, India, pp. 228–234, 2015. [Google Scholar]

18. B. K. Yousafzai, M. Hayat and S. Afzal, “Application of machine learning and data mining in predicting the performance of intermediate and secondary education level student,” Education and Information Technologies, vol. 25, no. 6, pp. 4677–4697, 2020. [Google Scholar]

19. S. Shadravan, H. R. Naji and V. K. Bardsiri, “The sailfish optimizer: A novel nature-inspired metaheuristic algorithm for solving constrained engineering optimization problems,” Engineering Applications of Artificial Intelligence, vol. 80, pp. 20–34, 2019. [Google Scholar]

20. L. L. Li, Q. Shen, M. L. Tseng and S. Luo, “Power system hybrid dynamic economic emission dispatch with wind energy based on improved sailfish algorithm,” Journal of Cleaner Production, vol. 316, pp. 128318, Sep. 2021. [Google Scholar]

21. G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006. [Google Scholar]

22. Z. Han, M. M. Hossain, Y. Wang, J. Li and C. Xu, “Combustion stability monitoring through flame imaging and stacked sparse autoencoder based deep neural network,” Applied Energy, vol. 259, pp. 114159, 2020. [Google Scholar]

23. M. Neshat, G. Sepidnam and M. Sargolzaei, “Swallow swarm optimization algorithm: A new method to optimization,” Neural Computing and Applications, vol. 23, no. 2, pp. 429–454, 2013. [Google Scholar]

24. UCI Machine Learning Repository: Student Performance Data Set, 2019. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/student+performance. [Google Scholar]

25. A. M. Mesleh and G. Kanaan, “Support vector machine text classification system: Using ant colony optimization based feature subset selection,” in 2008 Int. Conf. on Computer Engineering & Systems, Cairo, Egypt, pp. 143–148, 2008. [Google Scholar]

images This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.