Artificial Intelligence Based Optimal Functional Link Neural Network for Financial Data Science

: In present digital era, data science techniques exploit artificial intelligence (AI) techniques who start and run small and medium-sized enterprises (SMEs) to have an impact and develop their businesses. Data science integrates the conventions of econometrics with the technological elements of data science. It make use of machine learning (ML), predictive and prescriptive analytics to effectively understand financial data and solve related problems. Smart technologies for SMEs enable allows the firm to get smarter with their processes and offers efficient operations. At the same time, it is needed to develop an effective tool which can assist small to medium sized enterprises to forecast business failure as well as financial crisis. AI becomes a familiar tool for several businesses due to the fact that it concentrates on the design of intelligent decision making tools to solve particular real time problems. With this motivation, this paper presents a new AI based optimal functional link neural network (FLNN) based financial crisis prediction (FCP) model for SMEs. The proposed model involves preprocessing, feature accuracy of 98.830%, 92.100%, and 95.220% on the applied Polish dataset Year I-III respectively.


Introduction
Financial data science is the application of data science approaches to solve the problems related finance. The data science encompasses skills from computer science, mathematics, statistics, information visualization, graphic design, complex systems, communication and business. It is generally based on scientific techniques and methods in extracting useful patterns from both structured and unstructured data. It generally includes predictive modeling, clustering, data wrangling, visualization and dimensionality reduction. On the other hand, small and medium sized enterprises (SMEs) are a major portion of the global economy [1,2]. In China, SMEs are the basis of economic development and turned into a significant factor for social growth [3]. The Chinese management has executed several rules for supporting the development of private businesses, with increased credit and reduction in fees & taxes. Nonetheless, prediction of financial status is considered a crucial problem for SMEs. For huge enterprises, certified audited financial statement is utilized for evaluating the credit threat and assist financial decision making like permitting loans [4].
For SMEs, because of the absence of reliable data and other aspects, assessing credit risk is complex and expensive for the banks. Although bank utilizes relationships long to collect soft data on time for handling credit data scarcity, SME frequently faces higher cost while accessing finance as a result of data opacity and higher bankruptcy threat [5]. Furthermore, related to huge companies, SME is highly susceptible to the modifications of outside platform [6]. Because of the large effect of business crisis on the economy and society (global debt), at present, it has been a stimulating concept of research for precise analysis of the significances of bankruptcy and finds a way to evade it. For reducing the impact of this crisis, businesses might employ for economic help or funds from financial institutions, when decision maker in the financial system tries to find these businesses which are feasible to state bankruptcy in the upcoming years. Hence, business crises or bankruptcy predictive aim is for assessing the financial health and upcoming efficiency of a business. Fig. 1 shows the combination of Artificial Intelligence (AI) and IoT from SMEs [7].
Financial crisis prediction (FCP) for SMEs contains major importance in making financial decisions. A company state of minimum/maximum organization is involved with shareholders, local community, and organizational candidates, but, it causes the financial global economy and policymakers. Henceforth, maximum social and financial expenses through business bankrupt have been stimulated with several developers for understanding better bankruptcy reason and lastly, finding the distress of business. Economical centres and organizations have shown high consideration to predict the economical error of a firm. FCP is the most substantial application which assists an economical firm to make an optimum decision. It is due to the inferior decision acquired from the whole organizations that result in economic problem/bankruptcy and influence the clients, shareholders, vendors, etc. The current growth in Information Technology (IT) initiates to accomplish different data are depending upon the threat levels of a firm in many conditions. During the estimation of high data, several clients are based on analyst decisions. But, the factor that affects the efficiency analyses.

Role of AI and Feature Selection Techniques in FCP
AI and Statistical methods have been employed to find the essential factors of FCP. Now, AI modules are utilized in different categories [8]. It is utilized to construct the approaches to validate the methods either the economical centre suffers from many challenges/not. The main concept of this module is that economical features are extracted from general financial statements such as financial proportion captures high data about organization state which is appropriate for FCP [9]. It is referred to as a complex operation that examines the related economic data and other data's from the organization's strategic competency to process data for emerging the productive module. Moreover, the AI and database modules, Data Mining (DM) approaches are utilized in different areas.
On the other hand, Feature Selection (FS) is determined as a significant preprocessing stage in DM. It is mostly proposed to filter the repeating and irregular features from actual data. Now, it is recognized that various mathematical models and estimations are employed to handle the FCP. Depending upon the evaluation state, FS modules are categorized as filter relied upon, wrapper, and embedded. In spite of utilizing the wrapper methods, it experiences several challenges such as learner constraints, maximum processing complexity, etc. Incorporated modules contain difficulty compared to wrapper modules as FS subsets based on learning modules. Because of the presence of limitations, it was employed by filter modules. It is employed to compute the feature subset by permanent values rather than selected features and learners. The problem of detecting an optimal feature from the accessible features is called as NP hard problem [10]. Various techniques are utilized to find the partial solution with constrained period. Some of the ML methods such as ACO and GA are utilized to select desirable features, and it isn't appropriate for the business sectors, particularly FCP.

Paper's Contribution
This paper presents a new AI based optimal functional link neural network (FLNN) based FCP model for SMEs. The proposed model involves preprocessing, feature selection, classification, and parameter tuning. Primarily, the financial data of the SMEs are gathered and then preprocessed to improve the data quality. In addition, a novel chaotic grasshopper optimization algorithm (CGOA) based feature selection technique is applied for the optimal selection of features. Besides, functional link neural network (FLNN) model is employed for the classification of the feature reduced data. At the same time, the efficiency of the FLNN model can be improvised by the use of cat swarm optimizer (CSO) algorithm. A detailed experimental validation process takes place on Polish dataset to ensure the performance of the presented model. In short, the paper's contributions can be summarized as follows.
• Propose a CGOA-FLNN-CSO technique to predict the financial status of SMEs. The CGOA-FLNN-CSO technique integrates preprocessing, CGOA based feature selection, FLNN based classification, and CSO based parameter tuning. • Designs a new CGOA technique to select optimal set of features to reduce the computational complexity and enhance the FCP performance. • Derive a new FLNN-CSO technique for the classification process, which includes an optimal hyperparameter process using CSO algorithm to boost the predictive performance. • Validate the predictive performance of the proposed technique on Polish dataset and compares the results with the recent state of art methods.

Paper's Organization
The upcoming sections of the paper are planned as follows. Section 2 briefs the existing works related to the FCP. Followed Section 3 elaborates the proposed model and Section 4 offers detailed performance validation. Lastly, Section 5 concludes the study.

Prior FCP Models for SMEs
This section offers a comprehensive review of FCP models that existed in the literature. Gregova et al. [11] designed LR, RF, NN modules for identifying the higher prediction accuracy of the financial distress if it comes from business enterprises working in the particular Slovak platform. The result indicates that entire module demonstrates higher discrimination accuracy and equivalent efficacy; NN method generated optimal outcomes estimated by the whole efficiency features. The purpose of this research is to experimentally examine the determinant of financial distress amongst SMEs in the global financial and post crises period [12]. Various statistical approaches like multiple binary LR, have been utilized for analyzing a longitudinal cross sectional panel dataset of 3,865 Swedish SMEs functioning in 5 businesses in 2008 2015 period.
In Lu et al. [13], the financial crises of the enterprise in t period is forecasted with the financial index data of non-financial industry. Later they utilized the combined ML methods for selecting the financial crises enterprise, and the challenge of failure of classifier in unbalanced instance is resolved by sampling and bagging methods. Shang et al. [14] select multiple financial indicators depending upon big data mining in IoT. The rules among the entire financial indicators have been created for selecting descriptive financial risk indicators. Later the common fuzzy selection set was defined using parallel mining, FCM, and parallel rules, therefore attaining the fuzzy association rule fulfills the minimal fuzzy reliability. Metawa et al. [15] presented a novel optimum FS based classification module for FCP. The presented FCP technique includes preprocessing, classification, and FS. Firstly, the financial data of the enterprise is gathered using IoT devices like laptops and smartphones. Later, the PIO based FS method is employed for selecting an optimum set of features. Then, the XGB based classification optimized using JO method named JO XGB is utilized for classifying the financial data. In Ptak-Chmielewska [16], compared the efficiency of LDA and SVM predictions. An instance of SME was utilized in the empirical analyses, financial ratios have been used and non-financial aspects have been considered.
Perboli et al. [17] focuses on mid and long term bankruptcy predictions (up to months) targeting small or medium enterprises. The main involvement of this research is a significant development of the predictive accuracy in the short term (twelve months) by ML methods, related with the advanced when creating precise mid and long term predictions. Malakauskas et al. [18] utilized data on 12.000 SMEs for estimating binomial classifications for financial distress prediction by LR, ANN, and RF methods. Traditional financial ratios have been utilized for estimating the early single period predictor that was improved using age factor, time, and credit history for retrieving multi period modules. Luo et al. [19], proposed a new predictive architecture that utilizes external public credit data. This method could be gathered from publicly accessible websites. Records on 15,605 instance companies have been gathered from around 300,000 businesses. Amongst them, 8183 have defaulted. The empirical data have been employed for constructing predictive modules by LR, Light GBM, and CART module.

Design of CGOA Based FS Technique
GOA is initially projected in [20] as the novel nature inspired and population based methods that implement the behavior of grasshopper swarms in nature. The two important stages of optimization contain exploitation & exploration of the search space; the grasshopper involves two stages in the food search via this social interaction. The major features of the swarm in the larval phase are slow motion and smaller phases of the grasshopper. However, longer range and abrupt motion is the important feature of the swarm in adulthood.
Depending upon the aforementioned descriptions of grasshopper, three evolution operators have existed in the location upgrading of individual in swarms, the social interactions operator (Si), the gravity force operator (Gi), and the wind advection operator (Ai), as shown in Eq. (1).
whereas X i denotes location of the i th grasshopper. All behaviors are numerically showed as follows. The interaction operator is estimated by Eq. (2). (2) where N denotes number of grasshoppers in the swarm, d ij indicates distance among the i th and the j th grasshopper, S represents function which determines the strength of social forces and estimated by Eq. (3).
whereas f and l denote 2 constants of the intensity of attraction and attractive length scale, and r indicates real value.
But, the gravity operator isn't deliberated and they consider that the wind direction is often on the way to target. Later the Eq. (1) turn into: where ub d denotes upper bound in d th dimension, lb d indicates lower bound in the d th dimension. T d represents value of the d th dimension in the target (optimal solution established until now), and the coefficient c decreases the comfortable region related to the number of iterations and is estimated by Eq. (5).
where C max denotes maximal value, C min indicates minimal value, l represents present iteration, and L denotes maximal number of iterations. In [21], they use C max = 1 and C min = 0.00001. Eq. (4) displays following location of a grasshopper are determined according to its present location and the location of whole grasshoppers (initial term in Eq. (4)) and the location of the target (next term).
The generation of early population in the search space plays a major part in GOA. From the stated survey, they have examined that several chaoses based GOA method is investigated for resolving global optimization problems. In this study, they presented chaotic initiation of maps in the GOA optimization procedure for accelerating its global convergence speed. The chaotic map is utilized for balancing efficient exploitation & exploration and decrease repulsion or attraction forces among grasshoppers in the optimization procedure [22]. The application of chaotic series rather than arbitrary series in GOA is certainly an essential approach. Henceforth, it could execute downright search at high speed compared to stochastic search which is depending mainly on likelihoods. Some of the functions (chaotic maps) and variables (early condition) are essential for longer series. Additionally, a large number of different series could be generated with differing their early condition. Moreover, this chaotic sequence is reproducible and deterministic. The logistic maps are the most optimum chaos based method that scientists have focused on global search. It is defined by Eq. (6).
In this process, huge number of manifold periodic components would be placed in the thinner μ intervals as it raises. This phenomenon is really without limitations. However, it contains a constrained value at μ t = 3.60. Consider if the methods are the period μ t could be infinite or non periodic μ t . For the moment, the entire structures evolve to a chaotic condition. But, if μ is large compared to four, the entire scheme turns unstable. Thus, the interval [μ; 4] is generally measured with the chaotic region of the entire scheme. More apparently, when a present quantity of chaotic generation is performed, the chaotic parameters would be generated consequently. Then, by re-mapping this parameter in the optimization space, the initial parameters would be produced for the early optimization problem. The logistic map generates a uniform distribution series and prevents it from being immersed in smaller periodic cycle efficiently.
An important factor of CGOA-FS method is assessing the quality of the chosen subset. As the presented CGOA-FS method is a wrapper based technique, then a learning method (viz., classification) must be included in the evaluation procedure. In this study, an FLNN classifier is used as an evaluator, and the classifier accuracy of the chosen features is included in the presented fitness function. If the chosen features in a subset are appropriate, the attained classifier accuracy would be improved. Having a higher classifier accuracy is the main aim of the CGOA-FS technique. Another significant aim is decreasing the number of chosen features. When the number of features chosen is minimum, then improved solution can be obtained. In the presented CGOA-FS technique, these two contradictory aims are considered. Eq. (7) displays the FF which assumes classification accuracy and the number of chosen features while calculating a feature subset of the entire methods [23]. As the number of chosen features is to be minimized, they utilized classification error rate (the accompaniment of the classification accuracy).
where γ R (D) denotes classification error rate of the well-known classifier, |R| indicates number of chosen feature, |N| represents number of features, αand β indicates 2 variables for reflecting the part of classification rate and length of subset, and α ∈ [0, 1] and β = (1 − α) are accepted.

Structure of FLNN
The ANN is a network of connected components which is stimulated using the study of biological nervous system. It is an effort to implement a machine which works in the same way as the human brain system. This artificial machine mimics the functions such as biological neurons are named as nodes/neurons. These nodes/units are the components of the NN. This method executes difficult problems in a nonlinear platform. It can classify patterns by arranging the patterns to one group/other. Several variations of ANN amongst MLP is the best known kind. It consists of hidden, input, and output layers that are interconnected among each other via weight values. The hidden layer could be more than one layer. Many hidden layers aids in higher order statistics for a huge number of inputs. The higher order FFNN is a single layer module of ANN with consumed input capability. It has summing and product units. The modern unit aids in better estimate capability for classification accuracy. The HONN executes nonlinear mapping by tuning single layer of weight. ANN such as MLP could not conquer the problem of slow learning, particularly while handling high complex nonlinear issues. This makes the requirement to HONN.
FLNN is the subdivision of FFNN without hidden layer. It carries nonlinearity to its input structure with the expansion unit [24]. The extended input to the network decreased the computation cost, it enhanced the approximation efficiency as related to BP. Different from MLP the FLNN module could be trained faster without compromising computation performance. While the inherited features of invariance drive the FLANN for selecting the desirable signals that produce optimal systems identification. Fig. 3 shows the second order FLNN architecture that has x 1 , x 2 , x 3 inputs and with few higher order integrations.
W o denotes Tunable threshold and σ represents Nonlinear transfer function.

Algorithmic Design of CSO-FLNN Model
In order to properly tune the parameters of the FLNN model, CSO algorithm is applied to it and thereby enhances the predictive performance. The CSO technique is simulated as resting and tracing performances of cats. The cats appear that exists lazy and spend one of their time resting. Therefore, it is continually monitoring the surroundings cleverly and purposely and if it can be obvious a target, it starts moving to it rapidly. So, the CSO technique is exhibited depends on composing of 2 important behaviors of cats. The CSO technique has 2 modes such as tracing and seeking modes. All cats signify the solution set that has their individual place, fitness value, and flag. The place is composed of M dimensional from the search space, and all dimensional is their individual velocity; the fitness value shows a fit the solution set (cat) is; lastly, the flag is for classifying the cats as to either seek or trace mode. Hence, it initially identifies their several cats are involved in the iterations and runs them with the technique [25]. An optimum cat from all iterations is stored as to memory, and the one at last iteration is signified as the last solutions.
This technique proceeds the subsequent steps for searching for better solutions: (1) To identify the upper and lower bounds to the solution sets.
(2) Arbitrarily create N cats (solution set) and spread them from the M dimension space where all cats are arbitrary velocity value not superior to pre-determined maximal velocity value. (3) Arbitrarily classified the cats as seeking and tracing modes based on MR. The MR was mixture ratio that is selected from interval of [0, 1]. Therefore, for instance, when the count of cats N is equivalent to ten and MR is set to 0.2, afterward, eight cats are arbitrarily selected to undergo seeking mode and another two cats are endure tracing mode. (4) Estimate the fitness value of every cat based on domain identified FF. Then, an optimum cat was selected and stored as memory. (5) The cats then move to either seeking or tracing mode. (6) Next, the cats endure seek or trace mode, for next iteration, arbitrarily reallocate the cats as to seek or trace modes according to MR. (7) Verify the end criteria; when fulfilled; end the program; then, repeat Step 4 to Step 6.
Seeking Mode. This mode reproduces the resting performance of cats, in which 4 basic parameters role vital plays: seeking memory pool (SMP), seeking range of selected dimension (SRD), counts of dimension to change (CDC), and self-position considering (SPC). These values are each tuned and determined as the user with trial and error model. SMP identifies the size of seeking memory to cats, for instance, it determined the count of candidate places where among them is going that exists selected as cat. Thus, for instance, when the SPC flag is fixed to true, afterward to all cats, it can be required for generating (SMP-1) number of candidates in its place SMP number as the present place was regarded as among them. The seeking mode steps are as follows: (1) Create up to SMP copies of present place of Cat k .
(2) In order to all copies, arbitrarily elect up to CDC dimensional that exists mutated. Also, arbitrarily add or subtract SRD values in the present values that exchange the old places as depicted from the subsequent formula: where Xjd old implies the present place; Xjd new represents the next place; j represents the count of cat and d stands for the dimensional, and rand refers the arbitrary number from the interval of 0 and 1. (3) Estimate the fitness value to every candidate place.
(4) According to probabilities, elect most candidate points that exist the next place to the cat where candidate point with superior FS have further chance to be elected as demonstrated in Eq. (10). But, when every fitness value is equivalent, afterward the set every electing probability of all the candidates point to be one.
When the objective is to minimize, next FS b = FS max ; then, FS b = FS min .
Tracing Mode. This mode replicas the trace performance of cats. In the initial iteration, arbitrary velocity values are provided to every dimension of cat place. But, to later steps, velocity values require that exists efficient. Moving cats from this mode are as follows: (1) Upgrade velocities (V k,d ) to every dimension based on Eq. (11).
(2) When the velocity value outranged the maximal value, afterward it can be equivalent to maximal velocity.
(3) Upgrade place of Cat k based on the subsequent formula: The proposed technique has 2 essential processes involving inner parameter optimization and outer efficiency estimation. In the inner parameter optimization method, the penalty parameter C and kernel bandwidths γ of ELM was determined dynamically employing the ISSO approach utilizing 5_fold CV analysis. Next, the achieved optimum parameters pair (C, γ ) is inputted as to KELM forecast method for performing the classifier task from outer loop utilizing a 10_fold CV approach. The classifier error rate is employed as FF.
where testError i stands for the average test error rates reached as ELM classification employing 5-fold CV from inner parameter optimization approach.

Performance Validation
The proposed model is simulated using Python 3.6.5 tool. The FCP performance of the proposed model takes place on Polish dataset with 3 years of data [26] as shown in Tab      On analyzing the performance on the Polish dataset year-III, it can be stated that the GWO-FS manner has accomplished least performance with the average best cost of 0.9819. Simultaneously, the KHO-FS manner has achieved slightly increased outcome with the average best cost of 0.7168. Also, the GOA-FS manner has accomplished moderate performance with the average best cost of 0.5331. Finally, the CGOA-FS approach outperformed the existing manners with the average best cost of 0.2467.

Conclusion
This paper has developed a new CGOA-FLNN-CSO algorithm to predict the financial status of the SMEs. The proposed model involves different processes such as preprocessing, CGOA based feature selection, FLNN based classification, and CSO based parameter tuning. The inclusion of CGOA algorithm for FS plays a vital role in the improved predictive performance. At the same time, the unique features of the FLNN model parameter tuning using CSO algorithm helps to considerably enhance the predictive performance. A detailed experimental validation process takes place on Polish dataset to ensure the performance of the presented model. The obtained simulation results verified the effectiveness of the presented model over the compared methods interms of best cost, sensitivity, specificity, accuracy, F-score, and Mathew Correlation Coefficient (MCC). The CGOA-FLNN-CSO model has accomplished maximum prediction accuracy of 98.830%, 92.100%, and 95.220% on the applied Polish dataset Year I-III respectively. In future, the predictive performance of the CGOA-FLNN-CSO algorithm can be extended by the use of outlier detection approaches.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.