Open Access


An Endogenous Feedback and Entropy Analysis in Machine Learning Model for Stock’s Return Forecast

Edson Vinicius Pontes Bastos1,*, Jorge Junio Moreira Antunes2, Lino Guimarães Marujo1, Peter Fernandes Wanke2, Roberto Ivo da Rocha Lima Filho1

1 Instituto Alberto Luiz Coimbra de Pós-Graduação e Pesquisa de Engenharia-COPPE, Universidade Federal do Rio de Janeiro (UFRJ), Rio de Janeiro, 21941-598, Brasil
2 Instituto de Pós-Graduação em Administração-Coppead, Universidade Federal do Rio de Janeiro (UFRJ), 21941-918, Brasil

* Corresponding Author: Edson Vinicius Pontes Bastos. Email:

(This article belongs to this Special Issue: Neutrosophic Theories in Intelligent Decision Making, Management and Engineering)

Intelligent Automation & Soft Computing 2023, 36(3), 3175-3190.


Stock markets exhibit Brownian movement with random, non-linear, uncertain, evolutionary, non-parametric, nebulous, chaotic characteristics and dynamism with a high degree of complexity. Developing an algorithm to predict returns for decision-making is a challenging goal. In addition, the choice of variables that will serve as input to the model represents a non-triviality, since it is possible to observe endogeneity problems between the predictor and the predicted variables. Thus, the goal is to analyze the endogenous origin of the stock return prediction model based on technical indicators. For this, we structure a feed-forward neural network. We evaluate the endogenous feedback between the predicted returns and technical analysis indicators based on the generated residues. It is possible to predict the return. The high accuracy of the model indicates that, during the test period, there is a hit rate close to 76%. Regarding endogeneity, the term of interest and the return are the variables that influence the largest number of indicators. The results will help investors build investment strategies based on this expert system applied to forecasting.


1  Introduction

To predict and trade the BOVA11, an exchange-traded fund (ETF) from Brazil, we apply a new approach using technical indicators. The main trading system is based on a neural network forecast method. The results are highly favorable to the hypothesis of abnormal returns in a successful forecast of one-day-ahead returns (D + 1) since the forecasting strategy proved to be feasible with an accuracy close to 76%.

According to [1], the efficient market hypothesis, suggests that it is impossible to predict stock value, since these values have random behavior. However, new research [25] shows that most share prices reflect previous records and information, so movement trends are vital for effectively predicting values [3]. Classic economic-financial literature defends the informational efficiency of the stock exchange. In a so-called perfectly efficient market, abnormal returns are unattainable, as prices have already incorporated all relevant information for future results. Thus, the efficient market is defined by [6] as one in which the assets prices fully reflect all available information at a given time. Assuming that all information is reflected in the price, investors expect to obtain a normal rate of return, which rules out the possibility of earning an abnormal return. In other words, in a so-called efficient market, it is expected that all the actors involved have the same access to available information and that these are already priced. Therefore, there would be no possibility of arbitration. As a result, it would be impossible for investors to buy an undervalued asset or sell an overvalued asset to make an abnormal profit, as each stock would always be traded at its fair value. Thus, the theory questions the predictability of prices and opportunities for profitable and consistent long-term trades. The idea that the market cannot be conquered motivates a long-standing controversy between academics and market professionals.

Financial managers always seek to maximize the return on their investments, as the objective is to “win”. However, [7] confirm that hitting the stock market direction is considered an uncertain and high-risk strategy, since many external factors can affect these returns. Furthermore, according to the Efficient Market Hypothesis (EMH) theory, it is impossible to predict a series of returns. Such a time series follows a Brownian and independent movement with random characteristics. It is in this scenario of chaos and controversy that this article develops. The entropy concept can be applied in finance for stock pricing and forecasting. This concept can help to solve the general problem of determining probability distributions of financial market assets, which is characterized by volatility, uncertainty, limited and incomplete information.

Regarding entropy, we will apply the Stochastic Structural Relationship Programming (SSRP) model, based on the methodology of neural network residuals. With this model, we will evaluate the endogenous relationship between the predictive variables of the neural network using the maximum entropy functions according to the Principle of Maximum Entropy (MEP). From this method, it is possible to produce a general mapping of the causal relations between predictors and the Return. From this context, the following research questions arise:

1.    How accurately is it possible to hit the one-day-ahead return forecast using technical indicators as input to the neural network predictive model?

2.    What is the direction of the endogenous relationship between the technical indicators used as input to the neural network model for predicting returns?

The goal is to investigate the entropy and endogenous origin in the prediction of stock returns, with the predictive model inputs given by the technical signals. To that end, we will carry out an endogenous analysis between the technical indicators and the dependent variable Return. We will build a machine learning algorithm using artificial neural networks (ANN) to predict market returns. This research seeks to contribute to the research gap evidenced by the literature [8], which highlights the importance of investigating the link between performance and entropy on predictive stock models.

Studies like this are justified by the empirical evidence of utility, meaning the ability to satisfy a need. Which is contained in an artificial intelligence algorithm applied to emerging markets. Such discussion has the potential to empirically contribute to the literature, concerning the behavior analysis of asset returns forecast. This applies especially in countries with volatile financial markets, as advocated by [9], which are subjected to political-economic uncertainties and frequent changes in the rating, as seen in Brazil. Furthermore, with our work, it will be possible to outline a strategy capable of predicting the future return of the asset. And then, investigate the performance of an investment strategy made with a machine learning algorithm.

Reference [10] highlights the importance of developing and improving predictive machine learning algorithms applied to the financial market. With these algorithms, it is possible to analyze multiple assets in different regions and countries. In this way, the operation becomes more efficient and diversified. This research contributes by bringing the possibility of abnormal returns, making clear the need to create better regulatory mechanisms to reduce informational inefficiencies.

2  Materials and Methods

We will work with the assumption that there is a pattern in the database. That pattern cannot be described mathematically by deterministic methods. So, we used machine learning to identify this hidden pattern, according to the learning theory [11]. The objective is to obtain an approximation function to better describe the behavior of the return that is unknown.

Analogous to the market features, we can mention the human brain because they share some characteristics such as nonlinearity, uncertainty, evolution, cloudiness, chaotic and dynamic systems with a high degree of complexity. According to [12], the brain can be considered a highly complex, non-linear and parallel computing, capable of organizing its constituents (neurons) for processing. In this sense, neural networks present themselves as a processor capable of stock and process the acquired knowledge and making it available for use. Regarding the learning components, the dependent variable (Y) is Return (Ret). This is the response variable to check how the algorithm behaves to predict day t with the lagged at t-1, that is, a lagged day. Technical and volatile VIX indicators will be the Xi inputs.

The ANN mathematical model is formed by a set of n inputs ( x1, x2, , xn ), to which weights ( w1,  w2 ,  ,  wn  ) are attached indicating each corresponding neuron effect, by a linear combination inwixi . The sum of the product between the weights w1 and x1 of input signals passes through the activation function f and induces the transformation. Every neuron transforms the input vector into an output O (x). We have included the hidden layer to increase flexibility. The general mathematical representation of a single hidden layer network with J hidden neurons was recommended by [13].

The database is BOVA11, a Brazilian ETF with more than 60 companies traded on B3 (Brasil, Bolsa, Balcão). The BOVA11 includes companies such as Petrobras, Vale, Itaú Unibanco, Bradesco Bank, Ambev and Banco do Brasil. We downloaded the dataset from Yahoo´s finance website as done by [14], where there is historical stock time-series data such as open, high, low, close, adjusted close and volume information.

The period under review is 2010 to 2020. We divided the dataset into training and testing. The training stage comprises 80% of the oldest observations (1520 observations) and the test stage, the most recent 20% of the database (380 observations). We remove samples with null or missing information as a cleaning operation process.

We use the ANN tool to predict the BOVA11 returns (Ret) one-day-ahead. This network is structurally similar to biological neural structures and has computational capacity acquired through learning and generalization. We structured the learning algorithm with feed-forward neural networks with R software [15]. Therefore, step zero was to structure a single-hidden-layer Multilayer Perceptron (MLP), the feed-forward neural network. We then apply the quasi-Newton method (Broyden-Fletcher–Goldfarb-Shanno algorithm, BFGS) [15] for optimization in the neural net step.

The first model selection problem is represented as a two-class classification to predict the Return Class more accurately. If the Return value is positive, the Return class takes the value 1, indicating a buy signal. Otherwise, the value will be zero. Thus, the objective is to evaluate the endogeneity of the neural network model to predict the return of the BOVA11 index one-day-ahead. This model uses traditional technical analysis variables, such as inputs presented in Table 1.


Indicators are known to give an idea of a specific metric, so they should always be read in context, using other tools to avoid false signals. The increase in subjectivity is evident when using only technical indicators for decision-making, without applying a structured and subjectivity-free methodology that can analyze such indicators. The need to remove human judgment is increasingly emphasized to make good use of indicators. It is necessary to structure a machine learning model to improve the extraction of information from an extensive set of indicators. According to [14], the ability to promote decision-making by induction and the behavior of the human brain were the main motivators for the development of neural networks, which can be seen as intelligent computerized systems. The neural network main model follows the following specification:

Model 1: Ret∼f (VIX + CCI + MACD + WILL + STOCHK + TSEO + BB + CMO + DPO + ROC)

Additionally, we structured 10 feed-forward neural networks, according to the specified models (2 to 11). The objective was to detail the endogenous developments between the variable Return and its predictors (the technical signs) to identify significant causal relationships. Therefore, we verify the hypothesis of endogeny between the predicted return variable and the predictors for the technical analysis. As defined by [20], endogeny exists when the explanatory variables have some correlation with the error term of the predictive model. Once the non-correlation assumption is violated, one or more regressors will be endogenous. This endogeny adds bias to the estimators obtained, negatively affecting the outcome of the forecasting model. For [21], in finance it is possible to observe endogeny due to measurement errors of regressors, simultaneity or even by variables omitted from the model.

2.1 First Endogeneity Analysis: Minimal Endogenous Relationship Variance

Technical indicators are correlated and a correlation between their residues is expected. The existence of this correlation suggests evidence of endogenous relationships between the predictors. Thus, this research presents an approach based on the Stochastic Structural Relationship Programming (SSRP) model, from the residues generated by 10 specifications of neural network models. The objective is to clarify the endogeny and significant structural cause-and-effect relationships that exist between Return and technical analysis variables. For this, we use the residues obtained by the following 10 models:

Model 2: VIX∼f (Ret + CCI + MACD + WILL + STOCHK + TSEO + BB + CMO + DPO + ROC)

Model 3: CCI∼f (VIX + Ret + MACD + WILL + STOCHK + TSEO + BB + CMO + DPO + ROC)

Model 4: MACD∼f (VIX + CCI + Ret + WILL + STOCHK + TSEO + BB + CMO + DPO + ROC)

Model 5: WILL∼f (VIX + CCI + MACD + Ret + STOCHK + TSEO + BB + CMO + DPO + ROC)

Model 6: STOCHK∼f (VIX + CCI + MACD + WILL + Ret + TSEO + BB + CMO + DPO + ROC)

Model 7: TSEO∼f (VIX + CCI + MACD + WILL + STOCHK + Ret + BB + CMO + DPO + ROC)

Model 8: BB∼f (VIX + CCI + MACD + WILL + STOCHK + TSEO + Ret + CMO + DPO + ROC)

Model 9: CMO∼f (VIX + CCI + MACD + WILL + STOCHK + TSEO + BB + Ret + DPO + ROC)

Model 10: DPO∼f (VIX + CCI + MACD + WILL + STOCHK + TSEO + BB + CMO + Ret + ROC)

Model 11: ROC∼f (VIX + CCI + MACD + WILL + STOCHK + TSEO + BB + CMO + DPO + Ret)

We use the residues from models 1 to 11 to generate sets of conditional probability distributions of the residuals, thus obtaining ten probability distributions to investigate the entropy of such residuals. In information theory, entropy refers to the probabilistic uncertainty related to a given probability distribution. Different degrees of uncertainty was associated with different distributions, since each distribution has an intrinsic degree of uncertainty. The principle of Maximal Information Entropy for Directional Weighted Residuals establishes that the probability distribution most adherent to the variable is the one with the highest entropy.

The conditional distributions of the residues show the direction of the relationship between the variables under study. There are two steps to run the Stochastic Structural Relationship Programming (SSRP) method that reveal the endogeneity and identify significant cause and effect relationships. The first step, called Minimal Endogenous Relationship Variance, consists of exploring the degree of relative importance between the variables of predictive models from 1 to 11, through the variance of each model. To investigate the existence of endogeny we used the covariance between the models.

We simultaneously minimize the covariance and variance terms of the residues of the 11 models by a nonlinear stochastic optimization problem, as presented in Eq. (1), according to [22].

min[Var(i=111 wi  Ri)+(2 i,j=111 Covar (wi  wj  Ri  Rj),  ij,  j<i)] 

subject to

i=111wi=1 (1)

where wi stands for the weights, which range from 0 to 1 ( 0wi1 i ), assigned, respectively, to the residual vectors of each one of the 11 models described; Ri is the models’ residues where i varies from 1 to 11.

We solve Eq. (1) with the differential evolution method (DE). It is a stochastic method with the assumptions of natural selection and evolution. This algorithm uses a randomly generated initial population from a uniform distribution with probability crossover, differential mutation and selection operators. The DE method belongs to the genetic algorithms (GA) field and is biologically inspired, just like the structures of neural networks. According to [10], the optimization process in the GA is based on a randomly guided process. In this process, a group of specific parameters is randomly generated for a fixed number of so-called populations. More details on DE methodology can be seen in [23], which discusses the DEoptim R software package.

We solve models 1–11 with a bootstrap technique, with 100 repetitions each, generating 100 residues for each model. Subsequently, we optimize these 100 residues of models 1–11 by optimization Eq. (1), generating a W distribution for an assertive prediction of the coefficients. In this way, the W values obtained by the optimization represent the optimal solution and, therefore, the minimum points of variance (Var) and Covariance (Covar) of the Residues grouped according to Eq. (1) considered.

2.2 Second Endogeneity Analysis: Maximal Information Entropy for Directional Weighted Residuals

From the Principle of Maximum Entropy, the probability distribution that best represents the current stage of knowledge is the one with the highest entropy. The second step is the use of the Maximal Information Entropy for Directional Weighted Residuals algorithm. In this step, we get a set with all possible combinations of the Conditional Residual distributions (CR_k). We use the bootstrap results with 100 replications, obtained in the first step of the non-conditional distributions of residuals (R_i), as a starting point for the calculations of the second step, where C Rk f(Ri/Rj) for all i and j, ij and K=iji=11  1111=110 . Likewise, we apply the DE method [23] to solve the entire following non-linear programming model:

max[(i11j11H(f(RiRj)  wi  wj)) OR (i11j11H(g(Ri, Rj)  wi  wj)),  ij] 

subject to

i=111wi=1 (2)


where H(.) represents the information entropy function, g(Ri, Rj) is the unconditional marginals of the residuals from models (1)–(11), i,  j,  ij , f(Ri/Rj ) is the conditional distribution of the residuals from models (1)–(11), i,  j,  ij .

A nonlinear integer programming model makes it possible to identify whether the conditional distributions of each pair of residuals have significantly different directions. For example, the weights assigned to f (R_i/R_j) could produce higher entropy than those assigned to levels of f (R_j/R_i), compared to the unconditional residues analyzed in the first step, called Minimal Endogenous Relationship Variance for endogeny investigation. This non-integer linear programming methodology returns the structural relationship of the dependent variables defined in Eq. (1), for which the information entropy is maximum. This method ensures uniqueness and provides consistent support for the probabilistic weight profile calculated in the Minimal Endogenous Relationship Variance step, where the overall residual variance is minimal. We used the weights calculated in the Minimal Endogenous Relationship Variance step as an initial database for the optimization of the Maximal Information Entropy for Directional Weighted Residuals algorithm, called step 2. We also used the DE methodology to find an optimal solution regarding the maximum entropy for each pair of variables ij.

Thus, the output is whether i cause j (or the other way around) or whether the relationship is endogenous, for each pair ij. Through Table 2 it is possible to see a summary of the methodological step-by-step and the pseudo-code used to calculate the estimates of the function f (.) and g (.) in the optimization of the step of Maximal Information Entropy for Directional Weighted Residuals.



3  Results

The VIX indicator had the second smallest dispersion metric observed by the value of its standard deviation, second only to the indicator ROC. The highlight of the VIX value considered at the neural network input is the difference between the VIX of day t and the VIX of t-1, all divided by the value of the VIX index in T-1. The index VIX is quoted in percentage points. The higher the index, the greater the risk perception. It is known as the fear index, as it manages to capture investor sentiment.

After establishing the input connections, in the learning process, the objective was to find the fit of the weights vector pi. Thus, the training objective aimed at convergence was achieved. For the first model, the neural network algorithm converged at the end of 740 iterations. At this point, we concluded that learning has occurred. This was associated with the ability of the neural network to adapt the parameters as a result of its interaction with the database. The learning process is interactive, and through it, the ANN should gradually improve its performance as it interacts with the variables.

The performance criteria that determine the ANN quality and the training breakpoint were pre-established by the training parameters, usually associated with measures of accuracy or error. In this way, we adjusted the hidden-layer neural network parameters with 2000 iterations at maximum with an iterative loop for each number of units in the hidden layer and weight decay, using the size (number of neurons units in the hidden layer) between 2 and 30 for layout test, and the decay test values between 1, 0.1, 0.01, 0.001, 0.0001, 0.00001 and 0 to converge to the best layout.

To this end, we structured a matrix to accumulate the accuracy values of the 10-fold cross-validation for each layout and decay. We validated the hyperparameter search of the neural network models (neurons, decay and error) by comparing the lowest values of the Mean Squared Error (MSE). We capture the predicted values through the prediction function with R software. If the variables are correlated, is expected a relationship between their residues. This correlation indicates endogenies between the predictors and, as a result, was the analysis of residuals of the 11 models obtained by the neural net combinations, where each predictor variable assumes the role of a dependent variable in separate models.

With this, it was possible to estimate the residuals of the 11 models. Using the minimum endogenous relationship variance methodology that minimizes the variance, we used Olden´s criterion to capture the importance ranking between the models. Regarding Olden´s criterion of importance over the main model, the most important variable was the ROC, while the least important variable with a negative value was the DPO indicator. In addition, the TSEO variable had repeatedly high negative importance in models 3, 5, 8 and 9, where the dependent variable was CCI, WILL, BB and CMO, respectively. When the dependent variable was the CMO indicator of the neural network, the high and positive degree of importance of the MACD variable stands out. On the other hand, the variables DPO, ROC, Ret, TSEO and VIX presented a high negative index of importance. Finally, for models with ROC as the dependent variable, there were lower degrees of Olden’s importance for all predictors.

To answer the research objective, we present the performance analysis of the first model of neural networks through the confusion matrix. The dependent variable was the return and the inputs of the technical indicators.

The Mcnemar’s Test is a metric to assess the performance of the predictive model through the analysis of the confusion matrix [24]. According to Table 3, the Mcnemar’s Test had a p-value equal to 0.14, so it was not possible to reject the null hypothesis (H0) given a significance level of 5%. Only the classifiers had a similar proportion of errors in the test dataset. The network correctly classified 1439 observations from 1899, an accuracy of approximately 76%. This confirmed the research hypothesis regarding the possibility of trade success. Thus, the application of neural network to predict return signals was profitable and consistent, showing that the investor who makes decisions based on neural network outputs would hit 76% of the days on average. Along with the risk management of the allocated capital, this hit rate enabled profitable long-term trades, presenting evidence regarding the use of technical indicators as inputs of the neural network model.

The results indicated that the strategy allows higher gains than the buy and hold method. As a result, the second hypothesis was also valid: technical indicators can predict market movements. This result is in line with the works presented by [25], which evidenced that it is possible to have a winning predictive strategy in the stock market based on machine learning models, as applied in this article.

Regarding the verification of predictor variables endogeneity, the initial results refer to the analysis of the probability distribution of the model´s residuals. Most models presented the logistical distribution as more adherent to their respective residues. The endogenous analysis was discussed based on the results of the Stochastic Structural Relationship Programming SSRP methodology.

The relative importance of the models to obtain the minimum residual variance was low in 10 variables including the return, as seen in Fig. 1. Such attributes accounted for almost all total median weights.


Figure 1: Relative importance of models 1–119

In a normal situation, where there was a balance between the variables, the expected importance value for each model would be an equivalent weight for all 11 models, that is, (100%)/11 = 9.09%. However, there were indicators with unbalanced weight distribution. Initially, some indicators showed the strongest causal relationships linked to the ROC variable. The Rate of Change has the property of measuring the price percentage change in a given period. This means that the greater the difference between the contemporary price and the price of the period considered, the greater the value of the ROC indicator. Such causality can be explained by the herd effect, usually present when the stock market undergoes large fluctuations around the mean. Analyzing the causality pairs, Fig. 2 confirmed that the ROC variable was associated with a cause-and-effect process that could occur with TSEO, VIX, and STOCHK. In Fig. 2 it is possible to see the results for the relative importance of the main interaction pairs in explaining the general variance of the residuals.


Figure 2: Major combinations of endogeneity weights for pairs

Regarding the main combinations of endogenous pairs, the effect of joint feedback on the residual’s variances of the TSEO and ROC, VIX, ROC, STOCHK, and ROC pairs stands out. As it is a momentum indicator, the ROC variable indicates the percentage of variation during a time window, and it was possible to observe that it was controlled by the ROC, VIX, and STOCHK indicators.

As a robustness test, we presented the results of the entropy of information for conditional and unconditional distribution (Tables 4 and 5).



4  Discussion

As seen in Tables 4 and 5, the unconditional distribution is symmetric and the conditional distribution is asymmetric. The cause-and-effect relationship is perceptible through these tables obtained by the Maximal Information Entropy for Directional Weighted Residuals method for the 0.975 percentile. Also evidenced in Fig. 3 is the framework of the relationship between technical indicators. Such sign and weight are derived from Olden´s criterion for ranking the importance of the neural network inputs, which provides the absolute importance.


Figure 3: Cause and effect framework for technical indicators at 0.975 percentile (weights are derived from Step 2, signs are derived from Olden´s (2002) sensitivity analyses)

In this second analysis, regarding conditional and non-conditional distributions, the objective was to find a maximization in the information entropy, as a test of robustness to verify if in a scenario with worse uncertainty (principle of maximum entropy) this type of behavior is repeated.

For each model, through bootstrap, we collect 100 combinations of residues and then find the non-conditional and conditional distribution of residues from one model to another. This is the second analysis detailed in the methodology by Eq. (2), to maximize the entropy of the information. It is considered a robustness test to confirm the first stage results. For the non-conditional simulation, we used the copulas method, following the same distribution and correlation between residuals found originally.

If the entropy of the information of the original (non-conditional) residuals is greater than conditional, the variable under analysis is independent and no other variable rules it. However, if the entropy of the conditional information is higher, then it is stated that there is a relationship between the two variables (there is endogeny). In this case, the row variable influences the column variable in Tables 4 and 5. For example, in the first row of the Return variable in Table 4, the comparative analysis is done first between the Return and the VIX index. Comparing the pair of values 0.67043 and 0.666955 with the value of the unconditional matrix of the variable in the respective first line, in this case, the value is 0.463661. As the pair of values of the conditional distribution is higher (0.67043 and 0.666955) than the value of the non-conditional distribution (0.463661). It denotes that the Return (row variable) commands the VIX indicator.

Analogously, it is possible to analyze cause and effect relationships through the diagram shown in Fig. 3 where the arrows indicate the direction of causal relationships between the variables. The conclusion is that the return influences the variables CCI, MACD, WILL, STOCHK, TSEO, BB, DPO, ROC and VIX. Except for CMO indicators, we observed the predominant return domain.

When analyzing the VIX, the volatility controls all the other indicators except WILL, TSEO, ROC and Return. The VIX index known as the fear index is an exogenous variable calculated from the maturity of options and serves as a proxy for market risk.

The CCI controls only the variable BB and is therefore controlled by all others. Such a weak command relationship can be justified by the fact that the CCI signal is used to detect the initial and final trends. This signal has the characteristic of storing the lowest value compared to the other indicators. To interpret CCI indicator results, we use the concepts of overbought and oversold. It can be understood as if the market is overbought when it is above +100 and oversold when it is below −100. However, some investors use the movement of these values to understand the market strength. Breaking the value of +100 upwards can represent strength in the uptrend. However, when it returns below +100, could mean that the market is correcting the recent high, and the same goes for values below −100. We calculate CCI using 20 periods for the moving average. The Commodity Channel Index is a momentum oscillator and measures the price change compared to its respective average. We assign a constant of 0.015 for multiplication with the standard deviation. This constant ensures that about 70% to 80% of the values are between −100 and +100.

The MACD indicator influences the variables CCI, STOCHK, BB, CMO, DPO, and TSEO. For MACD we use 12 periods for the fast-moving average, 26 periods for the slow-moving average, and nine periods for the signal moving average. The MACD allows monitoring trends and momentum, but it is not useful for identifying overbought or oversold levels. Usually, if the MACD is above zero it is a buy signal, and below zero a sell signal. This signal is considered a delayed type of indicator.

The variable WILL controls all the others, except Return, STOCHK and ROC. The WILL signal is applicable in markets without a defined trend and facilitates the identification of overbought or oversold points.

The STOCHK model influences CCI, BB, WILL, CMO, DPO and ROC. In the STOCHK indicator, values close to the maximum amplitude indicate buying force and accumulation. On the other hand, the values in the minimum range indicate predominantly selling force and distribution. For the STOCHK indicator we used the number of periods equal to 13, with two fast periods for initial smoothing, 25 slow periods for double smoothing and nine periods for the signal line. STOCHK is a stochastic oscillator, also considered a moment indicator that relates the closing value of each day against the high/low range in time.

The TSEO influences the variables VIX, CCI, STOCHK, BB, DPO and ROC. For Triple Smoothed Exponential Oscillator, buy/sell signals are relevant when this indicator crosses the signal line. We built this indicator considering 20 periods for the moving average and nine periods for the signal line moving average.

The BB only has a direct influence on the ROC. If the stock´s volatility increases, the BB tends to widen. Otherwise, the BB tends to narrow. It serves as an overbought or oversold indicator. For example, when the price is close to the upper band, there are signs of reversal. We built this indicator considering 20 periods for the moving average and 2 standard deviations for each band.

The CMO influences CCI, TSEO, BB, DPO, and Return. The Chande Momentum Oscillator is useful for determining the beginning of trends and is considered a modified relative strength index (RSI). The RSI is an oscillator-type technical indicator that measures the relationship between buying and selling forces of a given paper, ranging from 0 to 100. RSI signals regions indicate overbought and oversold. When the indicator is below a threshold, we have RSILow, commonly equal to 30. In this scenario that the share price is in an oversold zone, and the selling force is losing strength. This can be a sign that the share price will rise.

The DPO influences only the CCI and BB indicators. The Detrended Price Oscillator removes the trend component from the price time series by subtracting the moving average from the price over the price. We used ten for the number of periods of the moving average, and six for the number of periods of change in the moving average. Finally, the ROC can influence VIX, CCI, MACD, WILL, CMO, and DPO.

5  Conclusion

The present study aimed to verify the performance of the model of neural networks to predict returns in the Brazilian market. In addition, it investigated the information entropy from the variables of the neural network predictive model.

Confusion matrix analysis confirms hypothesis H1 since the main model has an accuracy of about 76%. In other words, during 100 trading days, the model would have settled in 76 days, thus fulfilling the first research objective. The predictive ability of the model is slightly better for the shorts position (negative returns) compared to the predictive ability of positive return scenarios (long position).

Since the predictor variables present correlated residues, there is evidence of endogeneity. However, considering only the correlation analysis, it is not possible to support this statement. So, that´s why we investigate the endogenous origin and information entropy in a prediction system of stock returns with the inputs given by technical indicators. We carried out an endogenous analysis between the technical indicators and the dependent variable Return.

From the residuals bootstrap of the 11 neural networks models combinations, we optimized the smallest possible variances using the minimum endogenous relationship variance methodology. We applied the minimization method through a nonlinear stochastic optimization process as presented in Eq. (1) within the methodology. Based on this method, we assign an importance degree to the models. Based on the theory of information entropy, it is expected that the model with the lowest variance has the highest degree of importance. In a normal situation, the expected value of importance for each model would be an equivalent weight for all 11 models, i.e., (100%)/11 = 9.09%. However, there is a difference between the expected and actual weight, indicating how well this model behaves better than others, or how much this variable dominates the others. Based on the results, we conclude that there is no balanced weights distribution among the 11 models under study.

The model with the lowest variance indicates good behavior and, consequently, will have greater weight. We did this by minimizing covariance with the variances. The input is a residual matrix, for which we solved models 1–11 by applying a bootstrap technique with 100 repetitions, generating 100 residuals for each model. So, we optimized this resulting in the optimum weights. With that, we obtained a behavior profile of the weights.

In model 1, the Rate of Change has the highest positive Olden´s criterion of absolute importance, which was confirmed by the robustness test (Fig. 1). Finally, the dependent term of the main model, the return, is the variable that influences the largest number of indicators. Thus, the endogenous origin of the main neural network based on the technical indicators presented is favorable for the predictive purpose of the interested output, the Return.

The findings will help the target public to build investment strategies and verify strategy adherence in different risk environments. This work presents a tool for decision-making and, therefore, a practical and applicable contribution.

Apart from that, it is known that the Brazilian stock market is small compared to the American one, with few companies covered by analysts. Therefore, the findings related to the prediction of the returns are especially important to investors without access to market analysts, and for other individuals who want to learn about investment strategies from machine learning algorithms. Moreover, with quantitative strategies, is possible to significantly reduce human interference in the decision-making process, eliminating behavioral biases that negatively impact investment returns. As a direction for further research, we suggest investigating the application of artificial intelligence algorithms to make the prior selection of technical variables that will be input to the predictive model. For this, it is possible to use a random forest model beforehand to select the variable improvements to be used as input to the neural network models.

Funding Statement: The authors received no specific funding for this study.

Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.


  1. S. Akhterand and M. A. Misir, “Capital markets efficiency: Evidence from the emerging capital market with particular reference to Dhaka stock exchange,” South Asian Journal of Management, vol. 12, pp. 35, 2005.
  2. M. Ballings, D. Van den Poel, N. Hespeels and R. Gryp, “Evaluating multiple classifiers for stock price direction prediction,” Expert Systems with Applications, vol. 42, pp. 7046–7056, 2015.
  3. M. Nabipour, P. Nayyeri, H. Jabani, A. Mosavi, E. Salwana et al., “Deep learning for stock market prediction,” Entropy, vol. 22, pp. 840, 2020.
  4. D. Shah, H. Isah and F. Zulkernine, “Stock market analysis: A review and taxonomy of prediction techniques,” International Journal of Financial Studies, vol. 7, pp. 26, 2019.
  5. C. -F. Tsai, Y. -C. Lin, D. C. Yen and Y. -M. Chen, “Predicting stock returns by classifier ensembles,” Applied Soft Computing, vol. 11, pp. 2452–2459, 2011.
  6. E. F. Fama and J. D. MacBeth, “Risk, return, and equilibrium: Empirical tests,” Journal of Political Economy, vol. 81, pp. 607–636, 1973.
  7. T. K. Lee, J. H. Cho, D. S. Kwon and S. Y. Sohn, “Global stock market investment strategies based on financial network indicators using machine learning techniques,” Expert Systems with Applications, vol. 117, pp. 228–242, 2019.
  8. X. Liu and D. D. Thomakos, ““Taps”: A trading approach based on deterministic sign patterns,” Expert Systems with Applications, vol. 175, pp. 114761, 2021.
  9. R. Garcia, D. Monte-Mor and N. Tardin, “Can accounting-based and market-based indicators predict changes in the risk rating of Brazilian banks?,” Revista Brasileira de Gestão de Negócios, vol. 21, pp. 152–168, 201
  10. F. Ecer, S. Ardabili, S. S. Band and A. Mosavi, “Training multilayer perceptron with genetic algorithms and particle swarm optimization for modeling stock price index prediction,” Entropy, vol. 22, pp. 1239, 2020.
  11. Y. S. Abu-Mostafa, M. Magdon-Ismail and H. -T. Lin, Learning from Data, vol. 4. New York, USA: AMLBook, pp. 1–9, 2012.
  12. S. Haykin and P. M. Engel, Redes neurais: Princĺpios e prática, 2nd ed., Porto Alegre, Brasil: Bookman, pp. 898, 2003.
  13. D. P. Fonseca, P. F. Wanke and H. L. Correa, “A Two-stage fuzzy neural approach for credit risk assessment in a Brazilian credit card company,” Applied Soft Computing, vol. 92, pp. 106329, 2020.
  14. A. Thakkar and K. Chaudhari, “A comprehensive survey on deep neural networks for stock market: The need, challenges, and future directions,” Expert Systems with Applications, vol. 177, pp. 114800, 2021.
  15. B. Ripley, W. Venables and M. B. Ripley, “Package ‘NNET’: Feed-forward neural networks and multinomial log-linear models,” R Package Version, vol. 7, pp. 3–12, 2016.
  16. P. Carrand and L. Wu, “Variance risk premiums,” The Review of Financial Studies, vol. 22, pp. 1311–1341, 2009.
  17. S. L. Lambert, “Fundamental signals, future earnings and security analysts’ efficient use of fundamental signals during 1991 through 2008,” Ph.D. Dissertation, University of Texas, Arlington, Texas, USA, 2011.
  18. J. Ulrich, “Package TTR: Technical trading rules,” R package, 2021.
  19. F. Lemos, “Análise técnica dos mercados financeiros,” in São Paulo, Brasil: Saraiva Educação S.A., 2017.
  20. A. D. Hill, S. G. Johnson, L. M. Greco, E. H. O’Boyle and S. L. Walter, “Endogeneity: A review and agenda for the methodology-practice divide affecting micro and macro research,” Journal of Management, vol. 47, pp. 104–143, 2021.
  21. L. A. B. de C. Barros, F. H. Castro, D. Silveira, A. D. Miceli and D. R. Bergmann, “Endogeneity in corporate finance empirical research (in Portuguese),” 2010. Available at Social Science Research Network:
  22. A. B. Alves, P. Wanke, J. Antunes and Z. Chen, “Endogenous network efficiency, macroeconomy, and competition: Evidence from the Portuguese banking industry,” The North American Journal of Economics and Finance, vol. 52, pp. 101114, 2020.
  23. D. Ardia, K. Boudt, P. Carl, K. M. Mullen and B. G. Peterson, “Differential evolution with DEoptim: An application to non-convex portfolio optimization,” The R Journal, vol. 3, pp. 27–34, 2011.
  24. P. P. Shinde, K. S. Oza and R. K. Kamat, “Big data predictive analysis: Using R analytical tool,” in Int. Conf. on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Palladam, India, pp. 839–842, 2017.
  25. M. M. Kumbure, C. Lohrmann, P. Luukka and J. Porras, “Machine learning techniques and data for stock market forecasting: A literature review,” Expert Systems with Applications, vol. 197, 2022.

Cite This Article

E. V. P. Bastos, J. J. M. Antunes, L. G. Marujo, P. F. Wanke and R. I. D. R. L. Filho, "An endogenous feedback and entropy analysis in machine learning model for stock’s return forecast," Intelligent Automation & Soft Computing, vol. 36, no.3, pp. 3175–3190, 2023.

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 539


  • 299


  • 0


Share Link