Deep learning is the process of determining parameters that reduce the cost function derived from the dataset. The optimization in neural networks at the time is known as the optimal parameters. To solve optimization, it initialize the parameters during the optimization process. There should be no variation in the cost function parameters at the global minimum. The momentum technique is a parameters optimization approach; however, it has difficulties stopping the parameter when the cost function value fulfills the global minimum (non-stop problem). Moreover, existing approaches use techniques; the learning rate is reduced during the iteration period. These techniques are monotonically reducing at a steady rate over time; our goal is to make the learning rate parameters. We present a method for determining the best parameters that adjust the learning rate in response to the cost function value. As a result, after the cost function has been optimized, the process of the rate Schedule is complete. This approach is shown to ensure convergence to the optimal parameters. This indicates that our strategy minimizes the cost function (or effective learning). The momentum approach is used in the proposed method. To solve the Momentum approach non-stop problem, we use the cost function of the parameter in our proposed method. As a result, this learning technique reduces the quantity of the parameter due to the impact of the cost function parameter. To verify that the learning works to test the strategy, we employed proof of convergence and empirical tests using current methods and the results are obtained using Python.
The cost function formed by an artificial neural network (ANN) is identified using the learning data to achieve artificial intelligence (A.I.) by describing the cost function created by an ANN. The parameters to make this cost function as minimal, to put it another way, A.I. learning is the cost procedure for optimizing a process; as a result, A.I. learning faces two challenges: The cost function description and its optimization is the first issue is the definition of the term the more data the ANN structure, the higher the cost function. As a result, the cost function has become more complicated [
Since the introduction of machine learning, Artificial Neural Networks (ANN) have been on a meteoric rise [
There has recently been a lot of research into new strategies to adapt the learning rate. Academic and intuitive empirical support these claims [
The AdaGrad method excessive accumulation of the RMSProp [
Momentum and RMSProp are combined in the Adam algorithm [
The cost function was created using the learning data. The input data
To satisfy
Gradient-based approaches are commonly employed to address the optimization problem
To discover parameter w, gradient-based approaches are utilized to ensure that perhaps the cost function gradient is zero. However, c(w) is not a convex cost function. Reducing the gradient to zero might be ineffective if c(w) is not a convex cost function. As a result, to improve learning performance, we used the Lagrange multiplier technique to recognize E(w) and identify the parameter w, which makes E(w) zero.
The learning rate is defined as α, where α is a constant. The momentum approach is identical to
The momentum technique is used to solve E(w) = 0.
Our methodology is based on the method of Momentum, which terminates learning when it reaches the cost function global minimum value, which is the point at which learning is well implemented. The following results can be obtained by modifying the learning rate:
Where
Lemma 1. The following is the relationship between mi and Ui:
Proof. [
Because
As a result, when the series
As
As a result, we can acquire
Where δ = supremum-i>τ
It is possible that δ is smaller than 1 in Theorem (1). As a result, we explain why δ is smaller than 1 in further detail in the following corollary.
Under the supposition that
If sign
This paper compares GD, Adagrad, and Adam, our proposed technique, since our strategy’s momentum concept contains the momentum method influence. The performance of each method was examined in Sections 6.1 and 6.2 by supposing; that the cost functions are indeed two-variable functions. It is a visual representation of how the technology changes each parameter. We explored the problem in Section 6.3. Human-written0 to 9 numbers are classified. A dataset is the most fundamental data set to compare the performance of deep learning MNIST. The convolution neural network (CNN) and MNIST dataset approach, which are extensively used to classify an image, were used in this experiment. This approach is commonly used to classify images [
The two-variable function is used to in this section, so each approach impacts along parameters. The two-variable function utilized in this experiment (i.e., Weber function) is roughly defined as follows:
If
The Tang–Styblinski Function performance of each technique was examined in this experimentation utilizing the Styblinski– Tang function as the cost function, with three the cost function is based on local minimums. The three local minima of the Styblinski–Tang function are specifically defined:
The initial parameter value is (7, 0), 1 × 10−2 is the learning rate, and the learning has been done 301 times. Each strategy yielded a different modification in parameters as depicted in
The MNIST dataset is a 28 × 28 gray-scale picture with ten classes that are popular for machine learning; part of the MNIST data is shown in
A basic structured CNN was utilized, with one hidden layer and two convolution layers to compare the performance of each approach. A 32 × 5 × 5 sized filter and a 64 × 32 × 5 × 5 sized filtering were used for the convolution; a 50% dropout was performed as well as the size of the batch was set to 64. The learning process was repeated 88300 times, with a learning rate of 0.0001. The experiment results are shown in
CIFAR-10 dataset used to train in this section; we’ll use the RESNET model to examine more extensive tests of each method performance model. There were 60,000 images in the CIFAR-10 dataset. Color images from the CIFAR-10 collection were used, with ten different classes (trucks, airplanes, ships, vehicles, horses, frogs, dogs, deer, cats, and birds). The learning was done with the RESNET 44 model in 128 the batch size, 80,000 iterations, and 0.001 is a learning rate. The findings of each approach employed in this experiment are shown in
We presented a deep learning method based on the current momentum approach that uses both of the first derivative and the same time of the cost function. In addition, our proposed strategy developed to learn in an optimal condition by modifying the learning rate technique that responds to cost function changes. The results of experimental shown this way of learning are Adam, GD, and momentum techniques used to validate our proposed strategy. According to studies, we confirmed pause learning in terms of successful learning because it leads to better learning accuracy than other techniques. The cost we define is zero if we use the global minimum of the cost function to converge to zero using our proposed method. It performs exceptionally well in terms of learning accuracy. Our approach provides several advantages because the learning rate is adaptive. Simply changing the formula may be used with existing learning methods. In the future, deep learning with a more profound arrangement will rate learning stop at a proper period in deep learning; stopping learning is a problem. On the other hand, it will be solved concurrently with practical learning.
Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2022R79), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.