Home / Journals / CMES / Online First / doi:10.32604/cmes.2026.077726
Special Issues
Table of Content

Open Access

ARTICLE

Gradient Descent with Time-Decaying Regularization for Training Linear Neural Networks

Sergio Isai Palomino-Resendiz1,2, César Ulises Solís-Cervantes1,*, Luis Alberto Cantera-Cantera1,3, Jorge de Jesús Morales-Mercado1, Diego Alonso Flores-Hernández4
1Departamento de Ingeniería en Control y Automatización, Escuela Superior de Ingeniería Mecánica y Eléctrica (ESIME), Unidad Zacatenco, Instituto Politécnico Nacional, Unidad Profesional Adolfo López Mateos. Av. Luis Enrique Erro S/N, Gustavo A. Madero, Zacatenco, Ciudad de México, México
2 Departamento de Control Automático, Centro de Investigación y de Estudios Avanzados (CINVESTAV) del Instituto Politécnico Nacional, Unidad Zacatenco, Av. Instituto Politécnico Nacional No. 2508, Col. San Pedro Zacatenco, Ciudad de México, México
3 Facultad de Ingeniería, Universidad Anáhuac México, Campus Norte, Huixquilucan, Estado de México, México
4 Sección de Estudios de Posgrado e Investigación, Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas (UPIITA), Instituto Politécnico Nacional, Av IPN 2580, La Laguna Ticoman, G. A. M., Ciudad de México, México
* Corresponding Author: César Ulises Solís-Cervantes. Email: email
(This article belongs to the Special Issue: Computational Modeling, Simulation, and Algorithmic Methods for Dynamical Systems)

Computer Modeling in Engineering & Sciences https://doi.org/10.32604/cmes.2026.077726

Received 16 December 2025; Accepted 25 February 2026; Published online 09 April 2026

Abstract

Many linear-in-parameters models arising in identification and control can be expressed as single-layer artificial neural networks (ANNs) with linear activation, enabling online learning via first-order optimization. In practice, however, standard gradient descent often exhibits slow convergence, large intermediate weights, and stagnation when the regressor data are ill-conditioned or computations are performed under finite precision. This paper proposes Gradient Descent with Time-Decaying Regularization (GD-TDR), a training algorithm that augments the quadratic loss with a regularization term whose weight decays exponentially in time. The proposed schedule enforces uniform strong convexity during early iterations, effectively mitigating neural-paralysis-like behavior associated with flat directions, while asymptotically vanishing so that the unregularized least-squares solution is recovered. A convergence theorem for GD-TDR is established and a concise pseudocode implementation is provided. Numerical and embedded experiments on an online identification problem of a Chua-type chaotic oscillator demonstrate that GD-TDR converges faster and avoids stagnation compared to standard gradient descent, without introducing the steady-state bias characteristic of fixed quadratic regularization.

Keywords

Time-decaying regularization; gradient descent; single-layer linear neural network; online system identification; chaotic oscillator; embedded implementation
  • 54

    View

  • 10

    Download

  • 0

    Like

Share Link