Gradient Descent with Time-Decaying Regularization for Training Linear Neural Networks

Sergio Isai Palomino-Resendiz^1,2, César Ulises Solís-Cervantes^1,*, Luis Alberto Cantera-Cantera^1,3, Jorge de Jesús Morales-Mercado¹, Diego Alonso Flores-Hernández⁴
1Departamento de Ingeniería en Control y Automatización, Escuela Superior de Ingeniería Mecánica y Eléctrica (ESIME), Unidad Zacatenco, Instituto Politécnico Nacional, Unidad Profesional Adolfo López Mateos. Av. Luis Enrique Erro S/N, Gustavo A. Madero, Zacatenco, Ciudad de México, México
2 Departamento de Control Automático, Centro de Investigación y de Estudios Avanzados (CINVESTAV) del Instituto Politécnico Nacional, Unidad Zacatenco, Av. Instituto Politécnico Nacional No. 2508, Col. San Pedro Zacatenco, Ciudad de México, México
3 Facultad de Ingeniería, Universidad Anáhuac México, Campus Norte, Huixquilucan, Estado de México, México
4 Sección de Estudios de Posgrado e Investigación, Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas (UPIITA), Instituto Politécnico Nacional, Av IPN 2580, La Laguna Ticoman, G. A. M., Ciudad de México, México
* Corresponding Author: César Ulises Solís-Cervantes. Email: email
(This article belongs to the Special Issue: Computational Modeling, Simulation, and Algorithmic Methods for Dynamical Systems)

Computer Modeling in Engineering & Sciences https://doi.org/10.32604/cmes.2026.077726

Received 16 December 2025; Accepted 25 February 2026; Published online 09 April 2026

Download PDF

Abstract

Many linear-in-parameters models arising in identification and control can be expressed as single-layer artificial neural networks (ANNs) with linear activation, enabling online learning via first-order optimization. In practice, however, standard gradient descent often exhibits slow convergence, large intermediate weights, and stagnation when the regressor data are ill-conditioned or computations are performed under finite precision. This paper proposes Gradient Descent with Time-Decaying Regularization (GD-TDR), a training algorithm that augments the quadratic loss with a regularization term whose weight decays exponentially in time. The proposed schedule enforces uniform strong convexity during early iterations, effectively mitigating neural-paralysis-like behavior associated with flat directions, while asymptotically vanishing so that the unregularized least-squares solution is recovered. A convergence theorem for GD-TDR is established and a concise pseudocode implementation is provided. Numerical and embedded experiments on an online identification problem of a Chua-type chaotic oscillator demonstrate that GD-TDR converges faster and avoids stagnation compared to standard gradient descent, without introducing the steady-state bias characteristic of fixed quadratic regularization.

Keywords

Time-decaying regularization; gradient descent; single-layer linear neural network; online system identification; chaotic oscillator; embedded implementation

Downloads
- Full-Text PDF
Citation Tools
- BibTex
- EndNote
- RIS

54

View
10

Download
0

Like

Adaptive Time Slot Resource Allocation in SWIPT IoT Networks
Yunong Yang, Yuexia Zhang, Zhihai...
Fractional Gradient Descent RBFNN for Active Fault-Tolerant Control of Plant Protection UAVs
Lianghao Hua, Jianfeng Zhang,...
Three-Stage Transfer Learning with AlexNet50 for MRI Image Multi-Class Classification with Optimal Learning Rate
Suganya Athisayamani, A. Robert...
Adaptive Time Synchronization in Time Sensitive-Wireless Sensor Networks Based on Stochastic Gradient Algorithms Framework
Ramadan Abdul-Rashid, Mohd Amiruddin...
A Privacy-Preserving Graph Neural Network Framework with Attention Mechanism for Computational Offloading in the Internet of Vehicles
Aishwarya Rajasekar, Vetriselvi...

Gradient Descent with Time-Decaying Regularization for Training Linear Neural Networks

Abstract

Keywords

54

10

0

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link