AI Model Compression Methods: A Distribution-Aware Residual Entropy Quantization

Nikita Sakovich; Dmitry Aksenov; Ekaterina Pleshakova; Sergey Gataullin

doi:10.32604/cmc.2026.079522

Open Access icon Open Access

ARTICLE

AI Model Compression Methods: A Distribution-Aware Residual Entropy Quantization

Nikita Sakovich¹, Dmitry Aksenov¹, Ekaterina Pleshakova^1,*, Sergey Gataullin^1,2

1 MIREA—Russian Technological University, Institute of Advanced Technologies and Industrial Programming, Russia 78 Vernadsky Avenue, Moscow, Russia
2 Social Modeling Lab, Central Economics and Mathematics Institute, Russian Academy of Sciences, Nakhimovsky Pr., 47, Moscow, Russia

* Corresponding Author: Ekaterina Pleshakova. Email: email

Computers, Materials & Continua 2026, 88(2), 32 https://doi.org/10.32604/cmc.2026.079522

Received 22 January 2026; Accepted 02 April 2026; Issue published 15 June 2026

Abstract

We introduce the DARE-Q (Distribution-Aware Residual Entropy Quantization) method—a post-training quantization method for neural network weights designed to reduce bit-width with minimal degradation of model quality. Unlike traditional approaches that solely optimize the mean squared error of weight approximation, DARE-Q additionally considers the entropy of the quantization residual, allowing for control over the statistical properties of the resulting error. The method is based on channel-wise symmetric uniform quantization with scaling based on a combined loss function that includes L2 distortion and entropy regularization. The DARE-Q method is implemented as a compact DAREQuantLinear module which can be easily integrated into standard transformer pipelines without changing the inference logic or using specific kernels. The experimental analysis was conducted on the language models facebook/opt-125m and facebook/opt-350m, which contain approximately 125 and 350 million parameters. The quality of the models was assessed using the standard perplexity metric (PPL) computed on the wikitext-2-raw-v1 dataset. DARE-Q is completely data-free and does not require model retraining or calibration data, which makes it the only viable option in privacy-sensitive or confidential environments where access to the original training data is restricted—precisely the setting where methods such as GPTQ and AWQ cannot be applied. The observed increase in PPL relative to data-dependent baselines reflects this fundamental trade-off rather than a shortcoming of the approach. By leveraging per-channel scale selection and a combined loss function, DARE-Q provides a flexible trade-off between approximation accuracy and quantization error structure, creating an attractive algorithmic basis for further improvement of model compression methods.

Keywords

Artificial intelligence; large language models; mathematical optimization methods; model compression; quantization methods; information theory; high-performance computing

Cite This Article

APA Style

Sakovich, N., Aksenov, D., Pleshakova, E., Gataullin, S. (2026). AI Model Compression Methods: A Distribution-Aware Residual Entropy Quantization. Computers, Materials & Continua, 88(2), 32. https://doi.org/10.32604/cmc.2026.079522

Vancouver Style

Sakovich N, Aksenov D, Pleshakova E, Gataullin S. AI Model Compression Methods: A Distribution-Aware Residual Entropy Quantization. Comput Mater Contin. 2026;88(2):32. https://doi.org/10.32604/cmc.2026.079522

IEEE Style

N. Sakovich, D. Aksenov, E. Pleshakova, and S. Gataullin, “AI Model Compression Methods: A Distribution-Aware Residual Entropy Quantization,” Comput. Mater. Contin., vol. 88, no. 2, pp. 32, 2026. https://doi.org/10.32604/cmc.2026.079522

BibTex EndNote RIS

Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

AI Model Compression Methods: A Distribution-Aware Residual Entropy Quantization

Abstract

Keywords

Cite This Article

1152

500

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link