Lei Sun1,2, Jingwen Wang2,*, Peng Hu2, Xiuqing Mao1,2, Cuiyun Hu1,2, Zhihong Wang2
CMC-Computers, Materials & Continua, Vol.87, No.2, 2026, DOI:10.32604/cmc.2026.073657
- 12 March 2026
Abstract Transformer models face significant computational challenges in private inference (PI). Existing optimization methods often rely on isolated techniques, neglecting joint structural and operational improvements. We propose IG-3D, a unified framework that integrates structured compression and operator approximation through accurate importance assessment. Our approach first evaluates attention head importance using Integrated Gradients (IG), offering greater stability and theoretical soundness than gradient-based methods. We then apply a three-dimensional optimization: (1) structurally pruning redundant attention heads; (2) replacing Softmax with adaptive polynomial approximation to avoid exponential computations; (3) implementing layer-wise GELU substitution to accommodate different layer characteristics. A More >