Xuanhong Wang1, Hongyu Guo1, Jiazhen Li1, Mingchen Wang1, Xian Wang1, Yijun Zhang2,*
CMC-Computers, Materials & Continua, Vol.86, No.1, pp. 1-19, 2026, DOI:10.32604/cmc.2025.069482
- 10 November 2025
Abstract Over the past decade, large-scale pre-trained autoregressive and diffusion models rejuvenated the field of text-guided image generation. However, these models require enormous datasets and parameters, and their multi-step generation processes are often inefficient and difficult to control. To address these challenges, we propose CAFE-GAN, a CLIP-Projected GAN with Attention-Aware Generation and Multi-Scale Discrimination, which incorporates a pre-trained CLIP model along with several key architectural innovations. First, we embed a coordinate attention mechanism into the generator to capture long-range dependencies and enhance feature representation. Second, we introduce a trainable linear projection layer after the CLIP text… More >