Mo Hou1,2,3,#,*, Bin Xu4,#, Wen Shang1,2,3
CMC-Computers, Materials & Continua, Vol.86, No.2, pp. 1-19, 2026, DOI:10.32604/cmc.2025.071282
- 09 December 2025
Abstract Image captioning, a pivotal research area at the intersection of image understanding, artificial intelligence, and linguistics, aims to generate natural language descriptions for images. This paper proposes an efficient image captioning model named Mob-IMWTC, which integrates improved wavelet convolution (IMWTC) with an enhanced MobileNet V3 architecture. The enhanced MobileNet V3 integrates a transformer encoder as its encoding module and a transformer decoder as its decoding module. This innovative neural network significantly reduces the memory space required and model training time, while maintaining a high level of accuracy in generating image descriptions. IMWTC facilitates large receptive… More >