TY  - EJOU
AU  - Li, Long 
AU  - Wu, Hengyang 
AU  - Wang, Na 

TI  - Zero-Shot Image Captioning Method Based on the Hamiltonian Monte Carlo
T2  - Journal on Artificial Intelligence

PY  - 2026
VL  - 8
IS  - 1
SN  - 2579-003X

AB  - Zero-shot learning as an emerging approach in image captioning techniques, has garnered significant attention from researchers in recent years due to its ability to accomplish tasks without requiring specific category training data. Existing zero-shot image captioning schemes largely rely on traditional language models, which exhibit low efficiency and suboptimal generation quality. To address this issue, this study proposes Hamiltonian Monte Carlo for Image Captioning (HMCIC). This method first models the image captioning task as a probabilistic sampling problem in parameter space, integrating semantic matching and syntactic coherence into an energy function to guide the generation process toward high-quality captions. Secondly, it introduces momentum variables from Hamiltonian dynamics, enabling the sampling process to traverse local optima and achieve smoother, more efficient exploration in parameter space, effectively mitigating the “random walk” phenomenon common in traditional sampling. Finally, by iteratively optimizing the sampling trajectory, the generated descriptions achieve a better balance between semantic accuracy and linguistic fluency. This enables more efficient and accurate zero-shot image captioning without requiring category-specific training. Experimental results on two public datasets demonstrate that compared to other current zero-shot methods, our approach achieves nearly 1.5 times faster average generation speed while also improving word generation accuracy. This indicates the effectiveness of the proposed method.
KW  - Zero-shot; hamiltonian monte carlo; sampling algorithm; image captioning

DO  - 10.32604/jai.2026.077462