Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (1)
  • Open Access

    ARTICLE

    PCATNet: Position-Class Awareness Transformer for Image Captioning

    Ziwei Tang1, Yaohua Yi2,*, Changhui Yu2, Aiguo Yin3

    CMC-Computers, Materials & Continua, Vol.75, No.3, pp. 6007-6022, 2023, DOI:10.32604/cmc.2023.037861

    Abstract Existing image captioning models usually build the relation between visual information and words to generate captions, which lack spatial information and object classes. To address the issue, we propose a novel Position-Class Awareness Transformer (PCAT) network which can serve as a bridge between the visual features and captions by embedding spatial information and awareness of object classes. In our proposal, we construct our PCAT network by proposing a novel Grid Mapping Position Encoding (GMPE) method and refining the encoder-decoder framework. First, GMPE includes mapping the regions of objects to grids, calculating the relative distance among objects and quantization. Meanwhile, we… More >

Displaying 1-10 on page 1 of 1. Per Page