Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (2)
  • Open Access

    ARTICLE

    MVCE-Net: Multi-View Region Feature and Caption Enhancement Co-Attention Network for Visual Question Answering

    Feng Yan1, Wushouer Silamu2, Yanbing Li1,*

    CMC-Computers, Materials & Continua, Vol.76, No.1, pp. 65-80, 2023, DOI:10.32604/cmc.2023.038177

    Abstract Visual question answering (VQA) requires a deep understanding of images and their corresponding textual questions to answer questions about images more accurately. However, existing models tend to ignore the implicit knowledge in the images and focus only on the visual information in the images, which limits the understanding depth of the image content. The images contain more than just visual objects, some images contain textual information about the scene, and slightly more complex images contain relationships between individual visual objects. Firstly, this paper proposes a model using image description for feature enhancement. This model encodes images and their descriptions separately… More >

  • Open Access

    ARTICLE

    Fine-Grained Features for Image Captioning

    Mengyue Shao1, Jie Feng1,*, Jie Wu1, Haixiang Zhang1, Yayu Zheng2

    CMC-Computers, Materials & Continua, Vol.75, No.3, pp. 4697-4712, 2023, DOI:10.32604/cmc.2023.036564

    Abstract Image captioning involves two different major modalities (image and sentence) that convert a given image into a language that adheres to visual semantics. Almost all methods first extract image features to reduce the difficulty of visual semantic embedding and then use the caption model to generate fluent sentences. The Convolutional Neural Network (CNN) is often used to extract image features in image captioning, and the use of object detection networks to extract region features has achieved great success. However, the region features retrieved by this method are object-level and do not pay attention to fine-grained details because of the detection… More >

Displaying 1-10 on page 1 of 2. Per Page