Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (11)
  • Open Access


    PCATNet: Position-Class Awareness Transformer for Image Captioning

    Ziwei Tang1, Yaohua Yi2,*, Changhui Yu2, Aiguo Yin3

    CMC-Computers, Materials & Continua, Vol.75, No.3, pp. 6007-6022, 2023, DOI:10.32604/cmc.2023.037861

    Abstract Existing image captioning models usually build the relation between visual information and words to generate captions, which lack spatial information and object classes. To address the issue, we propose a novel Position-Class Awareness Transformer (PCAT) network which can serve as a bridge between the visual features and captions by embedding spatial information and awareness of object classes. In our proposal, we construct our PCAT network by proposing a novel Grid Mapping Position Encoding (GMPE) method and refining the encoder-decoder framework. First, GMPE includes mapping the regions of objects to grids, calculating the relative distance among objects and quantization. Meanwhile, we… More >

  • Open Access


    Fine-Grained Features for Image Captioning

    Mengyue Shao1, Jie Feng1,*, Jie Wu1, Haixiang Zhang1, Yayu Zheng2

    CMC-Computers, Materials & Continua, Vol.75, No.3, pp. 4697-4712, 2023, DOI:10.32604/cmc.2023.036564

    Abstract Image captioning involves two different major modalities (image and sentence) that convert a given image into a language that adheres to visual semantics. Almost all methods first extract image features to reduce the difficulty of visual semantic embedding and then use the caption model to generate fluent sentences. The Convolutional Neural Network (CNN) is often used to extract image features in image captioning, and the use of object detection networks to extract region features has achieved great success. However, the region features retrieved by this method are object-level and do not pay attention to fine-grained details because of the detection… More >

  • Open Access


    Enhanced Image Captioning Using Features Concatenation and Efficient Pre-Trained Word Embedding

    Samar Elbedwehy1,3,*, T. Medhat2, Taher Hamza3, Mohammed F. Alrahmawy3

    Computer Systems Science and Engineering, Vol.46, No.3, pp. 3637-3652, 2023, DOI:10.32604/csse.2023.038376

    Abstract One of the issues in Computer Vision is the automatic development of descriptions for images, sometimes known as image captioning. Deep Learning techniques have made significant progress in this area. The typical architecture of image captioning systems consists mainly of an image feature extractor subsystem followed by a caption generation lingual subsystem. This paper aims to find optimized models for these two subsystems. For the image feature extraction subsystem, the research tested eight different concatenations of pairs of vision models to get among them the most expressive extracted feature vector of the image. For the caption generation lingual subsystem, this… More >

  • Open Access


    Red Deer Optimization with Artificial Intelligence Enabled Image Captioning System for Visually Impaired People

    Anwer Mustafa Hilal1,*, Fadwa Alrowais2, Fahd N. Al-Wesabi3, Radwa Marzouk4,5

    Computer Systems Science and Engineering, Vol.46, No.2, pp. 1929-1945, 2023, DOI:10.32604/csse.2023.035529

    Abstract The problem of producing a natural language description of an image for describing the visual content has gained more attention in natural language processing (NLP) and computer vision (CV). It can be driven by applications like image retrieval or indexing, virtual assistants, image understanding, and support of visually impaired people (VIP). Though the VIP uses other senses, touch and hearing, for recognizing objects and events, the quality of life of those persons is lower than the standard level. Automatic Image captioning generates captions that will be read loudly to the VIP, thereby realizing matters happening around them. This article introduces… More >

  • Open Access


    Natural Language Processing with Optimal Deep Learning-Enabled Intelligent Image Captioning System

    Radwa Marzouk1, Eatedal Alabdulkreem2, Mohamed K. Nour3, Mesfer Al Duhayyim4,*, Mahmoud Othman5, Abu Sarwar Zamani6, Ishfaq Yaseen6, Abdelwahed Motwakel6

    CMC-Computers, Materials & Continua, Vol.74, No.2, pp. 4435-4451, 2023, DOI:10.32604/cmc.2023.033091

    Abstract The recent developments in Multimedia Internet of Things (MIoT) devices, empowered with Natural Language Processing (NLP) model, seem to be a promising future of smart devices. It plays an important role in industrial models such as speech understanding, emotion detection, home automation, and so on. If an image needs to be captioned, then the objects in that image, its actions and connections, and any silent feature that remains under-projected or missing from the images should be identified. The aim of the image captioning process is to generate a caption for image. In next step, the image should be provided with… More >

  • Open Access


    Oppositional Harris Hawks Optimization with Deep Learning-Based Image Captioning

    V. R. Kavitha1, K. Nimala2, A. Beno3, K. C. Ramya4, Seifedine Kadry5, Byeong-Gwon Kang6, Yunyoung Nam7,*

    Computer Systems Science and Engineering, Vol.44, No.1, pp. 579-593, 2023, DOI:10.32604/csse.2023.024553

    Abstract Image Captioning is an emergent topic of research in the domain of artificial intelligence (AI). It utilizes an integration of Computer Vision (CV) and Natural Language Processing (NLP) for generating the image descriptions. It finds use in several application areas namely recommendation in editing applications, utilization in virtual assistance, etc. The development of NLP and deep learning (DL) models find useful to derive a bridge among the visual details and textual semantics. In this view, this paper introduces an Oppositional Harris Hawks Optimization with Deep Learning based Image Captioning (OHHO-DLIC) technique. The OHHO-DLIC technique involves the design of distinct levels… More >

  • Open Access


    Image Captioning Using Detectors and Swarm Based Learning Approach for Word Embedding Vectors

    B. Lalitha1,*, V. Gomathi2

    Computer Systems Science and Engineering, Vol.44, No.1, pp. 173-189, 2023, DOI:10.32604/csse.2023.024118

    Abstract IC (Image Captioning) is a crucial part of Visual Data Processing and aims at understanding for providing captions that verbalize an image’s important elements. However, in existing works, because of the complexity in images, neglecting major relation between the object in an image, poor quality image, labelling it remains a big problem for researchers. Hence, the main objective of this work attempts to overcome these challenges by proposing a novel framework for IC. So in this research work the main contribution deals with the framework consists of three phases that is image understanding, textual understanding and decoding. Initially, the image… More >

  • Open Access


    Efficient Image Captioning Based on Vision Transformer Models

    Samar Elbedwehy1,*, T. Medhat2, Taher Hamza3, Mohammed F. Alrahmawy3

    CMC-Computers, Materials & Continua, Vol.73, No.1, pp. 1483-1500, 2022, DOI:10.32604/cmc.2022.029313

    Abstract Image captioning is an emerging field in machine learning. It refers to the ability to automatically generate a syntactically and semantically meaningful sentence that describes the content of an image. Image captioning requires a complex machine learning process as it involves two sub models: a vision sub-model for extracting object features and a language sub-model that use the extracted features to generate meaningful captions. Attention-based vision transformers models have a great impact in vision field recently. In this paper, we studied the effect of using the vision transformers on the image captioning process by evaluating the use of four different… More >

  • Open Access


    Low Complexity Encoder with Multilabel Classification and Image Captioning Model

    Mahmoud Ragab1,2,3,*, Abdullah Addas4

    CMC-Computers, Materials & Continua, Vol.72, No.3, pp. 4323-4337, 2022, DOI:10.32604/cmc.2022.026602

    Abstract Due to the advanced development in the multimedia-on-demand traffic in different forms of audio, video, and images, has extremely moved on the vision of the Internet of Things (IoT) from scalar to Internet of Multimedia Things (IoMT). Since Unmanned Aerial Vehicles (UAVs) generates a massive quantity of the multimedia data, it becomes a part of IoMT, which are commonly employed in diverse application areas, especially for capturing remote sensing (RS) images. At the same time, the interpretation of the captured RS image also plays a crucial issue, which can be addressed by the multi-label classification and Computational Linguistics based image… More >

  • Open Access


    A Position-Aware Transformer for Image Captioning

    Zelin Deng1,*, Bo Zhou1, Pei He2, Jianfeng Huang3, Osama Alfarraj4, Amr Tolba4,5

    CMC-Computers, Materials & Continua, Vol.70, No.1, pp. 2065-2081, 2022, DOI:10.32604/cmc.2022.019328

    Abstract Image captioning aims to generate a corresponding description of an image. In recent years, neural encoder-decoder models have been the dominant approaches, in which the Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) are used to translate an image into a natural language description. Among these approaches, the visual attention mechanisms are widely used to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning. However, most conventional visual attention mechanisms are based on high-level image features, ignoring the effects of other image features, and giving insufficient consideration to the relative positions between image features.… More >

Displaying 1-10 on page 1 of 11. Per Page