Tech Science Press - Publisher of Open Access Journals

Open Access

ARTICLE

PCATNet: Position-Class Awareness Transformer for Image Captioning

Ziwei Tang¹, Yaohua Yi^2,*, Changhui Yu², Aiguo Yin³

CMC-Computers, Materials & Continua, Vol.75, No.3, pp. 6007-6022, 2023, DOI:10.32604/cmc.2023.037861

Abstract Existing image captioning models usually build the relation between visual information and words to generate captions, which lack spatial information and object classes. To address the issue, we propose a novel Position-Class Awareness Transformer (PCAT) network which can serve as a bridge between the visual features and captions by embedding spatial information and awareness of object classes. In our proposal, we construct our PCAT network by proposing a novel Grid Mapping Position Encoding (GMPE) method and refining the encoder-decoder framework. First, GMPE includes mapping the regions of objects to grids, calculating the relative distance among objects and quantization. Meanwhile, we… More >

Open Access

ARTICLE

Fine-Grained Features for Image Captioning

Mengyue Shao¹, Jie Feng^1,*, Jie Wu¹, Haixiang Zhang¹, Yayu Zheng²

CMC-Computers, Materials & Continua, Vol.75, No.3, pp. 4697-4712, 2023, DOI:10.32604/cmc.2023.036564

Abstract Image captioning involves two different major modalities (image and sentence) that convert a given image into a language that adheres to visual semantics. Almost all methods first extract image features to reduce the difficulty of visual semantic embedding and then use the caption model to generate fluent sentences. The Convolutional Neural Network (CNN) is often used to extract image features in image captioning, and the use of object detection networks to extract region features has achieved great success. However, the region features retrieved by this method are object-level and do not pay attention to fine-grained details because of the detection… More >

Open Access

ARTICLE

Enhanced Image Captioning Using Features Concatenation and Efficient Pre-Trained Word Embedding

Samar Elbedwehy^1,3,*, T. Medhat², Taher Hamza³, Mohammed F. Alrahmawy³

Computer Systems Science and Engineering, Vol.46, No.3, pp. 3637-3652, 2023, DOI:10.32604/csse.2023.038376

Abstract One of the issues in Computer Vision is the automatic development of descriptions for images, sometimes known as image captioning. Deep Learning techniques have made significant progress in this area. The typical architecture of image captioning systems consists mainly of an image feature extractor subsystem followed by a caption generation lingual subsystem. This paper aims to find optimized models for these two subsystems. For the image feature extraction subsystem, the research tested eight different concatenations of pairs of vision models to get among them the most expressive extracted feature vector of the image. For the caption generation lingual subsystem, this… More >

Open Access

ARTICLE

Red Deer Optimization with Artificial Intelligence Enabled Image Captioning System for Visually Impaired People

Anwer Mustafa Hilal^1,*, Fadwa Alrowais², Fahd N. Al-Wesabi³, Radwa Marzouk^4,5

Computer Systems Science and Engineering, Vol.46, No.2, pp. 1929-1945, 2023, DOI:10.32604/csse.2023.035529

Abstract The problem of producing a natural language description of an image for describing the visual content has gained more attention in natural language processing (NLP) and computer vision (CV). It can be driven by applications like image retrieval or indexing, virtual assistants, image understanding, and support of visually impaired people (VIP). Though the VIP uses other senses, touch and hearing, for recognizing objects and events, the quality of life of those persons is lower than the standard level. Automatic Image captioning generates captions that will be read loudly to the VIP, thereby realizing matters happening around them. This article introduces… More >

Open Access

ARTICLE

Natural Language Processing with Optimal Deep Learning-Enabled Intelligent Image Captioning System

Radwa Marzouk¹, Eatedal Alabdulkreem², Mohamed K. Nour³, Mesfer Al Duhayyim^4,*, Mahmoud Othman⁵, Abu Sarwar Zamani⁶, Ishfaq Yaseen⁶, Abdelwahed Motwakel⁶

CMC-Computers, Materials & Continua, Vol.74, No.2, pp. 4435-4451, 2023, DOI:10.32604/cmc.2023.033091

Abstract The recent developments in Multimedia Internet of Things (MIoT) devices, empowered with Natural Language Processing (NLP) model, seem to be a promising future of smart devices. It plays an important role in industrial models such as speech understanding, emotion detection, home automation, and so on. If an image needs to be captioned, then the objects in that image, its actions and connections, and any silent feature that remains under-projected or missing from the images should be identified. The aim of the image captioning process is to generate a caption for image. In next step, the image should be provided with… More >

Open Access

ARTICLE

Oppositional Harris Hawks Optimization with Deep Learning-Based Image Captioning

V. R. Kavitha¹, K. Nimala², A. Beno³, K. C. Ramya⁴, Seifedine Kadry⁵, Byeong-Gwon Kang⁶, Yunyoung Nam^7,*

Computer Systems Science and Engineering, Vol.44, No.1, pp. 579-593, 2023, DOI:10.32604/csse.2023.024553

Abstract Image Captioning is an emergent topic of research in the domain of artificial intelligence (AI). It utilizes an integration of Computer Vision (CV) and Natural Language Processing (NLP) for generating the image descriptions. It finds use in several application areas namely recommendation in editing applications, utilization in virtual assistance, etc. The development of NLP and deep learning (DL) models find useful to derive a bridge among the visual details and textual semantics. In this view, this paper introduces an Oppositional Harris Hawks Optimization with Deep Learning based Image Captioning (OHHO-DLIC) technique. The OHHO-DLIC technique involves the design of distinct levels… More >

Open Access

ARTICLE

Image Captioning Using Detectors and Swarm Based Learning Approach for Word Embedding Vectors

B. Lalitha^1,*, V. Gomathi²

Computer Systems Science and Engineering, Vol.44, No.1, pp. 173-189, 2023, DOI:10.32604/csse.2023.024118

Abstract IC (Image Captioning) is a crucial part of Visual Data Processing and aims at understanding for providing captions that verbalize an image’s important elements. However, in existing works, because of the complexity in images, neglecting major relation between the object in an image, poor quality image, labelling it remains a big problem for researchers. Hence, the main objective of this work attempts to overcome these challenges by proposing a novel framework for IC. So in this research work the main contribution deals with the framework consists of three phases that is image understanding, textual understanding and decoding. Initially, the image… More >

Open Access

ARTICLE

Efficient Image Captioning Based on Vision Transformer Models

Samar Elbedwehy^1,*, T. Medhat², Taher Hamza³, Mohammed F. Alrahmawy³

CMC-Computers, Materials & Continua, Vol.73, No.1, pp. 1483-1500, 2022, DOI:10.32604/cmc.2022.029313

Abstract Image captioning is an emerging field in machine learning. It refers to the ability to automatically generate a syntactically and semantically meaningful sentence that describes the content of an image. Image captioning requires a complex machine learning process as it involves two sub models: a vision sub-model for extracting object features and a language sub-model that use the extracted features to generate meaningful captions. Attention-based vision transformers models have a great impact in vision field recently. In this paper, we studied the effect of using the vision transformers on the image captioning process by evaluating the use of four different… More >

Open Access

ARTICLE

Low Complexity Encoder with Multilabel Classification and Image Captioning Model

Mahmoud Ragab^1,2,3,*, Abdullah Addas⁴

CMC-Computers, Materials & Continua, Vol.72, No.3, pp. 4323-4337, 2022, DOI:10.32604/cmc.2022.026602

Abstract Due to the advanced development in the multimedia-on-demand traffic in different forms of audio, video, and images, has extremely moved on the vision of the Internet of Things (IoT) from scalar to Internet of Multimedia Things (IoMT). Since Unmanned Aerial Vehicles (UAVs) generates a massive quantity of the multimedia data, it becomes a part of IoMT, which are commonly employed in diverse application areas, especially for capturing remote sensing (RS) images. At the same time, the interpretation of the captured RS image also plays a crucial issue, which can be addressed by the multi-label classification and Computational Linguistics based image… More >

Open Access

ARTICLE

A Position-Aware Transformer for Image Captioning

Zelin Deng^1,*, Bo Zhou¹, Pei He², Jianfeng Huang³, Osama Alfarraj⁴, Amr Tolba^4,5

CMC-Computers, Materials & Continua, Vol.70, No.1, pp. 2065-2081, 2022, DOI:10.32604/cmc.2022.019328

Abstract Image captioning aims to generate a corresponding description of an image. In recent years, neural encoder-decoder models have been the dominant approaches, in which the Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) are used to translate an image into a natural language description. Among these approaches, the visual attention mechanisms are widely used to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning. However, most conventional visual attention mechanisms are based on high-level image features, ignoring the effects of other image features, and giving insufficient consideration to the relative positions between image features.… More >

Displaying 1-10 on page 1 of 11. Per Page

View

560

Download

282

Like

0

View

926

Download

441

Like

0

View

613

Download

492

Like

2

View

826

Download

403

Like

0

View

687

Download

369

Like

0

View

1177

Download

1152

Like

0

View

1341

Download

574

Like

0

View

1562

Download

1142

Like

2

View

1196

Download

747

Like

0

View

1675

Download

1389

Like

0

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp: