Special Issue "Recent Advances in Deep Learning, Information Fusion, and Features Selection for Video Surveillance Application"

Submission Deadline: 15 April 2021 (closed)
Guest Editors
Dr. Seifedine Kadry, Beirut Arab University, Lebanon.
Dr. Shuihua Wang, University of Leicester, UK.
Dr. V. Rajinikanth, St Joseph’s College, India.
Dr. Muhammad Attique Khan, HITEC University Taxila, Pakistan.


In the area of computer vision, human action recognition, gait recognition, and gesture recognition (HARGRGR) are important research areas from the last decade. The most important application of HARGRGR is video surveillance. As the imaging technique improvements and the camera expedient promotions, novel approaches for HAR continuously arise. Nowadays, through camera networks, a lot of videos are captured for human activities. Through these activities, it can be possible to predict the future activities of a human. For this purpose, many automated systems are proposed by computer vision researchers using machine learning algorithms. However, the question is how these systems can handle a large number of videos? Also, how they remove redundant or irrelevant information to monitor the required activities? The more recent, deep learning gain a huge success in the area of machine learning to handle a large amount of data with more accuracy as compared to classical techniques. For HARGRGR, deep learning can be more useful because it requires a large amount of data for training.

Sometimes, the deep learning models are trained on complex imaging datasets and due to these complex datasets, the required accuracy cannot be achieved. Therefore, it is possible to fuse two or more than two deep neural networks (layers information, features, etc.). But the question is that how the fusion process impact the system computational time? This problem can be resolve by employing feature reduction techniques.

This special issue aims to gather the achievemen of deep learning, information fusion, and feature selection in fields of action recognition, gait recognition, and gesture recognition.

• Human action recognition using deep learning for large video datasets
• Human gait recognition using deep learning
• Human gesture recognition using deep learning
• Deep learning models information fusion for action recognition, gait recognition, and gesture recognition
• Features fusion for action recognition, gait recognition, and gesture recognition
• Features selection and action recognition
• Gesture recognition and features selection
• Gait recognition and features selection
• Gait recognition in the real-time camera network using deep learning
• Learning a deep learning model using body parts for action recognition

Published Papers

  • Deep Learning-Based Approach for Arabic Visual Speech Recognition
  Abstract Lip-reading technologies are rapidly progressing following the breakthrough of deep learning. It plays a vital role in its many applications, such as: human-machine communication practices or security applications. In this paper, we propose to develop an effective lip-reading recognition model for Arabic visual speech recognition by implementing deep learning algorithms. The Arabic visual datasets that have been collected contains 2400 records of Arabic digits and 960 records of Arabic phrases from 24 native speakers. The primary purpose is to provide a high-performance model in terms of enhancing the preprocessing phase. Firstly, we extract keyframes from our dataset. Secondly, we produce…
  • Dynamic Hand Gesture Recognition Using 3D-CNN and LSTM Networks
  Abstract Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream. Many researchers have been working on vision-based gesture recognition due to its various applications. This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network (3D-CNN) and a Long Short-Term Memory (LSTM) network. The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation. The 3D-CNN is used for the extraction of spectral and spatial features which are then given to…
  • Smart Devices Based Multisensory Approach for Complex Human Activity Recognition
  Abstract Sensors based Human Activity Recognition (HAR) have numerous applications in eHeath, sports, fitness assessments, ambient assisted living (AAL), human-computer interaction and many more. The human physical activity can be monitored by using wearable sensors or external devices. The usage of external devices has disadvantages in terms of cost, hardware installation, storage, computational time and lighting conditions dependencies. Therefore, most of the researchers used smart devices like smart phones, smart bands and watches which contain various sensors like accelerometer, gyroscope, GPS etc., and adequate processing capabilities. For the task of recognition, human activities can be broadly categorized as basic and complex…
  • Fast Intra Mode Selection in HEVC Using Statistical Model
  Abstract Comprehension algorithms like High Efficiency Video Coding (HEVC) facilitates fast and efficient handling of multimedia contents. Such algorithms involve various computation modules that help to reduce the size of content but preserve the same subjective viewing quality. However, the brute-force behavior of HEVC is the biggest hurdle in the communication of multimedia content. Therefore, a novel method will be presented here to accelerate the encoding process of HEVC by making early intra mode decisions for the block. Normally, the HEVC applies 35 intra modes to every block of the frame and selects the best among them based on the RD-cost…
  • Optimized Convolutional Neural Network Models for Skin Lesion Classification
  Abstract Skin cancer is one of the most severe diseases, and medical imaging is among the main tools for cancer diagnosis. The images provide information on the evolutionary stage, size, and location of tumor lesions. This paper focuses on the classification of skin lesion images considering a framework of four experiments to analyze the classification performance of Convolutional Neural Networks (CNNs) in distinguishing different skin lesions. The CNNs are based on transfer learning, taking advantage of ImageNet weights. Accordingly, in each experiment, different workflow stages are tested, including data augmentation and fine-tuning optimization. Three CNN models based on DenseNet-201, Inception-ResNet-V2, and…
  • Weapons Detection for Security and Video Surveillance Using CNN and YOLO-V5s
  Abstract In recent years, the number of Gun-related incidents has crossed over 250,000 per year and over 85% of the existing 1 billion firearms are in civilian hands, manual monitoring has not proven effective in detecting firearms. which is why an automated weapon detection system is needed. Various automated convolutional neural networks (CNN) weapon detection systems have been proposed in the past to generate good results. However, These techniques have high computation overhead and are slow to provide real-time detection which is essential for the weapon detection system. These models have a high rate of false negatives because they often fail…
  • Anomaly Based Camera Prioritization in Large Scale Surveillance Networks
  Abstract Digital surveillance systems are ubiquitous and continuously generate massive amounts of data, and manual monitoring is required in order to recognise human activities in public areas. Intelligent surveillance systems that can automatically ide.pngy normal and abnormal activities are highly desirable, as these would allow for efficient monitoring by selecting only those camera feeds in which abnormal activities are occurring. This paper proposes an energy-efficient camera prioritisation framework that intelligently adjusts the priority of cameras in a vast surveillance network using feedback from the activity recognition system. The proposed system addresses the limitations of existing manual monitoring surveillance systems using a…
  • Recognition and Tracking of Objects in a Clustered Remote Scene Environment
  Abstract Object recognition and tracking are two of the most dynamic research sub-areas that belong to the field of Computer Vision. Computer vision is one of the most active research fields that lies at the intersection of deep learning and machine vision. This paper presents an efficient ensemble algorithm for the recognition and tracking of fixed shape moving objects while accommodating the shift and scale invariances that the object may encounter. The first part uses the Maximum Average Correlation Height (MACH) filter for object recognition and determines the bounding box coordinates. In case the correlation based MACH filter fails, the algorithms…
  • YOLOv2PD: An Efficient Pedestrian Detection Algorithm Using Improved YOLOv2 Model
  Abstract Real-time pedestrian detection is an important task for unmanned driving systems and video surveillance. The existing pedestrian detection methods often work at low speed and also fail to detect smaller and densely distributed pedestrians by losing some of their detection accuracy in such cases. Therefore, the proposed algorithm YOLOv2 ("YOU ONLY LOOK ONCE Version 2")-based pedestrian detection (referred to as YOLOv2PD) would be more suitable for detecting smaller and densely distributed pedestrians in real-time complex road scenes. The proposed YOLOv2PD algorithm adopts a Multi-layer Feature Fusion (MLFF) strategy, which helps to improve the model's feature extraction ability. In addition, one…
  • Multi-Layered Deep Learning Features Fusion for Human Action Recognition
  Abstract Human Action Recognition (HAR) is an active research topic in machine learning for the last few decades. Visual surveillance, robotics, and pedestrian detection are the main applications for action recognition. Computer vision researchers have introduced many HAR techniques, but they still face challenges such as redundant features and the cost of computing. In this article, we proposed a new method for the use of deep learning for HAR. In the proposed method, video frames are initially pre-processed using a global contrast approach and later used to train a deep learning model using domain transfer learning. The Resnet-50 Pre-Trained Model is…
  • Real-Time Violent Action Recognition Using Key Frames Extraction and Deep Learning
  Abstract Violence recognition is crucial because of its applications in activities related to security and law enforcement. Existing semi-automated systems have issues such as tedious manual surveillances, which causes human errors and makes these systems less effective. Several approaches have been proposed using trajectory-based, non-object-centric, and deep-learning-based methods. Previous studies have shown that deep learning techniques attain higher accuracy and lower error rates than those of other methods. However, the their performance must be improved. This study explores the state-of-the-art deep learning architecture of convolutional neural networks (CNNs) and inception V4 to detect and recognize violence using video data. In the…
  • Safest Route Detection via Danger Index Calculation and K-Means Clustering
  Abstract The study aims to formulate a solution for identifying the safest route between any two inputted Geographical locations. Using the New York City dataset, which provides us with location tagged crime statistics; we are implementing different clustering algorithms and analysed the results comparatively to discover the best-suited one. The results unveil the fact that the K-Means algorithm best suits for our needs and delivered the best results. Moreover, a comparative analysis has been performed among various clustering techniques to obtain best results. we compared all the achieved results and using the conclusions we have developed a user-friendly application to provide…
  • Video Analytics Framework for Human Action Recognition
  Abstract Human action recognition (HAR) is an essential but challenging task for observing human movements. This problem encompasses the observations of variations in human movement and activity identification by machine learning algorithms. This article addresses the challenges in activity recognition by implementing and experimenting an intelligent segmentation, features reduction and selection framework. A novel approach has been introduced for the fusion of segmented frames and multi-level features of interests are extracted. An entropy-skewness based features reduction technique has been implemented and the reduced features are converted into a codebook by serial based fusion. A custom made genetic algorithm is implemented on…
  • Convolutional Bi-LSTM Based Human Gait Recognition Using Video Sequences
  Abstract Recognition of human gait is a difficult assignment, particularly for unobtrusive surveillance in a video and human identification from a large distance. Therefore, a method is proposed for the classification and recognition of different types of human gait. The proposed approach is consisting of two phases. In phase I, the new model is proposed named convolutional bidirectional long short-term memory (Conv-BiLSTM) to classify the video frames of human gait. In this model, features are derived through convolutional neural network (CNN) named ResNet-18 and supplied as an input to the LSTM model that provided more distinguishable temporal information. In phase II,…
