The rapid development and progress in deep machine-learning techniques have become a key factor in solving the future challenges of humanity. Vision-based target detection and object classification have been improved due to the development of deep learning algorithms. Data fusion in autonomous driving is a fact and a prerequisite task of data preprocessing from multi-sensors that provide a precise, well-engineered, and complete detection of objects, scene or events. The target of the current study is to develop an in-vehicle information system to prevent or at least mitigate traffic issues related to parking detection and traffic congestion detection. In this study we examined to solve these problems described by (1) extracting region-of-interest in the images (2) vehicle detection based on instance segmentation, and (3) building deep learning model based on the key features obtained from input parking images. We build a deep machine learning algorithm that enables collecting real video-camera feeds from vision sensors and predicting free parking spaces. Image augmentation techniques were performed using edge detection, cropping, refined by rotating, thresholding, resizing, or color augment to predict the region of bounding boxes. A deep convolutional neural network F-MTCNN model is proposed that simultaneously capable for compiling, training, validating and testing on parking video frames through video-camera. The results of proposed model employing on publicly available PK-Lot parking dataset and the optimized model achieved a relatively higher accuracy 97.6% than previous reported methodologies. Moreover, this article presents mathematical and simulation results using state-of-the-art deep learning technologies for smart parking space detection. The results are verified using Python, TensorFlow, OpenCV computer simulation frameworks.
The Internet has generally changed our lives, from the way we connect, the way we conduct business locally or globally, and how we move or travel. The Internet-of-Things (IoT) is a multi-dimensional expanding service of interconnected devices, networks, peoples, and valuable things that are provided with radio frequency identifications (RFIDs) and the ability to publish data over a smart-cloud without requiring a human-to-human interaction. Artificial intelligence, internet of things and big data analytics technologies are at hype nowadays and will stay to solve future transportations and other challenges. These ecosystem technologies are in an escalating period of growth in both the military, government, commercial sphere, and part of leading jobs to monitor and leverage those advanced developments. According to the world health organization (WHO) [
The ecosystems of connected devices do most of the data preprocessing tasks without human intervention, although humans may interact with these devices as well, simultaneously set them up and give instructions to actuators. The data-linking, networking, and communication protocols are used with these IoT-enabled devices, mostly dependent on Open-IoT APIs deployed on mobile devices by [
The recent results of self-driving vehicle systems, object identification showing improve results. Autonomous vehicle modalities and innovative ideas are one of the blazing application areas of the AI-ML research community recently, it can be benefited greatly from advanced technologies such as image augmentation, virtual reality, augmented reality, semantic segmentation, and explainable AI-techniques. However, with the advent of XR-AI, it might just be one step closer to make machines accountable, reasonable for their actions in the same manner as that humans may do [
The rest of the paper is organized as in Section 2, we discuss the general concept of vision-based technologies. Also, we look into the different groups of neural networks such as support vector machine (SVM), filter-based object reorganization, classification as well as optimized neural network model. In Section 3 we will discuss different machine learning techniques that provide the solution for many other object detection problems. Approaches such as deep convolutional neural network (D-CNN), recurrent convolutional neural network (R-CNN), inception-vs, and mask-RCNN will be discussed here. In Section 4, the comparison of the techniques in terms of its benefits and tradeoff are described based on statistical metrics. Also, we see how to improve the connected vehicles systems utilizing the proposed machine deep learning F-MTCNN model for the autonomous parking-lot detection. Finally, in Section 5, we will conclude this paper by looking back into all these methods and discuss whether these methods have somehow managed to formulate the solutions to the subject area intelligent transportation system (ITS).
Video-cameras have been installed all-around the city on poles, lamppost, and fixed on walls which are always vertical to the ground level. These mounted cameras are enabling the surveillance along with smart parking-lots detection along roadsides that how many vehicles are present in the parking areas. Smart parking solutions with multi-sensor data fusion is work together by senses of vision, hearing and touching systems are used to help us to navigate and understand its local or global positions. These sensor-suites are performing reasoning by using multimodal data collection utilization in AI-based ML-systems resulted in perfections as describe in following
A comprehensive autonomous driver assistant system (ADAS) gathers data with multiple-sensor that decides whether to plane, take the control decision to move based on a data-driven approach that could result in possible new solutions of current potential challenges. Multi-sensor data collection besides cameras ADASs further use sensors such as light detection and range finders (LIDAR), GPS, RADAR, inertial measurement units (IMUs) and more recently employing video cameras for data collections in ego-vehicles. ADAS can communicate with external aerial wireless network devices, satellites, or global position systems to help the driver with alternative routes planning and real-time information sharing proposed by Kubler et al. [
To detect the presence of vehicles in parking 360°-cameras recording covers approximately 200 m range or above surroundings is used to detect the availability of free parking spaces and possibly guide the vehicle autonomously the availability of parking space. Embedded cameras also support multi-streaming to expand its functionally to monitoring surveillance security solutions as well as outdoor, indoor parking solutions. These cameras play an important role in applications as they are inexpensive, easy to install, and easy to maintain. Using close-circuit television (CCTV) cameras make it possible to monitor general open-areas without the need for other expensive sensors. In-vehicle smart-cameras provide blind spot detection, 3D object mapping, localizing, and other proactive safety measurements provide by end-to-end connectivity autonomously.
LiDAR technology involves collecting and ranging the surrounding environment obstacles that can be fixed (i.e., mounted on a pole, or in vehicles), or it is mounted in moving vehicles is used to visualize the concept of time-of-flight. Lidar sends infrared lights beam that can detect smaller objects better (like obstacles, storms, bicyclists, and nearby objects) in nearer displacement based on the notion of time-of-flight. The interpretation of LiDAR generally involves perception, localization of the planned routing area. A radio-wave detection and ranging (RADAR) with a front-mounted camera provided enough information to analyze the road space ahead of the car, detecting road signs, traffic lights, other objects for information processing as that a human eye perceives with back-mounted radar as well. In each vehicle, a GPS-tractor is located onboard sensors to support vehicle tracking using google-maps, apple-maps, location identification.
The preceding sensor-suites described their strengths and limitations so the researchers take multi-sensor data integration approaches in synchronization, configuration, and calibration of key features that have becomes more important for accurate, reliable performance for the autonomous driving system. These sensors are used to augment the geographic information system (GIS) and global position system (GPS) sensing technologies to track the vehicle in accurate and precise manner.
Shivappa et al. described numerous multimodal fusion data techniques and an excellent survey is presented very well in the article. An example of late fusion how a camera and lidar detection separately and combine to produce effective outputs as illustrated in
Today, all leading automobile companies are developing algorithms for self-driving vehicles. Autonomous-cars (driverless cars, drones, autonomous-robots what so ever) are happenings, and appreciations all around the world currently. Autonomous-robots can perform the major human-like abilities in decision making just as a conventional car driver. Autonomous-cars are equipped with smart-cameras, GPS, LiDAR, LADAR, and advanced sensor technologies. The software empowering by tesla motors, general motors (GM), waymo formally google’s self-driving vehicle is known as google chauffeur and uber recently allowed testing of self-driving cars without a steering wheel and pedals on public roads describe by Krompier [
Seo et al. [
The Chu et al. [
Liu et al. [
Luo et al. [
In this paper, the authors described a high capability of classification using deep learning stacked network framework to encode the key features of input data streams, the implementation requirement is on graphical processing unit (GPU) devices. The pre-trained models extracted as If-THEN rules on the network input signal flow result in high classification capability. The generated test of the neural network model is designed using a deep belief neural network (DBNN) with effective computational speed [
Open-source programs developed by ROVIS research group that integrates the applications of vision-based perception to sense roadside objects that integrate pre-trained models developed using machine learning algorithms. The proposed system provides comprehensive development steps of object detection, mapping, localization using a smart 360°-camera setting, data streams preprocessing noise filtering, labeling, and semantic segmentation, object recognition along with 3D scene reconstruction [
An integrated self-diagnosing system (ISDS) for an autonomous agent-based on IoT-gateways and model transfer learning techniques. Connected vehicles detecting traffic patterns and find autonomous vehicle parking-lots available and assist the driver through SMS-alert, variable messaging signs boards, or display on dashboard describe by Frost et al. [
In this article, the researchers voluntarily work and screen their working framework’s from somewhere on the planet through connectivity using GPS sensor density with other existing frameworks. Smart automation system utilizes big-scale computing hardware and software resources that people integrate of remote correspondence, to give the remote-control differences how mobile apps provide proactive accident warnings, early parking solutions reduce overall traffic congestion, and increase the awareness of emergency detection [
In this study the proposed model employed an autonomous vehicle classification, parking space detection and count no. of vehicles with sensory input data by bringing their readings into a mutually synchronized framework. Precise calibration of key features with dimensionality reduction techniques is critical for the optimum performance. The design model serves as the prerequisite for data preprocessing, fusion of data with the deep neural network and enabling transfer-learning pretrained models.
Data or image augmentation is a technique for semantically generating more training data for image classification. The growing large training dataset, it may remove the issue of model overfitting on the observation of the perceive results. For a particular image class, scene images that can be easily created, refined by rotating, cropping, resizing, or augmenting the original images. To mitigate the issue of model overfitting by false injection, it can add information to network weights considering as hyper-parameter added with image augmentation [
All the operations on the input images, that can be generated a lot more training dataset from our original input data frames, which makes our final trained model much more accurate, precise by applying these functions operations randomly. The pk-lot dataset aims to democratize access to thousands of parking vehicle images, and foster innovation in higher-level autonomy functions for everyone, everywhere. It looks like a computer talks to you and you talk to the computer.
Support vector machine (SVM) algorithm does not work optimally with the datasets that are not linearly separable, firstly, it transforms features into a high dimensional space so the margin between two classes is maximized, SVM is a standard classifier works optimally with the linearly separable image datasets. This problem can be reduced by using “Kernel Trick” a method that returns the dot product of the parameters in the feature space, so that, each data point is mapped into a higher dimensional vector using some transformational techniques [
Support vector machine mathematical model description:
where a is the slope of line and b is any constant, therefore
The form above equation we get
Vector Notation of the
Let
The magnitude of vector w and x is given by
The cartesian form of the magnitude vector is given by the norm.
As we know the inner product returns the cosine of the angle between 2 vectors of unit length.
From
We know that x (x1, x2) and w (w1, w2) are two points on the xy-plane.
Form the above equation we may get:
Putting the value of cos (
Let the fitness function of slop can be computed for n-dimensional vectors is given by:
The minimum value of classification is either 0 or 1, only two possibilities are there, i.e.,
For training the whole dataset D we have to compute (multiple inputs, labels) for training a dataset, such that it is given:
To compute the functional optimal margin (F) value of dataset it is evaluated by the Langrangian Multiplier Method for weight optimization. Our objective is to find an optimal hyperplane that we can get after optimizing the weight vector
We expand the last equation w.r.t. of
where
After substitute the Langrangian Function
Thus
Subject to
Because of the constraints, we have inequalities, by putting for b and
However, input and output weights are updated at the optimal point
Multiple
where, it is known as a support vector, which is the closest point to the hyperplane given by
The above point on the hyperplane will be classified as class +1 (blank space found) and the point below the hyperplane will be classified as −1 (no space available). This function will exploit if we give nonzero values of a’s that correspond to the support vectors i.e., in fixing the maximum margin width on those that make all the a’s positive.
As in the current problem we have a small number of classes that are being recognized from the input video stream. We don’t need an extensively large neural network architecture instead of having a pretrained transfer-learning inception model describe by Raj [
The mathematical form for residual connections is illustrated in
The visual depiction for the RestNet block diagram is shown in
The descendant features of the instant image frame are locally represented by X and the processed computed features develop global features G(x). To check the localization of the vehicle concerning the landmark location within the frame on each motion and measurement-updates, a new posterior distribution can be established which provides its calculation. The convergence factor helps in establishing the correct loop closure that will function as the localization of the vehicle towards the correct prediction by Ravankar et al. [
In our proposed architecture, we have utilized the transfer-learning techniques on inception v-3 module which is pertained model to ImageNet 1000 classes [
In this proposed study, we utilized mask-RCNN along with the inception deep learning module for vehicle count and vacant parking space detection. Initially, the visual frame is captured from the input parking video feed. Once the frame is cropped, they passed to the inception network for counting the total number of vehicles present in frame of parking space with occupancy detection. The inception module is responsible for binary class prediction whether the slot is occupied or empty. Once the inception module returns result in the color of the rectangle represents the occupancy status (green means empty and red means occupied) as shown in video frames below. Other than occupancy detection, the proposed system is capable of detecting, labeling and instance segmenting out the vehicles on-road or anywhere in the video stream. The splitting of a dataset is 80% for training data and 20% validation data of sample dataset. The detection rate of the classifier was about 97.6% for input video feed of parking vehicles initially. On noisy images, Fusek et al. achieved 78% for positive samples prediction and 94% for negative ones proposed by Fusek et al. [
The artificially produced dataset which contains frames from dynamic video-streams. All training and testing were carried out on NVIDIA 1080 Ti GPU with 11 GB of memory, Intel Core i7 with a 64-bit operating system (CPU i7 (64)). The computational cost was measured for the projected model and sequential methods for the parking video dataset.
Firstly, based on neural network architecture such as CNNs that extract the dimensional points from an image or video feed, from low-level features, such as lines, edges, ROI, data segments, or circles to higher-level features of vehicles parts, persons, motorbike, etc. A few well-known base-neural networks models are LeNet, InceptionNet (aka. GoogleNet), RestNet, VGG-Net, AlexNet, and MobileNet, etc. Then secondly, pretrained perception, planning neural network model is attached to the end of a base neural network model that used to concurrently identify multi-class objects from a single frame or image with the help of the base high-level extracted features. After selecting the ROIs, it does the classification and regression task on them. Regression for precisely bound the ROI on the object and classification for prediction if it is an object. The detail of mask-RCNN is given below.
Deep learning algorithms are being used widely from classification task to, object detection and instance segmentation task due to their high reported accuracy. Detection of objects present in an image or sequence of images can be done using R-CNN, Fast-RCNN, or Faster-RCNN algorithms. The problem of segmentation is more advanced than the object detection, which cannot be alone done using only detection technique. It requires a pixel-level classification to make a segmentation mask around the detected object, which is very tough to classify individual objects and localize each using a bounding box. But the beauty of deep convolutional neural network is that they learn to perform this task. The problem of instance segmentation can be performed using mask-RCNN, which is a deep learning-based approach. The backbone architecture of mask-RCNN is similar to faster R-CNN. It performs region proposal task using region proposal network (RPN). After that, each region is classified as an object or instance background. The background regions are discarded and an object containing regions is further passed to a classifier network for recognizing the particular object class. After that, the detected regions are passed to a fully convolution neural network which draws the segmentation mask around the objects.
After implementing multi-task mask-RCNN the extracted masked regions are passed to the fully multi-task convolutional network model (F-MTCNN). Nevertheless, before passing to the F-MTCNN, the bounding boxes are drawn with the multi-parameters using dimensionality reduction values by using the ROI-alignment augment method. F-MTCNN is a simple deep convolutional neural network of classification layers. It outperforms the segmentation task on extracted ROIs on the selected parking frames. It draws the segmentation mask around the predicted vehicles present in the ROIs area as shown in images given below. We see in
Layer-name | Input size | Output size |
---|---|---|
Conv | ||
MaxPool | ||
Conv | ||
MaxPool | ||
Inception-3A | ||
Inception-3B | ||
MaxPool | ||
Inception-4A | ||
Inception-4B | ||
Inception-4C | ||
Inception-4D | ||
Inception-4E | ||
MaxPool | ||
Inception-5A | ||
Inception-5B | ||
AvgPool | ||
Dropout (0.5) | ||
Dense-1 (Fully connected) | ||
Dense-2 (Fully connected) | ||
Dense-3 (Soft-max) |
After in the first inception module, there are two layers in which we applied 256 and 480 filters that have been applied on the 28 × 28 image size followed by the max-pooling layer. Whereas in the second inception module there are 4 inception modules of 512, and 528 filters with the 14 × 14 kernel size. The second inception module also followed by the max-pooling layer. Whereas, in the last inception module 832 and 1024 filters have been applied with the kernel size of 7 × 7. After that, we have built the linear network which contains three fully connected layers. In the first fully connected layer, we obtained 1024 features. These features are mapped to the 512 features in the second fully connected layer, whereas in the last fully connected layer these 512 features are mapped to the second layers which give the probability against our defined classes in the algorithm.
The goal of this article is to introduce a video stream of open parking area with more frame per seconds for perfect classification scheme on the frames received from configured multi-sensor parking surveillance videos-cameras. The evaluation parameters are, let TP represent true positive, FP denotes false positive and FN represents false negative. To multi-sensory feeds the evaluation performance of the new large dataset CNN baseline model. The evaluation of our approach for vehicle parking segmentation and classification using the following 6 statistical metrics are given below.
Here the model employed dynamic learning rate with Adam-optimizer and dropout the neurons that leading to over-fitting of training data. The starting learning rate was 0.001. After 100 epochs it changed into 0.0001 and after 250 epochs that rate become 0.00001 to converge optimally close to the target output lastly. We train our model up to 4000 epochs to optimize the system until the desired accuracy is achieved.
It could be better that the performance depends on the utilization of highly powerful GPUs, TPUs, and high-resolution graphic card systems. The experimental results show that the powerful computational resources looking for distribution of processes, refining the proposed model described by Böhm et al. [
There also need to add efficient transfer-learning models with the proposed working model where integration of key parameters is using articulate-set of data structure for a multi-core GPUs to significantly increased computational performance using powerful devices.
The comparison of experimental results with earlier related reported work are presented here as follows.
Literature | Training | Validation | ||
---|---|---|---|---|
Accuracy (%) | Miss rate (%) | Accuracy (%) | Miss rate (%) | |
Fabian (2013) [ |
96.40 | 3.60 | 96.2 | 3.80 |
Amato et al. (2018) [ |
96.36 | 3.64 | 96.1 | 3.90 |
Proposed system model | 97.60 | 2.40 | 96.6 | 3.40 |
The above
The reason of stopping the training at 4000 epochs is due to the loss value becomes constant after 4000 epochs, so the best possible accuracy which is achieved on the training and validation dataset is shown in
The
In training accuracy as depicted in
The loss rate of the trained model should be as minimized as much as possible. The loss curves in both testing loss and training loss graph as shown in
This article proposed an ordered autonomous parking space detection system by providing visual input data to count empty vehicles parking spots and parked. Deep learning algorithms are showing increasing attention to own the growth of connected traffic data. Automated vehicles new functionality is advancing at a rapid pace virtually by all major auto concerns. The sheer number of sensors, the complexity of onboard diagnostic systems and decision-making systems are integrated with real traffic data analytics to disseminate information to solve user everyday needs. In this article, we proposed a deep convolutional neural network model F-MTCNN for parking spot detection, but not the last at least. The analysis results showed that the proposed multi-model system performs relatively well and attained accuracy 97.6%. Overall, the system is investigated a lot about how the mask-RCNN and inception CNN model in different video feeds to attain reasonable results and minimized error losses. Furthermore, we are developing more multi-model key features extracting algorithms for high training and testing accuracy performance. The possibilities are endless in terms of how the CNN technologies that would be applied and exciting to think about how to give our machines the “ability to see & talk” and help us to make the world better. Our future work includes incorporating deep knowledge about human behaviors, mobility, and connected vehicular technologies in multi-model object classification and detection.