Hybrid Metaheuristics Based License Plate Character Recognition in Smart City

: Recent technological advancements have been used to improve the quality of living in smart cities. At the same time, automated detection of vehicles can be utilized to reduce crime rate and improve public security. On the other hand, the automatic identification of vehicle license plate (LP) character becomes an essential process to recognize vehicles in real time scenarios, which can be achieved by the exploitation of optimal deep learning (DL) approaches. In this article, a novel hybrid metaheuristic optimization based deep learning model for automated license plate character recognition (HMODL-ALPCR) technique has been presented for smart city environ-ments. The major intention of the HMODL-ALPCR technique is to detect LPs and recognize the characters that exist in them. For effective LP detection process, mask regional convolutional neural network (Mask-RCNN) model is applied and the Inception with Residual Network (ResNet)-v2 as the baseline network. In addition, hybrid sunflower optimization with butterfly optimization algorithm (HSFO-BOA) is utilized for the hyperparameter tuning of the Inception-ResNetv2 model. Finally, Tesseract based character recognition model is applied to effectively recognize the characters present in the LPs. The experimental result analysis of the HMODL-ALPCR technique takes place against the benchmark dataset and the experimental outcomes pointed out the improved efficacy of the HMODL-ALPCR technique over the recent methods.


Introduction
Continual urbanization possesses difficult problems on living quality and sustainable development of urban residents in smart cities [1]. The idea of smart cities is to make very effective usages of scarce resources, and enhance the quality of public services and citizen lives [2]. With the growth of embedded devices, Internet of Things (IoT), for example, mobiles phones sensors, Radio Frequency Identifications (RFIDs), and actuators, constructed into all the fabrics of urban environment and coupled together [3]. Several smart city applications were deployed and developed, e.g., smart healthcare, intelligent transportation, public safety, and environment monitoring, etc. License plate recognition (LPR) system is often a great advantage for parking, traffic, cruise control, and toll management applications [4]. Regarding security management and monitoring of any region or place, LPR system is utilized as tracing assistance to help eyes for the safety teams. In terms of law and safety enforcement, LPR system plays an important part in safeguarding, monitoring the borders, and physical intrusion [5]. Different types of LPR systems are introduced by utilizing many smart computation models to attain efficiency and accuracy.
Various recognition approaches were described to implement many intermediate processing phases at the time of Region of Interest (ROI) extraction. Nonetheless, fraud situations such as replacement and alteration, LPR system is related to intelligence method for effectiveness [6]. The first phase of LPR systems is plate localization that is related to a recognition method for license plates (LP) in the input image. Algorithms like threshold or edge detection [7] are utilized by the video sequence. But Gabor filter is taken into account as a promising method for plate recognition through RBG image [8], whereas the previous one employs grey scale conversion for binary images. As well, generate histogram by means of vertical and horizontal prediction on the input images to recognize ROI based histogram that identifies the plates through multiple objects. Also, Hough conversion is employed for finding the edges bounded by the number plate [9]. This paper develops an intelligent hybrid metaheuristic optimization based deep learning model for automated license plate character recognition (HMODL-ALPCR) technique that has been presented for smart city environments. The HMODL-ALPCR technique involves mask regional convolutional neural network (Mask-RCNN) model is applied and the Inception with Residual Network (ResNet)-v2 as the baseline network. In addition, hybrid sunflower optimization with butterfly optimization algorithm (HSFO-BOA) is utilized for the hyperparameter tuning of the Inception-ResNetv2 model. Finally, Tesseract based character recognition model is applied to effectively recognize the characters present in the LPs. The experimental result analysis of the HMODL-ALPCR technique takes place against the benchmark dataset.

Literature Review
Deep learning (DL), a comparatively young learning model in the CI family, has its source from Artificial Neural Networks (ANN). It enables computation model that is made up of multiprocessing layers to learn representation of information with multi stages of abstraction, also it is capable of discovering complex structures from natural information in their new form without needing complex feature tuning and engineering [10]. In comparison with conventional ML models, DL method could develop exceptionally complex functions over layers of nonlinear conversion trainable from the start to the termination. In [11], proposed a cascaded DL method for constructing an effective Automatic license plate (ALP) recognition and detection method for the vehicle of northern Iraq. The LP in northern Iraq contains country region, plate number, and city region. Initially, the presented technique uses various pre-processing methods like adaptive image contrast enhancement and Gaussian filtering for making the input image better suitable for additional processing. Next, a deep semantic segmentation network is utilized for determining the three LPs of the input images. Then, Segmentation is performed by using deep encoder-decoder network framework.
Chen [12] resolves the issues of car LP recognition through a YOLO darknet DL architecture. In the work, we employ YOLO seven convolution layers to identify an individual class. The recognition model is a sliding-window method. The object is to identify Taiwan car LP. Izidio et al. [13] introduced a method to engineer systems to recognize and detect Brazilian LP with CNN i.e., appropriate for embedded systems. The resultant systems detect LP in the captured image through Tiny YOLOv3 framework and recognize its character with second convolution networks trained on synthetic image and finetuned with actual LP image. Pustokhina et al. [14] proposed an efficient DL-based VLPR method with optimum K-means (OKM) cluster-based classification and CNN based detection method. The presented method works on three major phases such as LP segmentation, detection with OKM cluster method, and LP number detection with CNN method. In the initial phase, LP detection and localization method take place.

The Proposed Model
In this article, an automated HMODL-ALPCR technique has been presented to detect LPs and recognize the characters that exist in them for smart city environments. The HMODL-ALPCR technique involves Mask-RCNN for the detection of LPs and Inception with ResNet-v2 as the baseline network. Moreover, the HSFO-BOA is utilized for the hyperparameter tuning of the Inception-ResNetv2 model. Lastly, Tesseract based character recognition model is applied to effectively recognize the characters present in the LPs.

Phase I: Mask RCNN Based LP Detection Process
The Mask R-CNN technique is melioration dependent upon Faster R-CNN detection technique that presents the fully convolutional network (FCN) for generating masks. During the real time target detection procedure, the pixel of target are categorized accurately, and after that, the contour of target was judged. An image was primary input as to the backbone network consisting of Inception with ResNet v2 and FPN [15]. The structure of Mask RCNN is shown in Fig. 1. The backbone network removes any shared feature map (FM) which integrates the coordinate data of detection target place and the form texture data. Afterward, the RPN region offer network utilizes a sliding window for traversing this FM for generating many anchor frames with group of fixed scale and aspect ratio. Afterward, the non-maximum suppression (NMS) technique was utilized for selecting the anchor box with superior score [16]. During the RoIAlign layer of Mask R-CNN technique, the quantization function from the feature aggregation procedure was changed by bilinear interpolation technique that keeps the issue of mismatching and enhancing the accuracy of detecting and segmenting. In the trained procedure, the Mask R-CNN technique determines the multitask loss function to all sampled RoI as (1) L cls implies the classifier error, L box refers the recognition error, and L mask stands for the segmented error. L cls and L box from the Mask, R-CNN is determined as: where p i signifies the forecasted probability of i th target on anchor point. p * i has referred to as the sign of anchor point samples. If the anchor point instance was positive, p * i is 1; else, it can be 0. Combined of t i and t * i are vectors consisting of 4 translation and scaling parameters that correspondingly. The weight N cls , N reg , and λ control the 2 losses for keeping balance. The classification and regression losses are determined as: where smooth (x) refers the robust loss that is referred as the translation χ of modified frame on the horizontal axis at anchor points. It can be demonstrated as: L mask in Mask, R-CNN is the average binary cross entropy function which explains the loss of semantic segmentation branch. During the mask branch, an input FM is resultant as to k × m × m formats then process, and k and m, correspondingly, controls the dimensional and scale of the FMs. The |x| < l, comparative entropy was reached by the pixel-by-pixel sigmoid computation of resultant FM, and the average entropy error is L mask .
In the Mask RCNN model, the Inception with ResNetv2 is utilized as the baseline network. DL concentrates on effectiveness as a human mind. If the child was trained on distinct animals, an arbitrary image was created from the mind of child which is a dog as follows and cat as follows, and from the future, the child is identified as this animal. In DL work on a similar rule. Transfer learning (TL) is the next stage from DL. In trained a NN technique needs several times and various runs for capturing the accurate weight based on this model condition. It can be tedious works and could not be simple to student a novel to the field for entering TL. The TL manages the methods led by field experts to the public that skip the necessity of determining compatible weight and carry on to next stage of trained method on novel input data. An Inception ResNetV2 is introduced [17] by combining the 2 most famous DCNN, Inception and ResNet, and utilizing batch-normalization (BN) to the convention layer before summation. The leftover components are specially employed for enabling a superior amount of Inception block and consequence, deeper method. As already mentioned, the extremely noticeable complexity compared with highly deep network is the trained phase. It can be managed to utilize remaining connection. But, an enormous amount of filters were utilized from the system, the remaining was scaled down in an effectual manner for dealing with the trained complexity. If the amount of strainer surpasses 1000, the remaining variants encounter variability, and the network could not be trained. Thus the outcome, the remaining supports are scaled from network trained stabilization.
The sigmoid function was numerically measured which is the feature of transmitting some actual value to range amongst zero and one, shaped like the letter "S." The logistic function was another name to the sigmoid function. The sigmoid function is written as: An important benefit of the sigmoid function is that it occurs amongst 2 points, 0 and 1. Thus the result can be most effective from this technique where it is required for anticipating probability as outcome. It can be selected this function as the possibility of something happening is only amongst zero and one.

Phase II: Design of HSFO-BOA Based Hyperparameter Tuning
In order to optimally adjust the hyperparameters involved in the Inception with ResNetv2 model, the HSFO-BOA is derived. A sunflower lifecycle is reliable: as they arise, accompany the sun daily and the needles of clock. Here, the inverse square law radiation is another key nature-based optimization. The heat quantity Q received by the plant is shown as follows [18]: While P indicates the source power and r i represent the distance between the existing paramount and the plant i.
The sunflower stride in the direction s can be evaluated as follows: Here, λ shows the perpetual value that determines a "inertial" dislocation of the plant, P i ( X i − X i−1 ) indicates the possibility of pollination as follows: In the equation, X max and X min denotes the upper and lower limits, and N pop indicates the overall amount of plants: The process initiates with population generation that may be random or even. Corresponding individual ratings assist in choosing which one would be moved towards the sun. Next, each entity will position itself into the sun and move in a random manner. However, it is proposed to include the capacity to function with different suns in a future version, now it is restricted to the study. Paramount plants would pollinate around the sun.
For improving the efficacy of the SFO algorithm, the HSFO-BOA is derived by the integration of BOA to it. The BOA imitates the natural behavior of the butterflies on food sources finding and mating. This approach uses two distinct navigation patterns for searching the domains [19]. In the exploration stage (r 1 ≤ p), butterflies move to the optimal butterfly of the colony whereas in the exploitation stage (r 1 > p), butterfly performs an arbitrary search within the searching space by moving to a random butterfly in the colony. The mathematical expression of both patterns are given in the following: When r 1 ≤ p, the global search process becomes When r 1 > p, the local search process becomes Here, t and t + 1 indicate the present and upgraded states. As well, position of optimal butterfly in the colony has been demonstrated as g * , and t X i and t X k , are positions of two arbitrarily designated butterflies; r 1 , r 2 and r 3 indicates three random scalars uniformly chosen within [0,1],· ϕ i represent the fragrance factor and it is determined by the following equation: Whereas, ϕ i indicates the fragrance magnitude for ith butterfly; c denotes a coefficient, I, and a shows the intensity of the stimulus and the fluctuating absorption degree. I is related to the objective function, and for ith butterfly, it is considered f (X i ) , whereas f return objective function of the problem. The a and c coefficients are designated within [0,1],· p indicates the likelihood switch that describes the search behavior.

Phase III: Tesseract Based Character Recognition
Primarily, Adaptive Thresholding was implemented for changing the image as to binary version utilizing Otsu's technique [20]. The page layout analysis is the next stage and was implemented by removing the text block in the region. Afterward, the baselines of all lines were identified and the texts were separated as words with the application of finite as well as fuzzy spaces. During the next phase, the character summaries are removed in the words. The text detection was introduced as 2-pass technique. Primary pass, a word detection was implemented with the application of static classifier. All the words are passed suitably for adaptive classifying from the procedure of trained data. The secondary pass was run on the page utilizing a novel adaptive classifier technique where the words are not studied comprehensively for re-examining the modules.

Experimental Validation
The performance validation of the HMODL-ALPCR technique takes place using three benchmark datasets namely FZU Cars, Stanford Cars, and HumAIn 2019 dataset. Few sample images are depicted in Fig. 2. Tab. 1 offers the LP detection outcome analysis of the HMODL-ALPCR technique under distinct epochs. Fig. 4 examines the LP detection result analysis of the HMODL-ALPCR technique under distinct epochs on FZU Cars dataset. With 100 epochs, the HMODL-ALPCR technique has offered prec n , reca l , F score , and mAP of 99.05%, 99.07%, 98.91%, and 98.56% respectively. Also, with 200 epochs, the HMODL-ALPCR technique has attained prec n , reca l , F score , and mAP of 99.05%, 99.54%, 98.71%, and 98.42% respectively. Similarly, with 300 epochs, the HMODL-ALPCR technique has provided prec n , reca l , F score , and mAP of 99.05%, 99.42%, 98.73%, and 97.86% respectively. Likewise, with 400 epochs, the HMODL-ALPCR technique has exhibited prec n , reca l , F score , and mAP of 99.00%, 99.40%, 98.76%, and 98.37% respectively.     With 100 epochs, the HMODL-ALPCR approach has offered prec n , reca l , F score , and mAP of 98.99%, 99.00%, 97.86%, and 96.95% correspondingly. Besides, with 200 epochs, the HMODL-ALPCR methodology has reached prec n , reca l , F score , and mAP of 98.12%, 99.27%, 98.46%, and 97.81% respectively. In addition, with 300 epochs, the HMODL-ALPCR system has offered prec n , reca l , F score , and mAP of 97.74%, 98.96%, 98.54%, and 96.32% correspondingly. Moreover, with 400 epochs, the HMODL-ALPCR methodology has demonstrated prec n , reca l , F score , and mAP of 98.49%, 99.11%, 98.48%, and 97.08% correspondingly. With 100 epochs, the HMODL-ALPCR technique has obtainable prec n , reca l , F score , and mAP of 98.57%, 99.14%, 99.31%, and 97.74% respectively. Along with that, with 200 epochs, the HMODL-ALPCR approach has reached prec n , reca l , F score , and mAP of 98.47%, 98.97%, 98.72%, and 98.32% respectively. Similarly, with 300 epochs, the HMODL-ALPCR technique has accessible prec n , reca l , F score , and mAP of 99.34%, 99.13%, 98.63%, and 98.11% correspondingly. At last, with 400 epochs, the HMODL-ALPCR system has exhibited prec n , reca l , F score , and mAP of 98.94%, 99.25%, 98.82%, and 98.40% respectively. and mAP. In line with, the DL-ResNet101 and HT-SSA-CNN techniques have attained slightly enhanced values of prec n , reca l , F score , and mAP. Next to that, the DL-VLPNR and OKM-CNN techniques have reached reasonable values of prec n , reca l , F score , and mAP. However, the HMODL-ALPCR technique has outperformed the other methods with the maximum prec n , reca l , F score , and mAP of 99.04%, 99.36%, 98.80%, and 98.18% respectively.   Fig. 8 examine the comparison study of the HMODL-ALPCR approach on the test Stanford Cars dataset. The outcomes exhibited that the CNN-VGG16 and DL-ResNet50 algorithms have reached lesser performance with the minimal values of prec n , reca l , F score , and mAP. Likewise, the DL-ResNet101 and HT-SSA-CNN techniques have attained slightly enhanced values of prec n , reca l , F score , and mAP. Followed by, the DL-VLPNR and OKM-CNN techniques have reached reasonable values of prec n , reca l , F score , and mAP. At last, the HMODL-ALPCR system has outperformed the other methods with the maximum prec n , reca l , F score , and mAP of 98.46%, 99.09%, 98.21%, and 97.10% respectively.  Tab. 4 and Fig. 9 examine the comparison study of the HMODL-ALPCR approach on the HumAIn 2019 dataset. The outcomes exhibited that the CNN-VGG16 and DL-ResNet50 systems have obtained lesser performance with the reduced values of prec n , reca l , F score , and mAP. Besides, the DL-ResNet101 and HT-SSA-CNN techniques have attained somewhat enhanced values of prec n , reca l , F score , and mAP. Afterward, the DL-VLPNR and OKM-CNN algorithms have obtained reasonable values of prec n , reca l , F score , and mAP. But, the HMODL-ALPCR system has outperformed the other algorithms with the maximal prec n , reca l , F score , and mAP of 98.81%, 99.09%, 98.89%, and 98.14% correspondingly.
After examining the above mentioned tables and figures, it is obvious that the HMODL-ALPCR technique has outperformed the other techniques on all the datasets.   In this article, an automated HMODL-ALPCR technique has been presented to detect LPs and recognize the characters that exist in them for smart city environments. The HMODL-ALPCR technique involves Mask-RCNN for the detection of LPs and Inception with ResNet-v2 as the baseline network. Moreover, the HSFO-BOA is utilized for the hyperparameter tuning of the Inception-ResNetv2 model. Lastly, Tesseract based character recognition model is applied to effectively recognize the characters present from the LPs. The experimental result analysis of the HMODL-ALPCR technique takes place against the benchmark dataset and the experimental outcomes pointed out the improved efficacy of the HMODL-ALPCR technique on existing techniques. In future, the detection performance can be improvised by the design of hybrid DL models for smart city environments.