Deep Reinforcement Learning Enabled Smart City Recycling Waste Object Classification

The Smart City concept revolves around gathering real time data from citizen, personal vehicle, public transports, building, and other urban infrastructures like power grid and waste disposal system. The understandings obtained from the data can assist municipal authorities handle assets and services effectually. At the same time, the massive increase in environmental pollution and degradation leads to ecological imbalance is a hot research topic. Besides, the progressive development of smart cities over the globe requires the design of intelligent waste management systems to properly categorize the waste depending upon the nature of biodegradability. Few of the commonly available wastes are paper, paper boxes, food, glass, etc. In order to classify the waste objects, computer vision based solutions are cost effective to separate out the waste from the huge dump of garbage and trash. Due to the recent developments of deep learning (DL) and deep reinforcement learning (DRL), waste object classification becomes possible by the identification and detection of wastes. In this aspect, this paper designs an intelligence DRL based recycling waste object detection and classification (IDRL-RWODC) model for smart cities. The goal of the IDRLRWODC technique is to detect and classify waste objects using the DL and DRL techniques. The IDRL-RWODC technique encompasses a twostage process namely Mask Regional Convolutional Neural Network (Mask RCNN) based object detection and DRL based object classification. In addition, DenseNet model is applied as a baseline model for the Mask RCNN model, and a deep Q-learning network (DQLN) is employed as a classifier.Moreover, a dragonfly algorithm (DFA) based hyperparameter optimizer is derived for improving the efficiency of the DenseNet model. 5700 CMC, 2022, vol.71, no.3 In order to ensure the enhanced waste classification performance of the IDRL-RWODC technique, a series of simulations take place on benchmark dataset and the experimental results pointed out the better performance over the recent techniques with maximal accuracy of 0.993.


Introduction
In smart cities, the design of intelligent waste management is a new borderline for local authorities aiming to minimize municipal solid waste and improve community recycling rate. Since cities spent high cost for managing waste in public places, smart city waste management programs result in enhanced performance. The rapid explosion in urbanization, global population rate, and industrialization have gained more consideration, pertain to environment degradation. With the global population growing at an alarming rate, there has been dreadful degradation of the environments, result in terrific conditions. According to the reports (2019), India yearly generates over 62 million tons (MT) of solid waste [1]. The concern has been increased toward the necessity of segregation of waste on the basis of non-biodegradable and biodegradable behaviors. Generally, in the Indian context, waste consists of plastic, paper, metal, rubber, textiles, glass, sanitary products, organics, infectious materials (clinical and hospital), electrical and electronics, hazardous substances (chemical, paint, and spray) are widely categorized as non-biodegradable (NBD) and biodegradable (BD) wastes with their corresponding shares of 48% & 52% [2]. Furthermore, as per the current Indian government report, the most common thing which has been disposed of in the garbage's are food, glass, paper and paper boxes [3]. This thing constitutes over 99.5% of overall garbage gathered, that defines clearly that peoples dispose of wet and dry waste together.
Effective waste segregation will support the appropriate recycling and disposal of this waste according to its biodegradability. Therefore, modern times dictate the development of a smart waste segregation scheme for alluding to the above-mentioned cause of ecological ruins. The segregation of wastes is subsequently, seeking consideration from several academicians and researchers around the world. Recycling systems could produce more effective results by keeping up with industrial development. In this system, the decomposition of waste is yet based on human factors [4]. But, the developments of deep learning architecture and artificial intelligence technology might lead to enhancing system productivity than human factors in the following days. Specifically, human brain control system could be successfully and quickly transmitted to the machine with AI model. In these developments, it can be unavoidable that recycling systems depending on DL frameworks could be employed in the waste classification [5].
The conventional waste classification is based largely on manual selections, however, the disadvantages are inefficacy. Even though the present waste classification driven by Machine Learning (ML) methods could operate effectively, but still the classification performance need to be enhanced. By analyzing the present waste classification method depends on deep learning method, 2 potential reason leads to the complexity of enhancing the classification performance [6]. Initially, because of the distinct frameworks of DL methods, they act in a different way on many datasets. Next, there is only a low number of waste image dataset accessible for training model, absence of large scale databases such as ImageNet7. As well, it consists of various kinds of garbage, the absence of accurate and clear classification would result in lower classification performance. As plastic bags and plastic bottles have distinct characteristics and shapes, even though they are made up of plastic, the disposal ways are distinct.
Deep Reinforcement Learning (DRL) is a subfield of ML which amalgamates Reinforcement Learning (RL) and Deep Learning (DL). A variety of algorithms exist in literature to deal with the underlying concept of iteratively learning [7], illustrating and improving data in order to foresee better outcomes and apply them to provide improved decisions. DL is largely are trained by the scientific community employing Graphical Processing Units (GPU) to accelerate their research and application, bringing them to the point where they exceed the performance of most traditional ML algorithms in video analytics. DRL techniques are extended to 3-dimensional for unveiling spatio-temporal features from the video feed which could distinguish the objects from one another This paper designs an intelligence DRL based recycling waste object detection and classification (IDRL-RWODC) model. The IDRL-RWODC technique encompasses a two-stage process namely Mask Regional Convolutional Neural Network (Mask RCNN) based object detection and DRL based object classification. Moreover, DenseNet model is used as a baseline model for the Mask RCNN model and a deep Q-learning network (DQLN) is employed as a classifier. For ensuring the improved waste classification outcomes of the IDRL-RWODC technique, an extensive experimental analysis takes place to investigate the efficacy in terms of different measures.

Literature Review
Ziouzios et al. [8] proposed a cloud based classification approach for automatic machines in recycling factories with ML algorithm. They trained an effective MobileNet method, capable of classifying 5 distinct kinds of waste. The inference could be implemented in real time on a cloud server. Different methods were used and described for improving the classification performances like hyperparameter tuning and data augmentation. Adedeji et al. [9] proposed a smart waste material classification algorithm, i.e., established with the fifty layers residual net pretrain (ResNet-50) CNN method, i.e., ML algorithm and serves as the extractor, and SVM i.e., employed for classifying the waste to distinct types or groups like metal, glass, plastic, paper and so on.
Chu et al. [10] propose an MHS DL approach for manually sorting the waste disposed of by individual perosn in the urban public area. Such systems deploy a higher resolution camera for capturing unwanted images and sensor nodes for detecting another beneficial feature data. The MHS employs CNN based method for extracting image features and MLP model for consolidating image features and other features data for classifying waste is recyclable. Gan et al. [11] propose an approach of MSW recycling and classification depends on DL technique and employs CNN for building classification algorithm and garbage intelligence simultaneous interpreting, that enhances the speed and accuracy of the garbage image detection. Narayan [12] proposed DeepWaste, an easy to use mobile app, which uses greatly improved DL model for providing users instantaneous waste classification to recycling, compost, and trash. Research using various CNN frameworks for detecting and classifying waste materials. Huang et al. [13] proposed a novel combinational classification algorithm depending on 3 pretrained CNN methods (NASNetLarge, VGG19, and DenseNet169) to process the ImageNet database and attain higher classification performance. In this presented method, the TL method depends on every pretrained method is made as candidate classifiers, and the optimum output of 3 candidate classifications are elected as the concluding classifier results.
Zhang et al. [14] aim are to enhance the performance of waste sorting via DL model for providing a chance to smart waste classifications depending on mobile phone or computer vision terminal. A self-monitoring model is included in the residual network module that is capable of integrating the appropriate features of each channel graph, compresses the spatial dimensional feature, and has global receptive fields. In Togaçar et al. [15], the dataset applied to the classification of waste has been recreated using the AE network. Then, the feature set is extracted by 2 datasets using CNN frameworks and the feature set is integrated. The RR model implemented on the integrated feature set decreased the amount of features and also exposed the effective features. The SVM model has been employed as a classifier in each experiment.
Wang et al. [16] explore the applications of DL in the area of environmental protections, the CNN VGG16 models are applied for solving the problems of classification and identification of domestic garbage. Such solutions are initially employed the Open CV library for locating and selecting the recognized object and pre-processed the image to 224 × 224 pixel RGB images recognized as the VGG16 network. Next, data augmentation, a VGG16 CNN depending on the TensorFlow frameworks are constructed, with the RELU activation function and including BN layer for accelerating the model's convergence speed, when safeguarding detection performance. In Hasan et al. [17], automated waste classification systems are presented with DL approach for classifying the wastes like paper, metal, non-recyclable and plastic waste. The classification was executed by this CV model with the aid of AlexNet CNN framework in realtime thus the waste could be dropped to the proper chamber since it is thrown to dustbin.
Kumar et al. [18] investigate a new method for waste segregation to its efficient disposal and recycling by using a DL approach. The YOLOv3 model was used in the Darknet NN architecture for training self-made datasets. The network was trained for six objects classes (such as glass, cardboard, paper, metal, organic and plastic waste). Sousa et al. [19] proposed a hierarchical DL model for waste classification and detection in food trays. The proposed 2 phase method maintains the benefits of current object detectors (as Faster RCNN) and permits the classification process to be assisted in high resolution bounding box. As well, they annotate, collect and provide to the research novel datasets, called Labeled Waste in the Wild.

The Proposed Model
In this study, a new IDRL-RWODC technique has been presented for recycling waste object detection and classification. The IDRL-RWODC technique derives a Mask RCNN with DenseNet model for the detection and masking of waste objects in the scene. In addition, the DRL based DQLN technique is employed to classify the detected objects into distinct class labels. The detailed working of these processes is given in the succeeding sections.

Problem Formulation
, whereas X i represent the ith training samples and n denotes the amount of the training samples. The labeled set of the training sample is given as c = {c i }, ∀c i ∈ {1, . . . , C}, in which C denotes the waste class image count. They determine Reject as the set of labeled training samples, while X L i indicates the ith labelled training sample, yL i = {1, 2, . . . , C} denote its label, and n L means the amount of labelled training samples in J L . Likewise, denotes the unlabelled training set as , whereas X U i represents the ith unlabelled training sample, n U signifies the amount of unlabelled training samples in J U . For every input image X i , they determine the output of final layer of a DRL method using f (X i ), i.e., the predictive score of X i [20]. For every input image X i , they determine the output of penultimate layer of a DRL method as x i , and consider x i as the object detected by the Mask RCNN model.

Mask RCNN Based Object Detection Model
At the initial stage, the Mask RCNN model is used for the detection of waste objects in the image. Mask RCNN is an easy, flexible, and usual structure for object recognition, exposure, and sample segmentation that capably detects objects in an image, but creating a great quality segmentation mask for all instances. The RPN, the 2nd piece of Mask R-CNN, and share complete images convolution features with recognition networks, therefore supporting approximately cost-free region proposal. The RPN is implemented to Mask R-CNN rather than elective search and so RPN shares the convolutional feature of complete map with recognition networks. It is forecast combined boundaries place as well as to object score at all places, and it can be also fully convolutional network (FCN).
For Mask R-CNN achieves 3 tasks: target recognition, detection, and segmentation. At the input, the images are passed to the FPN. Then, a 5 group of feature maps of distinct sizes were created and candidate frame region has been created by RPN [21]. Then the candidate area has been both the feature map, the model attained the recognition, classifier, and mask of target. In order to additional enhance the calculating speed of technique, it is adjusted to real time necessities of quick drive anti-collision warning scheme. During this case, FPN framework, and RPN parameter settings were enhanced. An enhanced technique presented during this case is to appreciate the recognition, detection, and segmentation of target simultaneously. Fig. 1 illustrates the structure of Mask RCNN for waste object detection.
where p i refers the noticed probabilities for ROI in classifier loss L cls and p * i utilized for ground truth as one of the ROI has been considered as foreground or zero otherwise. t i implies the vector of accurate controls for detecting bound box (Eq. (3)) and t * i represents the ground truth in place regression loss where r indicates the robust loss function for estimating the regression error. All the ROI detecting the outcome of K * m 2 dimensional by utilizing mask branch as well as encoded K binary mask together with resolution of m * m.
In this study, DenseNet method is employed as the baseline model for the Mask RCNN. DenseNet [22] is a DL framework where each layer is directly connected, thus attaining efficient data flow amongst other. Every layer is connected to each subsequent layer of the network, and they are represented as DenseNets. Assume an input image x 0 , i.e., passed by the presented convolution networks. The network contains N layer, and all the layers execute nonlinear transformations F n (.). Suppose that layer n consists of feature map of each previous convolution layer. The input feature map of layer 0 to n − 1 are concatenated and represented as x 0 , . . . , x n−1 . Therefore, this algorithm has N(N + 1)/2 connection on a N-layer network. The output of the nth layers are expressed as where x n is the present nth layer, [x 0 , . . . , x n−1 ] is a concatenation of feature map obtained from 0 to n − 1 layers, and F n (.) is the composite functions of BN and ReLU. Fig. 2 depicts the framework of DenseNet.

Figure 2: Structure of DenseNet
All the convolutional layers corresponding to the BN-ReLU-Conv sequences. Afterward the convolution is implemented on the image, ReLU is employed to the output feature map. This function presents nonlinearity in CNN models. The ReLU function can be expressed as Average pooling splits the input to the pooling area and calculates the average value of all the areas. GAP compute the average of all the feature maps, and the resultant vector is taken to the softmax layer. In this case, DenseNet-169 model is utilized, which depends upon the basic DenseNet framework and the DenseNet has L (L + 1)/2 direct connections.
In order to boost the object detection outcomes of the DenseNet model, a hyperparameter optimization using dragonfly algorithm (DFA). The DFA is established by Mirjalili in 2016 [23]. It is a metaheuristic technique simulated by the static as well as dynamic performances of dragonflies in nature. There are 2 important phases of optimization: exploration as well as exploitation. These 2 stages are modeled by dragonflies, also dynamic/static searching to food or avoided the enemy.
There are 2 analysis where SI appears in dragonflies: fooding and migration. The feeding has been demonstrated as a statically swarm in optimized; migration is displayed as dynamically swarm. Based on the swarms are 3 particular performances: separation, alignment, and cohesion. At this point, the model of separation implies which separate in swarm avoid statically collision with neighbor (Eq. (7)). Alignment mentions the speed at that the agents were corresponding with adjacent individuals (Eq. (8)). Lastly, the model of cohesion illustrates the tendency of individuals nearby centre of herd (Eq. (9)). Two other performances are added to these 3 fundamental performances in DA: moving near food and avoid the enemy. The reason to add these performances to this technique is that important drive of all swarms is for surviving. So, if every individual is moving near food source (Eq. (10)), it can avoid the enemy in similar time period (Eq. (11)).
In the above cases, X implies the instantaneous place of individuals, but X j refers the instantaneous place of j th individuals. N signifies the amount of adjacent individuals [24], however V j demonstrates the speed of j th adjacent individuals. X + and X − stands for the place of food as well as enemy sources correspondingly. Afterward computing the step vector, the place vector is upgraded (Eq. (13)): where the values of s, a, and c in Eq. (12) signify the separate, alignment, and cohesion coefficients correspondingly, and f , e, w, and t values demonstrate the food factor, enemy factor, inertia coefficients, and iteration number correspondingly. These coefficients and stated factors allow for performing exploratory as well as exploitative performances in the optimization. During this dynamic swarm, dragonfly incline for alignment its flight. During the static motion, the alignment has been extremely lower, but the fits for attacking the enemy were extremely superior. So, the coefficient of alignment has been maximum and the cohesion coefficient was minimum during the exploration procedure; during the exploitation model, the coefficient of alignment was minimum and co-efficient of cohesion was maximum.

DRL Based Waste Classification Model
Once the waste objects are detected in the image, the next step involves the waste object classification process by the use of DQLN technique. The DRL incorporates three components namely state, action, and reward. The DRL agent intends in learning the map function from the state to action space. Then, the DRL agent receives an award. The goal of the DRL agent is the maximization of total rewards. The classifier strategy of DQLN model is a function that receives the instance and returns the probability of every label. π(a|s) = P(a t = a|s t = s) (14) The aim of classification agent is to appropriately identify the instances in the trained data more feasible. Since the classification agents obtain a positive reward if it correctly identifies instances, it attains their aim by maximizing the cumulative reward g t : In reinforcement learning, there is function which computes the quality of state-action combination named as Q function: Based on the Bellman formula, the Q function is written as: The classification agents are maximizing the cumulative reward by resolving the optimum Q * function [25], and the greedy approach in an optimum Q * function is the better classifier approach π * for DQLN.
In the low-dimension finite state space, Q function is recording by table. But, in the high dimensional continuous state space, Q functions could not be solved still deep Q-learning technique is presented that fits the Q function with DNN. In deep Q-learning technique, the communication data (s, a, r, s ) achieved from (19) are saved in the experience replay memory M. The agent arbitrarily instances a mini-batch of transition B from M and carries out the gradient descent step on Deep Q network based on the loss function as: where y implies the target evaluation of Q function, the written of y is: where s implies the next state of s, a represents the act carried out by agent in state s , t = 1 if terminal = True; then t = 0. The derivative of loss function (20) in terms of θ is: At this point, optimum Q * function can be obtained by minimizing the loss function (20), and the greedy policy in the optimum Q * function get the maximal cumulative reward. Therefore, the optimum classifier approach π * : S → A for DQLN has been obtained.

Performance Validation
The proposed model is simulated using Python 3.6.5 tool on a benchmark Garbage [26] classification dataset from Kaggle repository. The dataset includes six waste classes. It has 403 images under cardboard, 137 images under Thrash, 501 images under Glass, 482 images under Plastic, 504 images under Paper, and 410 images under Metal. Fig. 3 showcases some of the sample test images from the Kaggle datasets.              Simultaneously, the MobileNetV2 and VGG-16 techniques have accomplished nearly nearer accuracy values of 0.880 and 0.884 correspondingly. Along with that, the DenseNet121 and SSD (MobileNetV2-OID V4) methods depicted that reached superior accuracy of 0.950, and 0.976 respectively. Finally, the IDRL-RWODC algorithm has resulted in a maximum performance with a maximal accuracy of 0.993. In this study, a new IDRL-RWODC technique has been presented for recycling waste object detection and classification for smart cities. The IDRL-RWODC technique derives a Mask RCNN with DenseNet model for the detection and masking of waste objects in the scene. In order to boost the object detection outcomes of the DenseNet model, a hyperparameter optimization using DFA. In addition, the DRL based DQLN technique is employed to classify the detected objects into distinct class labels. The IDRL-RWODC technique has the ability to recognize objects of varying scales and orientations. For ensuring the improved waste classification outcomes of the IDRL-RWODC technique, an extensive experimental analysis takes place to investigate the efficacy in terms of different measures. The experimental results pointed out the better performances of the IDRL-RWODC algorithm over the current techniques. In future, the IDRL-RWODC technique can be realized as a mobile application for smartphones to aid the waste object classification process in real time.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.