Video Surveillance-Based Urban Flood Monitoring System Using a Convolutional Neural Network

: The high prevalence of urban ﬂ ooding in the world is increasing rapidly with the rise in extreme weather events. Consequently, this research uses an Automatic Flood Monitoring System (ARMS) through a video surveillance camera. Initially, videos are collected from a surveillance camera and converted into video frames. After converting the video frames, the water level can be identi ﬁ ed by using a Histogram of oriented Gradient (HoG), which is used to remove the func-tionality. Completing the extracted features, the frames are enhanced by using a median ﬁ lter to remove the unwanted noise from the image. The next step is water level classi ﬁ ers using a Convolutional Neural Network (CNN), which is utilized to classify the water level in the images. The performance analysis of the method is analyzed by various parameters. The accuracy of the proposed method is 11% higher than that of the k-Nearest Neighbors (KNN) classi ﬁ ers and 5% higher than that of the ANN classi ﬁ ers, and the processing time is 7% less than that of the KNN classi ﬁ ers and 4% less than that of the Arti ﬁ cial Neural Network (ANN) classi ﬁ ers.


Introduction
Natural catastrophes such as landslides, hurricanes, typhoons and others pose a significant risk to life and property worldwide [1,2]. Floods are the most common natural disasters, accounting for 41% of all-natural hazards that have arisen worldwide during the past decade [3]. Therefore, these estimates only account for "reported" large-scale flooding events, generally considered to be outpouring [4]. A flood that severely interacts with human and social activities; however, floods are usually caused by the presence of water in arid areas [5]. The value of flood forecasting cannot be overstated given the growing complexity of rising sea levels and the number of people living in flood-prone situations [6]. Major catastrophes, such as earthquakes, have negative effects, such as collateral damage, and financial disruption, that cannot be prevented, but thorough preparation should minimize the calamitous consequences [7]. According to these findings, it is impossible to include details on watercourse ailment, forms of flooding, etc. [8]. This results in vast amounts of water, even more than can be handled by the natural or man-made conveyance method [9]. Therefore, it is necessary to implement first, a final flood detection, warning and response system that can forecast more accurately and reliably [10].
In 2020, Zakaria et al. [11] analyzed a flood monitor, prediction and rescue (FMPR) system. Centred on the abstractions of management. Gaia role model-based agent functions were defined, standard expressionbased existence properties were specified, and predicate-based security properties were specified. In 2015, Kamilaris et al. [12] generated the test plan, installation and subsequent analysis of the SMS. At the request of users, water flow elevation notifications are sent via SMS. When the water flow exceeds a user-defined threshold, the device offers timely updates and warnings via SMS to fragile populations and relevant agencies. In 2019, Senthilnath et al. [13] presented a flood monitoring analysis based on SMAP. The outcome shows that, based on SWAP results, the flood region can be mirrored, and H emission data are more adaptive to V polarization data. Muhadi et al. [14] developed a model of a real-time flood water level tracking device using Arduino Uno. The first test tested the total amount of time; the second test determined whether the machine should use three LEDs as its early warning mechanism to alert people from afar. In 2015, Menon et al. [15] proposed that the surface water transformation identity is a combination of image features. Subsequently, to retrieve and map the described modifications, an ANN support vector machine (SVM) and maximum probability (ML) classification techniques were used.

Proposed Methodology
This section describes video surveillance-based identification of the water level by using CNN classifiers. Here, the video sequences are collected from a video surveillance camera, and the video sequences are converted into video frames. After that, the HoG is utilized to extract the video frames, and median filters are applied to the extracted frames. After finishing the enhancement process, CNN classifiers are used to identify the water level. The structure of the method is given in Fig. 1. The median filter is a non-linear digital filtering method for removing noise from images and signals. This type of background subtraction is a common pre-processing step used to combine the reliability of image acquisition. Sliding a window over the images accomplishes the median data processing. The filtered image is created by taking the median of the variables in the input window and inserting it in the source images at the middle of that frame.

Histogram of Oriented Gradients (HoG)
The HoG descriptor is based on the aggregation of the gradient path over the pixel of the limited geographic region known as the 'node' and the subsequent creation of a 1D graph. Let lD be a function of intensity (grey scale) outlining the picture to be analyzed. The image is divided into M-M pixel cells, and the orientation of the gradient in each pixel is computed as: Consecutively, the orientations h j i i ¼ 1 . . . M 2 , i.e., the same cell j, are quantified and collected in the N-bin histogram.

Noise Reducing Performance of the Median Filter
In the case of a picture with normal distribution variance under sample data, the standard deviation is given below: where r 2 i represents the input noise power, n represents the size of the median filtering mask, and f ð nÞ represents the function of the noise density [16]. The noise variance in the average filter is given below: After removing the unwanted noise in the images, the water level is classified by using a CNN. The Structure of the CNN is given in Fig. 3.

Convolutional Neural Network
Convolutional layer: A convolutional operation is applied to the input by moving the effects to the next layer. It converts all the pixels in its receptive field into a single value. Let f k be the n-mapped filter of the kernel size [17]. The number of input connections of each neuron is defined by n*m and the resulting output of the layer measurements is given below: To measure a richer and more diverse representation of the input, several f k filters with k∈M can be added to the input. f k is realized by exchanging the weights of the adjacent neurons [18].
Max pooling: This is a mixture mechanism that determines the optimal values to eliminate feedback by adding the full function to the x i input. Let n be the filter size; then, the output is calculated as follows:  Rectified linear unit: As a way, using ReLU means avoiding the fast increase of the compute needed to run the neural network. The higher complexity of introducing more ReLUs velocity is increased as the capacity of the CNN grows. In practice, the ReLU operational amplifier is used immediately following a convolution layer, and the result is then maximally aggregated. In multi-layer neural networks or deep neural networks, ReLu is a non-linear training process. The following is a representation of this structure: wherein x is an input value. The largest value between zero and the input value is ReLu's output. ReLU is a neural network cell that utilizes the following activation function to measure its output given x [19]: Using cells is more powerful than using perception cells and provides more information than binary units.
Fully connected layer: The input to the fully connected layer is the output from the final pooling, which is flattened and then fed into the fully connected layer [20]. These results in a matrix are as follows, Output layer: The output layer in a CNN, as mentioned previously, is a fully connected layer, where the input from the other layers is flattened and sent to transform the output into the number of classes as desired by the network [21]. The output vector x is: Softmax layer: The softmax function is a function that turns a vector of K real values into a vector of K real values that sum to 1 [22]. For this reason, it is usual to append a softmax function as the final layer of the NN.
For each component 1 ≤ j ≤ M, the output is calculated as follows:

Results and Discussion
In this section, photographs were taken by surveillance cameras mounted along the water. Various images are shown in Fig. 4, one image under standard conditions and the other under overflowing conditions were used to analyze the practicality of each picture [23]. The photographs had a resolution of 1270 Â 620, and the ground reality images were segmented manually.
The flood image dataset includes diverse scenes from residential, suburban and geological settings, and it is useful for more flood monitoring analysis, as shown in Fig. 4. Fig. 5 shows the final water level analysis after applying the CNN, which is used to remove the unwanted noise from the water. Performance measures such as precision, recall and F1-scores are shown below: Fig. 6 represents the performance metrics of the F1-score, recall and precision values, which are compared with two existing KNN methods and the ANN algorithm. In Fig. 6a, the F1-score measures that the existing algorithm has a low-quality image when compared with our proposed method because our proposed technique enhances the quality of the image, so the water level analyses should be noted accurately. In the Fig. 6b recall images, when the blurred KNN and ANN images are compared to the CNN method, the images are enhanced. In Fig. 6c, our proposed CNN method increases in each image captured from the surveillance. However, it was proven that our method has higher precision, higher recall and higher F1-score than those of the existing method.  Here, the overall accuracy increased by 98% on average medium filtering, while that of the other existing algorithm, KNN, increased by 8% and that of ANN increased by 9%. In Fig. 7b, the processing time decreased for our proposed work. Finally, our proposed method increased the accuracy up to 98%; therefore, it has better accuracy. The CNN was found to be the most promising image processing technique for monitoring the water level features from digital images, with analysis evaluation results higher than 98%. Accuracy metric is often used to interpretably evaluate the system's efficiency. In other words, the test accuracy is commonly confused with the validation accuracy, which is the accuracy calculated on a given dataset that isn't used for training but is used to validate the model's generalization capacity. The loss can be calculated using training and validation data, and its meaning is determined by how well the model performs in these two different sets. It's the total number of errors committed in each learning or validation set for each sample. The loss value indicates how well or poorly a model performs after each iteration. The limitations in the existing model that includes lack of reliability because of not considering hostile environment as well as not taking more parameters which have been overcome as limitations in the proposed model. A confusion matrix is a table that shows how well a recognition system (or "regular expression") performs on a set of test data for which the true values are known. The percent of suggested and existent processes for probable correlations with known statistical analysis are shown in the Tab. 1 of the confusion matrix.

Conclusion
A novel technique for automatic flood detection monitoring in video surveillance systems was presented in this paper. The Google dataset was used for flood monitor images. From the database, the water level can be identified using a classification algorithm. The features extracted by using the HoG method and the unwanted noise were reduced using the median filtering technique. After that, the CNN classification algorithm was used to analyze the water level in the video frames. The output value was compared with a different existing method, such as ANN and K-NN classifiers. The key advantage of this automatic detection process is that it provided the highest accuracy of 98% with negligible validity loss using the

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.