Implementation of K-Means Algorithm and Dynamic Routing Protocol in VANET

With the growth of Vehicular Ad-hoc Networks, many services delivery is gaining more attention from the intelligent transportation system. However, mobility characteristics of vehicular networks cause frequent disconnection of routes, especially during the delivery of data. In both developed and developing countries, a lot of time is consumed due to traffic congestion. This has significant negative consequences, including driver stress due to increased time demand, decreased productivity for various personalized and commercial vehicles, and increased emissions of hazardous gases especially air polluting gases are impacting public health in highly populated areas. Clustering is one of the most powerful strategies for achieving a consistent topological structure. Two algorithms are presented in this research work. First, a k-means clustering algorithm in which dynamic grouping by k-implies is performed that fits well with Vehicular network’s dynamic topology characteristics. The suggested clustering reduces overhead and traffic management. Second, for inter and intra-clustering routing, the dynamic routing protocol is proposed, which increases the overall Packet Delivery Ratio and decreases the End-to-End latency. Relative to the cluster-based approach, the proposed protocol achieves improved efficiency in terms of Throughput, Packet Delivery Ratio, and End-to-End delay parameters comparing the situations by taking different number of vehicular nodes in the network.


Introduction
The Vehicular Ad hoc Networks (VANETs) are research network that pursues the future of pervasive computing [1]. VANETs are one of the Intelligent Transportation System (ITS) impact zones that allow vehicle drivers to interact and synchronize with each other to prevent dangerous conditions before they happen, thus enhancing driver protection and relaxation [2]. Inter-Vehicle Interaction is required to understand operational route planning, traffic situation control, emergency-message transmission and secure driving [3]. Generally, VANETs applications are time-crucial. The main constraints of these applications are the rapid propagation of data across the network area in question. Approximately 60% of road accidents can be avoided by the prompt distribution of emergency signals to local and remote vehicles. The propagation of broadcast messages is distributed into two types: single-hop and multi-hop transmission, as shown in Fig. 1. In a flat V2V dense network, a conventional multi-hop broadcast message propagation scheme will result in packet drop, high communication costs, high data packets delivery delay. We need a stable communication infrastructure for message propagation to address the disadvantages mentioned earlier of a flat V2V network system. VANETs cluster-based networking system forms a trusted backbone for the efficient transmission of messages and associated reserves all vehicles inside the network [4,5]. The application of VANETs, such as security and complex road situation details, needs the extremely static topology of the network, so the enhanced clustering technique is highly required in such instances. The clustering shows an essential way of shaping community vehicles and efficiently coordinating wireless communications.
In VANETs, clustering routing contributes to decreased network dynamics [6]. Due to various formation requirements, clustering mechanisms often vary from each other. According to functionality and its application domain, these requirements may differ. Nodes can, however, serve as cluster members (CMs) in the cluster or can be selected as cluster head (CH).
CMs are normal nodes, whereas CHs perform transmission of information in VANETs [7] between clusters and intra-clusters, as indicated in Fig. 2. CHs are then chosen to achieve optimized network efficiency based on their improved functionality. In order to achieve efficient communication, CH selection is therefore necessary. Therefore, we propose a dynamic clustering protocol utilizing k-means for CH selection and cluster forming in this paper, which increases the overall distribution ratio of packets and reduces VANETs end-to-end latency. The clustering of K-means divides the region into four segments, and each segment has several CHs.
The objective function is then determined using key vehicle parameters, such as position (X, Y), speed, direction and point of interest (POI). The measured objective function value helps create more stable clusters and takes advantage of the data transfer process by choosing more stable routes. Each cluster has particular interests through this clustering, such as parking data, accident alerts, and overcrowding information. When a CH receives a message, it tests whether or not the vehicles within the cluster are involved in the message. If they are interested in the vehicles within the cluster, CH will transmit the data to its members. Else, would transmit the data to the next CH. This will decrease the nonrelevant distribution of data in the network. The data is transmitted via the nearest neighboring CHs in this path, which is established from the vehicles' position.
The main objective of this paper is to create a clustering protocol that increases packet delivery ratio (PDR) .and decreases the end-to-end (E2E) delay via K-means clustering and dynamic routing. First, a clustering algorithm is designed that divides the region into segments by using K-means and then elects CHs takes into account: vehicle position, vehicle direction, vehicle speed, vehicle POI and destination. Each cluster has particular interests through this context-based clustering, such as parking data, accident alerts, and congestion information. Second, a dynamic routing protocol is defined for successful intercluster and intra-cluster transmission.
The portion of the article is arranged as: Section 1 produced the introduction. A literature review is given in Section 2. Section 3 discusses the clustering model. Section 4 implements the dynamic routing protocol. Finally, simulation analysis and conclusions are discussed in Sections 5 and 6, respectively.

Literature Review
There are numerous clustering procedures and techniques for VANETs in the last many years. In writing [8,9], a few stable clustering-based papers were suggested in VANETs. However, in all these literatures, it is concluded that they do not sustain the quality of the group head due to high vehicle growth and the constantly varying topology of vehicles.
In Bello-Salau et al. [10], a new routing-based algorithm is proposed to improve various essential parameters such as path loss, transmit power, and received signal strength. This algorithm also improves the reliability of the network. Their findings suggest that the algorithm improves road anomaly vehicle communication, thus intimating drivers to navigate anomalous roads to reduce road accidents.
In Song et al. [11], a cluster-based directional routing (DBR) protocol is suggested where a node sends data to a nearby CH whose moving path is identical to the message's communication direction. The communication path is determined by a node's location and destination location coordinates. The authors suggested an enhanced greedy traffic-aware routing protocol (GyTAR) in Jerbi et al. [12] that is a spatial routing protocol centered on an intersection. It uses the idea of clusters among adjacent intersections to transmit the data. The routing protocol for VANETs for vehicle intensity and load-aware (VDLA) was suggested in Zhao et al. [13], which chooses a set of junctions to create the path to the destination. The option is established on the vehicle's density in real-time, the traffic density and the distance to the target.
A cluster-based VANET connectivity maintenance algorithm called AODV-CV is suggested in Abuashour et al. [14], where the CH is chosen based on all vehicles' velocity within the cluster [15] region. By increasing the velocity, the AODV-CV performs better than AODV in terms of throughput. In Louazani et al. [16], a new Cluster Based Routing (CBR) protocol named CBVANET was suggested. The architecture of the clustering system for communication between VANET vehicles was the focus of this model. By decreasing the cluster formation duration, election duration and shifting duration, this model reduced the delay in VANET. The least velocity vehicle was selected as CH. The AODV-CV performs better in generation time and shifting time.
In Malathi et al. [17], the authors presented a cluster-based routing protocol that takes into account the target of a vehicle and the perspective for CH selection and routing. However, the proposed work is based on dynamic clustering that maximizes the clustering messages overhead.
A new algorithm for the system to sort a cluster architecture and CH election suitable for vehicular networks is suggested in Mohammed Nasr et al. [18]. Moreover, it shows a novel clustering-based routing approach that ensures efficient data transmission among the vehicles.
A new protocol for cluster-based lifetime routing (CBLTR) had suggested in Abuashour et al. [14]. This protocol aims to maximize the stability of the route and average throughput, reduce the E2E delay and decrease clustering overhead messages.

Clustering in Vehicular Ad-hoc Network
K-means is one of the most effective data extracting procedures [19,20] among clustering algorithms, mainly because of its simplicity, scalability and because it is easy to adjust to different scenarios and domains. There are some well-known shortcomings in k-means, however. To be exact, the number of clusters, k, is required as an input. For the request for data clustering, Influential k is answered. Input is given as the number of clusters, and then a modified k-means is employed to split the vehicles into clusters.
For cluster formation, we apply k-means [21] such that cluster creation occurs based on three parameters: the dimension of x, second is the dimension of y, and last is the distance of Euclidean. The central K-means algorithm definition is as follows. First, the road is split into sectors, and the number of clusters for each sector/segment is k. And K centers for clustering are initialized. The distance from the cluster center is then determined, and the information is split into the closest cluster center. The cluster center is modified based on the outcome of the partition. Until the predefined iterations are attained, this method continues to loop. The final outcomes are acquired at the end of the iteration. The process of CH selection is weight-based, which considers subsequent factors: location, direction, velocity, and POI. If a vehicle joins any cluster at any unit of time, it becomes a member of the associated cluster and transmits a CH-REQ to the corresponding CH. Each cluster has a threshold level (TL), and it begins the procedure of new CH election when a CH [22,23] extends the TL.
The handover procedure is started after CH selection. Each vehicle must be allocated to a single cluster based on its position for each unit of time. Vehicles are characterized to create clusters based on the following parameters in weight-based clustering: a) Location: As a significant parameter, the vehicle's location is considered and can be calculated using GPS. This GPS system supplies OBU with information that decides its current position. b) Direction: A vehicle's path is calculated by measuring the difference among the last two places obtained by the GPS system. c) Velocity: The speed of the vehicle is measured by OBU. There should be the least difference in velocity for vehicle nodes present in the same cluster. d) Point-of-Interest list: There are certain interests in any car. Some vehicles are concerned with data about parking, restaurants nearby, and some are only concerned with information about incidents and overcrowding, etc. A vector is used to describe a vehicle's interests. Each "k" vehicle maintains an interest vector in the way of: P In is order for vehicles present in the same cluster, the vehicles' POI should be the same.
The clusters are formed based on the above parameters, and each cluster has one CH. The CH can sustain a vehicle's interests. When a CH accepts data, it first verifies whether the cars within the cluster are involved. If there is an interest in the vehicles inside the cluster, CH will transmit the packet to its members. Else, the message would be routed to the next CH. This will decrease the network communication messages inside the cluster. The process of selecting the cluster head is as follows: A weight-based CH selection algorithm is suggested for optimal CH selection. Each node calculates a weight according to specific parameters, and the highest weight node is selected as the CH. The total duration required for the CH collection to be finished is T. This is split into four sub-hours. The following measures are involved in CH selection: Each vehicle acquires its clustering factors from its on-board component: position, path, velocity, and POI and the time required for this is T1. After finding its nearby vehicles, each vehicle recognizes its neighbouring vehicles whose POI is similar; each vehicle transmits its clustering factors to its nearby nodes. If a node obtains the clustering factors from its neighbours, a list for each nearby is preserved. (N List Þ. The time undertook to achieve this is T2 . A list comprises the ID of the neighbouring car, its location, speed, destination, POI and compatibility with the POI. Three parameters, which are cosine similarity and soft cosine similarity, are used to compute Point-ofinterest compatibility (PC). For instance, the PC is calculated using the following equation between vehicle "a" and vehicle "b" with "n" as the number of neighbouring nodes: cosine similarity soft cosine similarity Here s i;j =similarity (feature i , feature j ).
Next, the mean Euclidean distance (AED) between the "a" vehicle and each of its "b" neighbours is measured using the following equation: Here x a ; x b ð Þ y a ; y b ð Þsignifies position coordinates of nodes "a" and "b", and "n" represents the number of neighbours. The time duration for this process is T3. T4 is static and it is identical for all vehicles nodes.
Each vehicle determines the waiting time "Tw" Here, α signifies the number of times the vehicle "k" selects as a CH earlier. N List k ð Þ j jis the nearby nodes of node "k". R represents a random number among 0.1 and 0.2. The node awaits for "Tw" and determines the Weight Value (WV). If any CH request is accepted with in this "Tw", the vehicle does not compute the WV. It agrees that vehicle as a CH.
Each node computes the WVafter "Tw". The node sends out the CH advertising message immediately after measuring WV. Because of R., each node has different waiting times. The WV is determined using the equation below: In order to enhance WV of a vehicle, AED k and AV k should be lowest.

Dynamic Routing Protocol
If the network clustering structure has been created, when the cluster member demands that the packets be transferred to the designated destination [24,25], the packet will be sent to the CH. By using the dynamic routing protocol to the destination, the CH forwards the packet. Two sub-protocols are divided into the routing protocol.

Intra-sector Protocol
The suggested routing protocol intends to disseminate the data packets via the chosen CHs within the section. Each CH constructs its routing table and saves the neighboring CH ID and its related places to maintain routing data. When the nearby CH gets data, it chooses the candidate CHs situated near the destination regardless of the location of the CH in its routing table. After that, it sends the data to the nearest CH. If there is no nearby CHs to the destination node, the local CH uses a store-and-forward procedure as a recovery procedure; It saves the data in a particular buffer and keeps going until another CH relay is located. Algorithm 1 describes the steps taken to propagate the data inside the sector. If a node receives data at any point during the simulation, it first verifies its routing table and chooses the CHs with the least distance to the destination. Lastly, if the routing table is empty, then a store-andforward method follows the current CH.

Inter-sector Protocol
The protocol proposed seeks to disseminate the packets across the sector via the selected CHs as described as follows: If a source vehicle "s" wishes to transmit data to the vehicle "d", the "s" sends the message including the target location (Tloc x tl ; y tl ð Þ; xy) to its corresponding CH "k". After that, the direction of communication (DC) is computed. DC is related to the path of CH "c" if their cosine similarity (CSM) is more than 0. The connection between DC and the velocity Vector ðV c Þ is calculated utilizing the following equation.
where DC is the distance between the vehicle and the target position is DC. There is a velocity vector in each CH "c" that can be defined as Also, every CH "c" has a certain target ðx dest ; y dest ). For selecting the forward node, the distance between the target position and CH's target is also carried.
To choose the next forwarding CH node, a CH' c' utilizes the targets and directions of its nearby nodes, CH. Initially, it determines a DC post's communication route. Then, it tests CSM by using Eq. (6) CH node 'c' for every neighbour, with velocity. Subsequently evaluating CSM and DD c for every nearby CH "c", a CH "k" defines the routing metric (RM) for each neighbour CH "c":

End For
A neighbour CH in another sector whose RM is highest is chosen as the next hop CH. Then, the next CH tests whether Tloc within the cluster is located or not. The message is forwarded to Tloc if Tloc is located within the cluster. Otherwise, the next-hop CH is selected again using RM, and the process repeats itself.

Operation of Proposed Routing Protocol
It sends the message to its CH when a source "S" receives a message. The CH first tests whether or not the Tloc is located inside the cluster. If yes, the message is sent to Tloc.
Otherwise, it will verify that the Tloc is in the segment i) If so, the intra-segment packet forwarding protocol is utilized, where the next CH node is selected from its forwarding table. ii) If Not, the inter-segment protocol is utilized where the next CH node cantered on the RM is chosen in the other sector. The next CH tests Tloc's accessibility within its cluster again. If Tloc is not identified, the next hop CH is chosen again, and the procedure repeats until the data reaches its Tloc node.

Performance Evaluation
Two types of simulators are used to assess our proposed protocol's execution: the traffic simulator that replicates the vehicle mobility and the network simulator that creates the vehicular area. The SUMO is the most used traffic simulator in VANET.
A 1000 x 1000 area segment is applied to test the proposed protocol; then the segment is split into clusters. Firstly, 100 vehicles are allocated by uniform distribution to the segment, and constant velocity is assigned to each vehicle. As simulation results, the proposed protocol's output is compared to the CBLTR and AODV-CV regarding PDR, throughput, and E2E delay. Tab. 1 provides a detailed list of parameters for the simulation.
The throughput relation between the proposed protocols, CBCLR, and AODV-CV for 50, 60, 80, and 100 nodes is shown in Figs. 3-6, which shows that throughput increases with the number of nodes increasing.   7 shows the average comparison of throughput with varying the vehicle nodes for the proposed CBCLR and AODV-CV protocols. The AODV-CV has the least throughput compared to all other protocols as it declined to manage the network changing aspects efficiently compared to CBCLR and proposed protocols. The proposed protocol indicates the throughput improvement compared to CBCLR and AODV-CV because of dynamic clusters creation using K-means and stable CH election using location, direction, velocity, and POI as the key parameters. The throughput in the proposed protocol is increased by 6.5% compared to CBCLR protocol and 8.9 % compared to AODV-CV protocol for 50 vehicle nodes in the network.
In Fig. 8, the PDR is computed for the proposed CBCLR and AODV-CV protocols over the various simulations. It is found that PDR remains constant by increasing the number of nodes because PDR is independent of packet injection rate. The PDR in the proposed protocol is improved by 11% compared to CBCLR protocol and 16.5 % by AODV routing protocol.
In Fig. 9, E2E delay in distribution of packets is computed for the proposed CBCLR and AODV-CV protocols. It is found that the proposed protocol has less delay as compared to CBCLR and AODV-CV.  This is because the link among the nodes varies as the velocity of vehicles varies. The E2E delay in the proposed protocol is reduced by 46% compared to CBCLR protocol, 76% by AODV routing protocol.

Conclusions
In the paper, a new clustering architecture is proposed that comprises of two algorithms: First, a k-means clustering scheme is suggested, which incorporates regional clustering techniques to minimize overhead and traffic management in VANETs. Second, to choose the next-hop node for inter-clustering routing, a dynamic routing protocol is presented that considers a node's destination, which increases the overall PDR and decreases the E2E latency. The simulation results show that the proposed protocol is more effective as compared to CBCLR and AODV-CV protocols. The comparative analysis indicates that the proposed protocol has up to 6.5% and 8.9% more throughput, has up to 11% and 16.5% more PDR, and has up to 46% and 76% less E2E delay compared to CBCLR and AODV-CV protocols for a varying number of simulations in the network.