Context and Machine Learning Based Trust Management Framework for Internet of Vehicles

: Trust is one of the core components of any ad hoc network security system. Trust management (TM) has always been a challenging issue in a vehicular network. One such developing network is the Internet of vehicles (IoV), which is expected to be an essential part of smart cities. IoV originated from the merger of Vehicular ad hoc networks (VANET) and the Internet of things (IoT). Security is one of the main barriers in the on-road IoV imple-mentation. Existing security standards are insufficient to meet the extremely dynamic and rapidly changing IoV requirements. Trust plays a vital role in ensuring security, especially during vehicle to vehicle communication. Vehicular networks, having a unique nature among other wireless ad hoc networks, require dedicated efforts to develop trust protocols. Current TM schemes are inflexible and static. Predefined scenarios and limited parameters are the basis for existing TM models that are not suitable for vehicle networks. The vehicular network requires agile and adaptive solutions to ensure security, especially when it comes to critical messages. The vehicle network’s wireless nature increases its attack surface and exposes the existing TM Furthermore, for future work, a high-level machine learning model is proposed.


Introduction
The coming era is of high-speed communications and the Internet of things (IoT). By 2025, IoT will be connecting up to 21 billion devices [1]. The Vehicular ad hoc network (VANET) is one of the research areas shifting towards IoT, such as the Internet of vehicles IoV [2][3][4][5][6]. Since the IoV is not yet standardized, a few studies have been carried out in this field. Security in wireless networks has always been a core challenge, especially ad hoc networks where each time distinct nodes are likely to involve in network formation [7,8]. Similarly, the biggest threat for IoV is a security breach [9,10], which can cause a catastrophic sequence of chain accidents, traffic congestions, and diverting traffic to a specific path. The vehicular network has unique properties, unlike other wireless networks. The vehicle network's dynamic topology increases the attacksurface [11,12], allowing malicious nodes to launch an attack through a security breach. Moreover, the vehicle networks have no geographical limits that make vehicle to vehicle communication very critical. Trust plays an essential role in ensuring security in wireless networks.
Several models have been proposed for trust evaluation in VANET/Intelligent transport system (ITS) over the years, and each model has adopted different techniques and methods. Due to the heterogeneous nature, many factors need to be considered while evaluating the trust, which is missing in existing static models. Every model has one thing in common; they have selected set parameters or scenarios that are not ideal for changing vehicle networks. In a small experimental setting, most models have attempted to solve such issues that work only if significant changes are not made. In conclusion, the vehicle network is a complex system for which the available models are unable to provide an adaptive solution. The proposed TM framework uses the contextaware cognitive approach to solve the problem. Our work is motivated to fill that gap where TM is flexible and static. The framework constructs a scenario-based context for every other critical message received. The framework is subsequently structured to adopt the appropriate trust module for each scenario.
The vehicular network TM schemes must be adaptive to the environment. Artificial intelligence (AI) is the core approach to developing context-awareness. AI techniques are the most effective for building context [13]. Cognitive inferencing is used for context-awareness in the presented TM framework. Besides context-awareness, machine learning is another suitable technology to be implemented on IoV gathered data. Taking full use of IoV big data is one of this study's goals, which requires machine learning solutions. Machine learning is the best approach to single out the malicious nodes from VANET [14,15]. Machine learning algorithms proved dynamic properties to assure security in VANETS [15,16]. The vehicular network has not taken full use of machine learning yet. Trust is one of the areas that can be managed using machine learning. Within the trust, uncertainty is the critical issue faced by the vehicular network during trust evaluation. The framework uses a statistical machine learning technique; Bayesian machine learning (BML) evaluates trust under uncertainty. BML provides multilevel adaptive features to match the dynamic nature of IoV [17,18].
An overview of the IoV architecture is presented in Fig. 1. The centralized data repository is managed through cloud services. Vehicle nodes are equipped with an On board unit (OBU) that simultaneously function as fog nodes with a local data repository called OBUfog. The local data repository is synchronized parodically with a centralized repository. The trust value of all nodes is stored and updated in a centralized database. The framework uses a local data repository for immediate data access, and it is useful in case of the unavailability of internet access or disconnection from the centralized system. The unavailability of a centralized system is a frequent issue with vehicular networks. The use of a decentralized approach is equally helpful to deal with critical issues in communication security. The rest of the paper is organized as Section 2 is the literature review, Section 3 explains the proposed framework. Section 4 discusses the trust evaluation process Section 5 is performance analysis. The last section concludes the research study.

Literature Review
The security of any information system has always been a critical concern. In a study [4] on the upcoming challenges of IoV, authors identify security as a critical challenge to overcome. IoV is the future of intelligent vehicular communication and smart cities [19]. Multidimensional security problems occur within IoV, making security one of the main challenges for ad hoc networks. Available models for trust evaluation and management in IoV do not provide the ultimate solution and thus require improvement [11,20]. Many trust models for vehicle networks have been proposed; we discuss some of the renowned models in this section. Most of the trust models available are based on VANETs or ITS. A few recent works can be found on IoV considering trust [21,22]. The existing models can be classified by trust measuring type and categorized into three: data-centric, entity-centric, and hybrid models.

Trust Models
Trust is determined by the received messages in data-centric models. In a data-centric model, neighboring nodes share their trust opinions on specific events and calculating the trust by majority estimate [23]. The key downside of such models is, they neglect information relevant to the node [11]. Entity-centric models are based on trust credit building. In these models, the interaction experience is an important characteristic. When measuring trust in real-time, inter-node interaction experience is taken into account [24,25]. In one of the entity-centric trust models, the level of trust is determined by fuzzy logic [25]. Another such model presented by [26] work on prior experience, Certificate authority (CA) and, neighbor opinion. The key issue with these models is that they presume that the malicious activity cannot be performed when the node is authenticated. The second drawback is the dependency on CA. Overall entity centric models are a very powerful tool for evaluating trust. However, these models miss the useful aspects of data centric.
Hybrid models incorporate properties of data and entity to determine trust. Most of these models evaluate the data trustworthiness and keep track of the node trust. A combination of role-based and experience is one such popular model [27]. Another hybrid model evaluates trust by neighbor opinion and similarity-trust [28]. In a research study, researchers used social-trust while evaluating trust [29]. Some researchers have used probability methods, such as the law of Bayes, the theory of evidence, and the theory of Markov [24,30]. For handling the uncertainty during TM, a couple of hybrid models have been proposed [11,31]. Even though the hybrid models are a combination of data and entity, their dynamic integration is still missing [11]. However, to the depth of study, there is no such trust model that uses all the available information during the event. All the models evaluate trust based on specific predefined parameters and scenarios.

Machine Learning in Vehicular Networks
Machine learning has notable coherence with IoV due to its ability to generate big data. Significant research work has been conducted so far on VANET using machine learning. A research study investigated different aspects of detecting misbehaving vehicles by machine learning and found machine learning an effective method for vehicular network security [14]. The VANET and machine learning have strong coherence, explored in a comparative study between machine learning and VANETS [15]. Researchers in their work used machine learning to detect DDOS attacks in vehicular networks, the study concluded with positive results [32]. Another research work on VANET security worked on false node position attacks and applied a machine-learning algorithm to solve the problem [33]. Most of the studies have shown promising results.

Context Awareness
The main aim of using context-awareness is to add flexibility by making maximum use of the available data. Authors in a review study elaborate on the potential need for an AI approach for the trust evaluation in vehicular networks; coherence between VANET security and AI solutions exists [20]. A context is a form of knowledge that relates to the problem-solving ability of humans [34]. AI techniques are suitable for context development [35].
The proposed TM framework relies on a hybrid approach using both data and entity properties. On the other hand, it is also worth mentioning that all the available models are for VANETS/ITS. In contrast, our model is a novel trust model for IoV that works on context adaption and machine learning. To the best of our knowledge, no such trust model is presented in the field yet. Our proposed trust evaluation framework is based on context-awareness.

Proposed TM Framework
The proposed IoV TM framework is intended to fill the gaps in trust evaluation. The framework works by establishing a context for an event using a cognitive approach. Fig. 2 shows the proposed TM framework. The framework has three key components: input parameters, context building, and trust evaluation. The framework proposed designs on the best practices of VANET, implemented by notable trust models [24,25,27,28,36]. The framework components are derived from the generic context-aware flow model [34,37,38]. Once a critical incident alert is received from nearby vehicles, next, it requires the sender's authentication. The is responsible for carrying out the authentication process using the public key infrastructure (PKI). Since it needs cryptography and exceeds the scope of our work, we have already considered existing solutions.

Figure 2:
Proposed TM framework with three main modules: input parameters, context building, and trust evaluation

Parameter Input Layer
After the authentication process, the second task is parameter management, as shown in Fig. 2. Information about an event is filtered out to obtain parameters from the received message token. For ease, the framework classifies all the possible parameters. The information obtained from the node is formalized in "cues," the cues are fed into the context layer for context building. Parameter prioritization is the main task since some parameters have a more valid source and weight than others.

Context Building Layer
This layer is responsible for building a context around the road event. Context building is the responsibility of the inference engine. The immediately available data is called a low-level context that is transformed into a high-level context. Context data is interconnected information set mostly containing uncertainty. The uncertainty must be handled before submitting it to the context inference process [34]. First, to create a context, the raw data is used, and it is a prerequisite for the development of the context [35]. The perimeter module is responsible for providing available information in the form of readable "cues" to the context module. Context offers complete information to evaluate trust in an event. The context management module is responsible for handling context information obtained from the previous layer.

Trust Evaluation Layer
The responsibility of this layer is to calculate the trust level. Since the proposed framework is based on the cognitive context approach, the inference engine and trust evaluation modules con-tinuously exchanged information. The trust evaluation layer consists of different modules. In the proposed TM framework, the trust evaluation process is adaptive. The most used approaches are experience-based, role-based, opinion-based, cluster-based, thread-based, as illustrated in Fig. 2. Different scenarios require different evaluation modules to evaluate the trust. The proposed trust framework matches the most suitable trust evaluation module with an event's scenario using context-awareness.

Trust Metric
It is essential to define trust metrics before moving towards the trust framework. All existing models have taken into consideration different paraments as metrics. Experience is one of the most used parameters by many models [25,27,28]. Different models also use parameters such as time, location, distance, and others. The goal of our work is to use the maximum available parameters to evaluate trust. The set of trust metrics is made adaptive for this purpose.

Trust Evaluation Process
The proposed TM framework evaluates trust based on a critical road event. Trust evaluation revolves around the road-event. The event is denoted by Event ID (Ev_ID). The reporting vehicle that has evident the event itself is Reporter-vehicle (Rp_veh). The vehicles that are not evident in the event still carry the message for hopping or beaconing they are denoted as Carrier-vehicle (Cr_veh). The road event is illustrated in Fig. 3. Local database and centralized data repositories are denoted as Local-database (Loc_DB) and Centralized-database (Cnt_DB). Tr denotes the trust level value. The trust level of a specific event is represented by Ev_ID_Tr , and the trust level value of any node is represented by Veh_ID_Tr . The unique ID identifies every vehicle in the network. The trust level value is updated after the completion of the event in both local and centralized databases.

Assumptions
The following assumptions are made to ensure the proper functioning of the proposed TM framework: • All the vehicles in the network are equipped with OBUFog.
• All vehicles have a uniform communication platform.
• A third-party CA manages the PKI.

Context Ontology
Ontology is one of the promising methods for context building, which is based on formal logic. With the help of ontology, the context information is built around a road-event. An ontology is an explicit, systematic definition of concepts in a discourse domain. A brief ontology of TM framework is shown in Fig. 4, which illustrates the hierarchical classes and instances of the concept. It is not obligatory to store all relationships with ontology; existing triplets can generate new facts. Together with a collection of individual class instances, an ontology forms a knowledge-base [12].

Trust Level Threshold
The Trust level (Tr ) is measured between 0 and 1, an untrustworthy report is denoted as 0, and the highest level of trust is represented by 1 [27,39]. All the values between 0 and 1 are considered as different levels of trust. Initially, all the vehicles are assigned with Tr 0 in Cnt_DB by CA. Unlike common nodes, Tr of special vehicles (ambulance, police, fire brigade) initialized with 0.5. Likewise, the trust of an event Ev_ID_Tr is evaluated and managed during the event. The centralized and local database also store and update event trust value for future verification and trust management. A reported event's trust is also measured between 0 and 1, where 1 represents the highest trust level and 0 for untrusted. The weight allocation Tab. 1 represents: initial trust, previous trust (experience), and trust reward.
The accumulated trust value is the combination of multiple trust values obtained from different modules. The Tr 1 is the mean trust value of all reporting vehicles obtained by Eq. (1). Reporting vehicles for an event, include general and special vehicles. Carrier vehicles have two types based on hopping and beaconing. Eq. (2) is used to obtain the mean trust Tr 2 from carrier vehicles involved in beaconing, whereas carrier vehicles involved in hopping participate in the thread module, discussed in the related section.
In certain situations where the special vehicle is involved, the mean trust value Tr 3 of all Spe_ can be obtained by Eq. (3). Since the TM framework is adaptive and context-based, the TM might use different trust modules each time. The accumulated trust value of an event Ev_-ID_Tr is obtained by Eq. (4). Ev_ID_Trlv = Trlv 1 + Trlv 2 + Trlv 3 Eq.
(3) serves as the fundamental method for evaluating the trust during the modules: experience, role-base, and opinion-based trust evaluation, each with related specifications. At this point, it is necessary to mention that Eq. (3) involves only those vehicles having a higher or equal experience level than 0.5. Those vehicles with less than 0.5 trust experiences are filtered out and used in other inferencing modules, depending on the context.

Cluster Module
The cluster-based module is used only for those vehicles that follow the same route daily, In these situations the cluster approach is the most appropriate. The urban traffic mostly follows a pattern by the same vehicle daily. Cluster lists are managed on Cnt_DB and Loc_Db. The reports are broadcasted to all the vehicles in the specific cluster. In combination with other modules, the cluster module is used to determine the trust level according to the context.

Thread Based/Hopping
This module is likely to be used where there are hopping in higher numbers than reports and opinions. It is one of the effective techniques used by some models. The basic concept of this module is to use multiple threads of hops. The trust level depends on the thread level (thr_lev). This module requires at least two Rp_vhs. The level of the thread increases with the intersection of two threads of a single report. The Cr_vh, when receives the report with the same thread, considers it as thr_lev 1. The level of a thread increases only if a new thread is found. An increment at each level consequently increases the trust level of the report by 0.2.

Uncertainty Using Statistical Bayesian Machine Learning
Mostly, uncertainty occurs in scenarios where the number of nodes is small. Uncertainty is one of the challenges of evaluating trust. Some TM models vaguely discuss uncertainty in their methodology, such as [30,39]. Using a simple probability is one of the approaches used to handle uncertainty, which is unrealistic. Some approaches include Dempster Shafer theory. The proposed TM framework uses the BML approach to deal with uncertainty. Though BML is one of the rarely used machine learning techniques, it is a constructive statistical method that matches the uncertainty in IoV. Using BML allows combining multiple prior evidence and matches the nature of the problem under discussion. The Bayes rule helps to measure the likelihood of a message being true or false. The experience property must be less than 0.5, as used in Eq. (5). The probability of a report being false can be obtained by Eq. (5). In some circumstances, Rp_vh with less experience is considered under uncertainty. Eq. (6) calculates the trustworthiness of a report. P(rep_flase |exp_ < 0.5) = P(exp_ 0.5|rep_flase).P(rep_flase) P(exp_ < 0.5) (5) P(rep_true |exp_ < 0.5) = 1 + P(rep_flase | exp_ < 0.5) = P(exp_ 0.5|rep_flase).P(rep_flase) P(exp_ < 0.5) The TM framework performs one step further to be more accurate and use additional available information to infer the trust level of a report. The framework uses BML with multiple evidence; this lets the system infer cognitively when more contextual information is available. Eq. (7) uses multiple evidence BML by adding the direction of the node in Eq. (6).
Alg. 1 shows the algorithm for a reported event, where a new event report is received. The event is matched with the previous event list, synchronized if the event is previously available; otherwise, a new event is created. Furthermore, the algorithm explains the management of the overall report.

Algorithm 1: Trust Evaluation Algorithm
Pseudocode of event trust evaluation algorithm 1.
Event Message received 3.
opt for experience module 20.
opt for role-based module 22.

Message Token
For the TM framework, a token is designed that contains all related information and the message itself. The token is divided into two parts in Tab. 2: the header contains all information 24. opt for Opinion module 25.
elseIf (vehicles share same path frequently) than 26.
opt for Cluster module 27.
opt for thread-based module 30.
broadcast Ev_ID is untrustworthy with Ev_ID_Trlv 42. End related to vehicle and message. The "message" portion only contains information related to roadevent. The token is associated and identified with "vehicle ID". The critical messages can be categorized into four categories: accident, traffic congestion, work in progress, and diversion, defined in the event description.

Simulation
The simulation is conducted using MATLAB R2020a and Protege−5.5.0 with the "HermiT 1.4.3 456" reasoner. Depending on the simulation model's scenario, the number of nodes in each random event varies from 2 to 50. The special nodes are set to a maximum of 10%, rest of the 90% nodes are general vehicles. The node trust is initialized by 0 experience, and the special nodes with trust experience by 0.5. A significant concern about simulations relates to outcome variation; outcomes at each attempt may vary due to the experiment's random design. There can be a considerable variation between the same simulation instances in the wireless topology and the network's design. General nodes include reporting and carrier vehicles. In the road network, road incidents occur at random in the experiment. Each scenario runs 100 iterations to exhaust the simulation.
The framework is tested using scenario-based simulation. The scenarios are divided into three categories to get maximum performance and exhaust the simulation. 1. Heavy traffic: These scenario patterns are often found during rush hours or congestion in urban traffic. These scenarios take up to 50 nodes into consideration. Numerous reports of an incident can be found in these scenarios, making it easy to determine any reported event's level of trust. Most of the trust models perform well in these scenarios. 2. Moderate traffic: 11 to 25 nodes, taken into consideration in these scenarios. These scenarios are typically found during low rush hours in urban areas or on highways. The amount of information available in these situations is also moderate. The proposed framework takes full advantage of the available data, unlike other models that work under limited information. 3. Low traffic: This depicts a traffic pattern found on highways, towns, and rural areas. 2 to 10 nodes are involved in this type of scenario. The lack of information makes these the most critical scenarios. Most of the trust models fail in these scenarios due to the little information available. These scenarios contain maximum uncertainty, which is a significant issue during the trust evaluation.

Performance Analysis and Discussion
The module distribution by generating random road events up to 50 nodes can be observed in Fig. 5a. The experience-based module is more likely to occur, and the reason might be the experience property of all the vehicles; this is a significant finding in understanding the type of vehicle involved in the trust evaluation process. The variation of the trust evaluation module can be observed in Fig. 5a. With the smaller number of nodes involved, the opinion module is the highest. In comparison, the moderate node involvement experience module is highly observed. Another notable finding is that the thread-based module requires a higher number of nodes, shows in Fig. 5a. Nonetheless, we believe that it is well justified that uncertainty is high with low node density, which reveals the relations between less information and uncertainty. Depicted in Fig. 5b, involving 10 nodes in the different events, show the "opinion" is with the highest occurrence followed by the experience, the reports with uncertainty are also high. Fig. 5c shows the increment in the number of nodes up to 25 involved in an event. Experience and opinion are almost equal, whereas uncertainty is decreasing. Finally, Fig. 5d shows that the uncertainty is almost inversely proportional to the number of nodes involved in the event. Fig. 6a shows the trust level evaluated under the experience module. One of the essential features of this framework is the confidence score. Even in the case of high trust, there might be a low confidence score. This feature makes the report more elaborative. Fig. 6b shows the thread module outcomes; a gradual trend during the trust-building phase can be observed. The trust is directly proportional to the thread level. Here it is significant to explain that the level of a thread is not the hop level, which can be observed in Fig. 6b. The results in Fig. 6c show the opinionbased trust evaluation; the confidence line in Fig. 6c supports the legitimacy of trust value. A significant aspect is noted here the confidence score has no coherence with the level of trust. The behavior of "confidence" as an independent variable helps the system make better decisions during trust evaluation. The involvement of special vehicles makes the event more legitimate. A relatively higher trust level can be observed in Fig. 6d, where a special vehicle is involved in trust evaluation. The more special vehicles involved, the higher the level of trust that can be achieved. Fig. 6e summarizes the findings by the uncertainty module. The number of discarded reports is low, and a gradual decrease is observed with an increase in the number of nodes. The use of big data will probably improve the module's precision over time due to the increased information availability.

Limitations
The following limitations must be considered for this TM framework: • The experiment is conducted using a limited number of nodes, and a large-scale experiment will further improve the accuracy of the results. • The simulation has shown promising results. An on-road experiment still needs to be conducted. • The framework does not take the human factor into account.
• The analysis of big data and its involvement in trust evaluation will improve accuracy.
A machine learning model is presented in future work that needs to be implemented for further investigation.

A High-level Machine Learning Inference Model as Future Research Direction
In a short time, the IoV can produce big data; every node generates multiple sensor data. Given this study's limited scope, a high-level machine-learning model as future work is illustrated by Fig. 7. The process begins with data inputs from centralized DB. The second phase is the preprocessing involving: data quality assessment, data cleaning, finding relations, normalization, and feature scaling to specify a range of variables. A mix of supervised and unsupervised learning is suitable to achieve maximum knowledge extraction [40]. The use of regression is ideal for continuous variable prediction. Linear support vector machine (LVSM) is the most suitable because of the trust threshold value [41]. An artificial neural network (ANN) is recommended to automate huge sensor data. Similarities in road events are the main reason to use ANN. Malicious node detection is a crucial issue in IoV; the clustering technique is more appropriate and efficient to solve this issue. For clustering, the Gaussian mixture model (GMM) is highly recommended [42]. The reason behind this is the two parameters description method [43]. The training and testing are the subsequent phases to be carried out for regression by the adapting 70:30 general rule. To achieve maximum accuracy of the model, precision, recall, and F-1 score are recommended in the validation phase.

Conclusion
Soon, IoV will play a significant role in smart cities and ITS. Many areas of IoV need to be standardized before implementing it as real-time road networks. Trust management is a primary component of vehicle network security, helping prevent a breach of security. In this article, a TM framework is presented for IoV to ensure secure on-road vehicle communication. The proposed trust framework is based on context-awareness, thus use maximum available information resources. The framework works well in critical scenarios where other models fail. The framework intelligently chooses between different trust evaluation modules, thereby allowing full use of available information to determine the received message's trustworthiness. The results indicate that the framework will contribute to trust management security in the vehicle network. The use of the Bayesian method of machine learning was effective when dealing with uncertainty. Our presented trust framework shows promising results. To the best of our knowledge, the work is novel, and no cognitive trust management scheme for IoV has been proposed. The proposed framework is equally beneficial for IoT security.