Machine Learning-Based Detection of DDoS Attacks in VANETs for Emergency Vehicle Communication

Bappa Muktar; Vincent Fono; Adama Nouboukpo

doi:10.32604/cmc.2025.067733

icon Open Access

ARTICLE

Machine Learning-Based Detection of DDoS Attacks in VANETs for Emergency Vehicle Communication

Bappa Muktar^*, Vincent Fono, Adama Nouboukpo

Department of Computer Science, University of Quebec in Outaouais (UQO), 283 Boul. Alexandre-Taché, Gatineau, QC J8X 3X7, Canada

* Corresponding Authors: Bappa Muktar. Email: email , email

(This article belongs to the Special Issue: Smart Roads, Smarter Cars, Safety and Security: Evolution of Vehicular Ad Hoc Networks)

Computers, Materials & Continua 2025, 85(3), 4705-4727. https://doi.org/10.32604/cmc.2025.067733

Received 11 May 2025; Accepted 04 September 2025; Issue published 23 October 2025

Abstract

Vehicular Ad Hoc Networks (VANETs) are central to Intelligent Transportation Systems (ITS), especially for real-time communication involving emergency vehicles. Yet, Distributed Denial of Service (DDoS) attacks can disrupt safety-critical channels and undermine reliability. This paper presents a robust, scalable framework for detecting DDoS attacks in highway VANETs. We construct a new dataset with Network Simulator 3 (NS-3) and Simulation of Urban Mobility (SUMO), enriched with real mobility traces from Germany’s A81 highway (OpenStreetMap). Three traffic classes are modeled: DDoS, Voice over IP (VoIP), and Transmission Control Protocol Based (TCP-based) video streaming (VideoTCP). The pipeline includes normalization, feature selection with SHapley Additive exPlanations (SHAP), and class balancing via Synthetic Minority Over-sampling Technique (SMOTE). Eleven classifiers are benchmarked—including eXtreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), Adaptive Boosting (AdaBoost), Gradient Boosting (GB), and an Artificial Neural Network (ANN)—using stratified 5-fold cross-validation. XGBoost, GB, CatBoost and ANN achieve the highest performance (weighted F1-score = 97%). To assess robustness under non-ideal conditions, we introduce an adversarial evaluation with packet-loss and traffic-jitter (small-sample deformation); the top models retain strong performance, supporting real-time applicability. Collectively, these results demonstrate that the proposed highway-focused framework is accurate, resilient, and well-suited for deployment in VANET security for emergency communications.

Keywords

VANET; DDoS attacks; emergency vehicles; machine learning; intrusion detection; NS-3; SUMO; traffic classification; supervised learning; artificial neural network

1 Introduction

Vehicular Ad Hoc Networks (VANETs) have emerged as a cornerstone of Intelligent Transportation Systems (ITS), enabling real-time communication between vehicles and infrastructure to improve traffic efficiency and road safety [1,2]. Unlike general wireless networks, VANETs operate in highly dynamic environments characterized by high node mobility, rapidly changing topologies, and stringent latency requirements. These distinctive features—combined with decentralized architecture and frequent handovers—introduce unique design challenges for communication reliability, scalability, and security. VANETs are particularly vital for emergency response units, which rely on uninterrupted connectivity to minimize response time and save lives. However, their open communication channels, decentralized architecture, and dynamic topology expose them to a wide range of cybersecurity threats [3]. Among the most critical of these threats are Distributed Denial of Service (DDoS) attacks, which aim to overwhelm network resources and degrade the performance of safety-critical services. Such disruptions can cause severe consequences, including delayed emergency interventions, increased traffic congestion, and potential loss of life [4,5].

In parallel, adjacent research in Wireless Sensor Networks (WSN) has explored multi-pronged defenses that blend detection, trust, and localization. For example, Kaur et al. propose a deep-learning and blockchain approach combined with the Distance Vector-Hop (DV-HOP) algorithm to mitigate DDoS while preserving accurate node localization, further refined via mayfly-based optimization [6]. Their simulations report improvements in localization error and misclassification rates, highlighting the value of combining learning-driven detection with decentralized trust mechanisms. While WSN and highway VANETs differ in link layer, mobility, and traffic models, this line of work underscores the importance of robustness under adversarial conditions and informs our focus on stress-testing detection models in mobile wireless environments.

Despite increasing academic interest in intrusion detection systems for VANETs, many existing studies present notable limitations, such as exclusive reliance on synthetic datasets, lack of reproducibility and an overemphasis on dense urban environments [3,7]. In particular, realistic highway scenarios—where uninterrupted communication for emergency vehicles is equally critical—remain significantly underexplored. Moreover, most prior research depends on a single machine learning classifier, which limits the robustness and generalization capacity of the proposed models.

To bridge these gaps, this paper proposes a comprehensive machine learning-based framework for detecting DDoS attacks in VANETs operating in highway environments.

The main contributions of this work are as follows:

• We design and simulate realistic VANET traffic using the Network Simulator 3 (NS-3) and Simulation of Urban Mobility (SUMO) simulators, incorporating real-world vehicle mobility traces from Germany’s A81 highway extracted via OpenStreetMap (OSM).

• We evaluate a wide range of supervised learning algorithms, including eXtreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), Adaptive Boosting (AdaBoost), Extremely Randomized Trees (Extra Trees), Random Forest (RF), Gradient Boosting (GB), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Logistic Regression (LR), Decision Tree (DT), and Artificial Neural Network (ANN).

• We apply SHapley Additive exPlanations (SHAP) to assess feature importance, thereby enhancing the interpretability and reliability of the models. The proposed framework achieves excellent predictive performance, with F1-scores reaching up to 96% for XGBoost, GB, CatBoost and ANN classifiers.

The remainder of this paper is organized as follows: Section 2 presents a comprehensive literature review of machine learning-based intrusion detection in VANETs. Section 3 details the methodology, including dataset generation and preprocessing. Section 4 describes the classifiers used and the predictive modeling approach. Section 5 reports and discusses the experimental results. Finally, Section 6 concludes the paper and outlines future research directions.

2 Literature Review

Securing VANETs against DDoS attacks has emerged as a critical research area due to the potential disruptions in vital communication channels, especially those involving emergency vehicles. Recent advances have emphasized developing robust, accurate, real-time intrusion detection mechanisms utilizing machine learning (ML) and deep learning (DL) approaches.

Several researchers have investigated innovative machine learning models tailored explicitly to the unique constraints of VANET environments. For instance, Setia et al. proposed a framework employing machine learning combined with fuzzification methods within cloud-based VANET systems, achieving a remarkable accuracy of 99.59% in proactively detecting DDoS threats [8]. Similarly, Polat, O. et al. introduced a hybrid model blending a one-dimensional Convolutional Neural Network (1D-CNN) with decision trees for real-time detection in Software-Defined Vehicular Ad-Hoc Networks (SD-VANETs), attaining an accuracy close to 90% [4]. Further expanding this direction, Polat et al. presented an advanced deep learning architecture using stacked sparse autoencoders combined with a softmax classifier, significantly improving accuracy to approximately 96.9% in SDN-based VANET scenarios [9].

Addressing not only attack detection but also network congestion, Gopi et al. developed a two-phase Intelligent Denial of Service (DoS) Attack Detection with Congestion Control (IDoS-CC) system. Their methodology combined Teaching and Learning-Based Optimization (TLBO) with a Gated Recurrent Unit (GRU) deep learning model, demonstrating substantial reductions in network congestion and improved detection accuracy [10]. Kadam and Sekhar also contributed notably by proposing a hybrid classification approach (KSVM) integrating K-Nearest Neighbors (KNN) and SVM, exhibiting superior sensitivity, recall, and precision compared to traditional classifiers [11].

Achieving data realism and reproducibility remains a key challenge often overlooked in the literature. In response, Alkadiri and Ilyas generated a contemporary dataset leveraging Objective Modular Network Testbed in C++ (OMNeT++), and SUMO simulations, optimized via SMOTE and classified using the XGBoost algorithm, achieving an F1-score of approximately 99% [12]. Similarly, Rashid et al. adopted OMNeT++ and SUMO for a realistic VANET simulation, presenting a real-time adaptive framework with various ML classifiers, yielding accuracies of up to 99% [13]. Oluchi Anyanwu et al. further optimized detection by integrating Radial Basis Function SVM (RBF-SVM) with Grid Search Cross-Validation, showing detection rates of 99.22% on realistic SDN-based VANET datasets [14].

Hybrid optimization and multi-stage detection systems have also been extensively explored. Marwah et al. combined modified SVM enhanced by Harris Hawks Optimization (HHO) and Whale-Dragonfly optimization for efficient routing and bandwidth allocation, significantly improving throughput and reducing communication overhead under DDoS conditions [15]. Adhikary et al. developed a hybrid model merging AnovaDot and RBFDot SVM kernels into a chained detection mechanism, achieving improved robustness and detection accuracy compared to single-kernel models [16]. Moreover, Tariq proposed a comprehensive detection framework integrating Autoencoders, Long Short-Term Memory (LSTM), clustering methods, fog computing, and blockchain technology, offering a low-latency, scalable, and robust solution with a detection rate of approximately 94% [17].

Deep learning-based anomaly detection approaches have recently gained momentum due to their scalability and superior pattern recognition capabilities. Lekshmi et al. leveraged convolutional autoencoders coupled with LSTM networks and self-attention mechanisms, achieving an F1-score of 98.20% in detecting DDoS attacks on realistic VANET data [18]. Similarly, Haydari and Yilmaz introduced a semi-supervised, non-parametric intrusion detection system using roadside units (RSUs), capable of detecting novel attack patterns without prior knowledge, significantly enhancing real-time responsiveness and detection accuracy [19].

Moreover, Gu et al. introduced a DRL-enhanced federated self-supervised learning framework for ISAC-enabled Vehicle Edge Computing, dynamically allocating tasks between vehicle on-board resources and roadside units to minimise energy consumption and accelerate model convergence, thereby reinforcing security in highly dynamic VANET contexts [20].

While extensive progress has been made, gaps remain in terms of evaluating these methodologies in realistic highway scenarios. Most existing works predominantly target dense urban environments or lack reproducible real-world mobility data, limiting the generalizability of results. Additionally, comprehensive comparisons of various machine learning classifiers within a unified, realistic highway scenario remain scarce.

Our study aims to address these critical gaps by evaluating multiple prominent ML classifiers—including XGBoost, CatBoost, AdaBoost, Extremely Randomized Trees (Extra Trees), Random Forest (RF), GB, SVM, KNN, LR, DT, and ANN—in a realistic VANET highway scenario. Leveraging NS-3 and SUMO simulators enriched with real mobility data from the A81 highway in Germany, our approach not only ensures realism but also enables reproducibility. Furthermore, data balancing through SMOTE and rigorous performance evaluation metrics (accuracy, precision, recall, and F1-score) strengthen our methodological framework, providing a robust and comprehensive assessment of classifier effectiveness.

Table 1 below summarizes and positions our work compared to existing state-of-the-art approaches based on several critical criteria.

images

This comparative analysis underscores the novelty and relevance of our research, emphasizing both methodological rigor and practical applicability, thus effectively filling the identified gaps in the current state of VANET cybersecurity research.

3 Methodology

This section outlines the methodological framework for developing a robust classification model for DDoS attacks in a VANET environment, simulating a realistic highway scenario.

3.1 Experimental Architecture

This section outlines the architecture and methodology used to simulate a realistic highway-based VANET under coordinated DDoS attacks. It details the scenario design, simulator integration, and incorporation of real mobility traces to ensure data realism and model applicability.

3.1.1 Scenario Description

Two simulation scenarios were designed to evaluate the performance and robustness of the proposed detection system.

Baseline DDoS Scenario.

The first scenario (cf. Table 2 and Fig. 1) simulates a VANET highway environment with 13 vehicles (from V0 to V12) moving at a constant speed. V0 to V2 act as legitimate nodes, while V3 to V12 act as malicious nodes. Vehicle V0, which symbolizes an emergency vehicle (for instance, an ambulance vehicle), generates Transmission Control Protocol (TCP) traffic to vehicle V2, which simulates a real-time video streaming application. Simultaneously, vehicle V1 transmits Voice over IP (VoIP) messages over User Datagram Protocol (UDP) to the same destination. Malicious nodes (V3 to V12) initiate a DDoS attack by overwhelming V2 with high-rate UDP traffic flows, aiming to disrupt its communication capabilities. This scenario demonstrates a critical security threat in a VANET highway environment, where a coordinated cyberattack jeopardizes the operational integrity of emergency communications.

images

Figure 1: Illustration of the VANET highway simulation setup used for both the baseline DDoS and adversarial-attack scenarios

Adversarial-Attack Scenario.

The second scenario (also cf. Table 2) builds upon the baseline DDoS setup but introduces adversarial conditions to test detection robustness in a more realistic, noisy environment. In addition to the 13-vehicle highway topology and identical traffic roles (V0 sending TCP video, V1 sending UDP VoIP, V3–V12 launching UDP-based DDoS), this scenario incorporates:

• A constant packet error rate (PER) channel model via a custom ConstantErrorRateModel, configured with a configurable LossRate (default: 1% PER), to simulate wireless impairments and random frame losses.

• Traffic deformation, where legitimate and malicious traffic experiences randomized packet sizes and data rate jitter, mimicking real-world variability and evading static detection patterns.

• RNG seed control, ensuring reproducible adversarial noise injection across multiple runs.

These perturbations affect both legitimate and malicious flows, introducing overlap in traffic characteristics and making classification more challenging. Output metrics such as average Signal-to-Noise Ratio (SNR) and noise levels are collected alongside standard FlowMonitor statistics to capture the combined effects of network load and channel impairments.

3.1.2 NS-3 and SUMO Integration

The experiment uses NS-3 [21] and SUMO [22] simulators to simulate communication protocols and vehicle dynamics. NS-3 handles network stack, protocol behavior, and traffic generation, while SUMO provides the precise mobility dynamics of the vehicle for realistic traffic scenarios.

3.1.3 Incorporation of Real Mobility Traces

To further enhance the realism of the simulation, real-world mobility traces from the A81 highway in Germany were integrated into the SUMO simulation and imported into NS-3 using the Ns2MobilityHelper module. This integration ensures that the generated dataset reflects authentic vehicular behavior and spatial-temporal patterns, thus increasing the applicability and reliability of the intrusion detection model trained on this data. Fig. 2 illustrates the A81 highway in OSM and its corresponding import within the SUMO environment.

images

Figure 2: Visualization of the A81 highway segment used in the simulation. (a) Map segment from OSM. (b) Simulation rendering in SUMO

3.1.4 Hardware and Software Environment

All experiments were conducted on a Windows 11 workstation equipped with an NVIDIA GeForce GTX 1650 GPU, 32 GB RAM, and a 1 TB SSD. The software stack is managed with Conda (channels defaults and conda-forge) and uses Python 3.10. Interactive development employed Jupyter 1.0.0 and notebook 7.0.8.

Core libraries:

• Numerical/data tooling: NumPy, pandas 2.0.2, SciPy, Polars, dask[complete], modin[all].

• Classical ML: scikit-learn 1.2.1, imbalanced-learn, joblib.

• Deep learning: TensorFlow 2.12.0, scikeras, keras-tuner.

• Gradient boosting: xgboost, lightgbm, catboost.

• Visualization: Matplotlib, Seaborn.

• Model interpretation: SHAP.

• I/O and export: fastparquet, openpyxl, nbconvert, tabulate.

• Web/API: Flask, Flask-RESTful, flask_cors, flask_smorest.

• Utilities and geospatial: requests, PyYAML, Pillow, mysql-connector-python, rootutils, geopandas, contextily, nodejs.

• Code quality: pylint, isort, autopep8.

To facilitate reproducibility, a detailed snapshot of the software environment is provided in Fig. 3.

images

Figure 3: Machine Learning environment snapshot for the experimental setup

Note: Only version-pinned packages from the environment file are shown explicitly (e.g., pandas 2.0.2, scikit-learn 1.2.1, TensorFlow 2.12.0, Jupyter 1.0.0, notebook 7.0.8). Other packages follow the latest compatible versions resolved by Conda/Pip at installation time.

3.2 Data Generation and Labeling

The simulated dataset utilized in this study comprises three distinct classes of network traffic: (DDoS), VoIP, and VideoTCP. Each traffic category was generated using appropriate application models within the NS-3 simulation environment. Specifically, VideoTCP traffic, emulating a real-time video streaming application, was produced using the BulkSendHelper application over a TCP connection directed toward the target vehicle. Concurrently, VoIP traffic was simulated using the OnOffHelper application, configured at a constant data rate of 64Kbps and a fixed packet size of 160 bytes, thereby adhering to the widely used G.711 standard in VoIP communications. In contrast, DDoS traffic was generated using the same OnOffHelper application, but set to a significantly higher data rate of 1Mbps per flow, explicitly modeling malicious traffic intended to saturate network resources.

To characterize the behavior and performance of each network flow, several relevant metrics were collected using the FlowMonitor module in NS-3. Key metrics extracted include the average throughput, measured in kilobits per second (Kbps), computed according to the following equation:

Throughput=8×RxBytesFlowDuration×103

where RxBytes denotes the total number of bytes received and FlowDuration represents the effective duration of the flow in seconds. The mean delay was calculated using:

MeanDelay=∑i=1NrxDelayiNrx

where Delayi is the delay experienced by each successfully received packet and Nrx corresponds to the total number of received packets. Additionally, the packet loss rate (LostPackets) was determined by calculating the difference between transmitted (Ntx) and received (Nrx) packets:

LostPackets=Ntx−Nrx

Lastly, each network flow was explicitly labeled according to its traffic class (DDoS, VoIP, or VideoTCP) based on the originating IP address and the employed network protocol. Consequently, TCP-based flows were systematically classified as VideoTCP, UDP-based flows originating from legitimate nodes (IP addresses ≤ 10.0.0.3) were labeled as VoIP, whereas UDP flows initiated by malicious bot nodes were categorized as DDoS. This meticulous labeling procedure enhances the reliability and accuracy of the dataset, facilitating the development of robust and effective intrusion detection models. Fig. 4 shows the first five rows of the dataset sample extracted from the NS-3 simulation.

images

Figure 4: Dataset sample

3.3 Data Preprocessing

The preprocessing stage is a fundamental step in building an effective intrusion detection model. This process was structured into three main phases: data cleaning and normalization, creation of a derived SNR variable, and class rebalancing through oversampling techniques.

3.3.1 Cleaning and Normalization

The raw dataset initially consisted of 6882 network flows described by 19 features, including identifiers, traffic characteristics, performance metrics, and physical measurements such as average signal and noise power. Several cleaning operations were applied:

• Removal of non-informative or highly correlated features: Columns such as FlowID, Src, Dest, SrcPort, DestPort, and Samples were discarded due to their low predictive value. Similarly, the temporal features TimeFirstTx and TimeLastRx were removed in favor of the derived feature FlowDuration, and DelaySum was excluded in favor of MeanDelay.

• Categorical feature encoding: The categorical variables Protocol and TrafficLabel were converted to numerical representations using LabelEncoder, where DDoS, VoIP, and VideoTCP were encoded as 0, 2, and 1, respectively.

• Duplicate removal: Approximately 7.5% of the data were identified as duplicates and subsequently removed to reduce model bias.

• Normalization: All numerical features were normalized using StandardScaler to enforce zero mean and unit variance—an essential condition for many machine learning algorithms.

3.3.2 SNR Feature Engineering

Although the dataset initially contained the fields AvgSignal_dBm and AvgNoise_dBm, a new variable representing the average SNR was computed as follows:

SNR¯=S¯−N¯

where S¯ and N¯ denote the mean received signal and noise power respectively, measured in dBm. However, SHAP analysis revealed that these features had negligible predictive value in the highway VANET scenario, and they were therefore excluded from the final dataset used for training.

3.3.3 Class Rebalancing Using SMOTE

Fig. 5 shows a significant class imbalance: 3489 DDoS flows, 1996 VoIP flows, and only 882 VideoTCP flows. To address this, we applied the SMOTE [23] to the training data. SMOTE generates synthetic samples for the minority classes, resulting in a balanced training set with 2617 flows per class.

images

Figure 5: Class distribution of traffic labels before applying SMOTE rebalancing

This rebalancing significantly improved model generalization and reduced bias toward the majority class during training.

3.4 Feature Selection

Feature selection plays a pivotal role in the development of any predictive model, particularly in the context of VANETs, where the dataset may include redundant or highly correlated variables. To identify the most relevant attributes for classifying network traffic (DDoS, VoIP, and VideoTCP), we adopted an interpretability-based approach using SHAP values (see Fig. 6). This method quantifies the marginal contribution of each feature to the model’s output while accounting for complex interdependencies among features.

images

Figure 6: Feature importance based on SHAP values

As illustrated in Fig. 6, the SHAP analysis highlighted TxPackets, LostPackets, and Protocol as the most influential features in predicting the traffic class. Although these features exhibit some degree of correlation, they offer complementary insights into traffic intensity and anomalous behavior, such as packet losses resulting from DDoS attacks.

Nonetheless, TxPackets and LostPackets, despite their high SHAP scores and strong correlation with the target variable, were deliberately excluded from the final feature set to mitigate multicollinearity effects. These variables directly influence several other performance metrics (e.g., ThroughputKbps and MeanDelay), and including them could introduce bias by over-representing certain aspects of the traffic.

The final selection includes the following features:

• Protocol: distinguishes UDP flows (VoIP) from TCP flows (VideoTCP), and supports the identification of traffic patterns typical of DDoS attacks.

• ThroughputKbps: reflects traffic intensity and helps discriminate between high-volume flows such as those generated by VideoTCP and DDoS.

• MeanDelay: captures average packet latency, which is critical for detecting delays caused by attacks or real-time services like VoIP.

• RxPackets: although moderately ranked in SHAP importance, this feature complements flow-level analysis without the redundancy of TxPackets.

• FlowDuration: captures the temporal dynamics of each flow and effectively substitutes highly correlated variables such as TimeFirstTx and TimeLastRx.

This refined feature set was selected based on its discriminative power while minimizing redundancy. It ensures improved robustness and interpretability of the classification model, which is essential for reliable intrusion detection in VANET environments.

4 Modeling and Classification

This section presents the modeling approach to classify network traffic in a VANET scenario under DDoS conditions.

4.1 Tested Machine Learning Models

To assess the ability to classify network traffic in a VANET environment, several machine learning algorithms were tested, encompassing both traditional methods and more advanced ensemble and boosting techniques.

The traditional models evaluated include:

• RF: An ensemble method based on building multiple decision trees and averaging their predictions to improve generalization.

• Extra Trees: Similar to Random Forest, but introducing greater randomness in the selection of splitting thresholds to enhance diversity.

• DT: A simple hierarchical model based on attribute-based decision rules.

• LR: A linear model adapted for multiclass classification through the softmax activation function.

• SVM: Using an optimized linear kernel to separate network traffic classes effectively.

• KNN: A non-parametric method that classifies each observation based on the majority vote among its k nearest neighbors.

Advanced boosting and ensemble methods were also evaluated:

• XGBoost: A gradient boosting framework optimized for multiclass classification tasks using the multi:softmax objective function.

• CatBoost: Designed to efficiently handle categorical variables and exhibit robustness against class imbalance.

• AdaBoost: An iterative ensemble technique that sequentially improves weak classifiers.

• GB: Builds models sequentially to correct errors made by prior models.

Finally, an ANN was designed and implemented using Keras. The architecture consists of:

• An input layer receiving 5 features (Protocol, ThroughputKbps, MeanDelay, RxPackets, FlowDuration).

• A first dense hidden layer with 32 neurons and a ReLU activation function.

• A second dense hidden layer with 16 neurons, also activated by ReLU.

• A Dropout layer with a rate of 30% applied after the second hidden layer to mitigate overfitting.

• A third dense hidden layer with 8 neurons and a ReLU activation function.

• An output dense layer with 3 neurons using the Softmax activation function to classify among three classes: DDoS, VoIP, and VideoTCP.

The ReLU activation function was selected for the hidden layers based on empirical testing during preliminary experiments, where it consistently yielded faster convergence. The Softmax activation in the output layer was chosen as it is the standard approach for multi-class classification tasks.

Fig. 7 illustrates the architecture of the designed ANN.

images

Figure 7: Architecture of the designed ANN

4.2 Training and Validation

To rigorously assess model generalization and mitigate potential overfitting, a stratified 5-fold cross-validation (CV) methodology was employed for all classifiers, including the ANN implemented with TensorFlow/Keras. The choice of 5-fold was made as it provides an optimal balance between computational efficiency and robust estimation of generalization performance. Lower values (e.g., 3-fold) may yield overly pessimistic estimates, while higher values (e.g., 10-fold) substantially increase computational cost without offering significant performance gains, particularly given the moderate size of our dataset.

During ANN training, dropout and early stopping were maintained to further reduce the risk of overfitting. The CV procedure not only provided statistical robustness to performance estimates but also ensured that the models were consistently evaluated on unseen data in each fold. Mean training and validation accuracy/loss trends from the CV runs, together with the averaged confusion matrix (CM) of the best model, are presented in Section 5.2.2 to visually confirm stable convergence and the absence of over-specialization.

4.3 Model Evaluation

The performance of each classification algorithm was assessed using standard evaluation metrics derived from the confusion matrix, namely Accuracy, Precision, Recall, and F1-score. These metrics quantify the models’ ability to correctly classify the network traffic into the three categories: DDoS, VoIP, and VideoTCP. The definitions and formulas are as follows:

• Accuracy (AC): Represents the ratio of correctly predicted instances over the total number of samples. It is computed using:

Accuracy(AC)=TP+TNTP+TN+FP+FN

• Recall (R): Measures the proportion of true positives detected among all actual positive cases. The formula is:

Recall(R)=TPTP+FN

• Precision (P): Indicates the ratio of correctly predicted positive observations to the total predicted positives:

Precision(P)=TPTP+FP

• F1-score: Combines precision and recall into a single metric by calculating their harmonic mean:

F1 Score=2×Precision×RecallPrecision+Recall

To compute these metrics for each algorithm, the confusion matrices were extracted after testing on the evaluation set. These matrices contain the number of true positive (TP), false positive (FP), true negative (TN), and false negative (FN) predictions for each class. The values were used to assess how each model performed in distinguishing between normal traffic (VoIP, VideoTCP) and malicious traffic (DDoS).

5 Results and Discussion

This section outlines the performance outcomes of the machine learning models used in this study and provides a corresponding analysis and interpretation of these findings.

5.1 Results

The classification report summary (Table 3), together with the comparative analysis of F1-scores across various algorithms (Fig. 8), offers a thorough evaluation of the predictive capabilities of each model.

images

Figure 8: F1-score comparison across models

5.2 Results Analysis

This section analyzes the classification results obtained from various models, focusing on overall performance, robustness to class imbalance, and sources of misclassification. Key insights are drawn from evaluation metrics and confusion matrices to highlight model strengths and areas for improvement.

5.2.1 Performance Interpretation

The classification results in Table 3 and Fig. 8 show that boosted learners and the neural network deliver the strongest performance. In particular, XGBoost, GB, ANN, and CatBoost achieve the highest overall F1-scores of 0.97. Tree-based baselines also perform strongly: Extra Trees, RF, and a single DT reach 0.96. Mid-tier results are obtained by KNN and AdaBoost with overall F1-scores of 0.95. In contrast, linear models—LR and SVM—trail with overall F1-scores of 0.86, indicating limitations in capturing the non-linear interactions present in VANET traffic under DDoS, VoIP, and VideoTCP scenarios.

5.2.2 Overfitting & Generalization Analysis

Every classifier was evaluated by stratified 5-fold cross-validation. Table 4 lists the mean weighted F1-score together with the standard deviation, as well as the average training and inference times per fold. The best overall score is obtained by XGBoost (0.970±0.006), closely followed by the ANN (0.969±0.002) and CatBoost. All ensemble methods maintain a standard deviation below 0.007, indicating that their performance is highly stable across different data partitions. In contrast, linear SVM and LR remain more variable and clearly underperform the tree-based models.

images

Fig. 9 shows the fold-averaged learning curves for the ANN. Training and validation trajectories almost coincide and converge rapidly; neither the loss nor the accuracy curves exhibit divergence, providing strong evidence that the dropout regularisation (p=0.3) and early stopping successfully prevent overfitting.

images

Figure 9: Fold-averaged learning curves of the ANN over 5-fold CV. Error bands (barely visible) represent one standard deviation

Finally, the confusion matrix averaged over the five folds for the best model (XGBoost, Fig. 10) shows a nearly perfect separation of VideoTCP flows, while the few residual errors are limited to a symmetric confusion between DDoS and VoIP (134 VoIP packets flagged as DDoS and 102 DDoS packets flagged as VoIP). This pattern corroborates the class-similarity analysis reported in Section 5.2.5.

images

Figure 10: Confusion matrix averaged over the five validation folds for the XGBoost model

5.2.3 Robustness to Class Imbalance

The dataset is inherently imbalanced across DDoS, VoIP, and VideoTCP. After applying SMOTE on the training set, the per-class results in Table 3 show that imbalance no longer dominates performance. VideoTCP is the easiest class: every model attains Precision = Recall = F1-score = 1.00. For DDoS, most models achieve F1 in the 0.95–0.97 range, with a recall dip for AdaBoost (0.88) and substantially lower scores for linear baselines (LR: F1 0.81, SVM: F1-score 0.80). The VoIP class remains the most challenging for simple linear models (LR: F1 0.79, SVM: F1 0.78), whereas boosting methods and the ANN maintain high performance (typically F1 ≈0.95–0.96); AdaBoost exhibits a precision–recall trade-off (precision 0.89, recall 0.98). Overall, these results indicate that SMOTE combined with non-linear learners (boosting trees and ANN) effectively mitigates the initial class skew, while residual errors concentrate around VoIP/DDoS separability for linear baselines.

5.2.4 Robustness to Adversarial Traffic

To further assess generalization, we evaluated the five best classifiers—ANN, XGBoost, CatBoost, GB, and ET—on an unseen adversarial-attack dataset (6777 flows: 4000 DDoS, 777 VideoTCP, 2000 VoIP).

As reported in Table 5, overall (weighted) F1-scores remained high under distribution shift, ranging from 0.90 to 0.93. XGBoost and GB achieved the best overall robustness (F1-score = 0.93 each; accuracy = 0.93), CatBoost followed closely (F1-score = 0.91, accuracy = 0.92), while ANN and Extra Trees both reached F1-score = 0.90 (accuracy = 0.90).

images

Per-class results reveal two consistent patterns. First, VideoTCP was perfectly recognized by all models (precision = recall = F1-score = 1.00). Second, degradation concentrates in the VoIP class: recall spans 0.73–0.87 (Extra Trees: 0.73; CatBoost: 0.80; GB: 0.85; XGBoost: 0.86; ANN: 0.87), with precision between 0.81–0.93. In contrast, DDoS detection remains strong across models (recall 0.90–0.97); Extra Trees attains the highest DDoS recall (0.97) but trades off VoIP recall.

Overall, these results confirm that the top models exhibit robust performance on adversarial traffic: despite targeted distributional shifts, their weighted F1-score and accuracy stay at 0.90–0.93, with errors primarily confined to VoIP under challenging conditions.

5.2.5 Misclassification Analysis

Per-class results in Table 3 show that VideoTCP is almost perfectly identified by all models (precision/recall/F1-score ≈1.00 for XGBoost, CatBoost, GB, Extra Trees, RF, DT, KNN—and even LR/SVM). The residual errors arise mainly from confusion between DDoS and VoIP. For instance, while AdaBoost and ANN achieve strong overall performance (F1-score ≈0.95–0.97), they miss more DDoS flows (recall 0.88 and 0.91, respectively), and the linear baselines show the largest VoIP confusion (LR: P/R/F1−score=0.83/0.75/0.79; SVM: 0.82/0.74/0.78), yielding the lowest overall F1-score (≈0.86). The best performers—XGB, GB, CatBoost and ANN—substantially reduce this DDoS↔VoIP confusion, each attaining overall F1 ≈0.97.

On the adversarial–attack dataset (Table 5), the same pattern persists and intensifies: overall F1-score for the top models drops to 0.90–0.93, driven almost entirely by a decline in VoIP recall (0.73–0.87), while VideoTCP remains perfect (1.00/1.00/1.00) and DDoS recall stays high (0.90–0.97). These observations suggest that the current feature set (Protocol, ThroughputKbps, MeanDelay, RxPackets, FlowDuration) does not fully capture short-term rate dynamics that separate low–rate VoIP from high–rate UDP floods. Incorporating jitter/inter–arrival variance, burstiness and packet–size statistics, or using sequence models that exploit temporal structure—possibly with cost-sensitive training or threshold calibration toward VoIP—should further mitigate these errors.

6 Conclusion and Future Work

This paper presented a comprehensive evaluation of multiple machine learning techniques for detecting DDoS attacks in VANETs, specifically targeting emergency vehicle communication scenarios on highways. Leveraging a realistic simulation setup, which integrates the NS-3 network simulator with the SUMO mobility simulator and real-world vehicular mobility traces from Germany’s A81 highway, we generated a robust and reproducible dataset for rigorous evaluation.

The experimental results demonstrated that several machine learning algorithms, notably XGBoost, GB, ANN, and CatBoost, achieved outstanding classification performance, with overall F1-scores reaching up to 0.97. Other models such as Extra Trees, RF, and DT also performed strongly with F1-scores of 0.96, while KNN and AdaBoost followed closely at 0.95. LR and SVM recorded the lowest performances at 0.86. These findings confirm that the adopted data balancing strategy via SMOTE was effective in addressing class imbalance, enabling accurate detection of all traffic types, including the minority VoIP class.

The study offers significant scientific contributions, including the introduction of a reproducible and realistic methodology combining NS-3 and SUMO simulators with authentic mobility data, and a systematic comparison of widely recognized machine learning classifiers in the context of highway VANET scenarios. Furthermore, the detailed SHAP-based feature selection analysis provided valuable insights into the key predictors necessary for accurate intrusion detection.

Despite these contributions, the study has several limitations. Primarily, the results remain constrained by the synthetic nature of the dataset, albeit enhanced by real-world mobility patterns. Moreover, the simulations did not encompass the full complexity of real-world communication scenarios, including dynamic signal propagation, variable network topologies, and real-time adaptive behavior.

Future research should focus on extending the present approach through the following perspectives:

• Conducting experiments in real-world settings by utilizing actual connected vehicles and infrastructure, which would validate and potentially refine the proposed classification models.

• Investigating the feasibility and effectiveness of deploying these detection systems onboard vehicles, thus enabling practical intrusion detection solutions in real-time scenarios.

• Expanding the methodology to detect other prominent cybersecurity threats in VANETs, including spoofing, Sybil, and blackhole attacks, thereby broadening the scope and practical applicability of the developed intrusion detection framework.

Acknowledgement: None.

Funding Statement: The authors received no specific funding for this study.

Author Contributions: Conceptualization, Bappa Muktar; Methodology, Bappa Muktar and Vincent Fono; Software, Bappa Muktar; Investigation, Bappa Muktar; Writing—original draft, Bappa Muktar; Writing—review & editing, Vincent Fono and Adama Nouboukpo. All authors reviewed the results and approved the final version of the manuscript.

Availability of Data and Materials: The data that support the findings of this study are available from the Corresponding Author, B.M., upon reasonable request.

Ethics Approval: Not applicable.

Conflicts of Interest: The authors declare no conflicts of interest to report regarding the present study.

Abbreviations

The following abbreviations are used in this manuscript:

1D-CNN	One-Dimensional Convolutional Neural Network
AdaBoost	Adaptive Boosting
ANN	Artificial Neural Network
CatBoost	Categorical Boosting
DL	Deep Learning
DDoS	Distributed Denial of Service
DT	Decision Tree
Extremely Randomized Trees	Extra Trees
GB	Gradient Boosting
GRU	Gated Recurrent Unit
IDoS-CC	Intelligent DoS Attack Detection with Congestion Control
KNN	K-Nearest Neighbors
LR	Logistic Regression
LSTM	Long Short-Term Memory
ML	Machine Learning
NS-3	Network Simulator 3
OMNeT++	Objective Modular Network Testbed in C++
OSM	OpenStreetMap
RF	Random Forest
RSU	Roadside Unit
SD-VANET	Software-Defined Vehicular Ad Hoc Network
SDN	Software Defined Networking
SHAP	SHapley Additive exPlanations
SMOTE	Synthetic Minority Over-sampling Technique
SNR	Signal-to-Noise Ratio
SVM	Support Vector Machine
SUMO	Simulation of Urban MObility
TLBO	Teaching and Learning-Based Optimization
UDP	User Datagram Protocol
VANET	Vehicular Ad Hoc Network
VoIP	Voice over IP
XGBoost	eXtreme Gradient Boosting

References

1. Dutta A, Samaniego Campoverde LM, Tropea M, De Rango F. A comprehensive review of recent developments in vanet for traffic, safety & remote monitoring applications. J Netw Syst Manag. 2024;32(4):73. doi:10.1007/s10922-024-09853-5. [Google Scholar] [CrossRef]

2. Pawar V, Zade N, Vora D, Khairnar V, Oliveira A, Kotecha K, et al. Intelligent transportation system with 5G vehicle-to-everything (V2Xarchitectures, vehicular use cases, emergency vehicles, current challenges, and future directions. IEEE Access. 2024;12(2):183937–60. doi:10.1109/ACCESS.2024.3506815. [Google Scholar] [CrossRef]

3. Al-Mohtaseb A, Hanoon AQ, Samara G, Al Daoud E, Alidmat O, Batyha R, et al. A comprehensive review of VANET attacks: predictive models, vulnerability management, and defense selection. In: 25th International Arab Conference on Information Technology (ACIT); 2024 Dec 10–12; Zarqa, Jordan. Piscataway, NJ, USA: IEEE; 2024. p. 1–9. [Google Scholar]

4. Polat O, Oyucu S, Türkoğlu M, Polat H, Aksoz A, Yardımcı F. Hybrid AI-powered real-time DDoS detection and traffic monitoring for software-defined-based vehicular ad hoc networks: a new paradigm for securing intelligent transportation networks. Appl Sci. 2024;14(22):10501. doi:10.3390/app142210501. [Google Scholar] [CrossRef]

5. Ababsa M, Ribouh S, Malki A, Khoukhi L. Deep multimodal learning for real-time DDoS attacks detection in internet of vehicles. arXiv:2501.15252. 2025. [Google Scholar]

6. Kaur B, Prashar D, Mrsic L, Almogren A, Rehman AU, Altameem A, et al. Enhancing the reliability and accuracy of wireless sensor networks using a deep learning and blockchain approach with DV-HOP algorithm for DDoS mitigation and node localization. EURASIP J Wirel Commun Netw. 2025;2025(1):46. doi:10.1186/s13638-025-02465-w. [Google Scholar] [CrossRef]

7. Vamshi Krishna K, Ganesh Reddy K. Classification of distributed denial of service attacks in VANET: a survey. Wirel Pers Commun. 2023;132(2):933–64. doi:10.1007/s11277-023-10643-6. [Google Scholar] [CrossRef]

8. Setia H, Chhabra A, Singh SK, Kumar S, Sharma S, Arya V, Gupta BB, Wu J. Securing the road ahead: machine learning-driven DDoS attack detection in VANET cloud environments. Cyber Secur Applicat. 2024;2(1):100037. doi:10.1016/j.csa.2024.100037. [Google Scholar] [CrossRef]

9. Polat H, Turkoglu M, Polat O. Deep network approach with stacked sparse autoencoders in detection of DDoS attacks on SDN-based VANET. IET Commun. 2020;14(22):4089–100. doi:10.1049/iet-com.2020.0477. [Google Scholar] [CrossRef]

10. Gopi R, Mathapati M, Prasad B, Ahmad S, Al-Wesabi NF, Abdullah Alohali M. Intelligent DoS attack detection with congestion control technique for VANETs. Comput Mater Contin. 2022;72(1):141–56. doi:10.32604/cmc.2022.023306. [Google Scholar] [CrossRef]

11. Kadam N, Sekhar KR. Machine learning approach of hybrid KSVN algorithm to detect DDoS attack in VANET. Int J Adv Comput Sci Appl. 2021;12(7):82. doi:10.14569/IJACSA.2021.0120782. [Google Scholar] [CrossRef]

12. Alkadiri N, Ilyas M. Machine learning-based architecture for DDoS detection in VANETs system. In: 2022 International Conference on Artificial Intelligence of Things (ICAIoT); 2022 Dec 29–30; Istanbul, Turkey. Piscataway, NJ, USA: IEEE; 2022. p. 1–7. [Google Scholar]

13. Rashid K, Saeed Y, Ali A, Jamil F, Alkanhel R, Muthanna A. An adaptive real-time malicious node detection framework using machine learning in vehicular ad-hoc networks (VANETs). Sensors. 2023;23(5):2594. doi:10.3390/s23052594. [Google Scholar] [PubMed] [CrossRef]

14. Oluchi Anyanwu G, Nwakanma CI, Lee J-M, Kim D-S. Optimization of RBF-SVM kernel using grid search algorithm for DDoS attack detection in SDN-based VANET. IEEE Internet Things J. 2022;10(10):8477–90. doi:10.1109/JIOT.2022.3199712. [Google Scholar] [CrossRef]

15. Marwah GPK, Jain A, Malik PK, Singh M, Tanwar S, Safirescu CO, et al. An improved machine learning model with hybrid technique in VANET for robust communication. Mathematics. 2022;10(21):4030. doi:10.3390/math10214030. [Google Scholar] [CrossRef]

16. Adhikary K, Bhushan S, Kumar S, Dutta K. Hybrid algorithm to detect DDoS attacks in VANETs. Wirel Pers Commun. 2020;114(4):3613–34. doi:10.1007/s11277-020-07549-y. [Google Scholar] [CrossRef]

17. Tariq U. Optimized feature selection for DDoS attack recognition and mitigation in SD-VANETs. World Elect Veh J. 2024;15(9):395. doi:10.3390/wevj15090395. [Google Scholar] [CrossRef]

18. Lekshmi V, Pramila RSuji, Tibbie Pon Symon VA. Defense mechanisms for vehicular networks: deep learning approaches for detecting DDoS attacks. Int J Adv Comput Sci Applicat. 2024;15(7):65. doi:10.14569/IJACSA.2024.0150765. [Google Scholar] [CrossRef]

19. Haydari A, Yilmaz Y. RSU-based online intrusion detection and mitigation for VANET. Sensors. 2022;22(19):7612. doi:10.3390/s22197612. [Google Scholar] [PubMed] [CrossRef]

20. Gu X, Wu Q, Fan P, Cheng N, Chen W, Letaief KB. DRL-based federated self-supervised learning for task offloading and resource allocation in ISAC-enabled vehicle edge computing. Digit Commun Netw. 2024. doi:10.1016/j.dcan.2024.12.009. [Google Scholar] [CrossRef]

21. Riley GF, Henderson TR. The ns-3 network simulator. In: Wehrle K, Güneş M, Gross J, editors. Modeling and tools for network simulation. Berlin/Heidelberg, Germany: Springer, Berlin Heidelberg; 2010. p. 15–34. doi:10.1007/978-3-642-12331-3_2. [Google Scholar] [CrossRef]

22. Behrisch M, Bieker L, Erdmann J, Krajzewicz D. SUMO–simulation of urban mobility: an overview. In: Proceedings of the SIMUL 2011, The Third International Conference on Advances in System Simulation; 2011 Oct 23–28; Barcelona, Spain. Red Hook, NY, USA: ThinkMind; 2011. [Google Scholar]

23. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57. doi:10.1613/jair.953. [Google Scholar] [CrossRef]

Cite This Article

APA Style

Muktar, B., Fono, V., Nouboukpo, A. (2025). Machine Learning-Based Detection of DDoS Attacks in VANETs for Emergency Vehicle Communication. Computers, Materials & Continua, 85(3), 4705–4727. https://doi.org/10.32604/cmc.2025.067733

Vancouver Style

Muktar B, Fono V, Nouboukpo A. Machine Learning-Based Detection of DDoS Attacks in VANETs for Emergency Vehicle Communication. Comput Mater Contin. 2025;85(3):4705–4727. https://doi.org/10.32604/cmc.2025.067733

IEEE Style

B. Muktar, V. Fono, and A. Nouboukpo, “Machine Learning-Based Detection of DDoS Attacks in VANETs for Emergency Vehicle Communication,” Comput. Mater. Contin., vol. 85, no. 3, pp. 4705–4727, 2025. https://doi.org/10.32604/cmc.2025.067733

BibTex EndNote RIS

Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Machine Learning-Based Detection of DDoS Attacks in VANETs for Emergency Vehicle Communication

Abstract

Keywords

References

Cite This Article

7449

1415

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link