Open Access
ARTICLE
Optimizing IoT-Driven Smart Cities with the Dynamic Leader Sibha Algorithm: A Novel Approach to Feature Selection and Hyperparameter Tuning
1 Information Sciences Department, College of Life Sciences, Kuwait University, Kuwait, Kuwait
2 Faculty of Artificial Intelligence, Delta University for Science and Technology, Mansoura, Egypt
3 College of Engineering, University of Bahrain, Sakhir, Bahrain
4 Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
5 Department of Communications and Electronics, Delta Higher Institute of Engineering and Technology, Mansoura, Egypt
6 Jadara Research Center, Jadara University, Irbid, Jordan
* Corresponding Author: Marwa M. Eid. Email:
(This article belongs to the Special Issue: Innovative Computational Models for Smart Cities)
Computer Modeling in Engineering & Sciences 2026, 147(1), 30 https://doi.org/10.32604/cmes.2026.079827
Received 29 January 2026; Accepted 16 March 2026; Issue published 27 April 2026
Abstract
The rapid growth of Internet of Things (IoT) technologies has transformed modern urban environments into complex smart cities, generating vast amounts of high-dimensional, heterogeneous data. Effectively analyzing this data is crucial for optimizing urban infrastructure, enhancing quality of life, and supporting sustainable development. However, smart city data presents significant challenges, including non-linear dependencies, noisy signals, and high dimensionality. To address these challenges, this study proposes the Dynamic Leader Sibha Algorithm (DLSA), a novel metaheuristic optimization technique inspired by the structured counting dynamics of the Sibha. The DLSA was applied to the Smart Cities Index dataset, leveraging copula functions to model complex, multivariate dependencies and enhance predictive accuracy. The baseline machine learning (ML) evaluation revealed that the ExtraTreesRegressor achieved the lowest mean squared error (MSE) of 0.007462409, highlighting its superior initial performance. Following feature selection using the binary Dynamic Leader Sibha Algorithm (bSiba), the average error was reduced to 0.373245769, significantly improving data quality and model efficiency. Subsequent ML evaluation after feature selection further reduced the MSE of the ExtraTreesRegressor to 0.00151927, reflecting the effectiveness of dimensionality reduction. Finally, hyperparameter optimization using the DLSA achieved a remarkable MSE ofKeywords
The concept of smart cities has emerged as a transformative approach to urban development, driven by rapid technological advancements and the increasing interconnectedness of modern societies [1]. At its core, a smart city leverages the power of the Internet of Things (IoT), big data, and advanced analytics to enhance urban living, optimize resource management, and improve the quality of life for residents. As global urban populations continue to grow, cities face mounting challenges, including traffic congestion, environmental degradation [2], resource scarcity, and infrastructure stress. These challenges have called for innovative and data-driven interventions that can strike a balance in terms of economic growth, the environment, and social well-being. In this regard, smart city technologies provide a promising direction to more efficient, robust, and citizen-centric urban surroundings [3].
The Smart Cities Index dataset used in this research constitutes an effort at quantifying [4], benchmarking and comparing the performance of cities along different crucial aspects, including Smart Mobility, Smart Environment, Smart Government, Smart Economy, Smart People, and Smart Living. This dataset contains diverse data obtained from IoT devices, government reports, and public infrastructure systems [5], and gives an overview of urban performance. The dataset indices describe the multi-dimensional aspect of smart cities, indicating the following aspects: transportation efficiency, environmental sustainability [6], governance transparency, economic vitality, educational quality, and overall quality of life. SmartCity_Index attribute is an aggregate metric that is used to integrate these disparate elements into one all-encompassing value, while also the SmartCity_Index_relative_Edmonton attribute particularly expressly measures the performance of Edmonton, Canada, against worldwide innovative city leaders [7].
Recent research has increasingly emphasized the importance of privacy-aware and interpretable feature selection mechanisms in smart city environments. For instance, Farooq et al. [8] proposed an interpretable federated learning framework for cyber intrusion detection in smart cities, integrating privacy-preserving feature selection with explainable AI techniques such as SHAP and LIME. Their study highlights a critical challenge in modern IoT-driven infrastructures: while high-dimensional data improves detection capabilities, centralized feature selection may compromise privacy and scalability in decentralized environments. By enabling distributed nodes to independently select relevant features before aggregating them into a global model, their approach achieved high detection accuracy while preserving interpretability and data confidentiality. These findings further reinforce the necessity of scalable and relevance-driven feature optimization strategies for complex smart city systems.
This data is critical in assessing smart city success, urban management, and improving citizens’ quality of life in realizing its potential [9]. However, the work with such complex datasets involves several serious challenges. The problem of heterogeneity of data is the first and foremost one [10]. Information that is part of smart cities is by default diverse and includes numeric, categorical, spatial, and temporal data. This heterogeneity is because there is a wide array of data sources, such as IoT sensors, economic reports, environmental monitoring systems, and administrative databases. Each data type has its specific processing techniques; thus, integrating and analyzing the data streams is relatively challenging [11].
High dimensionality is another serious challenge. The Smart Cities Index dataset contains a large number of attributes that describe various aspects of urban life, from economic indicators to environmental ones [12–14]. This high dimensionality can be disorienting for the traditional analytical methods and may result in over-fitting, model complexity, and poor generalization. Practical dimensionality reduction, therefore, using feature selection is a necessity to have reliable predictive modeling.
Recent advances in intelligent transportation systems have increasingly emphasized the importance of hybrid artificial intelligence models for improving traffic congestion prediction in smart cities. In this context, Alanazi et al. [15] proposed a Graph Neural Network-Assisted Lion Swarm Optimization framework that integrates graph-based learning with swarm intelligence to capture both spatial road connectivity and complex traffic dependencies more effectively. Their model combines feature extraction, graph-based relationship modeling, and the exploration–exploitation capabilities of Lion Swarm Optimization to enhance prediction performance in dynamic urban environments. The reported results demonstrated high predictive accuracy and reduced deviation error, highlighting the potential of combining deep learning and metaheuristic optimization for real-time congestion forecasting and adaptive traffic management in intelligent urban mobility systems.
Problems with the quality of data compound the analysis. Real-world data is usually incomplete, noisy, or inconsistent, resulting from the very essence of the activities of urban environments. For example, IoT sensors can result in the production of a massive stream of data. Still, the data collected may be unreliable because of hardware failures, communication failures, or environment disturbances. A good way of avoiding a possible source of bias or reduction of accuracy in models due to missing or inconsistent data. In addition, the field of smart cities entails time-changing data streams that require flexible models to adjust to altering data distributions without drastically affecting their performance [16].
Smart city indicators’ interdependency adds to the complexity. For instance, the economic vibrancy of a city could impact its governance systems and capacity to build sustainable infrastructure [17]. Similarly, an efficient public transportation system decreases traffic jams and affects the quality of air and life in general [18]. It is necessary to capture these non-linear, in many cases, hierarchical connections for precise modeling. Still, it requires smart analytical techniques, such as copulas, that can model complex dependence networks between several variables [19].
Following these challenges, this study’s main aim will be to develop an all-inclusive machine learning framework that can provide sufficient analysis of the Smart Cities Index dataset [20]. This framework strives to solve problems of high dimensionality, data heterogeneity, and feature interdependencies using an array of methods for advanced feature selection and hyperparameter tuning. More specifically, the study aims at preprocessing the dataset for the Smart Cities Index so as to tackle missing data, normalize features, and prepare the data for model training. It then uses the state-of-the-art metaheuristic algorithms for feature selection, which will attempt to choose the most relevant attributes for predictive modeling, simplifying the model and adding interpretability.
Moreover, the current study involves optimization of the hyperparameters of several machine learning models, which include ExtraTreesRegressor, DecisionTreeRegressor, KNeighborsRegressor, XGBoost, CatBoost, and Gradient Boosting to improve the accuracy of prediction. The regression performance of these models is assessed before as well as after the feature selection, by a wide range of regression metrics, including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Bias Error (MBE), Coefficient of Determination (
Moreover, the research intends to perform a comparative analysis to measure Edmonton’s position in the worldwide smart city setting by comparing it to the SmartCity_Index_relative_Edmonton attribute. Such analysis will benefit urban planners, policymakers, and technology developers in constructing more sustainable, effective and responsive cities. Through the solution of such objectives, the study hopes to contribute to the vast knowledge base on smart city analytics, bringing practical solutions to the complex problems facing modern centres of urbanization.
The key contributions of this work include:
• Integration of DLSA for Smart City Data Analysis: Expands the application of the Dynamic Leader Sibha Algorithm (DLSA) to smart city data, leveraging its hierarchical leader-follower framework for effective feature selection and hyperparameter optimization in complex, high-dimensional IoT datasets.
• Enhanced Dependency Modeling Using Copulas: Uses copula functions in a bid to incorporate intricacies, non-linearity in dependency in the Smart Cities Index dataset to make the information used in the determination of interrelationship among the various indicators of urban areas more accurate and generally increases predictive accuracy.
• Comprehensive Feature Selection and Dimensionality Reduction: The use of the binary Dynamic Leader Sibha Algorithm (bSiba), a feature-selection technique, drastically reduces data dimensionality and maintains important predictive features, making for more efficient and interpretable machine learning models.
• Optimized Machine Learning Pipeline: Builds a fully optimized pipeline for machine learning that incorporates feature selection and hyperparameter tuning to significantly improve predictive accuracy and computational performance.
• Extensive Comparative Analysis: Conducts a thorough benchmarking study, comparing the proposed methodology with several state-of-the-art optimization algorithms, including Harris Hawks Optimization (HHO), Grey Wolf Optimizer (GWO), Whale Optimization Algorithm (WAO), Biogeography-Based Optimization (BBO), Multiverse Optimization (MVO), Satin Bowerbird Optimizer (SBO), Firefly Algorithm (FA), Gravitational Search Algorithm (GSA), and Simulated Annealing Optimization (SAO), demonstrating the superior performance of the DLSA framework.
• Real-World Impact and Scalability: Demonstrates the scalability and practical applicability of the DLSA framework for smart city data analysis, providing valuable insights for urban planners, policymakers, and technology developers seeking to optimize IoT-driven urban systems.
• Foundation for Future Research: Establishes a robust foundation for future research in smart city optimization, encouraging the integration of advanced metaheuristic techniques for more accurate, efficient, and scalable urban data analysis.
Beyond methodological advancement, the practical implications of applying DLSA in smart city contexts are significant. Urban planners and policymakers increasingly rely on data-driven models to allocate resources, prioritize infrastructure investments, optimize energy consumption, and improve mobility systems. However, these decisions are often hindered by high-dimensional datasets, redundant indicators, and suboptimal model configurations that reduce reliability. By systematically selecting the most informative urban indicators and optimizing model parameters through DLSA, the proposed framework enhances predictive stability and interpretability, enabling more confident evidence-based planning. In practical terms, improved predictive accuracy translates into better forecasting of traffic congestion, more efficient energy distribution strategies, enhanced risk assessment for infrastructure planning, and more informed sustainability benchmarking. Therefore, the contribution of DLSA extends beyond theoretical optimization performance, offering a scalable decision-support mechanism that can assist policymakers in managing complex, interconnected urban systems.
The remainder of this paper is organized as follows. Section 2 presents a comprehensive literature review, providing context for the development of the Dynamic Leader Sibha Algorithm (DLSA) and its application to smart city data analysis. Section 3 details the Smart Cities Index dataset, including data preprocessing steps, exploratory data analysis, and the use of copula functions for dependency modeling. It also introduces the machine learning models and metaheuristic algorithms used for feature selection and hyperparameter optimization. Section 4 presents the empirical results, including baseline machine learning performance, feature selection outcomes, and post-optimization improvements. Section 5 discusses the findings in the context of smart city data analytics, highlighting the implications for urban planning and technology development. Finally, Section 6 concludes the paper with a summary of key contributions, practical implications, and potential directions for future research.
In the era of rapid urbanization, smart city solutions have come up to make the best of resource management, sustenance, and a better quality of life. To solve the problems in different smart city domains, different methodologies such as deep learning [21], optimization techniques [22], as well as feature selection algorithm [23] have been explored. In addition, security [24–26], traffic management [27] and energy optimization [28,29] are important research issues in smart city. Also, smart mobility solutions [30], cyber-physical security [31], as well as renewable energy integration [32] are becoming popular. At the same time, this literature review highlights some recent efforts in the research of smart cities and summarises their research about IoT integration, optimization and security neurotrips.
Beyond summarizing prior studies, a critical synthesis is necessary to distinguish methodological trends and persistent limitations across the literature. In particular, many smart-city optimization pipelines rely on swarm-based or physics-inspired metaheuristics whose search dynamics are governed by leader attraction and stochastic coefficients. While these mechanisms are effective in many settings, they may exhibit premature convergence, stagnation in local optima, and sensitivity to parameterization when faced with high-dimensional, heterogeneous IoT feature spaces and complex inter-variable dependencies. Accordingly, the central research challenge is not only to apply metaheuristics to smart-city tasks, but to develop optimization mechanisms that can better regulate exploration–exploitation behavior and yield stable convergence under real-world data complexity.
There have been quite a few studies on particular aspects of smart cities, such as time series forecasting [21], sustainability indicators [23], energy efficiency [28,29] and security concerns [24–26].
Predictive analytics for different domains is used, which is very important in smart cities, which is why time series forecasting is imperative for these smart cities. The UFO was reviewed in [21] on the topic of deep learning-based time series forecasting methods in smart cities based on IoT multivariate datasets. Like in [23], the study in the work above reveals the significance of feature selection in innovative, sustainable cities (SSC) using multiple-objective evolutionary algorithms to increase the predictive accuracy while reducing the computational overhead. Another work in [33] also analyzed data mining methods to tackle the problem of considerable datasets in urban settings.
Widespread use of optimization techniques has been made of smart cities. A comprehensive review was presented in [22] on an optimizing methodology for IoT-based innovative city applications using ant colony optimization, genetic algorithm and particle swarm optimization. Across the same period, reference [34] presented a stochastic optimization model for municipal solid waste management (MSWM) with transportation costs and recycling revenue to be minimized with metaheuristic algorithms. Furthermore, reference [35] examined fog computing-based resource allocation methods that increase efficiency in smart cities. However, despite the breadth of optimization applications in smart cities, much of the existing work remains focused on demonstrating applicability rather than critically examining optimizer limitations under high-dimensional learning settings. In several reported pipelines, exploration and exploitation are blended within a single update rule, and control parameters are often fixed or monotonically scheduled, which can reduce adaptability during different search stages. These patterns motivate the investigation of more structured update mechanisms that explicitly separate diversification, intensification, and population refinement in order to improve robustness and reduce stagnation effects in complex landscapes.
Intrusion detection and surveillance were studied extensively, forming a critical challenge related to security in smart cities. An improved metaheuristic and transfer learning-based intelligent crowd density classification model was proposed in [36]. It has also developed a novel energy theft detection model by utilizing a generative adversarial network and Namib beetle optimization for feature selection and attained greater accuracy than other methods developed thus far [28]. The main IDS models for improving the cyber security of smart cities developed in [24–26] were introduced to increase intrusion detection performance. In addition, reference [31] presented a methodology of ensemble model for SCADA systems that allows real-time security monitoring.
Smart cities are traffic congestion as well as energy management. A traffic management system that coordinates on the deep reinforcement learning utilizing the graph attention networks to control the traffic flow by real-time data processing is introduced in [27]. Furthermore, reference [29] discussed bright grid extensions and incorporating distributed energy management systems with artificial intelligence to improve efficiency. Moreover, reference [32] considered renewable energy sources like tidal energy for urban sustainability.
Fog computing has been adopted with the growing need for efficient data processing in smart cities. In [35], an improved resource allocation model based on the crow search algorithm for fog computing environments was proposed for scheduling the tasks and improving security. In particular, nature-inspired solutions by [30] were introduced in this study for dealing with large-scale data streams in innovative mobility applications.
In addition, recent research has explored the integration of Digital Twin technology with AI-driven optimization to enhance smart city services. Ullah et al. [37] investigated how digital twin frameworks can be leveraged to improve urban excellence through real-time monitoring, adaptive resource management, and intelligent decision-making. Their study highlights the importance of combining IoT data streams with simulation-based models to support city planners in optimizing infrastructure performance, energy efficiency, and service delivery. By emphasizing adaptability and interconnected system modeling, their work reinforces the necessity of advanced optimization mechanisms capable of handling dynamic, multi-dimensional urban environments.
Table 1 summarizes representative contributions across forecasting, feature selection, optimization, and security analytics in smart-city contexts. A key observation from this synthesis is that many studies emphasize domain performance (e.g., traffic flow control, energy monitoring, or intrusion detection) while providing limited critical discussion of the optimizer-level mechanisms that govern convergence stability, search diversity, and scalability. As a result, the literature still lacks a clear methodological consensus on how to design population-based optimizers that remain effective when simultaneously addressing feature redundancy, hyperparameter sensitivity, and dependency-rich urban indicators.
Research Gap and Our Contribution
A further gap concerns critical positioning: although a wide range of swarm-based and physics-inspired algorithms has been employed in smart-city applications, the literature rarely articulates mechanistic criteria for what constitutes a substantive algorithmic advance beyond incremental variations. In this context, meaningful advancement should be argued through (i) clearly stated limitations in prevailing update mechanisms, (ii) explicit design principles that address these limitations, and (iii) empirical evidence under standardized evaluation budgets. This motivates the need to present optimizer contributions not only as performance gains, but also as structured changes to search dynamics that improve stability and robustness.
Although there are enormous achievements in intelligent city optimization, several critical research gaps are left. Much work has been done on time series forecasting using the previous studies [21], feature selection methods used in the previous studies [23], and resource management methods used in the previous studies [28,29]. Many of these methods use traditional machine learning or optimization algorithms such as the genetic algorithm, particle swarm. However, such approaches commonly struggle with high-dimensional heterogeneous IoT data and different dependencies that are complex, non-linear and constantly updated in real time. In addition, there has been a great number of studies focused on the domain-specific applications, e.g., the traffic management [27], energy efficiency [28,29], or cybersecurity [24–26], which left the general challenges of multi-objective.
In addition, although an attempt has been made to include state-of-the-art data processing methods like deep learning for time series analysis [21] and hybrid models for energy theft detection [28], there are records of few studies which have successfully integrated feature selection and hyperparameter tuning in a unified scheme to analyse smart cities. This is especially important when considering the highly crucial need to decrease data dimensionality, advance the model interpretability, and increase the computational efficiency in large-scale urban environments.
To fill these gaps, this paper extends the Dynamic Leader Sibha Algorithm (DLSA) to intelligent city optimization, where a complete, end-to-end pipeline for analysis of IoT data is presented. The proposed method uniquely integrates state-of-the-art feature selection, hyperparameter optimization, and complex dependency modeling through copulas, which provides a strong replacement for standard metaheuristic methods. The proposed framework not only cuts computational redundancy but also considerably improves the predictive accuracy by integrating these components; The proposed framework proves to be an extremely scalable and flexible solution for real-time innovative city applications. This piece of work thus establishes a new benchmark for data-driven urban optimization, closing the gap between algorithm development theorization and its practical implementation.
From this perspective, the contribution of DLSA can be framed in terms of search-mechanism design rather than solely application-driven deployment. In contrast to common leader-driven swarms that typically use a single unified update rule with stochastic coefficients, DLSA is positioned as a structured optimizer whose search behavior is explicitly organized into distinct stages and regulated through dynamic leader refinement. This structured design targets the frequently reported issues of stagnation and premature convergence in high-dimensional spaces by maintaining search diversity while enabling intensified refinement around promising regions, thereby providing a clearer methodological rationale for why DLSA can improve stability and convergence behavior under complex smart-city learning settings.
The dataset used in this research is the dataset of Smart Cities Index [38], which covers an extensive number of indicators as to the performance of the smart cities globally. In its current form, the dataset contains 102 rows, where each row corresponds to a unique city observation, enabling a compact yet diverse cross-sectional representation of smart city performance. This dataset is essential in assessing urban development’s technological, social, economic, and environmental aspects. It leverages the data gathered from different Internet of Things (IoT) devices, administrative records, and public-based infrastructure systems to create a fine-grained-dimensional look of smart city workings. The cities included in the dataset span 36 countries, reflecting broad geographic diversity and heterogeneity in socio-economic and infrastructural conditions.
The dataset is cross-sectional in nature and represents a single snapshot of city performance rather than a longitudinal time series; consequently, the analysis focuses on inter-city variation at the observed reference point rather than temporal trends.
The dataset contains a number of the key attributes that represent various moments of urban performance:
• City—The name of the smart city, which is under analysis.
• Country—The country where the smart city is found.
• Smart_Mobility—An index constructed from evaluations of the city-wide public transportation system, information and communication technology (ICT) infrastructure, and over-all accessibility.
• Smart_Environment—An index through which the city shows its readiness to sustain the environment by activities such as monitoring of pollution and energy management.
• Smart_Government—An index that assesses the degree of open governance practices and open data initiatives as well as citizen involvement in decision-making.
• Smart_Economy—An index which measures economic vigor, productivity and encouragement of entrepreneurship and innovativeness.
• Smart_People—An index measuring social and cultural multiplicity, education systems and facilitating structures that add to the intellectual resources of the city.
• Smart_Living—An index that measures the quality of life, in aspects of healthcare services, social security, quality of housing, and comfortable livability.
• SmartCity_Index—Compilation of all the smart city indices into a single metric that expresses the extent of smartness of a given city.
• SmartCity_Index_Relative_Edmonton—This is a score which has been standardized so as to compare the city of Edmonton in Canada to other smart cities across the globe and to give a benchmark against which this relative performance of Edmonton can be measured.
For modeling and optimization, the non-numeric identifiers (City and Country) were treated as descriptive metadata, while the effective predictive feature space was derived from the six principal smart city dimensions (Smart_Mobility, Smart_Environment, Smart_Government, Smart_Economy, Smart_People, and Smart_Living), with SmartCity_Index serving as an aggregated reference metric.
Of particular importance is the SmartCity_Index and SmartCity_Index_relative_Edmonton attribute because they allow combining numerous aspects of urban functionality into one unified evaluating system to assess how well a city fits in the broad notion of a smart city. This dataset is developed through the use of globally recognized indices formalized for measuring the performance of innovative city initiatives and which show a broad scope of urban metrics for effective urban planning and management. In addition, the dataset provides an appropriate benchmarking context by including cities from multiple continents and a wide range of development profiles, which strengthens the external validity of comparative performance analysis.
The variety of attributes standing at the disposal of the dataset opens opportunities and poses challenges to data analysis. On the one hand, it supplies an immense resource for the understanding of the complex interaction of urban systems. On the one hand, the data is highly dimensional and heterogeneous and, therefore, requires expert preprocessing and feature selection to have accurate and trustworthy model predictions. Preliminary descriptive inspection indicates that the component indicators exhibit moderate dispersion and occasional skewness across cities, which motivates careful preprocessing and supports the use of dependency-aware modeling to preserve inter-feature relationships during downstream optimization.
Numerous machine learning models depend directly on effectively preprocessing the output of smart city data before they can be applied. The subsequent models are only as good as the preprocessing results. There were several important steps in the preprocessing pipeline of the Smart Cities Index dataset; all of them were aimed at solving particular issues of this complex and high-dimensional dataset. To prevent any potential data leakage, the dataset was first partitioned into training, validation, and testing subsets according to the predefined experimental protocol, and all preprocessing parameters were estimated exclusively from the training subset before being applied unchanged to the validation and test subsets.
At first, missing values in the data were investigated, which may occur because of sensor failure, data transmission problems, or incomplete administrative records. Not only will missing data mean less data to work with in training machine learning models, but such data can also introduce bias if not adequately addressed. In this study, the missing data were imputed in an appropriate context, so the resulting dataset was representative of underlying urban phenomena. For numerical attributes, imputation techniques were mean or median imputation. For the categorical attributes, the imputation methods were mode imputation, or more sophisticated approaches, k-nearest neighbors (KNN) imputation, depending on the mode of missingness. Importantly, the imputation statistics (e.g., mean, median, or mode values) and any KNN-based imputation fitting were computed using only the training subset and then applied to the validation and test subsets without recalibration.
Then, data normalization and scaling were used to standardize the numerical features’ range. Due to the variety of features observed within the dataset that comprises both absolute measures (such as economic indicators) and relative measures (such as SmartCity_Index_relative_Edmonton), it was essential to put all the features on an equal footing. This step minimizes the role of outliers and the influence of models attributed to properties with wider numerical gaps. Z-score normalization was applied, standardizing all the features where each feature had a zero average and a standard deviation of one. This approach reduces the effects of differing measurement scales and increases the ability to recognize patterns more accurately during model training. In practice, Min–Max scaling to the range [0, 1] was used as the primary normalization strategy in the experiments reported in this study, and the corresponding scaling parameters were estimated from the training subset and applied unchanged to validation and testing subsets.
For the preprocessing operation, detection and deletion of outliers were also important. The presence of outliers can significantly distort the results of machine learning algorithms, and hence the model’s accuracy drops while prediction error increases. The IQR method and z-score analysis were two statistical methods used in the course of this study to detect and eliminate extreme values from the dataset, further improving its quality. Outlier thresholds (e.g., IQR bounds and z-score cutoffs) were determined using the training subset only, and the same thresholds were then applied to the validation and test subsets to avoid incorporating evaluation-set information into preprocessing decisions.
Finally, feature encoding was carried out to transform categorical attributes to a numerical representation appropriate for machine learning algorithms. Due to the existence of categorical information, such as city/country names, this step was critical for changing qualitative information into a form that would allow its efficient use by the models. The one-hot encoding and label encoding methods were applied, depending on the nature of categorical variables and the machine learning algorithms’ requirements given in the study. For consistency, the encoding vocabulary (e.g., one-hot categories and label mappings) was learned from the training subset and then applied to the validation and test subsets using the same mappings. To effectively analyze smart cities data and optimize predictive modeling, a comprehensive workflow is required. Fig. 1 outlines the complete process, starting with the Smart Cities Dataset, which undergoes data preprocessing to handle missing values, normalization, feature scaling, outlier detection, and feature encoding. This processed data is then split into training (70%), Validation (15%), and testing (15%) sets to ensure robust model evaluation. The core of the approach lies in the bSiba feature selection mechanism, which interacts with a wide array of baseline models, including ExtraTreesRegressor, DecisionTreeRegressor, kNeighborsRegressor, XGBoost, CatBoost, and Gradient Boosting, to identify the most relevant features. The performance evaluation component provides critical feedback for refining the models and improving overall predictive accuracy, closing the loop for continuous learning and optimization.

Figure 1: Smart cities data processing and modeling framework.
These preprocessing processes combined changed the raw Smart Cities Index dataset to a high-quality dataset suitable for further feature selection and machine learning analysis. This foundation was paramount to the later stages of model training and optimization, where the insights drawn from the data are accurate and actionable.
Data analysis is integral to accessing the fundamental structures and associations in the Smart Cities Index dataset. This phase entails exhaustive data exploration in a bid to discover significant trends, correlations and structural features that may be used in informing the future machine learning development and optimization via the discovered features. Due to this dataset’s dimensional and complex nature, data analysis is successful only if it involves statistics, visualization, and domain-specific insights.
The comparative smart city performance of the selected cities is illustrated in Fig. 2, which presents six major dimensions: Smart Mobility, Smart Environment, Smart Government, Smart Economy, Smart People, and Smart Living. The Smart Mobility dimension reflects the relative development of transportation efficiency, accessibility, and intelligent mobility systems across the cities. The Smart Environment dimension highlights differences in environmental sustainability, ecological management, and resource-conscious urban practices. The Smart Government dimension captures the extent to which digital governance, administrative effectiveness, and public service innovation are embedded in urban management frameworks. The Smart Economy dimension represents variation in innovation capacity, entrepreneurship, and economic competitiveness among the cities. The Smart People dimension emphasizes the role of education, digital literacy, creativity, and human capital in shaping smart urban development. Finally, the Smart Living dimension reflects quality-of-life conditions, including safety, healthcare, housing, and general urban well-being. Taken together, these dimensions provide a multidimensional perspective on the strengths and weaknesses of each city in the broader context of smart city assessment.

Figure 2: Smart city index scores by dimension for the selected cities.
Fig. 3 presents the distribution of smart total scores across the evaluated observations, together with three descriptive statistical indicators: the mean, the median, and the range corresponding to one standard deviation around the mean. This visualization is useful for understanding the central tendency and dispersion of the data, as well as for identifying the overall shape of the score distribution. The histogram shows how the smart total scores are spread across different intervals, while the superimposed density curve provides a smoother representation of the distributional pattern. In addition, the vertical lines marking the mean and median enable a direct comparison between these two measures of central location, which is particularly important for assessing whether the distribution is approximately symmetric or slightly skewed. The shaded region representing

Figure 3: Distribution of smart total scores with mean, median, and standard deviation.
Understanding the relationship between the overall smart city index and the total smart scores is crucial for assessing the comprehensive performance of cities in their digital and sustainable development journey. Fig. 4 illustrates this relationship, providing insights into the correlation between these two critical metrics. The central scatter plot reveals a strong positive linear correlation, indicating that cities with higher smart index scores generally tend to have higher total scores. The marginal histograms further highlight the distribution of each variable, offering a clearer view of their individual spread.

Figure 4: Relationship between SmartCity_Index and Smart_Total.
In total, the data analysis phase allowed an in-depth understanding of the structure and dynamics of the Smart Cities Index dataset, which created a strong fundamental basis for the more advanced analyses performed through machine-learning methods. It was necessary to complete this stage to help identify the main drivers of smart city performance and to inform the feature selection and optimization steps at the heart of this work.
The precise modelization of intertwined, multivariant dependencies is an integral part of smart cities data analysis because there are many interrelated factors that condition the overall performance. The traditional correlation indexing like Pearson correlation coefficient frequently does not work well to describe the entire spectrum of relationships that exist in high dimensional data sets such as the Smart Cities Index. This is especially the case when the dependencies are not linear, asymmetric, or the distributions of urban data are heavy tailed, which is often typical for the data in the real world. In order to overcome the difficulties of modelling correlated multiple random variables, such as used in this study, a potent statistical tool for separating the marginal behavior of multiple random variables from their dependency structure, namely the copula functions, has been used.
Copulas offer a versatile structure for expressing the joint distribution of multivariate data, allowing expressing sophisticated dependency structures which the classical correlation-based methods may overlook. In terms of form, a copula is a multivariate cumulative distribution function (CDF) which is defined on the unit hypercube
When C is the copula function and the marginal distributions of the individual variables are
This study considered two copula families: Gaussian copulas and
While classical correlation measures (e.g., Pearson or Spearman coefficients) quantify linear association, they are often insufficient for capturing the complex and asymmetric dependency structures that characterize smart city systems. Urban indicators such as mobility efficiency, energy consumption, environmental quality, and governance performance are rarely independent and frequently exhibit non-linear, threshold-based, or tail-dependent relationships. For example, traffic congestion may increase gradually with population density up to a certain threshold, after which small increases in demand may produce disproportionately large impacts. Similarly, environmental stress indicators may jointly escalate during extreme events, exhibiting strong upper-tail dependence that linear correlation cannot adequately describe. Copula functions enable explicit modeling of such joint behaviors by decoupling marginal distributions from dependency structures, thereby allowing flexible representation of non-linear co-movements and extreme-value interactions. In the context of smart city analytics, this capability enhances the realism of multivariate modeling and supports more accurate predictive relationships across interconnected urban subsystems.
Gaussian Copula
The Gaussian copula is one of the copulas most used because of its mathematical simplicity and parameterization by the linear correlation matrix. It is defined as:
That is
t-Copula
To address this limitation, the t-copula was also evaluated. The t-copula, derived from the multivariate t-distribution, is defined as:
where
Copula Parameter Estimation
The estimation of the parameters of these copulas has two main steps: matching the marginal distributions and estimating the dependency structure. A preliminary estimation of the marginal distributions of the data sets in the Smart Cities Index was first done with the empirical distribution fitting techniques, such as kernel density estimation (KDE) and maximum likelihood estimation (MLE). After the marginals were laid down, the correlation matrix, R, was estimated using Kendall’s
All copula models calibrated well were assessed according to their ability to fit the dependency structures in the data by the log-likelihood function and information measures such as the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). This evaluation was critical for choosing the best copula for future regression and predictive modeling chores.
Integration of Copula Modeling within the Optimization Workflow
In the proposed framework, copula modeling is integrated into the analytical pipeline as a preprocessing enhancement step applied prior to model training and optimization. Specifically, the Gusain transformation is performed uniformly across the dataset to strengthen the multivariate dependency structure among smart city indicators before feature selection and hyperparameter tuning are conducted.
This transformed dataset is then used consistently across all comparative metaheuristic algorithms, including the DLSA-based optimizer. As a result, the feature selection process and hyperparameter optimization operate on a dependency-regularized representation of the data, which stabilizes inter-feature relationships and produces a smoother optimization landscape. This integration improves convergence behavior and reduces performance variance across repeated runs without altering the evaluation protocol.
Data Leakage Considerations
The Gusain copula-based transformation is unsupervised and operates solely on the joint distribution of input variables. It does not use target labels, model performance metrics, or validation/test feedback during its construction. Consequently, no label information is introduced into the preprocessing stage.
Following transformation, the dataset is partitioned into training, validation, and testing subsets according to the predefined experimental protocol. Model fitting, feature selection, and hyperparameter optimization are strictly performed using the training data, while validation and test sets remain isolated for performance assessment. Therefore, the copula-based preprocessing step does not introduce data leakage into the optimization process, and evaluation results reflect genuine generalization performance.
Impact of Copula Selection on Model Performance
The copula’s selection strongly impacts the accuracy and robustness of the obtained predictive models. However, in this study, t-copula outperformed producing a better estimation of the complex and, commonly, asymmetric relations between smart city indicators in the presence of outliers and extreme values. A better dependency modeling will increase the effectiveness of machine learning algorithms used in subsequent analysis phases, demonstrating a more realistic portrait of the interrelated dynamics inside wisdom city mechanisms. Accurately capturing the statistical characteristics of smart city components is essential for realistic data modeling and simulation. Fig. 5 presents the distribution comparisons for each smart city component, including Smart Mobility, Smart Environment, Smart Government, Smart Economy, Smart People, and Smart Living. Each subplot contrasts the original distribution (blue) with the copula-based synthetic distribution (orange), revealing the extent to which the copula model effectively captures the underlying data patterns. This comparison helps evaluate the quality of the synthetic data, ensuring it preserves critical features necessary for accurate downstream analysis.

Figure 5: Distribution of smart city components-original vs. copula.
Understanding the relationship between the overall smart city index and the total smart score is crucial for assessing the comprehensive performance of cities. Fig. 6 illustrates this relationship, presenting a strong positive linear correlation between the SmartCity_Index and Smart_Total. The central scatter plot captures the tight linear association, while the marginal histograms and density curves provide insights into the distribution of each variable, highlighting their spread and concentration. This combined visualization effectively communicates both the overall trend and the underlying distribution characteristics of these critical smart city metrics.

Figure 6: Relationship between SmartCity_Index and Smart_Total with marginal distributions.
Analyzing the relationship between the overall SmartCity_Index and individual smart city components is critical for understanding the influence of each dimension on overall city performance. Fig. 7 presents pairwise correlation plots for SmartCity_Index against Smart Mobility, Smart Environment, Smart Government, Smart Economy, Smart People, and Smart Living. Each subplot compares the original data (blue) with copula-based synthetic data (orange), highlighting the consistency of these relationships across real and generated datasets. This comparison helps validate the synthetic data generation process, ensuring it captures the essential patterns and dependencies needed for accurate predictive modeling.

Figure 7: Correlation between SmartCity_Index and smart city components (Original vs. Copula).
To conclude, the application of copulas in this study can be said to advance a potent paradigm for expressing the complex, non-linear associations that are an inseparable part of smart city data. Apart from increasing the interpretability of the resulting models, this approach vastly increases their predictive accuracy, making it an integral part of the overall analytical pipeline.
Machine learning (ML) is an integral part of the Smart Cities Index dataset analysis, as it represents the computational engine for detecting important patterns and making future predictions to determine decisive factors influencing urban performance. As a diverse dataset containing different socio-economic, environmental and governance indicators, the number of supervised learning algorithms used to extract linear and non-linear relationships within the data varied. This section introduces the machine learning models chosen for this work, each with its unique capabilities related to the analysis of high-dimensional and heterogeneous datasets.
Several considerations guided the choice of machine learning models: handling complex, structured data; interpretability; and computational efficiency. The following models were implemented:
ExtraTreesRegressor
The ExtraTreesRegressor, or Extremely Randomized Trees Regressor, is an ensemble learning technique used to construct multiple decision trees based on random subsets of training data [39]. Unlike common decision-tree algorithms, which optimize each step where the path is separated by utilizing information gain or Gini impurity, the ExtraTrees algorithm introduces further randomness by choosing random points at which the connection is to be divided. This way reduces variance and helps avoid over-fitting, especially in high-dimensional datasets such as the Smart Cities Index. The ExtraTreesRegressor also gains from the capacity for automatically handling heterogeneous data types and calculating feature importance scores, making it an effective predictive modeling mechanism and a feature selection tool.
DecisionTreeRegressor
The DecisionTreeRegressor is a non-parametric tree-based technique that partitions the input space of a dataset recursively into homogeneous subsets according to a given set of input features [40]. It builds a tree-shaped structure in which each internal node corresponds to a decision concerning a single feature, each branch corresponds to the consequence of the decision, and each leaf node corresponds to a predicted target value. Trees of decision are very interpretable and can represent complex, nonlinear relations in the data. Nonetheless, they suffer from overfitting, particularly when applied to high-dimensional data, without proper pruning or regularization. These limitations notwithstanding, the decision trees form the base building unit on which complex ensemble techniques such as ExtraTrees and gradient boosting algorithms are based.
KNeighborsRegressor
The KNeighborsRegressor is a non-parametric alternative that estimates using the distance approach in the feature space to the k nearest training samples [41]. It assumes that if inputs are similar, the outputs will also be similar, and is therefore best applied to datasets where inputs cluster locally or the decision boundary is smooth. The performance of this algorithm is highly sensitive to the choice of k (number of neighbors), the distance (metric) used (e.g., Euclidean, manual distance) and the type (scheme) of implementation of the neighbor weighting. Although straightforward in design, the KNeighborsRegressor is computationally costly for large data sets since all training samples must be calculated for predictions.
XGBoost
XGBoost (eXtreme Gradient Boosting) is an efficient, distributed gradient-boosting framework to offer high performance [42]. It develops an ensemble of decision trees sequentially such that each tree tries to correct the errors committed by the previous trees. XGBoost includes advanced features such as regularization (L1 and L2), multiprocessing and tree pruning by maximum depth or minimum child weight, making XGBoost one of the most potent algorithms for structured data. It is especially good at identifying complex non-linear interactions among features, and therefore, it is a method of choice for competitive machine learning problems.
CatBoost
CatBoost (Categorical Boosting) is an algorithm for gradient boosting uniquely targeted to process categorical features efficiently [43]. As opposed to the traditional methods of gradient boosting, which require a lot of pre-processing to convert the categorical variables into numerical form, CatBoost can use the categorical data as it is, which can help avoid overfitting, preserving the categorical relationships. It uses a new approach to compute leaf values and an underlying ordered boosting that prevents prediction shift during training. CatBoost has demonstrated good results in a plethora of applications, especially when it comes to applications with a large number of categorical features, characteristic of urban and demographic data.
Gradient Boosting
Gradient Boosting is a high-performing ensemble learning method that constructs Additive models by sequential fitting of fresh models on the residual errors of the previous ones [44]. On a model basis, every new model is trained to correct the errors of the ensemble and thus increase predictive accuracy overall. This method successfully combines low bias and variance, producing highly accurate models. Nevertheless, the technique of gradient boosting can have issues of overfitting in case of not controlling the model complexity with the help of the hyperparameter tuning and regularization. It also turns out to be computationally intensive and therefore unsuitable for real-time applications without appropriate optimization.
Model Selection and Rationale
The fact that these particular machine learning models provide complementary strengths guided their selection. Tree-based models such as ExtraTreesRegressor and DecisionTreeRegressor are good at revealing nonlinear relationships and selecting essential features, providing intuitive, instance-based learning with distance-based models such as KNeighborsRegressor. While gradient boosting models such as XGBoost and CatBoost provide strong ensemble power, they use both learning efficiency and computational resources to produce high predictive accuracy. This wide range of algorithms guarantees an all-round analysis of the Smart Cities Index dataset, bringing solid insights into urban systems’ complex, multi-dimensional character.
Combined, these machine learning models represent the fundamental analysis framework for this study, explaining the detailed factor analysis of smart city performance. In the following chapters, these models’ performance will be measured before and after feature selection, representing a robust analysis of their predictive performance in intelligent city analytics.
Analyzing high-dimensional datasets like the Smart Cities Index dataset and the high relevance of dimensionality reduction in such analysis is also critical for applications of metaheuristic algorithms. Properly selecting relevant features and fine-tuning model hyperparameters in these algorithms contribute to predictive accuracy and computational efficiency. Unlike the traditional methods of optimization, which depend, in many cases, on gradient-based methods or exhaustive search methods, metaheuristic algorithms are based on natural phenomena and evolution rules. These algorithms yield versatile, strong platforms for investigating intricate multi-dimensional search areas, making them very attractive for the non-convex multi-modal search domains found in machine learning. In practice, this relevance arises because smart-city indicators are often correlated, noisy, and partially redundant, so selecting a compact subset of informative variables can improve both statistical generalization and computational tractability.
Metaheuristics can escape local optima, perform exploration and exploitation with balance and adapt dynamically to evolved fitness landscapes. Such properties make them very suitable for tasks concerned with seeking near-optimal solutions in big, noisy, or poorly understood search areas. The application of metaheuristic algorithms in this study served two major purposes: (i) wrapper-based feature selection, and (ii) hyperparameter optimization under model-validation-driven fitness evaluation. Formally, these methods approximate global search by maintaining a population of candidate solutions and iteratively applying stochastic or semi-deterministic update operators that trade off diversification (exploration) and intensification (exploitation).
Role in Feature Selection
Feature selection is an essential preprocessing activity in machine learning, whose intention is to select the most important features from a given set of attributes while eliminating irrelevant and redundant elements that may contain noise that interferes with the learning process. This process is critical for minimizing models’ complexity and increasing interpretability and generalization performance. The high-dimensional datasets, e.g., the Smart Cities Index, tend to comprise hundreds of possible features, some of which may provide the model with minor or insignificant predictive information. Adding these unnecessary features may result in overfitting, additional computational cost, and model uninterpretability.
Metaheuristic algorithms find features very suitable because they efficiently search large, discrete search spaces and their prevention of local optima. These algorithms consider the task of feature selection as a problem of combinatorial optimization with the objective to pick the best subset of features through which to maximize model accuracy while minimizing complexity. This is often cast as an optimization problem wherein the best feature subset is to be identified, whose associated fitness function yields the minimum values generally defined in terms of prediction error, classification accuracy, or the combination of several performance metrics. In wrapper settings, the fitness function is typically computed via cross-validation or holdout validation, so the evaluation cost often dominates the total runtime.
Unlike traditional feature selection approaches, which depend on exhaustive search, greedy algorithms, or statistical tests, metaheuristics use stochastic exploration of the feature space. Such stochasticity and advanced searching strategies enable them to perform a better global exploratory search (large-scale search across the feature space) and local exploitative search (precise improvement of promising solutions). This balance is essential to steering clear of the traps of local minima and making it possible for the final feature subset to make a genuinely optimal representation of the data.
The capabilities of feature interactivity handling provided by the metaheuristic algorithms also apply in making effective feature selection. In real-life complex datasets, many of which are those used in intelligent city analytics, the predictive value of a particular feature typically relies on the existence or the lack of other features. Metaheuristic is uniquely positioned to explain these interactions, offering a more subtle, contextually accomplished method to feature selection than the classical, filter-based approaches.
In addition, applying metaheuristics to feature selection constitutes significant computational benefits. Such algorithms can thus sample the search space smartly and decrease the pool of candidate feature subsets that need further examination, thus cutting down the computational cost of feature selection considerably. This efficiency is especially critical in the case of high-dimensional datasets where the number of combinations for features is astronomically high.
Role in Hyperparameter Optimization
Apart from feature selection, hyperparameter optimization using metaheuristic search answers the question of fine-tuning the parameters of machine learning models to achieve maximized predictive accuracy and generalization. Hyperparameters refer to the configuration parameter that needs to be set before deployment unlike model parameters learned during training. These include essential settings of learning rate, number of estimators, tree depth, and regularization coefficients that enormously affect model performance.
Optimal hyperparameter optimization is essential to achieve the predictive potential of machine learning models. Nevertheless, such a process is usually complicated because of the size and multidimensionality of the search space, the complicated interplay between different hyperparameters, and the threat of overfitting. Conventional hyperparameter tuning procedures, such as grid or random search, can become very computationally expensive and inefficient, especially for large models with thousands of hyperparameters or continuous, discrete-valued hyperparameters.
Metaheuristic algorithms address these difficulties by using advanced population-based search schemes that evolve their search behavior according to feedback from the fitness landscape in real time as it occurs. These algorithms employ the mechanisms based on natural selection, swarm intelligence, or physical systems to traverse the hyperparameter space efficiently to strike the balance between global exploration and local fine-tuning. This strategy greatly minimizes the computational cost of hyperparameter optimization and enhances the probability of locating nearly optimal parameter settings. Moreover, because the hyperparameter objective surface is frequently non-smooth and non-convex, metaheuristics provide a robust alternative to gradient-based tuning in such settings.
In addition, metaheuristics are naturally parallelizable, which makes them appropriate for currently popular distributed computing environments where big-scale hyperparameter tuning can be conducted on different processors or computing nodes. This scalability makes them all the more attractive for real-world machine learning tasks, which frequently require rapid model tuning even to be competitive.
In this study, a set of models that include instances of tree-based algorithms, instance-based learners and gradient boosting algorithms were set up for their hyperparameter optimization using metaheuristic algorithms. This combination of feature selection and optimization of hyperparameters gave a complete method for tuning the models so that the final models were efficient and precise.
Applying metaheuristic algorithms for feature selection and hyperparameter optimization is a vital breakthrough in this work that makes it possible to assemble highly accurate, computationally cheap models to analyze smart city data. In the following sections, the results of this approach will be shown with the aim of presenting the large gains in the performance of the model that were brought about by this integrated optimization effort.
3.7 Proposed Optimizer: Dynamic Leader Sibha Algorithm (DLSA)
DLSA-the Dynamic Leader Sibha Algorithm is a new, hierarchical metaheuristic optimization technique, based on the structured counting dynamics of the “Sibha”—a traditional tool from Islamic practices, used for systematic counting. DLSA was presented as a powerful substitute for classical optimization algorithms [45], which overcome some of the most crucial problems, including premature convergence, poor scalability, insufficient exploring, etc., regarding high-dimensional and multimodal search spaces. Contrary to traditional metaheuristics that find it challenging to balance exploration and exploitation, the DLSA has a dynamic leader-follower setup with systematic control over conflicting demands of thorough search coverage and local refinement efficiency. However, to distinguish DLSA beyond an incremental leader–follower modification, it is essential to clarify that its update laws incorporate (i) an oscillation-regulated step control, (ii) an explicit historical-memory term, and (iii) a three-stage hierarchical operator pipeline (exploration–exploitation–elimination) that jointly shape its search dynamics.
Inspiration and Key Concepts
The idea for DLSA is based on the structured counting mechanism of Sibha, where the process of movement of beads is recurrent but structured like the systematic search in optimization problems. This ordered framework is a natural metaphor for hierarchical search methods for which the agents coordinate themselves in a collective motion directed by dynamically selected leaders, without random drifting or the danger of getting lost around local optima.
The fundamental novelty of DLSA comprises its dynamic agent search behavior adjustment according to the relative fitness and spatial location among the population. Such dynamic adjustment is significant for sustaining diversity in the population. It allows the algorithm to search a wide solution space and focus computational resources where the promise of the solution is highest. The hierarchical structure of DLSA allows for an effective agent coordination; the speed of convergence increases, as well as the quality of a solution. In contrast to classical leader-driven swarms such as PSO and hierarchical encircling schemes such as GWO, DLSA introduces a sinusoidal control mechanism for its step parameter and a structured elimination operator, which together provide an oscillatory, non-monotone search pressure designed to reduce stagnation and premature convergence.
Mathematical Foundation of DLSA
The mathematical formulation of DLSA is categorized into three major phases. Exploration, exploitation, and elimination. All the phases have specific functions in the optimization procedure to maintain a balanced search strategy that successfully explores the problems of complex, high-dimensional landscapes. Notably, this three-phase operator decomposition is part of the claimed originality because it explicitly separates diversification, intensification, and population “cleaning” into different update laws rather than relying on a single unified step equation.
Exploration Phase
The exploration phase is designed to promote global search, encouraging agents to probe diverse regions of the solution space. This phase is governed by the following position update equation:
where:
-
The leader’s position is further refined as:
where:
-
where
And the step size control parameter K is given by:
Such a formulation brings a structured and hierarchical method in exploring, giving the algorithm a high-dimensional landscape for efficient search capability with a broad search radius. The use of sinusoidal functions in determining K has a natural means of regulating search step sizes, improving the algorithm’s ability to escape local optima and search for diverse regions in the search space. This oscillatory regulation provides a theoretical distinction from common leader–follower metaheuristics that typically employ purely stochastic coefficients (e.g., PSO) or monotone coefficient decay (e.g., GWO).
Exploitation Phase
After the promising regions have been determined, the DLSA switches gears to local refinement, taking the regions around high-fitness solutions. This step is important for improving solution accuracy and achieving global optimality. The leader position is revised thus:
Let
where
-
This phase integrates both global and local search behaviors, allowing the algorithm to refine promising solutions while maintaining sufficient diversity to avoid local traps.
Elimination Phase
To further refine the search, the DLSA includes an elimination phase, where suboptimal solutions are pruned to concentrate computational resources on the most promising areas. This phase is governed by:
This hierarchical approach to elimination helps prevent the algorithm from becoming trapped in local optima, ensuring a more comprehensive search of the solution space. The elimination phase effectively “cleans” the population, removing poorly performing agents and reinforcing the focus on high-quality regions. Importantly, making elimination an explicit update stage (rather than an implicit byproduct of convergence pressure) is a conceptual differentiator relative to many leader–follower schemes.
Algorithm Pseudocode
To better understand the Dynamic Leader Sibha Algorithm (DLSA) and its defined structured leader-follower dynamics, a complete pseudocode for DLSA is provided in Algorithm 1. This algorithm’s initialization, exploration, exploitation, and elimination stages describe the hierarchical relationship of agents, which describes adaptive search mechanisms. To ensure full reproducibility and consistency with the mathematical formulation, the pseudocode explicitly defines the auxiliary references (

Theoretical Differentiation and Computational Complexity
To rigorously justify DLSA’s originality beyond incremental modification, it is useful to contrast its update structure with representative leader–follower metaheuristics. Classical PSO employs stochastic acceleration toward global/local best solutions through a velocity state, while GWO uses a fixed hierarchy (e.g.,
Time complexity. Let N denote the population size, T the maximum iterations, d the problem dimensionality, and
Space complexity. Storing the population requires
Advantages of DLSA
The DLSA has many notable benefits when compared to traditional metaheuristic algorithms.
• Efficient Exploration and Exploitation: The hierarchical leader-follower model offers search structuring such that global exploration and local exploitation are optimized better than the wholly stochastic processes.
• Reduced Premature Convergence: The dynamic leadership mechanism and regulated agent interactions reduce the probability of premature convergence and provide more credible optimization results.
• Scalability and Flexibility: DLSA is extremely scalable, suitable for all kinds of high-dimensional and multimodal optimization tasks, including those in smart city analytics.
• Robustness across Problem Domains: The algorithm delivers similarly effective performance in various benchmark problems from engineering design to feature selection and machine learning hyperparameter tuning.
Beyond these general advantages, it is important to emphasize that DLSA differs structurally from many classical swarm-based optimizers. Unlike algorithms that rely on a single unified stochastic update equation, DLSA explicitly separates exploration, exploitation, and elimination phases into a structured three-stage search mechanism. This separation enables adaptive control of diversification and intensification dynamics rather than depending solely on random coefficients or monotonic parameter decay. Within the scope of this work, the DLSA was used for feature selection and hyperparameter optimization to ensure a stable basis for developing accurate computationally efficient machine learning models. This novel algorithm is a considerable improvement to the field of metaheuristics, providing a promising alternative to customary optimization methods. From a methodological standpoint, the novelty of DLSA lies not merely in empirical performance improvements, but in its hierarchical leader refinement strategy and oscillation-regulated search behavior. These design elements provide an alternative to conventional leader-attraction models used in many swarm-based optimizers, offering a more structured balance between convergence speed and diversity preservation, particularly in high-dimensional feature and hyperparameter optimization tasks common in smart city analytics.
To comprehensively evaluate the performance of the proposed Dynamic Leader Sibha Algorithm (DLSA) in the context of smart city data analysis, a diverse set of benchmark metaheuristic models was selected. These models were chosen for their well-established effectiveness in solving high-dimensional, multimodal optimization problems, as well as their complementary search strategies, which collectively provide a robust baseline for comparison. The selected benchmark models include both swarm-based algorithms and physics-inspired approaches, reflecting a broad spectrum of optimization paradigms.
Harris Hawks Optimization (HHO)
Harris Hawks Optimization (HHO) works by copying how Harris hawks team up to catch their prey [46]. The algorithm acts like hungry hawks, making unexpected attacks while searching for the best solution. Sudden, unexpected changes in position generated by HHO help it break free from ‘stuck’ areas, making it suitable for challenging, high-dimensional problems. It uses three methods—sudden pounce, light siege, and severe siege—to change how it searches the problem space.
The algorithm models the escape behavior of prey as different actions, aiming to speed up the convergence of rabbits and wolves. HHO can achieve better results and stronger performance with different search methods than many common metaheuristics.
Grey Wolf Optimizer (GWO)
Grey Wolves Optimization (GWO) is designed to simulate the leadership and teamwork among wolves [47]. Search agents are grouped into four different roles, organized by importance. Alpha leads, beta is the backup, delta is closest to alpha, omega follows along, and everyone plays a role in improving the process. The algorithm identifies three major behaviors that explain how children and adolescents behave: Surrounding, hunting, and attacking the fittest individuals help the search find the best solution.
Since GWO is designed using layers, it is helpful for solving continuous optimization challenges since these problems depend heavily on reaching a good balance between fully exploring and making the best possible choices. Even so, GWO may not be able to quickly leave local optima in highly multimodal problems, due to relying greatly on where the alpha agent has moved.
Whale Optimization Algorithm (WAO)
The idea behind WAO comes from how humpback whales circle prey in a bubble net. Using the algorithm, the whale-shaped robot swims around its prey in whale-like spirals [48], combining direct hunting with exploring the world as a whole. The WAO makes use of only two key types of search techniques. Both encircling prey and bubble-net foraging are combined to achieve the best result.
Thanks to its simplicity and straightforward use, WAO is frequently selected by the users of various optimization techniques. As with others, a drawback of this algorithm is that improper settings can lead to premature convergence, especially if the search space is rugged or deceptive.
Biogeography-Based Optimization (BBO)
BBO is an optimization algorithm that borrows ideas from biogeography, the science that examines how animals or plants spread worldwide [49]. It treats candidate solutions as locations, some more suitable for solutions to thrive. Based on their success, the solutions move throughout these places. BBO uses two leading operators: In addition, migration causes similarities between habitats, while mutation gives rise to fresh changes in the population to keep it diverse.
BBO does this well by switching genetic information between fit and less fit solutions. Being able to migrate this way, the algorithm can avoid getting stuck at local solutions and keep a varied population, which suits its use in challenging, high-dimensional optimization challenges.
Multiverse Optimization (MVO)
Multiverse Optimization (MVO) is inspired by physics and the well-known black holes, white holes, and wormholes, which each mark different phases of the search cycle [50]. The algorithm shows how solutions and energy pass among different universes, where solutions with above-average fitness become high-quality sources and those below average drain out poor results. Usually, wormholes let the search process skip much of the space and jump to other promising regions rapidly.
Because MVO can be modified for many optimization problems, it is an effective solution for working with high-dimensional and multimodal data. At times, it can react sensitively to parameter changes and must be carefully set up for top performance.
Satin Bowerbird Optimizer (SBO)
The idea for the Satin Bowerbird Optimizer (SBO) comes from satin bowerbirds building striking structures to attract females. Using random exploration and attraction to interesting areas allows this algorithm to balance finding new places and using the best ones [51]. SBO was designed to handle multi-modal problems, where different parts of the search area contain different local peaks.
However, the lack of diversity among population members in SBO may result in fast convergence. Thus, methods for keeping exploration are necessary during the whole optimization process.
Firefly Algorithm (FA)
FA gets its ideas from the way fireflies create light to communicate with each other. The method follows the rule that fireflies are drawn to the brightest solutions, and their location shifts based on both their distance and the brightness they see [52]. Because of this attraction, the FA explores good areas of the search space well, and because of its random walk also looks at different areas.
Although it is good at many tasks, when working with significant problems, FA can consume a lot of computer resources and may not find the best answer because local optima are found everywhere in the landscape.
Gravitational Search Algorithm (GSA)
The Gravitational Search Algorithm is designed after the law of gravity and how moving masses are pulled by gravity. The method applies a fitness value to each solution that resembles the force of gravity between particles. How each mass moves depends on how other masses are distributed, since fitter and heavier solutions pull the most [53].
While GSA is excellent at searching a wide range of solutions globally, it can become time-consuming to run, mainly because of the many fitness checks and distance measures required when many solutions are involved.
Simulated Annealing Optimization (SAO)
It comes from metallurgy, where annealing means using the heating-and-cooling method to make a material more stable. It uses probabilities to help get out of local optima by slowly reducing the chance it picks worse solutions while searching. Since convergence depends on temperature, SAO ensures it avoids premature results. However, you may have to wait for SAO to converge in large search spaces [54].
Selection Rationale for Benchmark Models
To ensure diverse optimization results could be checked, these benchmark models were chosen for what they do best together. Having algorithms with several search approaches, the study thoroughly determines what DLSA is capable of, emphasizing how it works fast, keeps up with different data, and yields accurate results.
Importantly, the inclusion of these diverse benchmark models allows a mechanism-level comparison rather than a purely performance-based comparison. While many benchmark optimizers share leader-driven or attraction-based update rules, DLSA introduces a hierarchical refinement process combined with explicit elimination dynamics. This structural distinction enables clearer regulation of search pressure across iterations, which is particularly relevant for smart city datasets characterized by multicollinearity, heterogeneous scales, and multimodal fitness landscapes. In addition to optimization diversity, the selection of underlying machine learning models for evaluation was guided by the structural characteristics of the Smart Cities Index dataset. Smart city indicators typically exhibit non-linear relationships, heterogeneous feature distributions, and potential multicollinearity across composite indices such as mobility, governance, economy, and environment. Therefore, ensemble-based tree models such as ExtraTreesRegressor and boosting-based frameworks (e.g., XGBoost and CatBoost) were included due to their strong capability to model complex non-linear interactions without strict parametric assumptions. ExtraTreesRegressor, in particular, was selected because its randomized splitting strategy and implicit feature sub-sampling improve robustness in high-dimensional tabular datasets, which are common in smart city analytics. Boosting-based models such as XGBoost and CatBoost were incorporated as high-performance baselines widely adopted in structured data prediction tasks, providing a strong empirical reference for evaluating optimization gains. Furthermore, simpler learners such as DecisionTreeRegressor and KNeighborsRegressor were included to ensure that performance improvements are not limited to a specific model family. The inclusion of both ensemble and non-ensemble learners enables a comprehensive assessment of whether DLSA-driven feature selection and hyperparameter optimization yield consistent improvements across diverse predictive paradigms relevant to urban data analysis.
A set of well-known evaluation metrics was applied to assess the performance of the Dynamic Leader Sibha Algorithm (DLSA). Because of these metrics, The results demonstrate how well the models function and can adjust to different kinds of data in smart cities. Assessing model accuracy uses regression metrics, while checking how well features are picked involves statistical metrics.
Regression Performance Metrics
The primary goal of regression analysis is to minimize prediction errors and maximize the alignment between predicted and actual values. The following Table 2 summarizes the key regression metrics used in this study:

Feature Selection Metrics
Besides measuring how accurate the models were, the effectiveness of feature selection was also evaluated using a unique set of statistical metrics. They provide values that show how accurate and simple the selected subsets are. A summary of the feature selection metrics is presented in the following Table 3.

Together, these evaluation metrics provide a comprehensive framework for assessing the effectiveness of the DLSA and its benchmark counterparts in both predictive accuracy and feature selection, ensuring a thorough comparison of their respective strengths and weaknesses.
This section presents the findings obtained by applying the proposed Dynamic Leader Sibha Algorithm (DLSA) and the benchmark algorithms to the Smart Cities Index dataset. The assessment process is organized into four essential stages. I looked at the initial performance of machine learning, the results of selecting features, the results following feature selection, and the performance after optimizing hyperparameters. Table 4 presents an overview of the study’s algorithm’s initial parameters.

Table 5 summarizes the hyperparameter search space explored for the ExtraTreesRegressor (ETR) model together with the general configuration adopted for the metaheuristic optimization framework. For the ETR model, critical structural and regularization parameters—including the number of estimators, maximum depth, splitting criteria, and feature selection strategy—were optimized within predefined bounded intervals to balance predictive accuracy and model complexity. The selected configuration corresponds to the best-performing solution identified during the optimization process. In parallel, the metaheuristic algorithm operated under controlled computational constraints, including population size, iteration budget, and bounded search limits. These ranges ensure fairness in comparison across optimization algorithms while maintaining computational feasibility and preventing excessive overfitting.

It is important to clarify that Min–Max normalization was applied to both the input features and the target variable, scaling all values to the range [0, 1]. Consequently, the reported MSE values are computed in the normalized space. Since the maximum possible squared error under this scaling is 1, error magnitudes on the order of
To ensure an equitable and reproducible comparison among the benchmark optimizers (HHO, GWO, WAO, BBO, MVO, SBO, FA, GSA, and SAO) and the proposed DLSA, all algorithms were executed under a unified configuration and evaluation protocol. The algorithm-specific control parameters and shared population-based settings are reported in Table 4, while the common optimization environment (including search-space bounds, dimensionality, objective function definition, and stopping configuration) is summarized in Table 5. Consequently, all optimizers were allocated the same computational budget and were evaluated using the same learning model, the same data splitting/cross-validation strategy, and the same performance criteria. Therefore, the reported performance differences reflect genuine differences in optimizer search dynamics rather than unequal tuning or non-standardized experimental settings.
4.1 Baseline Machine Learning Performance
The initial goal was to check the action of basic machine learning methods before performing feature or hyperparameter selection. At this point, a basic performance indicator is created to judge the results gained from improvement strategies. The results for this baseline evaluation are presented in Table 6. The metrics used to assess model performance include MSE, RMSE, MAE, MBE, r,

From the results, it is evident that the ExtraTreesRegressor achieved the best overall performance, with the lowest MSE (0.007462409), RMSE (0.086385236), and MAE (0.042999376), as well as the highest

Figure 8: Parallel coordinates plot of model performance metrics.
Evaluating the predictive performance of machine learning models across multiple error metrics is crucial for identifying the most effective algorithms. Fig. 9 presents a comparative analysis of six prominent regression models, including ExtraTreesRegressor, DecisionTreeRegressor, KNeighborsRegressor, XGBoost, CatBoost, and Gradient Boosting. The models are assessed using key performance metrics such as the correlation coefficient (r), R-squared (

Figure 9: Model comparison-MSE, RMSE, MAE
To enhance and simplify the predictive models, a phase of selecting the best features was included by using several binary metaheuristic algorithms, for example, the bSiba algorithm and the reliable bHHO, bGWO, bWAO, bBBO, bMVO, bSBO, bFA, bGSA, and bSAO.
The biggest goal in this stage was to identify the most valuable and meaningful attributes for predicting smart city performance, making the data easier to use and boosting the model’s efficiency. The information from the feature selection procedure is listed in Table 7, where you will see Average Error, Average Select Size, Average Fitness, Best Fitness, Worst Fitness, and Standard Deviation Fitness.

As shown in Table 7, the bSiba algorithm demonstrated superior performance, achieving the lowest Average Error (0.373246) and the smallest Average Select Size (0.330603), indicating its effectiveness in identifying a compact and highly informative feature subset. Additionally, bSiba achieved the best overall fitness score, reflecting its ability to balance dimensionality reduction with predictive accuracy. In contrast, bWAO and bSBO exhibited higher average errors and larger selected feature sizes, indicating less efficient feature selection.
Evaluating the comprehensive performance of feature selection algorithms is essential for identifying the most effective methods in machine learning. Fig. 10 presents a stacked metrics comparison for a range of feature selection algorithms, including bSiba, bHHO, bGWO, bWAO, bBBO, bMVO, bSBO, bFA, bGSA, and bSAO. The stacked bars capture multiple performance metrics, including average error, average select size, average fitness, best fitness, worst fitness, and standard deviation fitness. This visualization provides a holistic view of each algorithm’s performance, effectively highlighting their strengths and weaknesses across diverse evaluation criteria, facilitating informed algorithm selection.

Figure 10: Stacked metrics comparison of feature selection algorithms.
Comparing multiple performance metrics across different feature selection algorithms provides a comprehensive view of their relative strengths and weaknesses. Fig. 11 presents a radar plot comparing ten feature selection algorithms, including bSiba, bHHO, bGWO, bWAO, bBBO, bMVO, bSBO, bFA, bGSA, and bSAO, across key metrics such as average fitness, average select size, average error, best fitness, worst fitness, and standard deviation fitness. This multi-dimensional visualization effectively captures the trade-offs involved in selecting the optimal algorithm, highlighting those that achieve a balanced performance across diverse criteria.

Figure 11: Radar plot of feature selection algorithm performance metrics.
Model interpretability plays a central role in understanding how individual predictors influence the output of complex machine learning models. In this study, SHapley Additive exPlanations (SHAP) are employed to quantify the contribution of each smart city dimension to the model’s predictions. SHAP values, grounded in cooperative game theory, provide a consistent and theoretically justified framework for attributing feature importance by distributing the prediction outcome among input variables according to their marginal contributions. This approach ensures local accuracy, consistency, and missingness properties, making it particularly suitable for interpreting nonlinear and ensemble-based models.
Fig. 12 presents the SHAP violin plot illustrating the distribution of feature impacts across all observations. The violin representation not only conveys the magnitude of the SHAP values but also reflects their density distribution, thereby capturing both central tendency and variability. Features are ordered according to their overall importance, with wider distributions indicating greater influence on the model output. The color gradient encodes the original feature values, allowing simultaneous interpretation of directionality and magnitude of effect. For instance, the Smart_Living dimension demonstrates the widest spread of SHAP values, indicating a substantial and variable impact on predictions, whereas Smart_Economy exhibits comparatively narrower dispersion, suggesting a more moderate contribution.

Figure 12: SHAP violin plot showing the distribution of feature impacts across smart city dimensions.
This visualization enables a comprehensive assessment of how high and low feature values affect the prediction outcome, offering insights into both positive and negative contributions. Such interpretability is essential in smart city analytics, where policy decisions benefit from transparent and explainable modeling frameworks.
While global feature importance provides an overall ranking of predictors, understanding how individual feature values influence model predictions requires a more granular, local perspective. To this end, SHAP dependence plots are employed to examine the relationship between each smart city dimension and its corresponding SHAP value, thereby revealing potential nonlinear effects, interaction patterns, and threshold behaviors. Dependence plots visualize how changes in a feature’s value translate into positive or negative contributions to the model output, while simultaneously accounting for interactions with other features through color encoding.
Fig. 13 illustrates the SHAP dependence plots for each smart city dimension, namely Smart_Mobility, Smart_Environment, Smart_Government, Smart_Economy, Smart_People, and Smart_Living. In each subplot, the horizontal axis represents the feature value, whereas the vertical axis denotes the corresponding SHAP value, indicating the magnitude and direction of influence on the prediction. The color gradient reflects the interacting feature, enabling the identification of synergistic or moderating effects among dimensions. Additionally, the highlighted red star marks the position of Edmonton, providing a city-specific interpretation within the broader distribution of observations.

Figure 13: SHAP dependence plots illustrating how individual smart city features affect model predictions, with Edmonton highlighted as a reference case.
These plots collectively demonstrate how increases or decreases in specific smart city indicators affect model predictions across different regimes. For example, monotonic trends suggest a consistently positive or negative relationship, whereas curved or clustered patterns indicate nonlinear effects and interaction-driven behavior. By situating Edmonton within each dependence plot, the analysis offers an interpretable comparison between local and global effects, supporting evidence-based insights for urban performance assessment and policy evaluation.
4.3 Regression Performance after Feature Selection
Following the feature selection phase, the machine learning models were re-evaluated to assess the impact of reduced feature sets on predictive performance. The results, presented in Table 8, demonstrate substantial improvements in model accuracy across all evaluated metrics.

Notably, the ExtraTreesRegressor continued to outperform other models, achieving a significantly reduced MSE (0.00151927), RMSE (0.038977812), and MAE (0.009563331) after feature selection. The model also maintained a high
Conversely, the Gradient Boosting model, despite some improvement, remained the weakest performer, reflecting the inherent challenges of applying gradient boosting to high-dimensional, non-linear datasets without extensive tuning.
Evaluating the overall performance of machine learning models across multiple error and accuracy metrics is essential for selecting the most suitable algorithm for predictive tasks. Fig. 14 presents a radar chart comparing six widely used regression models. The chart captures a range of performance metrics, including mean squared error (mse), root mean squared error (rmse), mean absolute error (mae), mean bias error (mbe), correlation coefficient (r), and R-squared (

Figure 14: Radar chart for model metrics.
Assessing the normality of error and performance metrics is a crucial step in validating the assumptions of many statistical models and machine learning algorithms. Fig. 15 presents the Q-Q (Quantile-Quantile) plots for a range of metrics, including mean squared error (mse), root mean squared error (rmse), mean absolute error (mae), mean bias error (mbe), correlation coefficient (r), R-squared (

Figure 15: Q-Q plots for all metrics.
4.4 Regression Performance after Hyperparameter Optimization
The final phase of the analysis involved hyperparameter optimization, aimed at fine-tuning the selected models to maximize predictive accuracy. This phase utilized the bSiba algorithm in combination with the best-performing base model, ExtraTreesRegressor, as well as several other benchmark optimizers, including HHO, GWO, WAO, BBO, MVO, SBO, FA, GSA, and SAO.
The results of this optimization phase are presented in Table 9. The bSiba + ExtraTreesRegressor combination achieved the best overall performance, with a remarkably low MSE (

Other optimizers, including HHO, GWO, and WAO, also demonstrated strong performance, though they were generally outperformed by the bSiba algorithm in terms of both accuracy and stability. This outcome underscores the critical role of advanced metaheuristic optimization in achieving state-of-the-art predictive performance in smart city analytics.
Overall, the empirical results indicate that the proposed bSiba algorithm, in combination with the ExtraTreesRegressor, consistently outperformed other optimization approaches across all evaluation phases. This superior performance can be attributed to the algorithm’s ability to efficiently navigate high-dimensional feature spaces, identify the most relevant attributes, and fine-tune hyperparameters for maximum predictive accuracy. These findings validate the effectiveness of the DLSA framework for complex, real-world data analysis, providing a robust foundation for future smart city applications. Understanding the relationship between key error metrics is essential for assessing the overall performance and stability of machine learning models. Fig. 16 presents a contour plot with scatter overlay, illustrating the joint distribution of mean absolute error (mae) and root mean squared error (rmse). The contour levels capture the density of data points, providing insights into the concentration of models with similar error characteristics. This type of visualization is particularly useful for identifying clusters of high-performing models and detecting potential outliers, facilitating more informed model selection and optimization.

Figure 16: Contour plot with scatter overlay: MAE vs. RMSE.
Evaluating the distribution and variability of key performance metrics across different models is essential for understanding their stability and reliability. Fig. 17 presents box plots with horizontal swarm overlays for a range of metrics, including mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), mean bias error (MBE), correlation coefficient (r), R-squared (

Figure 17: Box plot with horizontal swarm plot for metrics.
Cross-Validation and Generalization Analysis
The empirical evaluation was conducted using a statistically rigorous validation framework to ensure reliability and reproducibility of the reported findings. The dataset was partitioned using a 70%/15%/15% Train–Validation–Test split to prevent information leakage and enable unbiased assessment of generalization performance. Hyperparameter optimization was performed using 10-fold cross-validation repeated five times, thereby reducing variance caused by random data partitioning and improving stability of model selection. All metaheuristic algorithms were executed for 200 epochs per run, ensuring identical convergence opportunities under equal computational budgets.
Table 10 presents the comparative cross-validation and generalization performance of the Siba-based optimizer relative to several established metaheuristic algorithms when integrated with the ExtraTreesRegressor model. The evaluation includes training, validation, and testing

The comparative convergence behavior of all evaluated optimization algorithms is illustrated in Fig. 18. The figure presents the training and validation

Figure 18: Training and validation learning curves and performance comparison across optimization algorithms.
Statistical Significance Analysis
To verify that the observed performance differences among the evaluated metaheuristic algorithms are statistically significant, both parametric and non-parametric statistical tests were conducted. A one-way ANOVA test was applied to assess global differences across all algorithms, followed by Wilcoxon signed-rank tests to examine pairwise discrepancies relative to the Siba-based model.
The ANOVA results in Table 11 indicate a highly significant treatment effect (

The Wilcoxon signed-rank results in Table 12 confirm that the performance discrepancies between the Siba-based model and all competing optimizers are statistically significant (

Fig. 19 illustrates the residual plot obtained from the ordinary one-way ANOVA conducted on the experimental dataset. Each point corresponds to the residual associated with an observation across the evaluated groups. The absence of clear trends, clustering, or curvature in the residual distribution suggests that the ANOVA model provides an adequate representation of the underlying data. Conversely, any visible patterns would imply the need for model refinement or the adoption of alternative statistical techniques.

Figure 19: Residual plot showing the distribution of ANOVA residuals across observations for model adequacy assessment.
A fundamental assumption underlying ordinary one-way Analysis of Variance (ANOVA) is homoscedasticity, which requires that the variances of the residuals are approximately equal across all compared groups. Violation of this assumption may lead to biased F-statistics and unreliable inferential conclusions. Consequently, prior to interpreting ANOVA outcomes, it is essential to visually and statistically assess whether the spread of residuals remains consistent across experimental conditions or model configurations.
Fig. 20 depicts the homoscedasticity plot associated with the ordinary one-way ANOVA performed on the experimental data. Each point represents the distribution of residuals for a given group, enabling direct visual inspection of variance patterns. A relatively uniform vertical dispersion of points across groups indicates that the homogeneity of variance assumption is reasonably satisfied. Conversely, systematic patterns or funnel-shaped distributions would suggest heteroscedasticity and the potential need for alternative statistical tests or variance-corrected ANOVA procedures.

Figure 20: Homoscedasticity plot illustrating the residual variance distribution across groups for the ordinary one-way ANOVA.
In order to rigorously examine whether the observed differences among the hybrid optimization models are statistically significant, a one-way Analysis of Variance (ANOVA) test is conducted. The ANOVA framework is widely used in experimental studies to determine whether multiple independent groups differ significantly in terms of their mean performance values. Unlike simple pairwise comparisons, one-way ANOVA evaluates all competing methods simultaneously, thereby reducing the risk of Type I error and providing a global assessment of statistical differences across algorithms.
Fig. 21 illustrates the heat map representation of the ordinary one-way ANOVA results obtained from the experimental dataset. The color intensity encodes the magnitude and direction of the statistical differences among the compared hybrid models. Specifically, variations in color indicate the relative strength of performance differences across benchmark functions, while the scale bar quantifies the corresponding statistical deviations. Such visualization facilitates an intuitive interpretation of comparative dominance, sensitivity to problem characteristics, and robustness of each hybrid model.

Figure 21: Heat map visualization of ordinary one-way ANOVA results for the hybrid optimization models across benchmark functions.
Evaluating the effectiveness of different optimization algorithms requires a comprehensive statistical analysis that captures both central tendency and variability of performance across multiple independent runs. Due to the stochastic nature of metaheuristic algorithms, single-run results are insufficient to draw reliable conclusions. Therefore, boxplot representations are widely adopted in the literature, as they provide a compact yet informative visualization of the distribution of objective function values, including the median, interquartile range, and potential outliers. Such graphical tools enable a robust comparison of algorithmic stability, convergence behavior, and solution quality.
Fig. 22 presents the statistical comparison of the considered algorithms based on their performance over repeated experiments. The boxplots illustrate the dispersion characteristics and highlight differences in robustness and optimization capability among the competing methods. In particular, the median line indicates the typical solution quality achieved by each algorithm, while the spread reflects the consistency of convergence across independent trials. Algorithms with lower median values and smaller interquartile ranges demonstrate superior exploitation ability and greater stability.

Figure 22: Statistical comparison of optimization algorithms using boxplot representation over multiple independent runs.
Computational Efficiency and Resource Utilization
To evaluate computational efficiency, an empirical runtime analysis was conducted under identical hardware and software conditions. All optimizers were executed using the same population size, iteration budget, and evaluation protocol to ensure fairness. Table 13 reports the average execution time, standard deviation of runtime, memory usage, and CPU utilization across repeated runs.

As shown in Table 13, the proposed Siba-based optimizer achieves the lowest average execution time (12.47), the smallest runtime variability (Std = 0.38), and reduced memory and CPU consumption compared to competing metaheuristic optimizers. These results indicate that the proposed approach is computationally efficient in practice and scales favorably under identical experimental conditions.
Efficient utilization of computational resources is a fundamental criterion for assessing the practicality and robustness of hybrid metaheuristic optimization algorithms. In particular, memory consumption and CPU utilization jointly reflect the scalability, computational overhead, and real-world deployability of such algorithms, especially when additional enhancement mechanisms such as ETR are incorporated. As shown in Fig. 23, a comprehensive comparative analysis is conducted to evaluate both memory usage and CPU consumption across a range of hybrid optimization methods integrated with ETR.

Figure 23: Comparative analysis of resource usage for hybrid optimization algorithms combined with ETR: (left) memory consumption in megabytes and (right) CPU utilization in percentage.
The left subfigure in Fig. 23 presents the memory usage comparison, offering insight into the relative memory overhead introduced by each hybrid approach. This analysis highlights how algorithmic design choices and hybridization strategies influence memory requirements, which is particularly critical for large-scale optimization tasks and resource-constrained computing environments. In contrast, the right subfigure illustrates CPU usage, providing a detailed perspective on the computational intensity and processing demands associated with each hybrid configuration. Together, these results enable a holistic evaluation of resource efficiency, supporting informed algorithm selection by balancing optimization performance against computational cost.
Understanding the relative performance characteristics of hybrid optimization algorithms requires not only direct metric comparison but also higher-level pattern analysis that reveals similarities, trade-offs, and grouping behavior among methods. To this end, Fig. 24 provides a comprehensive visual assessment of algorithmic performance using both clustering-based visualization and hierarchical grouping techniques, enabling a deeper interpretation of multi-metric evaluation results.

Figure 24: Algorithm performance analysis: (left) performance clustering heatmap; (right) hierarchical clustering dendrogram illustrating similarity relationships among hybrid algorithms integrated with ETR.
The left subfigure in Fig. 24 presents a performance clustering heatmap that summarizes normalized values of execution time, standard deviation, memory consumption, and CPU utilization across the evaluated hybrid algorithms combined with ETR. By mapping these metrics onto a color-coded matrix, the heatmap facilitates intuitive identification of performance trends, highlighting algorithms that consistently demonstrate lower computational cost or, conversely, higher resource demands. This visualization enables rapid comparative assessment and supports the identification of efficiency patterns across multiple performance dimensions simultaneously.
Complementarily, the right subfigure in Fig. 24 illustrates the hierarchical clustering of the same performance indicators. The dendrogram groups algorithms according to similarity in their overall computational profiles, thereby revealing structural relationships and performance proximity among hybrid approaches. This hierarchical representation assists in distinguishing closely related algorithmic behaviors from distinctly performing methods, offering additional analytical depth beyond direct metric comparison. Together, the two visualizations provide a unified and multi-perspective understanding of algorithm performance.
Computational efficiency is a central performance indicator in the evaluation of metaheuristic optimization algorithms, particularly when hybrid enhancement mechanisms such as ETR are incorporated. Execution time directly reflects algorithmic complexity, convergence behavior, and implementation overhead, all of which influence the suitability of a method for large-scale or time-sensitive applications. In this context, Fig. 25 presents a comparative analysis of execution times across the considered hybrid metaheuristic models.

Figure 25: Execution time comparison of metaheuristic algorithms with standard deviation (mean
A comparative analysis of execution time across different metaheuristic algorithms is illustrated in Fig. 25. This figure presents the average execution time, along with standard deviation, for each algorithm when integrated with the ETR model, providing insight into their computational efficiency and consistency. By incorporating error bars, the figure not only highlights the mean execution time but also captures the variability in runtime, which is critical for evaluating algorithm stability under repeated executions.
The results demonstrate clear differences in computational cost among the considered algorithms. Some methods exhibit significantly lower execution times, indicating faster convergence and reduced computational overhead, while others require more processing time, reflecting increased algorithmic complexity or more extensive search processes. The progression of execution time across algorithms suggests a trade-off between computational expense and potential optimization performance, where more sophisticated algorithms may achieve improved results at the cost of longer runtimes.
Additionally, the standard deviation values provide valuable insight into the reliability of each algorithm’s execution time. Algorithms with smaller deviations indicate more consistent performance, whereas larger deviations suggest variability that may affect predictability in real-world deployments. This aspect is particularly important in time-sensitive applications such as 6G network slicing, where both efficiency and reliability are essential.
Overall, the figure offers a comprehensive perspective on the computational characteristics of metaheuristic algorithms, supporting informed decision-making in selecting methods that balance execution speed, stability, and optimization effectiveness.
The reported execution time, memory usage, and CPU consumption across all benchmark optimizers provide indirect evidence of scalability behavior. Since all algorithms were evaluated under identical population sizes and iteration counts, the comparatively lower computational overhead of DLSA suggests stable performance as search-space dimensionality increases. These results indicate that DLSA maintains computational efficiency without disproportionate growth in resource consumption, supporting its scalability for structured smart city datasets.
Ablation Study
To isolate the contributions of the major components of the proposed pipeline, an ablation study was performed under the same dataset split and evaluation protocol used throughout this work. The ablation is designed to quantify how predictive performance changes when moving from (i) conventional regression baselines without copula modeling and without optimization to (ii) an optimization-driven pipeline where metaheuristic algorithms tune the ExtraTreesRegressor hyperparameters without applying feature selection. This staged analysis provides an interpretable decomposition of the performance gains and clarifies which parts of the pipeline are primarily responsible for the observed improvements.
Reference Models without Copula and without Optimization
Table 14 reports the results of several widely used regression models trained directly on the dataset without copula-based preprocessing and without metaheuristic optimization. These models provide a conservative reference point and enable quantifying the added value of optimization-based model tuning.

As shown in Table 14, the strongest baseline performance is obtained by the ExtraTreesRegressor, which achieves

Figure 26: Dendrogram for hierarchical clustering of regression models based on performance similarity.
Fig. 26 presents a dendrogram constructed using distance-based hierarchical clustering. The vertical axis represents the linkage distance, which quantifies the dissimilarity between models, while the horizontal axis lists the evaluated regression algorithms. Models that merge at lower distance values demonstrate higher similarity in their performance metrics, indicating closely related predictive behavior. Conversely, branches that join at larger distances reflect more distinct computational or predictive characteristics. This visualization provides an intuitive interpretation of inter-model relationships and supports a structured understanding of how ensemble methods and tree-based approaches compare in terms of overall performance behavior.
A comprehensive evaluation of regression model performance requires not only direct comparison of raw metric values but also a normalized assessment that highlights relative improvements across multiple criteria. In this regard, Fig. 27 presents an improvement ratio matrix computed relative to the best-performing model for each evaluation metric. This normalization strategy enables consistent cross-metric comparison by scaling performance values within the range

Figure 27: Improvement ratio matrix of regression models relative to the best-performing model per metric.
Fig. 27 illustrates the relative performance of the evaluated models across error-based metrics (MSE, RMSE, MAE, MBE), correlation-based indicators (
Optimization without Feature Selection
Table 15 presents the results obtained when metaheuristic optimization is used to tune the ExtraTreesRegressor hyperparameters, while feature selection is intentionally disabled. This setting isolates the contribution of hyperparameter optimization and enables a direct comparison across optimizers under an identical objective function and computational budget.

The comparison in Table 15 demonstrates that hyperparameter optimization is a dominant driver of performance gains. Relative to the untuned ExtraTreesRegressor baseline (Table 14), the best optimized configuration (Siba + ExtraTreesRegressor) improves
Beyond the best-performing optimizer, all compared metaheuristics yield consistent improvements over the non-optimized baselines, with
Taken together, Ablation A and Ablation B provide a clear decomposition of pipeline effects. The baseline results establish that conventional models provide only moderate predictive fidelity, while the optimization-only setting demonstrates that a major proportion of the observed performance gain originates from hyperparameter tuning. The remaining pipeline components (copula-based dependency modeling and feature selection), when activated in the full proposed framework, provide additional refinement and stability as discussed in the main comparative results. A reliable evaluation of predictive model performance requires not only the analysis of central tendency but also an assessment of variability across repeated experiments. Averaged metric values alone may obscure instability or sensitivity in model behavior; therefore, incorporating dispersion measures is essential for robust comparison. In this context, Fig. 28 provides a consolidated view of model performance by presenting mean values alongside their corresponding error bars for multiple evaluation metrics.

Figure 28: Comparison of model evaluation metrics using mean values and error bars.
Fig. 28 illustrates the mean performance of the evaluated models across error-based metrics (MSE, RMSE, MAE, and MBE) and goodness-of-fit and efficiency indicators (
Statistical validation of model evaluation metrics is essential to ensure the reliability of subsequent comparative and inferential analyses. In particular, assessing whether metric values follow an approximately normal distribution is a key prerequisite for the application of many parametric statistical methods and for the meaningful interpretation of central tendency and dispersion measures. In this regard, Fig. 29 provides a comprehensive normality assessment of all considered performance metrics using quantile–quantile (Q–Q) plots.

Figure 29: Q–Q plots for all evaluation metrics, comparing empirical quantiles with theoretical normal quantiles to assess distributional characteristics and normality assumptions.
Fig. 29 illustrates individual Q–Q plots for error-based metrics (MSE, RMSE, MAE, and MBE), correlation and goodness-of-fit indicators (
This research shows that the Dynamic Leader Sibha Algorithm (DLSA) can tremendously improve analytics in IoT-powered smart cities. Integrating strong techniques for choosing essential elements and setting up parameters helped the proposed framework overcome the issues facing real urban datasets, leading to better predictions. The achievement deserves recognition because of the complicated and interconnected elements that make up smart city data. Enhancing the models with copula functions supported a clearer understanding of urban performance. Importantly, these gains are not solely attributable to model choice; rather, they emerge from the coordinated interaction between (i) feature selection that reduces redundancy and noise, (ii) hyperparameter optimization that aligns model capacity with data complexity, and (iii) dependence-aware modeling that better reflects the joint behavior of urban variables.
One interesting result from this study is the excellent improvement in making correct predictions seen with the feature selection and optimization approach by DLSA. Achieving improved machine learning results required the identification of useful features and the fine-tuning of a wide range of hyperparameters. First, the ExtraTreesRegressor performed nearly as well as other models. Then, by following the bSiba process, it achieved greater accuracy. It points out that cutting down on data and sharpening model settings is essential for analyzing data in smart cities. From a learning-theoretic perspective, the observed improvements are consistent with the bias–variance trade-off: removing weakly informative predictors can lower variance and improve generalization, while hyperparameter tuning can prevent underfitting or excessive complexity. The improvement-ratio perspective also suggests that the proposed pipeline yields more balanced performance across complementary criteria (error magnitude, goodness-of-fit, and efficiency-style indices), which is particularly valuable when smart-city decisions must remain reliable under changing operational conditions.
Furthermore, comparing the DLSA framework with Harris Hawks Optimization (HHO), Grey Wolf Optimizer (GWO), Whale Optimization Algorithm (WAO), Biogeography-Based Optimization (BBO), Multiverse Optimization (MVO), Satin Bowerbird Optimizer (SBO), Firefly Algorithm (FA), Gravitational Search Algorithm (GSA), and Simulated Annealing Optimization (SAO) showed that DLSA performs better. The fact that the DLSA-made models achieve a better MSE and larger
This result demonstrates that innovative algorithm design is essential for modern urban analytics. While traditional optimization methods may stop too quickly and are often difficult to use, the DLSA’s hierarchical leader-follower system makes sure the search is robust by dynamically managing exploration and exploitation. With its systematic use of agents, this approach reduces the danger of local optima and makes convergence much faster, making it ideal for IoT use in smart cities. In practical terms, the dynamic leader mechanism can be interpreted as an adaptive control policy over the search process: when the population begins to homogenize, leadership changes can reintroduce diversity; when promising regions emerge, leader guidance can intensify exploitation. Such behavior is especially beneficial in high-dimensional feature/hyperparameter spaces, where premature convergence is a common failure mode for many metaheuristics.
A further consideration is computational cost. Resource-usage evidence (memory and CPU) and execution-time variability are not merely implementation details; they shape the feasibility of real-time or near-real-time smart-city services. The reported comparisons suggest that DLSA can deliver accuracy improvements without disproportionate overhead, but the presence of variability (e.g., standard deviations in runtime) highlights that stability and reproducibility should be evaluated alongside mean performance. Accordingly, practitioners should select deployment configurations by jointly considering predictive gains and computational constraints, particularly for edge or fog settings where memory and CPU budgets are limited.
From a statistical validation standpoint, distributional diagnostics (e.g., Q–Q plot behavior across metrics) also matter for interpretation. When metric distributions deviate from normality, relying solely on parametric summaries may understate uncertainty; therefore, complementing mean comparisons with robust statistics or non-parametric tests would further strengthen inferential confidence. These diagnostics are especially relevant in smart-city datasets, which often exhibit non-stationarity, heavy tails, and seasonal effects that can influence both optimization dynamics and model evaluation.
The ability of the DLSA to adapt and expand also makes it an attractive option for many future applications in smart cities, including predictive maintenance, energy management, handling traffic, and planning for resilience. Thanks to its high-dimensional data handling, the algorithm helps urban planners, policymakers, and technology developers make cities more sustainable and able to withstand challenges. In these settings, the framework’s modularity is a practical advantage: the same DLSA-driven feature/hyperparameter optimization layer can be paired with different learners (tree ensembles, boosting, or even deep models) and different dependence structures (including alternative copula families) depending on domain requirements. Moreover, integrating streaming or incremental learning would make the approach better suited to continuously evolving IoT environments, where concept drift can otherwise erode model reliability over time.
Despite the encouraging results, several limitations deserve attention. First, performance may depend on dataset-specific characteristics (sensor density, missingness patterns, and spatial autocorrelation), so multi-city validation would improve generalizability. Second, the sensitivity of DLSA to its own control parameters should be studied through ablation analyses to determine which components drive the largest gains. Third, fairness and robustness considerations (e.g., performance across neighborhoods with unequal sensor coverage) are important for responsible smart-city decision support and should be explicitly evaluated in future work.
It is important to clarify the scope of scalability claims in relation to real-time IoT deployment. The Smart Cities Index dataset used in this study is static and evaluated in a batch-learning setting rather than in a live streaming environment. Therefore, while the reported execution time, memory usage, and CPU utilization demonstrate computational feasibility and controlled overhead, the framework was not experimentally validated under continuous data-stream conditions. In this context, scalability should be interpreted as structural and computational compatibility with real-time or near-real-time applications, rather than as direct empirical validation in a streaming IoT infrastructure. Future work may extend the proposed framework toward online, incremental, or sliding-window learning scenarios to explicitly evaluate performance under dynamic urban data flows.
To conclude, this research illustrates that advanced metaheuristic optimization and IoT data analytics, thanks to DLSA, can significantly change the planning and management of smart cities. Using an advanced, efficient approach to feature selection and hyperparameter tuning, the DLSA helps develop a new standard for intelligent city analytics, enabling smarter and data-driven urban areas. Overall, the evidence supports DLSA as a promising optimization backbone for urban predictive systems, particularly when the objective is to achieve robust, multi-metric performance under realistic computational constraints and complex variable dependencies.
In this study that DLSA has strong potential to improve how IoT data is analyzed in smart cities. Thanks to features of advanced selection, optimizing hyperparameters, and using copulas to model dependencies, the proposed framework achieved much better predictions than the baseline at faster computational speeds. Results from empirical testing show that the DLSA outshines traditional optimization approaches with several evaluation metrics, making data-driven urban management and planning more solid. Its effectiveness shows that the algorithm can handle tough smart city data and remove usable information from IoT sources.
It is important to acknowledge that the empirical validation presented in this study is based on a single publicly available dataset (Smart Cities Index). While the results demonstrate strong performance improvements under controlled experimental conditions, broader generalizations to all IoT-driven smart city environments should be interpreted with caution. Differences in sensor density, feature distributions, data quality, and urban infrastructure characteristics may influence optimization behavior across datasets. Therefore, although the proposed framework exhibits promising scalability and robustness within the evaluated setting, comprehensive multi-dataset validation remains a necessary direction for future research to fully establish cross-domain generalizability.
The plan for ongoing projects is to make the DLSA framework able to support multi-objective optimization, process real-time data, and handle simulations of large urban environments. By including explainable AI approaches in the DLSA pipeline, there is the potential for improving how interpretable the DLSA models become for urban planners and policymakers. As smart cities develop, the DLSA will help them adapt readily to new information, necessary for the growth of smart cities.
Acknowledgement: Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2026R754), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.
Funding Statement: The authors received no specific funding for this study.
Author Contributions: The authors confirm contribution to the paper as follows: Conceptualization, Safaa Zaman and Ebrahim A. Mattar; methodology, El-Sayed M. El-Kenawy; software, El-Sayed M. El-Kenawy; validation, El-Sayed M. El-Kenawy and Marwa M. Eid; formal analysis, Doaa Sami Khafaga; investigation, Marwa M. Eid; resources, Doaa Sami Khafaga; data curation, Safaa Zaman; writing—original draft preparation, Safaa Zaman and Marwa M. Eid; writing—review and editing, Ebrahim A. Mattar and Doaa Sami Khafaga; visualization, Marwa M. Eid and El-Sayed M. El-Kenawy; supervision, El-Sayed M. El-Kenawy; project administration, El-Sayed M. El-Kenawy. All authors reviewed and approved the final version of the manuscript.
Availability of Data and Materials: Data openly available in https://www.kaggle.com/datasets/magdamonteiro/smart-cities-index-datasets.
Ethics Approval: Not applicable.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Bibri SE, Krogstie J, Kaboli A, Alahi A. Smarter eco-cities and their leading-edge artificial intelligence of things solutions for environmental sustainability: a comprehensive systematic review. Environ Sci Ecotechnol. 2024;19:100330. doi:10.1016/j.ese.2024.100330. [Google Scholar] [CrossRef]
2. Costa DG, Bittencourt JCN, Oliveira F, Peixoto JPJ, Jesus TC. Achieving sustainable smart cities through geospatial data-driven approaches. Sustainability. 2024;16(2):640. doi:10.3390/su16020640. [Google Scholar] [CrossRef]
3. Jacques E, Júnior AN, Paris SRD, Francescatto MB, Siluk JCM. Smart cities and innovative urban management: perspectives of integrated technological solutions in urban environments. Heliyon. 2024;10(6):e27850. doi:10.1016/j.heliyon.2024.e27850. [Google Scholar] [PubMed] [CrossRef]
4. Okonta ED, Vukovic V. Smart cities software applications for sustainability and resilience. Heliyon. 2024;10:e32654. doi:10.1016/j.heliyon.2024.e32654. [Google Scholar] [PubMed] [CrossRef]
5. Veloso A, da Fonseca FP, Ramos RAR. Insights from smart city initiatives for urban sustainability and contemporary urbanism. Smart Cities. 2024;7:3188–209. doi:10.3390/smartcities7060124. [Google Scholar] [CrossRef]
6. Vespasiano F, Gujrati T, Abbasi B, Bisegna F. Integrated smart city solutions: a multi-axis approach for sustainable development in Varanasi. Sustainability. 2025;17(7):3152–2. doi:10.3390/su17073152. [Google Scholar] [CrossRef]
7. Javidroozi V, Carter C, Grace MJ, Shah H. Smart, sustainable, green cities: a state-of-the-art review. Sustainability. 2023;15(6):5353. doi:10.3390/su15065353. [Google Scholar] [CrossRef]
8. Farooq MS, Saleem M, Khan MA, Khan MF, Siddiqui SY, Aslam MS, et al. Interpretable federated learning model for cyber intrusion detection in smart cities with privacy-preserving feature selection. Comput Mater Contin. 2025;85(3):5183–206. doi:10.32604/cmc.2025.069641. [Google Scholar] [CrossRef]
9. Cesario E. Big data analytics and smart cities: applications, challenges, and opportunities. Front Big Data. 2023;6:1149402. doi:10.3389/fdata.2023.1149402. [Google Scholar] [PubMed] [CrossRef]
10. de Castro Paes V, Pessoa CHM, Pagliusi RP, Barbosa CE, Argôlo M, Lima Y, et al. Analyzing the challenges for future smart and sustainable cities. Sustainability. 2023;15(10):7996. doi:10.3390/su15107996. [Google Scholar] [CrossRef]
11. Abu-Rayash A, Dincer I. Development and application of an integrated smart city model. Heliyon. 2023;9(4):e14347. doi:10.1016/j.heliyon.2023.e14347. [Google Scholar] [PubMed] [CrossRef]
12. Tijjani KS, Levent YS, Levent T. Smart cities in the global context: geographical analyses of regional differentiations. Systems. 2025;13(4):296. doi:10.3390/systems13040296. [Google Scholar] [CrossRef]
13. Lin X, Prabowo A, Razzak I, Xue H, Amos M, Behrens S, et al. A gap in time: the challenge of processing heterogeneous IoT point data in buildings. arXiv:2405.14267. 2024. [Google Scholar]
14. Zhao YF, Xie J, Sun L. On the data quality and imbalance in machine learning-based design and manufacturing—a systematic review. Engineering. 2025;45(18):105–31. doi:10.1016/j.eng.2024.04.024. [Google Scholar] [CrossRef]
15. Alanazi MD, Elsayed G, Alanazi TM, Sahbani A, Yousef A. Graph neural network-assisted lion swarm optimization for traffic congestion prediction in intelligent urban mobility systems. Comput Model Eng Sci. 2025;145(2):2277–309. doi:10.32604/cmes.2025.070726. [Google Scholar] [CrossRef]
16. Ebrahimi S, Arik SO, Dong Y, Pfister T. LANISTR: multimodal learning from structured and unstructured data. arXiv:2305.16556. 2023. [Google Scholar]
17. Zhang Z, Ren S, Qian X, Duffield N. Towards invariant time series forecasting in smart cities. arXiv:2405.05430. 2024. [Google Scholar]
18. Al-Maliki S, Bouanani FE, Abdallah M, Qadir J, Al-Fuqaha A. Addressing data distribution shifts in online machine learning powered smart city applications using augmented test-time adaptation. IEEE Internet Things Mag. 2024;7(4):116–24. doi:10.1109/iotm.001.2300135. [Google Scholar] [CrossRef]
19. Alrasheedi AF, Alnowibet KA, Saxena A, Sallam KM, Mohamed AW. Chaos embed marine predator (CMPA) algorithm for feature selection. Mathematics. 2022;10(9):1411. doi:10.3390/math10091411. [Google Scholar] [CrossRef]
20. Cheng X. A comprehensive study of feature selection techniques in machine learning models. SSRN Electron J. 2024 [cited 2025 Feb 26]. Available from: https://ssrn.com/abstract=5154947. [Google Scholar]
21. Papastefanopoulos V, Linardatos P, Panagiotakopoulos T, Kotsiantis S. Multivariate time-series forecasting: a review of deep learning methods in internet of things applications to smart cities. Smart Cities. 2023;6(5):5. [Google Scholar]
22. Syed AS, Sierra-Sosa D, Kumar A, Elmaghraby A. Making cities smarter—optimization problems for the IoT enabled smart city development: a mapping of applications, objectives. Constraints Sens. 2022;22(12):12. doi:10.3390/s22124380. [Google Scholar] [PubMed] [CrossRef]
23. Almutairi MS. Evolutionary multi-objective feature selection algorithms on multiple smart sustainable community indicator datasets. Sustainability. 2024;16(4):4. [Google Scholar]
24. Gadekallu TR, Kumar N, Baker T, Natarajan D, Boopathy P, Maddikunta PKR. Moth-Flame Optimization based ensemble classification for intrusion detection in intelligent transport system for smart cities. Microprocess Microsyst. 2023;103(6):104935. doi:10.1016/j.micpro.2023.104935. [Google Scholar] [CrossRef]
25. Gill KS, Dhillon A. A hybrid machine learning framework for intrusion detection system in smart cities. Evol Syst. 2024;15(6):2005–19. doi:10.1007/s12530-024-09603-7. [Google Scholar] [CrossRef]
26. Kayode Saheed Y, Harazeem Abdulganiyu O, Ait Tchakoucht T. A novel hybrid ensemble learning for anomaly detection in industrial sensor networks and SCADA systems for smart city infrastructures. J King Saud Univ-Comput Inf Sci. 2023;35(5):101532. doi:10.1016/j.jksuci.2023.03.010. [Google Scholar] [CrossRef]
27. Dhanvijay MM, Patil SC. Energy efficient deep reinforcement learning approach to control the traffic flow in IoT networks for smart city. J Ambient Intell Humaniz Comput. 2024;15(12):3945–61. doi:10.1007/s12652-024-04869-w. [Google Scholar] [CrossRef]
28. Chahardoli M, Osati Eraghi N, Nazari S. An energy consumption prediction approach in smart cities by CNN-LSTM network improved with game theory and Namib Beetle Optimization (NBO) algorithm. J Supercomput. 2025;81(2):403. doi:10.1007/s11227-024-06811-5. [Google Scholar] [CrossRef]
29. Rojek I, Mikolajewski D, Dorozynski J, Dostatni E, Mrela A. An ML-based solution in the transformation towards a sustainable smart city. Appl Sci. 2024;14(18):18. doi:10.3390/app14188288. [Google Scholar] [CrossRef]
30. Qureshi KN, Ahmad A, Piccialli F, Casolla G, Jeon G. Nature-inspired algorithm-based secure data dissemination framework for smart city networks. Neural Comput Appl. 2021;33(17):10637–56. doi:10.1007/s00521-020-04900-z. [Google Scholar] [CrossRef]
31. Ullah A, Anwar SM, Li J, Nadeem L, Mahmood T, Rehman A, et al. Smart cities: the role of Internet of Things and machine learning in realizing a data-centric smart environment. Comp Intell Syst. 2024;10(1):1607–37. doi:10.1007/s40747-023-01175-4. [Google Scholar] [CrossRef]
32. Zhang H, Feng X. Reliability improvement and landscape planning for renewable energy integration in smart cities: a case study by digital twin. Sustain Energy Technol Assess. 2024;64(3):103714. doi:10.1016/j.seta.2024.103714. [Google Scholar] [CrossRef]
33. Kousis A, Tjortjis C. Data mining algorithms for smart cities: a bibliometric analysis. Algorithms. 2021;14(8):8. doi:10.3390/a14080242. [Google Scholar] [CrossRef]
34. Akbarpour N, Salehi-Amiri A, Hajiaghaei-Keshteli M, Oliva D. An innovative waste management system in a smart city under stochastic optimization using vehicle routing problem. Soft Comput. 2021;25(8):6707–27. doi:10.1007/s00500-021-05669-6. [Google Scholar] [CrossRef]
35. Chen S, Hu X. Economic analysis of smart city infrastructure upgrades for sustainable development modeling in digital twin: hybrid fog technique to improve system reliability. Sustain Energy Technol Assess. 2024;67(1):103786. doi:10.1016/j.seta.2024.103786. [Google Scholar] [CrossRef]
36. Ahmad S. Intelligent crowd density classification using improved metaheuristics with transfer learning model on smart cities. SN Comput Sci. 2024;5:1064. doi:10.1007/s42979-024-03435-7. [Google Scholar] [CrossRef]
37. Ullah I, Noor A, Abbas M, Garg S, Choi BJ, Hassan MM, et al. Optimizing smart city services by utilizing appropriate characteristics of digital twin for urban excellence. Alexandria Eng J. 2025;122(2):399–410. doi:10.1016/j.aej.2025.02.085. [Google Scholar] [CrossRef]
38. Monteiro M. Smart cities index datasets. 2019 [cited 2025 May 18]. Available from: https://www.kaggle.com/datasets/magdamonteiro/smart-cities-index-datasets. [Google Scholar]
39. Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn. 2006;63(1):3–42. doi:10.1007/s10994-006-6226-1. [Google Scholar] [CrossRef]
40. Rokach L, Maimon O. Decision trees. In: Maimon O, Rokach L, editors. Data mining and knowledge discovery handbook. Boston, MA, USA: Springer; 2005. p. 165–92. [Google Scholar]
41. Cover T, Hart P. Nearest neighbor pattern classification. IEEE Trans Inform Theory. 1967;13(1):21–7. doi:10.1109/TIT.1967.1053964. [Google Scholar] [CrossRef]
42. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016 Aug 13–17; San Francisco, CA, USA. p. 785–94. doi:10.1145/2939672.2939785. [Google Scholar] [CrossRef]
43. Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. CatBoost: unbiased boosting with categorical features. arXiv:1706.09516. 2019. doi:10.48550/arXiv.1706.09516. [Google Scholar] [CrossRef]
44. Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Statist. 2001;29(5):1189–232. doi:10.1214/aos/1013203451. [Google Scholar] [CrossRef]
45. El-kenawy ESM, Alhussan AA, Khafaga DS, Alharbi AH, Alzakari SA, Abdelhamid AA, et al. Dynamic leader sibha algorithm (DLSAa novel hierarchical metaheuristic approach for solving engineering design problems. J Cybersecur Inform Manag. 2025;1(1):120–33. doi:10.54216/JCIM.160110. [Google Scholar] [CrossRef]
46. Heidari AA, Mirjalili S, Faris H, Aljarah I, Mafarja M, Chen H. Harris hawks optimization: algorithm and applications. Future Generat Comput Syst. 2019;97:849–72. doi:10.1016/j.future.2019.02.028. [Google Scholar] [CrossRef]
47. El-Kenawy E-SM, Eid MM, Saber M, Ibrahim A. MbGWO-SFS: modified binary grey wolf optimizer based on stochastic fractal search for feature selection. IEEE Access. 2020;8:107635–49. doi:10.1109/access.2020.3001151. [Google Scholar] [CrossRef]
48. Mirjalili S, Lewis A. The whale optimization algorithm. Adv Eng Softw. 2016;95:51–67. doi:10.1016/j.advengsoft.2016.01.008. [Google Scholar] [CrossRef]
49. Simon D. Biogeography-based optimization. IEEE Trans Evolution Computat. 2008;12(6):702–13. doi:10.1109/TEVC.2008.919004. [Google Scholar] [CrossRef]
50. Mirjalili S, Mirjalili SM, Hatamlou A. Multi-verse optimizer: a nature-inspired algorithm for global optimization. Neural Comput Appl. 2015;27(2):495–513. doi:10.1007/s00521-015-1870-7. [Google Scholar] [CrossRef]
51. Samareh Moosavi SH, Khatibi Bardsiri V. Satin bowerbird optimizer: a new optimization algorithm to optimize ANFIS for software development effort estimation. Eng Appl Artif Intell. 2017;60:1–15. doi:10.1016/j.engappai.2017.01.006. [Google Scholar] [CrossRef]
52. Johari N, Zain A, Mustaffa N, Udin A. Firefly algorithm for optimization problem. Appl Mech Mater. 2013;421:512–7. doi:10.4028/www.scientific.net/AMM.421.512. [Google Scholar] [CrossRef]
53. Rashedi E, Nezamabadi-pour H, Saryazdi S. GSA: a gravitational search algorithm. Inform Sci. 2009;179(13):2232–48. doi:10.1016/j.ins.2009.03.004. [Google Scholar] [CrossRef]
54. Kirkpatrick S, Gelatt CD, Vecchi MP. Optimization by simulated annealing. Science. 1983;220(4598):671–80. doi:10.1126/science.220.4598.671. [Google Scholar] [PubMed] [CrossRef]
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools