Open Access
REVIEW
Machine Learning for NTN-Assisted IoT: A Bibliometric-Assisted Survey of Optimization across Trajectory, Resource, Energy, and Security Aspects
1 Department of Communication Technology and Network, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, UPM Serdang, Selangor, Malaysia
2 Systems Modeling, Analysis, and Control Research Laboratory (MACS), University of Gabes, Avenue Omar Ibn El Khattab, Zrig Eddakhlani, Tunisia
3 Computer Science Department, Faculty of Science and Humanities, Imam Abdulrahman bin Faisal University, Jubail, Saudi Arabia
4 Department of Information Technology, Gulf Colleges, Hafar Al Batin, Saudi Arabia
5 Department of Electrical, Electronics & Systems Engineering, Faculty of Engineering & Built Environment, Universiti Kebangsaan Malaysia, UKM Bangi, Selangor, Malaysia
* Corresponding Author: Zurina Mohd Hanapi. Email:
(This article belongs to the Special Issue: Artificial Intelligence for 6G Wireless Networks)
Computer Modeling in Engineering & Sciences 2026, 147(2), 6 https://doi.org/10.32604/cmes.2026.077054
Received 01 December 2025; Accepted 04 February 2026; Issue published 27 May 2026
Abstract
Non-terrestrial networks (NTNs)—including UAVs, HAPs, and satellite systems—are rapidly becoming key enablers of wide-area, resilient connectivity for large-scale IoT applications. As these platforms integrate with terrestrial networks to form space–air–ground architectures, optimization challenges related to trajectory, resource management, energy efficiency, and security become increasingly complex. Machine learning (ML) has emerged as a central tool for addressing these challenges by enabling adaptive, data-driven decision-making under uncertainty. This survey presents an optimization-centric review of ML-based NTN-assisted IoT systems focusing on aspect-specific datasets. Using a structured methodology involving dataset curation, keyword filtering, metadata analysis, and citation-based paper selection, we analyze representative and influential works across four core optimization themes: trajectory planning, resource allocation, energy utilization, and security. We develop a taxonomy that captures problem types, learning approaches, architectural configurations, and cross-layer constraints, and discuss insights, complemented by a focused review of top-cited contributions in each theme as well as discussions relating to their complexities and practicality. Our analysis reveals clear methodological trends, including the growing use of deep and multi-agent reinforcement learning, the emergence of distributed intelligence through federated learning, and the increasing interplay among mobility, computation, communication, resource allocation, energy optimization and security. Finally, we highlight key lessons and future research opportunities related to scalable cooperative learning, energy-efficient operation, secure distributed intelligence, and multi-tier optimization across space–air–ground integrated networks, offering a roadmap toward resilient and intelligent 6G-era connectivity.Keywords
Non-terrestrial networks (NTNs) are emerging as indispensable components of next-generation connectivity. They can provide wide-area, on-demand, and resilient communication and thus can effectively complement terrestrial infrastructure, especially in remote, disaster-stricken, or mobility-dominated environments. As IoT systems continue to proliferate across smart cities, agriculture, environmental monitoring, transportation, and public safety, NTN platforms play a critical role in extending connectivity, improving situational awareness, and enabling mission-critical operations.
Fig. 1 illustrates a high-level framework of machine learning (ML)-enabled NTN-assisted Internet of Things (IoT) applications. It highlights key NTN components, UAVs, High Altitude Platforms (HAPs), and satellite infrastructures, interfacing with edge computing platforms and ML models such as Deep Reinforcement Learning (DRL), Federated Learning (FL), and Convolutional Neural Networks (CNNs). Together, these elements form an intelligent ecosystem that supports IoT services across multiple verticals, including smart agriculture, environmental monitoring, disaster response, maritime IoT, and autonomous transportation. The figure underscores that intelligence arises not from NTN entities but from the integration of adaptive ML algorithms with distributed computing and communication infrastructures.

Figure 1: High-level framework of machine learning-enabled NTN-assisted IoT systems. UAVs, HAPs, and satellites interact with terrestrial networks, edge computing, and ML models (e.g., DRL, FL, CNN) to support diverse IoT applications such as agriculture, disaster response, environmental sensing, maritime IoT, and autonomous transportation.
This growing convergence of NTNs and IoT introduces complex optimization challenges. Unlike fixed terrestrial systems, NTN-assisted IoT must contend with dynamic topologies, heterogeneous propagation conditions, intermittent connectivity, and stringent performance requirements such as latency, throughput, coverage, and Age of Information (AoI). UAVs, HAPs, and satellites exhibit mobility patterns and energy limitations that create intricate trade-offs between communication quality, mission duration, and operational efficiency. As a result, the design of intelligent and adaptive optimization frameworks becomes fundamental.
From an optimization perspective, NTN-assisted IoT systems differ fundamentally from conventional terrestrial networks. The inherent mobility of UAVs, the quasi-stationary nature of HAPs, and the large-scale coverage of satellite links, and the energy and payload constraints of aerial platforms introduce highly interdependent decision variables across different layers. Consequently, optimization objectives such as trajectory planning, resource allocation, energy utilization, and security are recurrent in the literature and are commonly formulated as interdependent optimization objectives. These characteristics make NTN-assisted IoT a natural application domain for machine learning–driven optimization, where adaptive, data-driven models can learn complex system dynamics and enable real-time decision-making under uncertainty.
ML has emerged as a key enabler for addressing these challenges. Deep learning supports perception, prediction, and representation learning for mobility and sensing. Reinforcement learning (RL) and its deep variants (DRL) enable sequential decision-making under uncertainty, supporting real-time optimization of trajectories, resource allocation, routing, and task offloading. Multi-agent RL (MARL) extends these capabilities to cooperative scenarios involving UAV swarms or multi-tier NTN architectures. Federated learning and other paradigms, such as blockchain, address privacy and security concerns in UAV-enabled IoT systems [1] and satellite–terrestrial cooperative IoT networks [2]. Together, these methods contribute to increasingly autonomous and intelligent NTN operation. However, the role of ML-driven optimization is not uniform across NTN platforms, since UAVs, HAPs, and LEO satellites differ markedly in their mobility, endurance, coverage characteristics, and operational constraints.
1.1 Distinct Optimization Characteristics of UAV, HAP, and LEO Platforms and Representative Examples
Although NTNs are often discussed as a unified paradigm, UAVs, HAPs, and low Earth orbit (LEO) satellites exhibit fundamentally different physical and operational characteristics. These sometimes necessitate platform-specific ML considerations for trajectory design, resource allocation, energy and security. The following subsections highlight representative studies that characterize the unique operational profiles of UAV, HAP, and satellite systems.
1.1.1 HAP-Assisted IoT: Endurance and Stability-Centric Optimization
Unlike high-mobility UAVs, HAPs operate at near-fixed altitudes with significantly longer endurance, shifting the optimization focus toward long-term service stability and energy harvesting. In this context, reference [3] investigates resource allocation and task offloading within a multi-access edge computing (MEC) framework, where a HAP coordinates with multiple IoT devices (IoTDs) to reduce energy consumption while preserving system stability and battery longevity.
1.1.2 LEO Satellite-Based IoT: Security, Dynamics, and Resource Control
The high orbital velocity and broadcast nature of LEO satellites introduce unique hurdles regarding intermittent connectivity and wide-area security vulnerabilities. Reference [4] addresses the vulnerability of mission-critical satellite IoT networks to cyber-attacks by proposing DLGAN, a security framework that utilizes Convolutional Neural Networks (CNNs) for real-time anomaly detection and Generative Adversarial Networks (GANs) to address skewed cybersecurity datasets.
Further addressing the spatio-temporal dynamics of 6G-enabled LEO systems, reference [5] adopts a cloud–edge–device architecture to support delay-sensitive applications via a spatio-temporal attention-based proximal policy optimization (STA-PPO) algorithm. This model captures task arrival patterns and satellite topology to reduce latency significantly. Similarly, reference [6] explores control-aware energy efficiency by transforming stability constraints into communication reliability requirements through a Lyapunov drift-plus-penalty framework. To mitigate interbeam interference inherent in large-scale satellite coverage, reference [7] provides a theoretical analysis of multibeam management, while reference [8] proposes a separated swarm optimization-based beam scheduling algorithm (SSO-BSA) to adaptively manage non-uniform traffic distributions.
1.1.3 Integrated and Multi-Tier NTN Architectures
The convergence of different tiers allows for complementary strengths, such as using UAVs to bridge the “last mile” between power-limited IoT devices and space-borne platforms. Reference [9] proposes the use of Simultaneous Transmitting and Reflecting Reconfigurable Intelligent Surfaces (STAR-RIS) mounted on UAVs to improve LEO-assisted communications through an Integrated Trajectory, Phase-shift, and Power allocation (ITPP) approach. Similarly, reference [10] explores UAV-assisted relay schemes, utilizing Taylor series approximation to solve joint trajectory and power allocation problems.
For dynamic environments like marine IoT (MIoT), reference [11] introduces a federated multi-agent learning framework (FL-MADDPG) to optimize multi-UAV trajectories and satellite resource allocation simultaneously. In a similar vein, reference [12] utilizes a Dueling Deep Q-Network (DQN) for task offloading decisions in LEO-assisted UAV networks, demonstrating superior energy efficiency compared to greedy baselines.
1.1.4 Advanced Coordination and Security Strategies
The optimization of data freshness and physical layer security must account for the long propagation delays and wide footprints of NTN platforms. Reference [13] utilizes AoI as a metric for information freshness in Integrated Satellite-Terrestrial Networks (ISTNs), applying a Lyapunov-based resource allocation scheme to balance energy efficiency with data timeliness. For autonomous operations, reference [14] presents a multisatellite negotiation framework based on cooperative game theory. In terms of security, reference [15] investigates satellite-assisted jamming to protect against eavesdropping, while references [16,17] explore edge collaboration and NOMA-based data gathering to ensure energy-neutral operations in remote maritime and terrestrial environments.
1.2 Problem Statement and Novelty
Despite the rapid growth of machine-learning-driven research on NTN-assisted IoT systems, existing bibliometric analysis [18] and survey literature remains fragmented in scope and analytical perspective. Several surveys concentrate primarily on UAV-assisted IoT, which represents a major but not exclusive component of NTNs [19–21]. These works provide detailed insights into UAV-enabled architectures, learning techniques, and communication strategies, yet their scope does not fully capture the broader multi-layered NTN ecosystem. Other domain-focused surveys emphasize specific application scenarios rather than system-level generalization. For example, precision agriculture and plant disease monitoring are extensively studied in [22,23], where the focus lies on UAV sensing, edge intelligence, and data fusion techniques. Disaster management and pandemic response applications are reviewed in [24,25], highlighting the integration of ML with UAV, IoT, satellite, and cloud platforms. Smart city and 5G-enabled service optimization perspectives are explored in [26,27], emphasizing AI-driven policy design and intelligent infrastructure management. While these studies provide valuable application-level and domain-specific insights, they generally do not provide design principles across multiple optimization dimensions such as resource allocation, computation offloading, energy efficiency, learning adaptation, and cross-layer NTN integration.
In parallel, technology-centered surveys examine individual enabling components, most commonly mobile edge computing, sometimes within a narrow machine-learning scope such as deep reinforcement learning [28–30]. Surveys addressing design considerations tend to focus on a single NTN tier, most commonly UAVs, without extending the discussion to cross-tier architectures or related aspects such as trajectory prediction and coordination across the NTN architecture [31]. Other reviews, by contrast, focus on specific technologies, including satellite communications, blockchain, or reconfigurable intelligent surfaces, rather than on the optimization aspects these technologies support [29,30,32,33], while some are further constrained by particular NTN tiers or architectural assumptions [33–36]. As a result, the optimization objectives, constraints, and trade-offs underpinning these technologies are frequently examined in isolation, rather than through an aspect-based optimization perspective that spans system layers.
Security is also frequently treated as a protocol- or technology-layer concern, rather than as an optimization aspect that interacts with latency, energy consumption, and resource utilization. Although some prior work has examined design aspects in UAV-enabled MEC systems [37], these studies remain largely confined to MEC-centric formulations and do not generalize to IoT-oriented or multi-tier NTN architectures. Similarly, surveys on secure UAV–IoT systems [38,39] primarily categorize authentication, encryption, and intrusion detection mechanisms, but do not systematically integrate them with broader resource and mobility optimization frameworks. More comprehensive surveys address the Internet of Drones and smart-city-oriented UAV ecosystems [40], as well as RL and DRL applications in IoT and wireless networks [41–43]. These works review learning-based solutions for routing, caching, spectrum access, computation offloading, and network control under dynamic conditions. Multi-agent reinforcement learning (MARL) extensions are further explored for cooperative decision-making in next-generation networks [44], and DRL-based task offloading strategies have been studied in aerial access and MEC-enabled environments [45]. However, despite their breadth, these surveys typically examine learning, security, mobility, or resource management in isolation. They seldom comprehensively analyze these mechanisms across integrated and multi-tier NTN-assisted IoT systems.
Taken together, the breadth and diversity of the literature make it difficult to extract coherent trends and transferable design principles across four recurring optimization aspects: trajectory optimization, resource allocation, energy utilization and management, and security, within a single survey. This work addresses this gap by combining a cross-dimensional analytical focus with a systematically structured analysis of representative and influential works. Specifically, structured filtering criteria, keyword- and text-driven analysis, and citation-based selection are jointly employed to provide a broad understanding of this research domain, as well as identify representative and influential contributions. By synthesizing results based on these contributions and optimization themes, the survey highlights practical design principles that provide perspectives on recurring trade-offs, methodological patterns, and deployment-relevant considerations in ML-enabled NTN-assisted IoT systems.
In summary, while existing literature on UAV-, satellite-, and HAP-assisted IoT is expanding, current reviews remain segmented across major optimization or core aspects of NTN-assisted IoT, often treating trajectory optimization, resource allocation, energy management, and security as isolated optimization aspects rather than components of a holistic NTN landscape. Application-driven surveys typically prioritize specific use cases, such as smart agriculture or disaster response, while technology-focused reviews often concentrate on narrow enablers without situating these critical optimization pillars within a holistic NTN framework. Furthermore, security-oriented research frequently addresses authentication or intrusion detection in isolation. Consequently, the literature lacks a comprehensive analytical framework that synthesizes these multi-dimensional technical aspects alongside a data-driven landscape analysis, a gap that this survey aims to bridge.
To address these limitations, this survey adopts an bibliometric-driven methodology with a focus on optimization-related domains. Beginning with a filtered Scopus dataset, smaller datasets are constructed corresponding to four recurring problem classes: trajectory planning, resource allocation, energy utilization, and security. Each subset is analyzed using keyword and metadata statistics, followed by a citation-guided review of representative technical papers. The outcome is a problem-oriented taxonomy that discusses modeling approaches, learning techniques, and system-level trade-offs across these four aspects.
• Trajectory Planning, adaptive path planning for aerial platforms such as UAVs to meet network performance requirements including timeliness, latency, and energy efficiency while supporting IoT service provisioning;
• Resource Allocation, intelligent allocation of spectrum, bandwidth, and power across NTN entities and IoT devices to enhance throughput, fairness, and delay performance;
• Energy Utilization, energy efficiency maximization, energy harvesting, energy transfer, and energy consumption minimization to support sustainable NTN-assisted IoT operations;
• Security, ML-enhanced intrusion detection, authentication, cybersecurity, and privacy-preserving mechanisms in NTN-assisted IoT networks.
These themes are treated as optimization problem classes, not as bibliometric clusters, and are analyzed independently to extract design methodologies and solution. The contributions of this survey are fourfold:
• We provide a structured methodology for identifying, categorizing, and reviewing ML-based NTN-assisted IoT research using curated datasets, keyword-based filtering, and citation-driven paper selection.
• We develop an optimization-aspect-driven taxonomy that organizes the literature according to recurring engineering optimization problem rather than bibliometric clusters. The taxonomy captures common problem types, ML techniques, system architectures, and constraints.
• We present a hybrid narrative and technical review of top-cited works, revealing representative models, optimization strategies, and methodological patterns.
• We derive lessons and outline future research directions involving cooperative learning, energy-aware NTN operation, secure distributed ML, and holistic optimization in space–air–ground integrated networks.
• Accordingly, this paper complements cluster-level analyses by providing a targeted investigation of optimization aspects and representative solution strategies.
The rest of this paper is organized as follows. Section 2 reviews related surveys and highlights existing research gaps. Section 3 details our methodology for dataset construction and paper selection. Section 4 presents a high-level description of the problem space across optimization themes. Section 5 introduces the optimization-aspect driven taxonomy. Section 6 reviews representative top-cited works. Section 7 presents recent research trends based on information obtained from Scopus, with the publication dates filtered from 2025 to 2026. Section 8 summarizes lessons learned, Section 9 outlines future directions, and Section 10 concludes this paper. Sections 4–6 are structured around the four primary aspects (trajectory design, resource allocation, energy efficiency and security), each examined with progressively increasing depth across the core sections.
2 Related Work and Research Gap
Research on the integration of NTNs, ML, and the IoT has expanded rapidly in recent years. Existing surveys typically fall into three categories: domain-focused reviews, technology-centered surveys, and security and learning-based surveys. While each provides valuable contributions, they do not collectively offer a comprehensive, understanding of NTN-assisted IoT across the core optimization objectives of trajectory, resource, energy, and security. This section summarizes key existing reviews and identifies the research gaps that motivate the present survey.
Several works examine NTNs within specific IoT application domains. In agriculture, UAV-based sensing and ML-driven analytics have been used to monitor crop health, improve agricultural productivity, and enhance automated inspection [22,23]. UAVs enable remote data acquisition in large farms or difficult terrains, providing farmers with real-time intelligence for decision-making.
In the context of disaster management, reviews emphasize UAV-assisted prediction, monitoring, routing, and emergency response. ML-enhanced UAV systems improve communications during crises, offering robust solutions for rapid assessment and situational awareness [24,25]. Similarly, urban and smart city applications increasingly rely on UAV-IoT integration for surveillance, traffic management, and public safety operations [26,27].
These domain-specific surveys highlight UAVs as the predominant NTN platform due to their mobility, cost-effectiveness, and ease of deployment. However, they do not generalize insights across different optimization challenges or extend the discussion to HAP or satellite settings.
2.2 Technology-Focused Surveys
A parallel stream of surveys examines enabling technologies. Mobile edge computing (MEC) has been extensively reviewed with attention to UAV-assisted task offloading, latency minimization, energy efficiency, and scheduling, all frequently supported by RL or DRL algorithms [29,30]. Trajectory optimization surveys categorize mathematical techniques, heuristic methods, and ML-driven approaches for dynamic UAV path planning, emphasizing factors such as connectivity and mission duration [27,32].
Other reviews focus on emerging technologies like reconfigurable intelligent surfaces (RIS), blockchain, and physical-layer security. For instance, RIS-assisted UAV systems have been surveyed with emphasis on secrecy performance, adaptable beamforming, and optimization under channel uncertainty [33]. Similarly, our previous work in [46] presents a large-scale thematic analysis of enabling technologies in ML-based NTN-assisted IoT, highlighting key technological trends and emerging application domains. These surveys provide depth, while the bibliometric review offers a comprehensive technology-based mapping; however, they do not provide an optimization-aspect-centric perspective.
2.3 Security and Learning-Based Surveys
Security-oriented surveys highlight ML for authentication, anomaly detection, intrusion prevention, and privacy-preserving distributed learning across IoT nodes, UAVs, and edge layers [40]. Reinforcement learning surveys extensively discuss RL-based solutions for routing, spectrum access, caching, and adaptive resource provisioning in UAV-assisted networks [41–43].
Recent analyses underscore the growing importance of MARL for decentralized UAV coordination, interference management, and joint mobility–communication optimization [44,45]. However, these reviews do not comprehensively consider security and its associated challenges learning-based strategies, and system-level optimization of trajectory, resource allocation, energy utilization under a comprehensive NTN assisted IoT framework.
Despite the value provided by existing surveys, three major gaps persist:
• Fragmentation across domains: Current surveys tend to focus narrowly on specific applications (e.g., agriculture, smart cities) or mapping/reviewing technologies (e.g., MEC, RIS, trajectory planning), lacking a focus on fundamental optimization targets, considering the large volume of research in these areas.
• Absence of a multi-dimensional optimization perspective: No existing review systematically study trajectory optimization, resource allocation, energy efficiency, and security under the NTN-assisted IoT framework despite these optimization target are in many cases coupled in many practical problems.
• Absence of theme-stratified bibliometric mapping: Current literature lacks a stratified study on trajectory optimization, resource allocation, energy utilization and security from a bibliometric perspective and do not provide insights into design guidelines associated with influential studies in the NTN-assisted IoT landscape.
To address these limitations, this survey adopts a tiered-bibliometric methodology for each optimization target to provide several levels of insights including background, taxonomy, broad perspectives, lessons as well as design guidelines. Specifically, the contributions of this study are:
• A comprehensive bibliometric perspective bridging four optimization pillars—trajectory planning, resource allocation, energy utilization, and security—across UAV, HAP, and satellite NTN platforms.
• A systematic methodology incorporating dataset construction, keyword extraction, bibliometric clustering, and selection of top-cited works.
• A curated and thematically aligned review of representative and top-cited studies, revealing optimization architectures, algorithms, assumptions, and limitations.
• Insights and future research recommendations grounded in some of the existing and emerging trends and considerations, and as well as NTN technologies such as distributed learning, RIS, blockchain, and space–air–ground integrated network (SAGIN) integration.
To systematically identify and analyze influential research on machine learning-driven optimization in NTN-assisted IoT, we adopt a structured, multi-stage methodology. The goal of this methodology is to ensure transparency, reproducibility, and consistency in selecting the most representative works across trajectory planning, resource allocation, energy utilization, and security. The process consists of five major phases: (1) search and data collection, (2) filtering and dataset construction, (3) keyword extraction and bibliometric analysis, (4) landscape mapping and taxonomy formation, and (5) selection of top-cited papers for detailed review. Each step is described in detail below (see Fig. 2). Note that some records satisfied multiple exclusion criteria; therefore, we report the net number of excluded records after applying all filters (Table 1).

Figure 2: Literature identification and filtering workflow: Scopus retrieval, exclusions after applying multiple criteria, keyword-based thematic dataset extraction using Scopus “Limit to” filters, and subsequent bibliometric/thematic analysis.

3.1 Search Strategy and Data Collection
We conduct a comprehensive search using the Scopus database, one of the largest scientific indexing platforms several fields of study including engineering, computing, and communication technologies. Search queries are formulated to capture publications at the intersection of NTNs, IoT, and machine learning, with additional refinements to include UAVs, satellites, edge intelligence, reinforcement learning, optimization, and related topics. These keywords are as shown in Table 2. The search encompasses journal articles, conference papers, and open-access publications to ensure broad coverage.

The search string includes variations and combinations of the following terms:
• NTN platforms: UAV, drone, HAP, satellite, LEO, non-terrestrial.
• IoT and sensing: IoT, sensor, remote sensing, WSN.
• Machine learning paradigms: machine learning, deep learning, reinforcement,
learning federated learning.
• Optimization themes: trajectory, resource allocation, energy, security age,
of information.
The initial search results serve as the basis for constructing specialized datasets corresponding to each optimization theme.
3.2 Filtering and Dataset Construction
Following data collection, a filtering process was applied to choose papers from the datasets:
1. Citation-based filtering: The entries in the dataset were sorted based on citations and then top cited papers were selected for study. Review or survey papers were excluded from this list to ensure only technical papers were reviewed.
2. Abstract inspection: Papers were manually inspected to ensure they fall within the studied optimization framework.
Each theme results in a tailored dataset that was used for deeper bibliometric and thematic analysis.
3.3 Keyword Extraction and Bibliometric Analysis
To identify dominant research concepts and thematic clusters, we employ VOSviewer [47], a widely used software tool for bibliometric visualization. The analysis is performed as follows:
• Keyword extraction: Author-provided keywords are extracted using the VOSviewer tool for each thematic dataset.
• Thresholding: Minimum occurrence thresholds are applied to focus on influential keywords: 4 for trajectory optimization, 4 for resource allocation, 5 for energy utilization, and 4 for security.
• Co-occurrence mapping: VOSviewer is used to generate network graphs showing how keywords co-occur across the dataset, revealing latent research clusters and frequent associations.
• Network map interpretation: The network maps are manually interpreted to identify recurring methods (e.g., DRL, MARL), architectural elements (e.g., UAV swarms, satellite relays), and optimization objectives (e.g., AoI minimization).
This phase provides an empirical foundation for mapping the optimization problem space and supporting taxonomy development.
3.4 Landscape Mapping and Taxonomy Formation
Using the bibliometric insights, we construct a structured overview of the research space for each optimization theme. This includes:
• Identifying core problem types: e.g., continuous vs. discrete trajectory design, centralized vs. distributed resource allocation, energy harvesting vs. scheduling, federated vs. centralized learning.
• Cataloging solution methods: traditional optimization, heuristic methods, DRL, MARL, neural networks, federated learning, hybrid techniques.
• Identifying architectures: UAV-assisted IoT, satellite-augmented systems, HAP-based support, MEC-integrated networks, and SAGIN.
• Recognizing constraints and performance metrics: AoI, latency, fairness, throughput, mission time, energy constraints, and security-related metrics.
The resulting taxonomy serves as a conceptual guide for navigating the technical literature and provides the structural basis for the deep-dive thematic review.
3.5 Selection of Top-Cited Papers
To contextualize the bibliometric findings with concrete contributions, we identify the top-cited and most representative papers within each optimization theme. This selection relies on:
• Citation metrics: Citation counts from Scopus, ensuring that highly influential works are included.
• Relevance to optimization theme: Papers must meaningfully contribute to trajectory design, resource allocation, energy efficiency, or security.
Given a total paper count N, the top

3.6 Summary of the Methodology
Fig. 2 illustrates the overall methodological process, from data collection to thematic synthesis and top-cited review. The combination of bibliometric analysis, structured filtering, and targeted review ensures both breadth and depth in understanding the optimization landscape of ML-driven NTN-assisted IoT.
3.7 Synthesis and Principle Extraction
To manage the structural complexity of the bibliometric data, the authors employed Gemini 1.5 Pro as a supportive tool to assist in organizing and grouping extracted technical keywords into preliminary hierarchical structures, which informed the taxonomy presented in Section 5. In addition, ChatGPT (OpenAI, version GPT-5.2) was used to assist in consolidating curated literature notes into draft design principles. All AI-assisted classifications and summaries were rigorously reviewed, cross-checked, and validated by the authors against the original primary sources to ensure technical accuracy, integrity, and independent interpretation.
As with any survey-based study, the proposed methodology is subject to several limitations. These are discussed below to ensure transparency and proper contextualization of the findings.
Data source and search strategy. The study relies on the Scopus database, selected for its broad coverage of peer-reviewed engineering and computer science literature. While Scopus is comprehensive, it is implicitly assumed that papers in exclusively in other databases (e.g., IEEE Xplore or Web of Science) are captured due to the large Scopus database. Thus, we utilized extensive keyword combinations to ensure wider coverage.
Filtering. It was observed that the initial search results and keyword based filtering features in Scopus for trajectory, resource allocation, energy utilization, and security does not guarantee the absence of “noise” in the form of peripherally related papers. To address this, we manually evaluated the top-cited papers to verify their technical relevance before inclusion, ensuring the dataset reflects dominant research trends accurately.
Bibliometric and keyword analysis limitations. The bibliometric analysis depends on author-provided keywords, which are subject to terminological inconsistencies. To handle the potential inclusion of irrelevant keywords from the initial dataset, we applied a minimum keyword occurrence threshold in VOSviewer. This empirically chosen threshold filters out low-frequency “noisy” terms, reducing the likelihood of irrelevant keywords appearing in the clustering results. While network map interpretation involves domain-specific expertise and a degree of subjectivity, this thresholding process facilitates the identification of clear patterns, thereby enhancing overall interpretability.
Citation-based selection bias. Selecting top-cited papers inherently favors older publications with more time to accumulate citations. To mitigate this, the citation-based review is complemented by a landscape analysis of titles, abstracts, and metadata. This two-level approach ensures that recent, impactful methodological shifts and emerging topics are identified alongside established works. The survey aims to provide a representative synthesis of the field rather than an exhaustive systematic review of every published study.
Reproducibility and interpretive scope. The methodology is described in detail to provide all necessary context and support conceptual reproducibility. However, minor variations may arise due to updates in database indexing or the stochastic nature of visualization algorithms in VOSviewer. Finally, the trends identified are descriptive observations derived from metadata and citation patterns; they should be interpreted as structural insights into the optimization-aspect problem space rather than an attempt to authoritatively restrict the scope of the field to specific algorithms.
4 Overview on Optimization Aspects
Machine learning–enabled NTN-IoT addresses four fundamental, aspects on which many other performance metrics depend, as shown in Fig. 3: trajectory optimization, resource allocation, energy utilization and management, and security. This section provides an integrated overview of the optimization-level problem space underlying machine learning–driven NTN-assisted IoT based on an overview of the obtained datasets. In this Section, we provide a background on these four dimensions leveraging on data obtained in Section 3.

Figure 3: A cross-layer interdependence matrix for ML-driven NTN-assisted IoT. The figure illustrates how machine learning underpins four key optimization aspects and how these aspects interrelate, from trajectory decisions and energy budgets to resource allocation and security, in a dynamic NTN environment.
Trajectory optimization emerges as one of the most dominant themes in NTN-assisted IoT literature with UAVs and DRL is perhaps the most prominent method used to achieve this objective. Table 3 summarizes and compares representative trajectory-centric UAV studies that employ DRL-based methods across different system architectures, performance metrics, and operational configurations. This prominence aligns with the widespread use of UAVs in NTN-assisted IoT systems [27,48]. UAV functions as a highly flexible aerial platforms capable of adaptive movement, controllable flight dynamics, and rapid deployment. Tasks such as data collection, IoT sensing, and aerial relaying naturally depend on mobility-aware optimization, and thus trajectory planning, and path planning problems are commonly solved using reinforcement learning (RL).
Machine learning, especially DRL, is widely adopted to handle the continuous, partially observable, and combinatorial nature of UAV trajectory problems. Algorithms such as Deep Deterministic Policy Gradient (DDPG), Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO), and Q-learning are quite prominent in this domain. These approaches support both single-UAV and multi-UAV scenarios, including UAV swarms and cooperative navigation missions. Trajectory design is frequently required to achieve freshness-aware service provisioning in the presence of connectivity constraints, and application-specific considerations.
Problem formulations involving trajectory optimization often intersects with energy consumption, computation offloading, resource allocation, and QoS optimization. This is consistent with findings in MEC-enabled UAV scenarios where trajectory shaping directly impacts latency, throughput, or task offloading performance [29]. Safety-aware and decentralized trajectory control is vital in trajectory optimization problems. Overall, trajectory optimization research presents a rich landscape characterized by mobility, learning-based adaptability, and multi-dimensional design constraints.
Resource allocation represents another major research focus, often integrated with scheduling, task management, and communication–computation coordination. Many IoT applications involve latency-critical, energy-limited, or bandwidth-constrained operations requiring careful balancing of communication and computation loads. Resource allocation includes task offloading, bandwidth allocation, and scheduling, often associated with MEC and edge intelligence applications.
Traditional optimization methods (e.g., convex optimization, Lyapunov optimization) play a foundational role in addressing mixed-integer or non-convex formulations. On the other hand, learning-based techniques, particularly Q-learning, Deep Q-Network (DQN), PPO, and hybrid heuristic–RL approaches, are increasingly used to manage dynamic network states, user mobility, and uncertain environments. Fairness is vital to ensuring equitable resource distribution in dense IoT networks, including those deploying NOMA as well as applications where power allocation, spectral efficiency and interference management are major priorities in many resource allocation problems.
Overall, resource allocation research seeks to balance analytical optimization with learning-enabled adaptability, particularly in MEC-integrated UAV scenarios, multi-UAV coordination tasks, and delay-sensitive IoT applications.
Energy efficiency is a fundamental constraint across UAVs, IoT devices, and NTN platforms. The energy utilization involves the management and optimization of energy harvesting, propulsion energy, wireless power transfer, and energy efficiency. This reflects the wide variety of energy-related challenges, including UAV flight energy, sensor node battery limits, and sustainable long-duration missions.
Energy-aware trajectory planning is quite fundamental in UAV-enabled sensing and data collection systems [72]. Multi-objective formulations are also common, combining energy minimization with throughput maximization, AoI reduction, or latency constraints. Of unique interest is wireless power transfer (WPT) and Radio Frequency (RF) energy harvesting, particularly in WSN and IoT deployments where UAVs or mobile chargers provide replenishment to distributed nodes.
DRL also plays a role in energy optimization, both for UAV mobility control and for scheduling decisions in MEC-based environments. However, there is a significant reliance on classical energy models, highlighting the ongoing need for realistic physical models of propulsion energy and harvesting efficiency. The optimization problem space therefore presents energy as a vital cross-layer challenge associated with mobility, communication, and computation.
Security is characterized by a diverse set of challenges, including intrusion detection, authentication, privacy preservation, and blockchain-based trust. Similarly, IDS, cybersecurity, blockchain, and federated learning are quite common in security-related NTN-assisted IoT research indicates significant attention to both data protection and decentralized intelligence. These are common security challenges in UAV networks, drone swarms, and satellite–IoT integration.
Machine learning is often deployed to solve security-related problems, particularly deep learning (e.g., CNNs, RNNs, autoencoders) for anomaly detection and behavioral modeling. Reinforcement learning can be deployed for secure routing, adaptive defense, and resilient communication strategies. Notably, there is a growing interest in privacy-preserving learning (e.g., FL) as well as the integration of blockchain to support traceability, authentication, and tamper resistance.
Thus, security is not merely a constraint but a co-optimization dimension that may be inherently coupled with trajectory, energy, and resource allocation.
Across all four themes, there are several underlying patterns:
• Widespread use of learning-based optimization, especially DRL, for addressing mobility and environmental uncertainty.
• Prominence of UAV-assisted IoT as a major scenario for ML-driven NTN research.
• Increasing integration of MEC, WPT, NOMA, and blockchain into NTN-assisted IoT design frameworks.
• Interdependent design challenges involving trajectory dynamics, energy utilization, communication–computation resource allocation, and security constraints.
These trends provide a foundation for the technical review of top-cited papers in Section 6.
5 Generalized Taxonomy of Optimization Themes
This section presents a unified taxonomy derived from the curated datasets for trajectory optimization, resource allocation, energy utilization, and security. The taxonomy consolidates recurring problem types, solution methodologies, architectural patterns, and system constraints identified across the literature. Consistent with the methodology described in Section 3, the taxonomy is constructed from the Scopus-derived datasets.
The resulting taxonomy provides a structured view of how ML-driven NTN-assisted IoT research is organized as shown in Fig. 4, while the detailed taxonomies for each optimization aspect are shown in Figs. 5–8. These taxonomies aim to provide a concise roadmap highlighting the key characteristics and design considerations associated with each dimensions.

Figure 4: A simple optimization-aspect-driven taxonomy of NTN-assisted IoT systems, structured around four core optimization aspects: trajectory optimization, resource allocation, energy utilization, and security.

Figure 5: Taxonomy of trajectory optimization: problem types, methods, applications, and constraints.

Figure 6: Taxonomy of resource allocation.

Figure 7: Taxonomy of energy utilization.

Figure 8: Security threats, mechanisms, defenses, contexts, constraints, and applications.
Trajectory optimization is a pivotal component of NTN-assisted IoT, determining how aerial platforms navigate to serve ground nodes effectively. These problems manifest in two primary forms: continuous optimization within large state spaces, and discrete optimization, often modeled as variants of the Traveling Salesman Problem (TSP) for fixed waypoint navigation.
Given the complexity of aerial environments, these problems are typically multi-objective, requiring the simultaneous optimization of flight paths, AoI, throughput, and energy efficiency. To address environmental uncertainty, formulations frequently rely on Markov Decision Processes (MDP) or Partially Observable MDPs (POMDP).
While traditional methods such as convex optimization and heuristics remain in use, the field is shifting toward learning-based approaches. DRL is particularly effective, utilizing algorithms such as PPO, SAC, and Twin Delayed Deep Deterministic Policy Gradient for continuous control. In multi-UAV scenarios, coordination is achieved via Multi-Agent DRL (e.g., MADDPG), while Federated Learning provides a decentralized framework for privacy-sensitive missions. Furthermore, recent research integrates trajectory design with RIS to optimize phase shifts and signal propagation dynamically.
These techniques are applied extensively in MEC computation offloading, smart city monitoring, and search-and-rescue operations. Successful deployment, however, requires rigorous adherence to constraints including collision avoidance, propulsion energy limits, and safety margins in non-line-of-sight conditions.
Resource allocation constitutes a fundamental pillar of NTN-assisted IoT, requiring the simultaneous optimization of communication bandwidth, computational resources, and energy consumption. In MEC scenarios, this manifests as joint task offloading and scheduling problems, often formulated as Mixed-Integer Non-Linear Programming (MINLP). Because these optimization landscapes are frequently non-convex, traditional analytical approaches such as Lyapunov optimization could be supplemented by metaheuristics like Particle Swarm Optimization (PSO) and Genetic Algorithms.
To address the dynamic nature of aerial networks, there is a definitive shift toward learning-based methodologies. While value-based Reinforcement Learning (e.g., Q-learning, DQN) works well for discrete state spaces, complex continuous environments increasingly rely on actor-critic methods such as PPO, DDPG, and SAC. Furthermore, Multi-Agent DRL (MADRL) and Federated Learning are critical for enabling decentralized, privacy-preserving coordination in heterogeneous swarms [73,74].
These algorithms are deployed across diverse architectures, including SAGIN, NOMA-enhanced V2X systems, and RIS-aided environments. Recent innovations also include Digital Twin-based solutions for predictive resource management [75]. These technologies target high-stakes applications such as rail transit inspection [76] and the Internet of Vehicles (IoV), where systems must adapt robustly to high mobility, varying channel conditions, and strict latency constraints.
5.3 Energy Utilization and Efficiency
Energy efficiency is a foundational challenge in NTN-assisted IoT, essential for sustaining network operation in remote or off-grid environments. Strategies to address this fall into three categories: minimizing consumption through power allocation, harvesting ambient energy, and active wireless transfer (WPT or SWIPT). These scenarios are typically modeled as non-convex or Mixed-Integer Non-Linear Programming (MINLP) problems, necessitating joint optimization of trajectory, task offloading, and scheduling.
To solve these complex formulations, the literature utilizes a spectrum of methods ranging from traditional heuristics (e.g., greedy algorithms, PSO) to advanced learning-based schemes. DRL and Federated Learning are increasingly employed to manage the dynamic trade-offs between energy, latency, and AoI.
These optimization frameworks are highly architecture-dependent. In UAV-assisted MEC and SAGIN environments, aerial nodes often function as flying base stations or recharging hubs for ground nodes. Consequently, research frequently integrates technologies such as NOMA and RIS to maximize spectral and energy efficiency simultaneously. Key constraints include propulsion energy costs, battery limitations, and the nonlinearity of energy harvesting circuits, all of which must be managed alongside security and reliability requirements.
Security remains a paramount challenge in NTN-assisted IoT, particularly within UAV and satellite-based architectures where distributed nodes are vulnerable to malicious activities. The threat landscape encompasses Denial-of-Service (DoS) attacks, intrusions, anomalies, and unauthorized access attempts. To mitigate these risks, robust authentication and authorization protocols are needed alongside encryption schemes. Given the power limitations of IoT devices, solutions consider lightweight cryptographic mechanisms, elliptic curve cryptography, and physical-layer security (PLS). Furthermore, decentralized trust models utilizing blockchain and smart contracts, as well as privacy-preserving techniques like differential privacy, are increasingly critical for protecting sensitive data in distributed environments.
Artificial Intelligence serves as a cornerstone for modernizing these defense mechanisms. Methodologies range from traditional supervised and unsupervised learning for anomaly detection to deep learning models (CNN, RNN, LSTM) capable of identifying complex attack patterns. RL and Deep RL are employed to optimize dynamic security responses, while FL is gaining traction for its ability to facilitate collaborative intrusion detection without compromising raw data privacy.
These security frameworks are applicable across diverse NTN configurations, including single UAV deployments, swarms, and LEO satellite networks often integrated with 5G and 6G technologies. Beyond pure defense, these systems must balance security requirements with strict constraints on energy consumption, scalability, and reliability. Relevant application domains include military operations, mission-critical surveillance, and smart agriculture. Fig. 8 illustrates the resulting taxonomy derived from the analysis of the security dataset.
Across the four optimization themes, the taxonomy reveals several cross-cutting patterns:
• Reinforcement learning, particularly DRL and MARL, is widely used across all optimization categories. In many cases, RL frameworks such as actor–critic methods, multi-armed bandits, and offline pre-training are used to handle large or continuous state spaces, support online decision-making, and address uncertainties in dynamic environments.
• UAVs dominate the optimization literature, not only as data collectors but also as aerial base stations, wireless power sources, and edge computing nodes. Their mobility makes them well suited for maintaining data freshness (e.g., AoI), supporting computation offloading, and adapting to network dynamics. HAPs and satellites appear in several works and can provide global network awareness that assists UAV trajectory design, although they remain less explored.
• Cross-layer optimization is common; trajectory decisions affect energy consumption, latency, AoI, throughput, and resource availability. Trajectory optimization are often combined objectives such as energy management, user association, scheduling, task offloading, and security. Because these dependencies often result in large or complex optimization spaces, hybrid approaches that combine heuristics, convex optimization, and deep learning are sometimes adopted.
• Emerging trends include decentralized intelligence, such as federated learning and distributed learning for intrusion detection, multi-UAV cooperation, and privacy-preserving analytics. These techniques help address scalability, data distribution, and privacy requirements in NTN-assisted IoT.
• MEC integration are quite prominent, with UAVs, satellites, and ground nodes jointly participating in task offloading and edge intelligence. Online learning and DRL-based frameworks are often used to reduce latency, predict mobility, and adapt to heterogeneous computing capabilities across network tiers.
• Secure distributed architectures are also gaining traction. Particularly, federated deep learning, blockchain-based solutions, and anomaly detection methods are increasingly used to handle threats. Common threats include data poisoning, adversarial attacks, eavesdropping, and intrusion attempts in UAV and IoT networks. These studies highlight the need for security mechanisms that are scalable, accurate, and efficient for large NTN-assisted IoT systems.
This taxonomy establishes the conceptual foundation for the deeper examination of representative and top-cited works in Section 6.
This section presents a structured review of the most influential and highly cited papers associated with the four optimization themes identified earlier: trajectory planning, resource allocation, energy utilization, and security. The goal is to summarize representative works and to reveal their methodological patterns, design assumptions, and optimization strategies. In contrast to the landscape-level overview provided in Section 4, this section focuses on specific contributions, discussing insights and providing further context to the broader taxonomy introduced in Section 5.
We begin with a keyword-driven understanding of how the literature clusters around trajectory-related optimization, as well as detailed bibliometric analyses and review of all four optimization domains.
Trajectory optimization is one of the most dominant themes in ML-driven NTN-assisted IoT research, reflecting the importance of UAV mobility for enhancing coverage, data collection, sensing efficiency, and communication quality. The dataset shows strong concentrations of keywords such as “trajectory planning,” “path planning,” “UAV,” “coverage,” “Age of Information,” and reinforcement learning variants, underscoring the multi-objective and learning-centric nature of this domain.
Fig. 9 depicts a keyword co-occurrence map generated from the curated trajectory dataset, where keyword variations (e.g., trajectory, trajectories, trajectory planning, trajectory optimization, vehicle trajectories, motion planning) were consolidated to form unified groups. Using VOSviewer with a minimum occurrence threshold of two, we obtained 105 significant keywords whose clustered relationships reflect major research directions. Table 4 summarizes prominent grouped keywords, their occurrence counts, and link strengths.

Figure 9: Keyword co-occurrence map for trajectory optimization using VOSviewer. Larger nodes indicate more frequent keywords, while link strength reflects co-occurrence intensity. Prominent clusters correspond to UAV mobility, RL-based trajectory planning, AoI-driven optimization, and multi-UAV coordination.

Several patterns emerge:
• Dominance of UAV-related terms: Trajectory optimization is strongly associated with UAVs due to their mobility, making them central to NTN-assisted IoT research.
• Reinforcement learning prevalence: RL and DRL feature prominently as many trajectory optimization problems involve high-dimensional or partially unknown environments.
• AoI and freshness concerns: Frequent co-occurrence of AoI indicates strong interest in freshness-aware trajectory design.
• Multi-agent learning trends: Keywords related to multi-agent DRL highlight the importance of decentralized and cooperative UAV strategies.
• Interconnected optimization tasks: The presence of keywords tied to data collection, MEC, security, resource allocation, bandwidth management, and fairness demonstrates the multifaceted nature of trajectory optimization-related problems.
To illustrate the core methodological patterns and challenges observed in the literature, we highlight several highly cited and representative anchor papers from the trajectory dataset. These papers exhibit distinct problem formulations and techniques but collectively provide some insights into the landscape of ML-based trajectory optimization in NTN–IoT systems. A summary of these works is provided in Table 5.
Liu et al. [77]—ML-Based Maritime Trajectory Reconstruction. This paper proposes a two-phase machine learning framework for denoising and reconstructing vessel trajectories from AIS data. First, noise points are filtered using density clustering to remove anomalies common in maritime sensing. Next, a bidirectional LSTM (BLSTM) model reconstructs missing or corrupted trajectory segments by learning temporal dependencies. This contribution is significant because mobility prediction and reconstruction are essential for reliable IoT-driven maritime monitoring, routing, and safety systems. The work highlights the usefulness of supervised sequence models in capturing real-world mobility behaviors.
Liu et al. [78,79]—Deep Learning for Dynamic Trajectory Forecasting. The proliferation of Maritime Internet of Things (MIoT) applications has necessitated advanced forecasting models to handle the vast volumes of data generated by satellite-terrestrial Automatic Identification System (AIS) stations. To enhance maritime safety and operational efficiency, Liu et al. [78] propose an AIS-driven vessel trajectory prediction framework based on Long Short-Term Memory (LSTM) networks. A key innovation in this work is the integration of vessel traffic conflict modeling—derived from the “social force” concept—directly into the LSTM architecture. By utilizing a reconstructed mixed loss function, the framework ensures robust performance across diverse navigation environments, providing reliable predictions for collision avoidance and anomaly detection.
Advancing beyond traditional recurrent architectures, Liu et al. [79] introduce the Spatio-Temporal Multigraph Convolutional Network (STMGCN) for trajectory prediction MEC environments. This framework addresses the complex dependencies in maritime traffic by integrating three distinct graph structures based on social force, the time to the closest point of approach, and vessel size. These graphs are fused through a spatio-temporal multigraph convolutional layer, complemented by a self-attention temporal convolutional layer to maintain model efficiency. The resulting STMGCN framework offers superior interpretability and predictive accuracy, contributing to more sophisticated traffic management in MEC-enabled maritime IoT systems.
Zhu et al. [69]—DRL for UAV-Assisted WSN Data Collection. This work addresses energy-efficient and time-efficient UAV path planning for data collection from wireless sensor networks (WSNs). By modeling UAV movement and data collection as a sequential decision process, the authors employ deep reinforcement learning to jointly minimize mission duration and flight energy consumption. Empirical results show that the DRL policy significantly outperforms heuristic baselines, particularly under varying WSN topologies and traffic conditions. The paper illustrates the suitability of DRL for online trajectory control in unpredictable IoT environments.
Bayerlein et al. [72]—Multi-UAV Cooperative Data Collection. Here, the authors explore cooperative path planning for multiple UAVs serving as data collectors for sensor networks. A multi-agent framework is developed to coordinate UAV trajectories while reducing latency and avoiding redundancy. The contribution highlights how decentralized or partially observable scenarios benefit from multi-agent reinforcement learning, particularly when multiple UAVs must navigate shared airspace and communication constraints.
Wang et al. [56]—Joint Trajectory and MEC Offloading Optimization. This paper integrates UAV trajectory planning with bandwidth allocation and MEC computation scheduling. The joint optimization framework ensures timely offloading of IoT tasks, reducing latency and energy consumption on resource-limited devices. The trajectory emerges as a critical control variable, influencing channel quality, computation offloading feasibility, and overall communication efficiency.
Abedin et al. [80]—Age-of-Information-Aware Trajectory Planning. Focusing on AoI minimization, this work develops trajectory policies that maintain data freshness for delay-sensitive IoT applications. AoI is embedded directly into trajectory control decisions, leading to flight paths that dynamically adjust to information arrival times and sensor spatial distributions. This contribution demonstrates the rise of freshness-driven optimization in UAV-assisted IoT.
6.1.1 Computational Complexity and Practicality
This subsection examines the computational complexity and deployment practicality of the studied trajectory-optimization approaches, with particular emphasis on training cost, scalability, and real-time feasibility, as explicitly reported in the reviewed studies.
Across the examined literature, two dominant methodological categories can be identified. The first comprises reinforcement-learning-based approaches that learn trajectory control policies under dynamic network states or observations. The second consists of supervised deep learning models applied to trajectory reconstruction or forecasting, most notably in maritime IoT scenarios.
For DRL-based trajectory optimization and control (e.g., [56,69,80]), the dominant computational burden is incurred during training. In these studies, trajectory optimization is typically formulated as a mixed-integer, non-convex, or constrained combinatorial problem, which is NP-hard. As the number of IoT nodes, clusters, or trajectory waypoints increases, the corresponding solution space grows exponentially rendering direct optimization difficult. This complexity is therefore transferred to the learning phase, where training cost scales with the number of training episodes, the episode length (decision horizon), and the dimensionality of the adopted state and action representations.
In practice, reported state representations are high-dimensional and heterogeneous, and are often dominated by node-level indicators. These may include per-node coverage states, energy or service-status variables, and freshness-related metrics, while other dimensions correspond to UAV-specific variables such as position or pheromone state. This feature dimensionality imbalance becomes more pronounced as network size increases and can significantly hinder convergence. To mitigate this effect and stabilize learning, several studies introduce architectural mechanisms such as learned feature embeddings implemented using one or more linear projection layers, whereby low-dimensional UAV-related features are mapped into higher-dimensional latent representations proportional to the number of nodes. While these designs improve learning stability, they also increase training complexity.
Reward design further contributes to training difficulty. In data-collection and AoI-constrained scenarios, multiple studies report sparse or discontinuous reward signals, where violations of coverage, energy, or freshness constraints result in zero or heavily penalized rewards. During early training, agents therefore explore largely at random and receive limited feedback, which slows convergence and destabilizes critic learning as network scale grows. To address this issue, reward-shaping mechanisms are commonly employed to transform sparse rewards into denser learning signals, reducing convergence time at the expense of additional design complexity and computational overhead.
As a consequence of these factors, training in DRL-based trajectory studies is consistently performed in a centralized manner, typically on ground servers or cloud platforms equipped with GPUs. Centralized training enables the use of large replay buffers, repeated neural network updates, and experience pooling, and is reported as necessary to handle long training horizons and high-dimensional state spaces. The learned actor or policy networks are deployed on UAVs only after training has been completed.
Once trained, online execution is generally reported to be lightweight. Trajectory decisions are obtained through a forward pass of the trained neural network, involving matrix multiplications and activation functions. This inference process is feasible on onboard UAV processors or edge nodes and is substantially less demanding than solving the original NP-hard optimization problem online. Nevertheless, policy execution is inherently reactive and requires continuous state updates, including UAV position, node service status, energy levels, or AoI indicators. Several studies note that if observation updates lag behind environmental dynamics, trajectory decisions may deviate from near-optimal behavior, potentially necessitating retraining or adaptation.
In combinatorial trajectory planning under energy and wireless power transfer constraints (e.g., [69]), training complexity is further increased by coupled decisions such as cluster-head selection and visiting-order determination. These decisions resemble traveling-salesman-type problems and are computationally prohibitive for large networks. To improve deployment practicality, some works adopt sequence-learning architectures, such as pointer networks, which learn optimal visitation sequence rather than fixed spatial coordinates. This design enables models trained on small-scale problem instances to generalize to larger networks without retraining, thereby reducing redeployment overhead when network size changes.
For multi-UAV trajectory coordination (e.g., [72]), scalability emerges as a central challenge. As the number of UAVs increases, joint state representations and inter-agent interactions introduce non-stationarity during training, since each agent’s policy evolves concurrently. Centralized training with pooled experience is commonly adopted to address coordination, but this substantially increases training time and computational cost, with reported implementations involving millions of interaction steps on GPU-equipped servers.
To maintain deployment feasibility, these studies emphasize decentralized execution. After centralized training, control policies are deployed individually on UAVs, which perform inference locally using compressed or partially observed state representations. Techniques such as global–local map decomposition and representation compression are used to reduce neural network size and floating-point operations, resulting in inference workloads that are compatible with modern embedded processors on small and energy-limited UAVs.
In contrast, supervised deep learning approaches for trajectory reconstruction and forecasting in maritime IoT contexts (e.g., [77–79]) incur their dominant computational cost during offline training. Training complexity is driven by dataset size, the number of training epochs, and architectural choices such as recurrent units, spatio-temporal convolutions, or graph-based representations that explicitly model vessel interactions. Several works further discuss GPU-based computing frameworks to accelerate trajectory-related computation, particularly when processing large-scale AIS datasets.
Once trained, inference in these supervised models is highly parallelizable and supports near-real-time operation on edge or cloud nodes, making them suitable for latency-sensitive maritime services such as collision avoidance and traffic monitoring. However, the reviewed studies consistently report sensitivity to dataset bias and distribution shift, especially in trajectory prediction applications. Changes in routes, traffic density, seasonal patterns, abnormal events, or sensor noise can degrade prediction accuracy unless models are periodically updated or retrained using new data.
Overall, across the reviewed trajectory optimization studies, inference cost is rarely identified as the primary deployment bottleneck. Instead, training stability under sparse or high-dimensional state representations, scalability with increasing network or agent count, retraining requirements under changing operational conditions, and data quality or generalization robustness emerge as the dominant factors influencing practical feasibility.
Across these top-cited papers in the trajectory category, several insights emerge:
• Sequence modeling (e.g., BLSTM networks) is effective for prediction and reconstruction in mobility-dominated IoT environments.
• DRL is widely adopted for online trajectory planning under uncertainty, enabling UAVs to adapt to dynamic IoT deployments and wireless conditions.
• Multi-agent RL supports cooperative trajectory optimization in multi-UAV scenarios, addressing scalability and decentralized control.
• Cross-layer coupling appears repeatedly, as trajectory decisions influence energy consumption, communication quality, MEC task offloading, and AoI.
• Application diversity is broad, ranging from WSN data collection to maritime IoT, MEC, and freshness-sensitive monitoring.
These anchor papers collectively map the core algorithmic landscape of ML-driven trajectory optimization and establish the foundation for subsequent cross-theme analysis.
Efficient resource allocation is a fundamental component for optimizing the performance of NTN-assisted IoT networks, particularly in UAV-enabled MEC environments. Fig. 10 illustrates the keyword co-occurrence landscape for this category, generated using VOSviewer with a minimum keyword occurrence threshold of three. Related terms such as resource management, resource allocation, and scheduling algorithms are consolidated under the main theme of resource allocation, yielding a densely connected network of seventy keywords. The results highlight strong associations between resource allocation and computation offloading, task scheduling, mobile edge computing, and UAV-assisted networking.

Figure 10: Keyword co-occurrence map for resource allocation. Clusters highlight relationships between MEC-enabled offloading, UAV-supported communication, dynamic resource management, channel and task allocation, and RL-based scheduling.
The bibliometric analysis reveals that resource allocation is frequently studied in conjunction with optimization objectives such as latency reduction, energy efficiency, QoS, AoI, and throughput maximization. Prominent solution approaches include Lyapunov optimization and a wide range of learning-based techniques, particularly DRL, MADRL, federated learning, and deep neural networks. Enabling technologies such as blockchain, network slicing, NOMA, RIS, and digital twins further indicate the growing complexity and heterogeneity of resource management problems in NTN-assisted IoT systems. The presence of satellite IoT and SAGIN-related keywords reflects the extension of resource allocation challenges beyond purely aerial platforms to multi-tier space–air–ground architectures.
The most prominent keywords include resource allocation, Internet of Things, deep reinforcement learning, mobile edge computing, edge computing, and unmanned aerial vehicles. Other highly connected terms such as task offloading, energy efficiency, federated learning, and multi-agent deep reinforcement learning demonstrate that resource allocation is intrinsically linked with computation, communication, mobility, and energy optimization in NTN-assisted IoT systems.
To provide some insights into this area we highlight several highly cited and representative anchor papers from the curated dataset. These works collectively demonstrate how joint optimization, edge intelligence, and ML-driven decision-making shape the resource allocation landscape. A summary of these works is provided in Table 6.
6.2.1 Learning-Based Resource Allocation and Offloading in UAV-Assisted MEC
Cheng et al. [81]—DRL-Based Resource Allocation and Task Scheduling in SAGIN. This work proposes a space–air–ground integrated network architecture in which UAVs act as edge computing servers while satellites provide cloud access. The authors formulate a joint task offloading and resource scheduling problem for UAV edge servers, where computing resources are allocated to virtual machines handling offloaded IoT tasks. A deep reinforcement learning framework optimizes offloading and scheduling decisions, achieving fast convergence and reduced system cost. The results demonstrate the suitability of DRL for handling the hierarchical and dynamic resource allocation challenges inherent in SAGIN-enabled IoT systems.
Seid et al. [84]—Multi-Agent DRL for UAV-Assisted IoT Edge Networks. This paper introduces a multi-agent deep reinforcement learning approach for joint task offloading and resource allocation in UAV-assisted IoT networks. Each UAV learns decentralized policies to manage computation and communication resources under dynamic traffic and channel conditions. The proposed framework reduces computation cost, improves QoS, and outperforms conventional optimization and heuristic methods, highlighting the scalability benefits of decentralized learning in multi-UAV resource management.
Seid et al. [83]—Collaborative DRL-Based Offloading and Resource Allocation for 5G EIoT. This paper proposes a collaborative computation offloading and resource allocation framework for 5G-enabled IoT systems using UAVs as aerial base stations. A model-free DRL approach enables UAVs to dynamically coordinate offloading decisions and allocate communication and computing resources under highly dynamic network conditions. The framework minimizes task execution delay and energy consumption, outperforming conventional DRL baselines and greedy strategies, and illustrates the benefits of collaboration in UAV-assisted 5G IoT scenarios.
Asheralieva and Niyato [87]—Hierarchical RL for Pricing and Resource Management in Blockchain-Enabled IoT. This paper studies joint pricing and resource management in an IoT system integrating blockchain-as-a-service (BaaS) and mobile edge computing, where UAVs and base stations execute blockchain-related tasks. The interaction among service providers and users is modeled as a stochastic Stackelberg game, and a hierarchical reinforcement learning framework combining deep Q-learning and Bayesian deep learning is proposed. The approach achieves stable convergence and efficient resource utilization, demonstrating how economic incentives and learning-based control can jointly shape resource allocation in UAV-assisted MEC systems.
Wan et al. [88]—Path Planning and Resource Management of UAV Base Stations in MEC-Enabled IoT. This work presents a three-layer MEC-based IoT architecture consisting of distributed sensors, UAV base stations with onboard edge servers, and a central cloud. Lyapunov optimization is employed to manage energy consumption and data offloading, while a deep reinforcement learning framework optimizes UAV path planning to improve service coverage. The results show that jointly optimizing mobility and resource allocation significantly enhances coverage stability and reduces service urgency, emphasizing the interdependence between trajectory control and resource management.
Nguyen et al. [89]—Joint Trajectory and Resource Optimization for UAV-Assisted IoT via DRL. This paper investigates a UAV-assisted IoT system that jointly optimizes flight trajectory, data collection, and resource utilization. A DRL algorithm learns optimal control policies that maximize throughput while minimizing resource consumption. Simulation results demonstrate improvements in sum-rate, trajectory efficiency, and overall network performance, reinforcing the strong coupling between trajectory design and resource allocation in UAV-enabled IoT systems.
6.2.2 Hybrid and Multi-Tier MEC Architectures
Jiang et al. [82]—Hybrid Deep-Learning-Based Online Offloading Framework (H2O). This work proposes a hybrid MEC architecture integrating ground stations, ground vehicles, and UAVs for IoT task offloading. The authors develop a hybrid deep-learning-based framework that jointly optimizes node positioning, user association, and computing resource allocation to minimize energy consumption. Large-scale path-loss fuzzy c-means clustering predicts UAV and vehicle positions, while particle swarm optimization and deep neural networks enable near-real-time offloading decisions. The framework demonstrates how hybrid learning–heuristic pipelines can balance solution optimality and computational efficiency in dynamic MEC systems.
Cui et al. [86]—Joint User Association, Offloading, and Resource Allocation in Satellite MEC IoT. This paper extends MEC-enabled IoT services to remote and sparsely populated regions through multi-satellite networks. IoT tasks can be offloaded either to satellites with limited computing capacity but low propagation delay, or to gateways with abundant resources but higher latency. The authors decompose the optimization problem into a convex resource allocation subproblem and a Markov decision process for joint user association and offloading, solved via DRL. The proposed approach minimizes system cost (latency and energy) while adapting to dynamic satellite and gateway resource availability.
6.2.3 Secure and Online Resource Allocation
Ding et al. [90]—Online Edge Learning Offloading for Secure UAV-Assisted MEC. This work presents an online edge learning offloading (OELO) scheme for secure computation in UAV-assisted MEC systems. A UAV server processes dynamically arriving tasks while adversarial UAV eavesdroppers and ground jammers model security threats. The framework jointly optimizes binary offloading decisions, time allocation, transmit power, and local computation using DRL combined with successive convex approximation and Dinkelbach’s method. The results demonstrate near-optimal secure computation efficiency, queue stability, and robustness under dynamic task arrivals.
6.2.4 Protocol-Level Resource Management
Hu et al. [91]—RL-Based Protocol Design and Resource Management in Cellular Internet of UAVs. This paper proposes a distributed sense-and-send protocol for the cellular Internet of UAVs, integrating multiple reinforcement learning techniques across protocol layers. Q-learning is used for trajectory adaptation, multi-armed bandit learning for user association, actor–critic methods for power control, and deep RL for subchannel allocation. The proposed protocol achieves faster convergence and higher average rewards than single-agent learning schemes, illustrating the effectiveness of multi-level RL for joint trajectory control and resource management.
6.2.5 Computational Complexity and Practicality
This subsection analyzes the computational complexity and deployment practicality of representative resource allocation approaches in NTN-assisted IoT systems, with emphasis on training overhead, scalability under network growth, and real-time feasibility, as reported in the reviewed studies.
Across the examined literature, resource allocation problems are commonly formulated as dynamic mixed-integer or continuous optimization problems involving task offloading decisions, computing and communication resource allocation, power control, and user association. Two dominant methodological paradigms emerge: (i) reinforcement-learning-based frameworks that learn online allocation policies under dynamic system states, and (ii) hybrid approaches that combine learning-based components with classical optimization or heuristic decomposition.
For single-agent DRL-based resource allocation and task offloading (e.g., [81,83]), the dominant computational cost is incurred during the offline training phase. In these studies, UAV mobility, dynamic virtual machine allocation, and time-varying channel conditions make the reward function and state transition probabilities difficult to model explicitly. As the number of users and tasks increases, the system state and action spaces grow rapidly, making conventional tabular methods or direct optimization increasingly impractical. Consequently, the optimization problem is converted into a policy function approximation learned by a deep neural network, shifting computational complexity to training.
Training complexity in these frameworks scales with the dimensionality of the state and action spaces, the length of training episodes, and the degree of environmental non-stationarity induced by UAV movement and workload variation. Several works report that standard policy-gradient methods, such as REINFORCE, suffer from high variance and slow convergence under large state–action spaces, motivating the use of actor–critic architectures. In such designs, a critic network guides the actor’s policy updates, reducing variance and enabling faster convergence, which is particularly beneficial for fast reconfiguration when user density or task arrival patterns change.
As a result of these factors, training is typically performed in a centralized manner on remote cloud servers or central controllers with high computational and storage capabilities. Once training is completed, the learned policy is deployed on UAV edge servers or cluster heads, where online inference reduces to a forward pass through the trained network. This inference process involves simple algebraic operations and matrix multiplications, enabling real-time offloading and resource allocation decisions under dynamic network conditions.
For multi-agent resource allocation scenarios involving multiple UAVs or edge nodes (e.g., [84]), scalability becomes a central concern. These studies adopt multi-agent DRL frameworks, such as MADDPG, to capture inter-agent coupling and cooperative behavior. While decentralized execution ensures that the online computational complexity of each agent remains independent of the total number of agents, training complexity increases significantly due to interaction-induced non-stationarity. During training, each agent’s environment evolves as other agents update their policies, requiring a centralized critic operating on the joint state–action space. As the number of agents grows, the critic input dimension increases accordingly, leading to higher training time and computational overhead.
To maintain deployment feasibility, these works emphasize centralized training with decentralized execution. After training, critic networks are decoupled and only lightweight actor networks are deployed on individual UAVs or edge devices. Each agent then performs inference using local observations, enabling scalable and distributed decision-making without continuous inter-agent communication during operation.
In cooperative UAV-enabled MEC systems with global state awareness (e.g., [85]), resource allocation performance is further improved by modeling the system state of the entire network. However, this improvement comes at the cost of increased state representation complexity. High-dimensional, noisy, and redundant IoT observations necessitate state representation learning to derive compact latent representations. Several studies incorporate additional loss terms, such as encoding loss, reward prediction loss, stability loss, and diversity loss, to stabilize representation learning during training. While this improves convergence and overall utility, it substantially increases offline training complexity at centralized entities such as virtual system operators.
Hybrid learning–optimization frameworks (e.g., [82,86]) offer alternative trade-offs between complexity and practicality. By decomposing the original NP-hard mixed-integer problem into subproblems, such as positioning, clustering, user association, and continuous resource allocation, these approaches exploit classical optimization methods where tractable and learning-based models where adaptation is required. For example, convex subproblems can be solved optimally using Lagrangian methods under fixed associations, while DRL or DNN models handle discrete or high-level decisions. This decomposition significantly reduces online runtime compared to end-to-end DRL, enabling near-instantaneous inference once models are trained.
In dynamic settings, however, the practicality of hybrid approaches depends on the frequency of clustering, association, or topology updates. Changes in user population or network geometry may require re-execution of clustering or heuristic components, which can dominate overhead if updates occur frequently. To mitigate this effect, some designs limit neural network inputs to low-dimensional membership or association vectors, keeping inference complexity independent of the total number of users.
For secure and satellite-enabled MEC scenarios (e.g., [90]), additional constraints such as eavesdropping mitigation, queue stability, and secure computation efficiency further increase computational complexity. These problems are often characterized by fractional objective functions involving binary offloading decisions coupled with continuous resource variables. To handle this complexity, studies combine DRL for discrete decision-making with successive convex approximation (SCA) for resource allocation. While online learning enables adaptation to stochastic task arrivals and channel variations, the inclusion of secrecy constraints and iterative optimization increases computational overhead relative to models without security constraints.
Overall, across the reviewed resource allocation literature, inference complexity during deployment is generally low and compatible with real-time operation on UAV or edge platforms. Instead, practical feasibility is primarily constrained by offline training cost, scalability with increasing numbers of users or agents, convergence behavior under non-stationary environments, and the frequency with which models must be updated to reflect changes in workload, topology, or security conditions.
Across the top-cited resource allocation papers in NTN-assisted IoT, several consistent observations emerge:
• Joint computation offloading and resource allocation is a dominant design paradigm, as communication, computation, and mobility decisions are strongly interdependent in UAV-assisted MEC and SAGIN architectures. Learning-based frameworks are commonly used to coordinate these decisions under dynamic workloads and time-varying wireless conditions.
• Deep reinforcement learning is widely adopted for online resource management, enabling adaptive allocation of bandwidth, transmit power, and computing resources in non-stationary IoT environments. DRL-based approaches are particularly effective in scenarios where analytical optimization becomes intractable due to system dynamics and coupling across layers.
• Multi-agent reinforcement learning: By leveraging Deep RL (MADRL) architectures, these frameworks support scalable and decentralized resource allocation in multi-UAV and multi-tier NTN deployments. By distributing decision-making across agents, MADRL mitigates centralized bottlenecks while accommodating partial observability and coordination among aerial platforms.
• Hybrid ML–optimization frameworks are frequently employed to balance solution quality and computational efficiency. Many studies combine learning-based prediction or policy optimization with classical solvers, heuristics, or convex optimization to enable near-real-time decision-making in large-scale MEC systems.
• Multi-tier and heterogeneous architectures, including hybrid ground–aerial MEC and satellite-enhanced IoT, introduce latency–capacity trade-offs that fundamentally shape resource allocation strategies. Joint optimization of user association, task offloading, and resource provisioning becomes essential in these settings.
• Security-aware and online learning approaches extend resource allocation beyond performance optimization by accounting for adversarial behavior, privacy constraints, and continuously arriving tasks. Online edge learning frameworks demonstrate the ability to maintain stability and efficiency under dynamic and potentially hostile operating conditions.
• Cross-layer coupling repeatedly appears across the literature, as resource allocation decisions frequently dictate UAV trajectories, energy consumption, and QoS, depending on the optimization objective. Effective resource management therefore requires integrated consideration of communication, computation, mobility, and security dimensions.
Energy consumption management and optimization are critical in NTN-assisted IoT applications, particularly those involving UAVs. Table 7 summarizes the most frequently cited studies in this category. The findings emphasize the pivotal role of AI, especially advanced learning techniques, in minimizing energy consumption while jointly optimizing other performance metrics such as latency, throughput, and data rates in UAV- and satellite-assisted communication and computation offloading systems.
The CSV file from the Energy Utilization category was uploaded to VOSviewer for keyword co-occurrence analysis with the results shown in Table 8 and Fig. 11. The default parameters of VOSviewer have been used with a minimum of three keyword occurrence threshold, yielding a total of 98 unfiltered keywords and connected keywords. These keywords are related to technologies involved in architectures where energy efficiency, energy consumption, energy harvesting, or other energy considerations have been addressed within the NTN-assisted IoT framework, or in supporting technologies that contribute to energy efficiency. These include 5G, 6G, blockchain, cloud computing, cognitive radio, mobile edge computing, digital twin, fog computing, intelligent reflecting surfaces, and NOMA. Architectures supporting energy harvesting can also be observed, such as wireless power transfer (WPT), wireless powered communication networks (WPCN), and simultaneous wireless information and power transfer (SWIPT). Energy considerations are often found in UAV-assisted architectures, while satellite involvement in energy-aware setups is also included.


Figure 11: Keyword co-occurrence map for energy utilization. Key aspects include energy-aware trajectory design, energy harvesting, wireless power transfer, and DRL-based energy-efficient control for UAV–IoT systems as well as the presence of joint energy optimization frameworks.
Themes emerging from the dataset also include performance requirements and metrics such as Age of Information, latency, and reliability. Other recurring research directions involve data collection, security, physical layer security, computation or task offloading, resource allocation, power control, coverage, scheduling, trajectory design, and optimization. Applications include smart farms, and some problems are addressed using multi-objective optimization. Optimization approaches incorporate game theory and AI-based techniques. For machine learning and AI, commonly studied methods include reinforcement learning, with observable variants such as Q-learning, multi-agent reinforcement learning, deep reinforcement learning, and multi-agent deep reinforcement learning. Federated learning also appears frequently within frameworks addressing energy management or utilization, alongside methods like attention mechanisms.
Networking paradigms appear prominently, with ‘Internet of Things (IoT)’ and ‘wireless sensor networks (WSNs)’ features strongly. This indicates that NTN-assisted technologies that are AI-driven not only support IoT for achieving energy efficiency but also encompass wireless sensor networks. Fig. 11 provides a broad thematic classification of several of the keywords related to energy utilization.
The analysis shows that a small set of keywords dominates in both frequency and connectivity, indicating their centrality in the literature on energy utilization in NTN-assisted IoT. In terms of occurrences, the most prominent include unmanned aerial vehicle (UAV) (62 occurrences, link strength 162), UAV (52, 152), Internet of Things (47, 129), resource allocation (35, 102), reinforcement learning (32, 85), and energy efficiency (30, 95). High link strength values are also observed for deep reinforcement learning (81, 200), energy harvesting (19, 57), wireless power transfer (20, 52), and trajectory planning (18, 53), reflecting their strong interconnections and presence within the AI-driven NTN-assisted IoT optimization problem space. Among performance metrics, Age of Information (16, 48) and its variants appear prominently, alongside latency and reliability, showing their importance in energy-aware NTN-IoT systems. This concentration of occurrences and strong co-occurrence patterns underscores the central role of UAV platforms, AI-driven optimization, IoT integration, and wireless energy transfer in advancing energy-efficient NTN-assisted IoT architectures.
To characterize representative trends in energy utilization for NTN-assisted IoT, we highlight several highly cited technical anchor papers from the curated dataset. These works collectively capture energy-aware offloading in emergency UAV edge networks, risk-sensitive scheduling under explicit energy constraints in SAGIN, multi-objective optimization in wireless-powered IoT, and energy-aware computation offloading in satellite–UAV 6G architectures. A summary of these works is provided in Table 7.
Seid et al. [83]—Energy-Aware Offloading and Resource Allocation in Emergency UAV-Assisted Edge Networks. This work considers emergency scenarios where terrestrial network infrastructure is unavailable and UAVs are deployed as aerial base stations and edge computation nodes. A multi-UAV aerial-to-ground (A2G) architecture is modeled, where collaborative model-free DRL is used to jointly optimize computation offloading and resource allocation with the objective of reducing execution delay and energy consumption. The framework enables adaptive decision-making under dynamic conditions and demonstrates performance gains over Asynchronous Advantage Actor-Critic (A3C) DQN, and greedy baselines, illustrating the importance of learning-driven energy-aware resource control in time-critical deployments.
Zhou et al. [92]—Risk-Sensitive DRL for Energy-Constrained Online Scheduling in SAGIN. This paper studies task scheduling in a space–air–ground integrated network by formulating an energy-constrained Markov decision process for UAV-enabled IoT data collection and offloading. A UAV collects computation tasks from IoT devices and must decide whether to process tasks locally, offload them to a base station, or transmit them to a remote satellite. To explicitly control energy constraint violations while minimizing delay, the authors develop a deep risk-sensitive reinforcement learning algorithm that balances processing delay with the risk of exceeding the UAV’s energy budget. The results show notable delay reduction while satisfying energy capacity constraints, highlighting a practical pathway for constraint-aware online learning in energy-limited NTN operation.
Zhu et al. [69]—Energy-Efficient UAV-Assisted Data Collection in Wireless Sensor Networks. This work studies a UAV-assisted wireless sensor network in which cluster heads collect data from member nodes, and a UAV is dispatched to gather data from the cluster heads. The objective is to minimize the total energy consumption over a complete data collection cycle, accounting for both UAV propulsion energy and communication-related energy expenditure. The resulting optimization problem, which jointly determines cluster head selection and UAV visitation order, is formulated as a constrained combinatorial problem and shown to be NP-hard. To address this challenge, the authors propose a deep reinforcement learning framework based on a Pointer Network integrated with A* search. Simulation results demonstrate that the learned policy generalizes well to networks with varying numbers of clusters without retraining and consistently outperforms baseline heuristics, highlighting the effectiveness of DRL for energy-aware UAV-assisted data collection.
Yu et al. [94]—Multi-Objective DDPG for Wireless-Powered UAV-Assisted IoT. This work addresses energy sustainability in UAV-assisted wireless-powered IoT networks using a fly–hover–communicate protocol. During hovering, the UAV operates in full-duplex mode to simultaneously collect data from a target device while charging other devices within its coverage range. The study incorporates a practical propulsion power model and a non-linear energy harvesting model, then formulates a multi-objective optimization problem to maximize sum data rate and harvested energy while minimizing UAV energy consumption. An extended multi-objective DDPG framework learns online path planning policies under tunable objective weights, enabling adaptive trade-offs between energy replenishment, communication performance, and UAV endurance.
Mao et al. [93]—Energy-Aware Computation Offloading in Satellite–UAV 6G IoT Networks. This study considers satellite- and UAV-assisted IoT communications for 6G, where IoT devices face limited energy and the network experiences high latency and significant signal loss. A SAGIN architecture is proposed that integrates satellites for cloud computing, UAVs for edge computing and wireless power delivery, and high-capacity links to support data-intensive communication. An LSTM-based model determines whether tasks should be processed locally, offloaded to UAVs, or sent to satellites by accounting for dynamic energy levels and varying network conditions. The results indicate that learning-assisted offloading decisions can improve task success and system efficiency under joint energy–latency constraints in multi-tier NTN deployments.
6.3.1 Computational Complexity and Practicality
This subsection examines the computational complexity and deployment practicality of representative energy-aware optimization frameworks in NTN-assisted IoT systems, with emphasis on training overhead, state-space dimensionality, and real-time feasibility under stochastic energy and traffic dynamics, as reported in the reviewed studies.
Across the examined energy utilization literature, computational complexity is primarily driven by the need to jointly model UAV mobility, communication conditions, and energy dynamics under uncertainty. Energy-aware task offloading, scheduling, and trajectory optimization problems are commonly formulated as Markov decision processes with continuous state and action spaces, where energy availability, task queues, and channel conditions evolve stochastically over time.
For DRL-based energy-aware offloading and scheduling frameworks (e.g., [83,92]), the dominant computational cost is incurred during the offline training phase. In these studies, centralized controllers must explore high-dimensional state spaces, encompassing UAV energy levels, stochastic task arrivals, and time-varying channel conditions, to optimize continuous control policies. Training complexity increases with the dimensionality of the energy-related state representation, the stochasticity of task arrivals, and the length of training episodes, requiring prolonged environment interaction to achieve policy convergence.
To address this complexity, actor–critic architectures such as DDPG are employed, as conventional heuristic or tabular reinforcement learning methods fail to scale under large continuous state and action spaces. While training entails iterative gradient-based optimization and intensive computational overhead at centralized servers, the resulting policies are offloaded to UAV cluster heads or edge nodes for decentralized execution. During deployment, inference involves only a single forward pass through the trained actor network, enabling real-time offloading and resource allocation decisions that are suitable for time-sensitive and emergency scenarios.
Energy-efficient data collection introduces additional complexity when the underlying formulation is combinatorial. In UAV-assisted wireless sensor network scenarios (e.g., [69]), the joint selection of cluster heads and UAV visitation order is explicitly formulated as an NP-hard problem, combining elements of facility location and traveling salesman problems. Exhaustive search becomes computationally prohibitive as network size grows. The proposed Pointer Network–A* approach replaces brute-force optimization with a learned heuristic that guides A* search, allowing the UAV to generate near-optimal trajectories via efficient inference and to generalize to unseen network sizes without retraining.
Multi-objective energy optimization frameworks further increase training complexity by simultaneously accounting for conflicting objectives. In wireless-powered UAV-assisted IoT systems (e.g., [94]), policy learning must balance harvested energy, UAV propulsion consumption, and data rate performance. The conflicting nature of these objectives leads to a Pareto-type optimization problem, where no single solution optimizes all metrics simultaneously. Training therefore requires extended exploration phases and careful reward design. However, once convergence is achieved, the learned policies enable flexible adjustment of objective weights, allowing system priorities to be tuned without re-solving the optimization problem.
In satellite–UAV IoT architectures (e.g., [93]), computational complexity is further compounded by heterogeneous computing tiers, long propagation delays, and time-varying energy harvesting processes. Prediction-based learning models, such as LSTM networks, are introduced to forecast near-future energy availability and traffic demand. By shifting part of the computational burden to offline training and replacing iterative optimization with neural inference, these models significantly reduce online processing requirements. However, their effectiveness depends directly on prediction accuracy, as errors in harvested energy estimation can lead to suboptimal offloading and scheduling decisions.
Overall, across the reviewed energy utilization studies, inference complexity during deployment is also generally lightweight and compatible with real-time execution on UAVs or edge platforms. Instead, practical feasibility is primarily constrained by offline training cost, the fidelity of energy and traffic modeling, convergence behavior under stochastic dynamics, and the generalization capability of learned policies when system conditions change.
Across the top-cited energy utilization papers in NTN-assisted IoT, several consistent observations emerge:
• Energy-aware offloading dominates energy utilization studies, where learning-based policies jointly balance computation delay and energy consumption under dynamic workload and channel conditions.
• Constraint-aware RL formulations (e.g., energy-constrained MDPs and risk-sensitive learning) provide a practical mechanism for preventing energy budget violations while maintaining low latency in SAGIN-enabled IoT.
• Wireless-powered IoT introduces multi-objective energy trade-offs, requiring simultaneous optimization of harvested energy, communication performance, and UAV endurance under realistic propulsion and non-linear energy harvesting models.
• Multi-tier SAGIN architectures reshape energy optimization, where the optimal offloading destination (local/UAV/satellite) depends on heterogeneous computing capacities, propagation delay, and dynamic energy states.
• Learning-driven adaptability is central because energy utilization is rarely optimized in isolation; it is typically coupled with latency, task success probability, and throughput objectives in operational NTN-assisted IoT settings.
• Generalization and scalability are practical strengths of learning-based energy control, with several works demonstrating policies that adapt to varying network sizes or task arrival patterns without retraining, supporting long-term deployment feasibility.
Taken together, the energy-focused anchor papers show that sustainable NTNs depend on joint mobility–communication–energy optimization, reinforcing the need for ML-driven adaptive strategies in power-constrained UAV and IoT systems.
Security constitutes a major optimization theme across NTN-assisted IoT, driven by the vulnerability of UAVs, WSNs, and distributed IoT infrastructures to cyberattacks, data tampering, unauthorized access, and adversarial interference. Fig. 12 shows a co-occurrence map for the security category. This map was generated using the CSV file for the security category, which includes entries such as Network Security (207), Security (65), Intrusion Detection (48), Cybersecurity (46), Security Systems (41), and Cyber Security (34). The default parameters of VOSviewer were applied with a minimum keyword occurrence threshold of four, resulting in a total of 63 unfiltered keywords (including variations, e.g., “unmanned aerial vehicle” and “unmanned aerial vehicles (UAV)”).

Figure 12: Keyword co-occurrence map for security. Dominant aspects include intrusion detection, anomaly detection, federated learning, blockchain-enabled security, authentication and access control, ML-driven security mechanisms and applications, and surveillance in NTN-assisted IoT networks involving UAVs and satellite-enabled communications.
The results highlight a range of interconnected technologies within the security paradigm. Core security concepts such as intrusion detection, intrusion detection systems, cybersecurity, and privacy are clearly present. Security-related technological mechanisms, such as cryptography, authentication, and access control, also emerge prominently in the map. Supporting technologies and frameworks, including blockchain and federated learning, are frequently associated with security research.
Security is often studied in the context of several NTN components, including UAVs, satellite communication, and the Internet of Drones. Some works also integrate enabling technologies such as energy harvesting. Application domains represented in the visualization include surveillance, edge computing, smart farming, smart agriculture, smart cities, and 6G. Other relevant technologies include digital twin, NOMA, 5G, and 6G.
From the visualization, it is evident that AI techniques, particularly federated learning, are highly prevalent in this field. Traditional machine learning approaches, including feature selection, are also widely applied to security challenges. The importance of data augmentation for improving machine learning performance is another key observation. Optimization strategies, such as Stackelberg game optimization, are commonly used. Finally, the map suggests that UAV-assisted Internet of Things (IoT) networks often incorporate security considerations, and similar considerations appear in NTN-assisted applications involving hardware platforms such as Arduino. Top-cited technical papers in this category are discussed in this section. A Summary of articles in the security category, including their NTN component, applied ML algorithms, objectives, and results, is provided in Table 9. It shows that privacy preservation, defense against vulnerabilities, malware detection, and intrusion detection are some of the prominent applications of machine learning and neural networks towards achieving security in UAV-assisted networks.
To illustrate the core techniques and challenges in this theme, we highlight several highly cited anchor papers from the dataset. A summary of these works is provided in Table 9.
6.4.1 Deep Learning and Blockchain-Enabled Security in UAV–IoT Systems
Kumar et al. [96]—Blockchain-Enabled Privacy and Anomaly Detection in Smart Agricultural UAVs. This work proposes a secure and privacy-preserving framework for smart agricultural UAV–IoT systems, addressing data poisoning and inference attacks in internet-connected sensing pipelines. The framework integrates a blockchain-based authentication layer (using smart contracts and an enhanced proof-of-work mechanism) with deep learning–based anomaly detection. A sparse autoencoder is used for privacy protection against inference attacks, while a stacked LSTM model performs anomaly detection on UAV-related network traffic. Evaluations on ToN-IoT and IoT-Botnet datasets report very high detection accuracy, demonstrating that multi-layer security designs can provide verifiability, traceability, and strong detection performance, though computational overhead remains a practical consideration.
6.4.2 Adversarial Robustness of Deep Learning-Enabled UAV Systems
Tian et al. [97]—Adversarial Threats and Defensive Strategies for DL-Based UAV Regression Models in CPS. This paper investigates adversarial vulnerabilities in deep learning–based UAV control for cyber-physical systems, with emphasis on regression models used in safety-critical decision loops. The authors propose both non-targeted and targeted attack strategies capable of perturbing camera inputs to disrupt navigation outputs and collision probability prediction. Defensive distillation and adversarial training are evaluated as mitigation strategies, revealing a practical trade-off between robustness and training cost. The study highlights that securing learning-enabled UAV autonomy requires explicit adversarial threat modeling and robust training frameworks.
6.4.3 Blockchain-Enabled Intrusion Detection for the Internet of Drones
Heidari et al. [98]—Blockchain-Integrated Neural Intrusion Detection for IoD. This work proposes a blockchain-integrated intrusion detection framework for the Internet of Drones, where dynamic network topology and mobility complicate reliable IDS deployment. The approach employs a radial basis function neural network (with transfer learning) supported by mobile edge computing to enable low-latency detection and model sharing, while blockchain provides decentralized integrity and tamper resistance. Evaluations across multiple benchmark IDS datasets demonstrate strong detection performance (e.g., accuracy and precision), supporting the feasibility of blockchain-assisted learning for security monitoring in drone-centric networks.
6.4.4 Security in Satellite-Enabled and NTN-Assisted IoT Networks
Han et al. [99]—Learning-Based Anti-Jamming for Satellite-Enabled IoT. Satellite-enabled Army IoT (SaIoT) has gained significant attention for its wide coverage and high-capacity transmission. However, its performance is increasingly threatened by AI-driven “smart jamming.” To address this, this paper investigates the energy consumption challenges caused by interference and proposes a distributed, dynamic anti-jamming scheme. The authors first model the adversarial interaction between jammers and SaIoT devices using a Hierarchical Anti-jamming Stackelberg Game (HASG). In this model, jammers act in a “leader” subgame while IoT devices respond in a “follower” subgame; the paper proves that a Stackelberg equilibrium exists within this framework. To further reduce energy consumption, an anti-jamming Coalition Formation Game (CFG) is introduced for the follower subgame, featuring a modified coalition preference order and a specific “coalition change principle” to optimize performance. By applying exact potential game theory, the authors demonstrate that this CFG converges to a stable structure, achieving performance levels comparable to centralized optimization despite being a distributed approach. Finally, reinforcement learning algorithms are utilized to derive suboptimal anti-jamming policies within dynamic, unknown environments. Simulation results confirm that this approach outperforms existing schemes in both efficiency and resilience.
6.4.5 Computational Complexity and Practicality
This subsection examines the computational complexity and deployment practicality of security-oriented learning frameworks in NTN-assisted IoT systems, with emphasis on training overhead, coordination cost, and real-time feasibility under adversarial and highly dynamic operating conditions, as reported in the reviewed studies.
Across the examined security literature, computational complexity is largely driven by the need to simultaneously address intelligent adversaries, network mobility, and strict latency and resource constraints. The reviewed works encompass deep-learning-based intrusion and anomaly detection, adversarial robustness mechanisms for UAV control systems, blockchain-enabled security architectures, and distributed anti-jamming strategies for satellite-enabled IoT networks.
For deep-learning-based intrusion detection and security analytics (e.g., [96,97]), the dominant computational cost is incurred during the training phase. These frameworks employ architectures such as stacked LSTMs, sparse autoencoders, and radial basis function neural networks, which involve high-dimensional matrix operations, recurrent gating mechanisms, or clustering-based initialization. Training complexity scales with dataset size, feature dimensionality, and network depth, and is further increased when sparsity penalties or auxiliary loss terms are introduced to improve feature selectivity and convergence stability.
Several blockchain-enabled security frameworks introduce additional computational and latency overhead through consensus and coordination mechanisms. Storing raw UAV or IoT data directly on-chain is reported to be impractical due to scalability and storage constraints; therefore, off-chain storage solutions such as the InterPlanetary File System (IPFS) are adopted, with only fixed-size cryptographic hashes recorded on the blockchain. While this design improves scalability, consensus operations, such as cryptographic nonce computation or voting-based validation, remain computationally expensive and introduce non-negligible delay, particularly for battery-powered UAVs or drone nodes participating in mining or verification.
To address these limitations, practical deployments adopt tiered architectures that separate heavy computation from edge operation. To mitigate these constraints, practical deployments adopt hierarchical architectures that separate compute-intensive tasks from edge operations. Training, blockchain maintenance, and consensus execution are offloaded to cloud, fog, or MEC infrastructure, while UAVs and IoT devices retain only truncated blockchain metadata and perform lightweight neural network inference. This asymmetric design ensures real-time responsiveness at the edge without compromising system-wide security guarantees.
Adversarial robustness studies for learning-enabled UAV systems (e.g., [97]) further demonstrate that improving resilience to targeted and non-targeted attacks substantially increases training complexity. Adversarial training involves multiple nested optimization stages, including standard model training, adversarial example generation using gradient-based or optimization-based attacks, and subsequent retraining with the crafted samples. This multi-stage process substantially increases training cost relative to non-defensive models. In contrast, defensive distillation is reported to preserve inference complexity, as it does not alter model scale and only modifies output smoothness during training.
The practicality of adversarial defenses is further constrained by the nature of UAV control tasks, which are formulated as regression problems rather than classification. Unlike classification settings, where attacks aim to cross discrete decision boundaries, effective attacks on control systems must induce sufficient deviation in continuous outputs. This requirement increases the cost of attack generation and limits the speed at which defensive models can be updated in rapidly changing environments.
Learning-based anti-jamming approaches for satellite-enabled IoT systems (e.g., [99]) introduce a different source of complexity associated with distributed coordination and coalition dynamics. Centralized optimization is reported to be impractical due to the rapid growth of possible coalition formations as the number of devices increases, leading to a pronounced curse of dimensionality. To overcome this limitation, distributed reinforcement learning and coalition formation games are employed, enabling devices to adapt locally based on partial information.
While distributed learning significantly improves scalability and avoids centralized bottlenecks, it introduces additional signaling and coordination overhead. Devices must exchange information related to coalition preferences, utility gains, and network state, and convergence time increases with both network size and the frequency of intelligent jamming strategy changes. Exact potential game formulations are used to ensure convergence to stable coalition structures, allowing distributed solutions to achieve performance close to centralized optimization at the expense of additional coordination cycles.
Overall, across the reviewed security-oriented studies, inference complexity during deployment is generally lightweight and compatible with real-time operation on UAVs or edge devices. Instead, practical feasibility is primarily constrained by offline training overhead, blockchain consensus and coordination cost, adaptability to evolving attack strategies, and signaling complexity in distributed or coalition-based defenses. Designing security mechanisms that balance robustness, scalability, and update efficiency under adversarial NTN conditions therefore remains a central practical challenge.
Across the top-cited technical security papers in NTN-assisted IoT, several consistent observations emerge:
• Deep learning is widely used for anomaly and intrusion detection, commonly relying on sequence models (e.g., LSTM) and representation learners (e.g., autoencoders) to characterize IoT and UAV network traffic.
• Blockchain frequently appears as a trust layer to enhance data integrity, verifiability, and resistance to data poisoning or tampering in decentralized UAV–IoT security frameworks.
• Adversarial threats represent a practical risk for learning-enabled UAV autonomy, where small perturbations to sensor inputs can disrupt safety-critical regression and control outputs, motivating robustness-aware training and evaluation.
• Security mechanisms must operate under mobility and resource constraints, as dynamic topology, intermittent connectivity, and limited onboard computation strongly influence IDS design and deployment feasibility.
• Learning-based anti-jamming emerges as a critical security mechanism in satellite-enabled IoT networks, where reinforcement learning and game-theoretic models enable adaptive and energy-efficient defense against intelligent adversaries.
7 Thematic Analysis of Recent Optimization Trends (2025–2026)
To complement the citation-driven technical review, this section provides a brief recent-trends snapshot based on title and abstract mining of a Scopus export (export date: 04 January 2026). The motivation is to capture very recent developments that may not yet be reflected by citation counts, thereby broadening the review’s coverage.
A supplementary Scopus query was conducted focusing on journal articles published between 2025 and 2026 that address UAV-, HAP-, and satellite-assisted IoT systems. The query incorporated optimization-related terms spanning trajectory design, resource allocation, energy consumption, and security. To maintain consistency with the technical scope of this survey, only research articles (DOCTYPE = “ar”) were retained, and records associated with clearly unrelated application domains (e.g., steganography or domain-specific sensing keywords) were excluded. The resulting dataset comprises 47 papers and is used exclusively for identifying high-level research signals from titles and abstracts, rather than for detailed technical comparison or algorithmic benchmarking.1
Among the 47 records, one paper is explicitly labeled as a survey or review in its title and is excluded from the analysis presented here. The objective of this section is mainly to provide emerging research directions that may be underrepresented in citation-based selection, using a metadata-driven approach. Accordingly, the observations reported below are grounded in patterns evident from titles and abstracts across the dataset (2025–2026 Scopus export, 47 records): The recent-trends dataset includes studies addressing optimization in UAV-, HAP-, and satellite-assisted IoT systems, covering aspects such as trajectory planning, resource allocation, energy management, and system security. Representative works in this dataset include [100–104]. Additional studies further explore IoT connectivity architectures, UAV data collection strategies, localization and inspection, and secure communication mechanisms [105–109]. Other contributions focus on control-aware communication, digital-twin modeling, and learning-based trajectory optimization in aerial and spaceborne IoT systems [6,110–113]. Further works investigate satellite resource management, sensing-integrated communication, and task offloading optimization in multi-agent IoT environments [7,114–117]. Recent studies also address satellite IoT access protocols, UAV trajectory planning, and distributed computation across satellite networks [9,14,118–120]. Additional research explores grant-free access, UAV clustering, energy-efficient multi-UAV sensing, and RIS-assisted communication frameworks [121–125]. Other works investigate marine IoT systems, heterogeneous network optimization, and energy-efficient RIS-assisted UAV communication [17,126–129]. Further studies explore machine-learning-based power control, privacy-preserving distributed learning, and energy-aware UAV deployment strategies [130–134]. Recent works also examine satellite beam scheduling, RIS-assisted NOMA transmission, and traffic prediction [8,135,136]. Additional studies investigate UAV-enabled multimedia delivery and security mechanisms [137–140].
The literature from 2025 and 2026 reveals a gradual paradigm shift toward more comprehensive cross-layer optimization frameworks. As such, contemporary research emphasizes the interdependence of mobility, energy, and security.
7.1 Trajectory Planning: AoI-Centric and Robust Mobility Control
Current research on trajectory has increasingly formulated mobility control as a multi-objective optimization problem. Rather than treating objectives in isolation, more recent approaches jointly address timeliness, feasibility constraints, and service goals. For instance, AoI-centric formulations embed data freshness directly into the mobility objective, showing that trajectory optimization is needed to achieve timely data delivery. To ensure that these complex control problems remain tractable in dynamic environments, researchers increasingly employ iterative learning-assisted algorithms, such as decomposing mixed-integer nonlinear programming into time-scheduling and path-planning subproblems solved via successive convex approximation and hierarchical asynchronous A3C modules.
Multi-objective DRL has enabled the explicit characterization of trade-offs between operational efficiency and service quality within a unified control policy. In addition to these goals, energy management continues to act as a primary constraint, particularly in data-intensive collection and offloading scenarios where battery station re-entry must be strategically planned within the mission trajectory. Finally, there is a growing trend toward integrating localization uncertainty into trajectory design, moving beyond idealized assumptions of perfect state information toward more robust joint estimation–control formulations.
7.2 Resource Allocation: Scalability and Beam Agility under Dynamics
Recent literature in resource allocation emphasizes the need for realistic access modeling, especially considering practical challenges such as intermittent visibility and bursty IoT traffic. On the infrastructure side, multibeam management and physical-layer scheduling remain primary mechanisms for aligning satellite resources with heterogeneous traffic demands. These strategies rely on beam- and channel-aware control mechanisms designed to minimize transmit power while satisfying the specific traffic demands of LEO-based IoT users.
With regard to channel access, the focus is primarily toward scalability. Deep neural network-assisted random access protocols are being developed to mitigate collisions arising from sporadic, large-scale transmissions characteristic of massive IoT (mIoT) networks. Similarly, the integration of grant-free access with beam-hopping designs addresses the need for scalable connectivity during the brief contact windows of non-terrestrial platforms. In parallel, UAV-assisted edge computing continues to prioritize scheduling and delay-aware resource allocation, often utilizing Lyapunov optimization to manage task backlogs and energy consumption in the face of unpredictable task arrivals.
7.3 Energy Utilization: Sustainable and Cross-Layer Efficiency
Energy efficiency is observed to be a core component of studied cross-layer frameworks rather than a standalone minimization objective. In UAV-assisted MEC systems, energy-aware designs jointly optimize computation, communication, and mobility, especially under renewable power constraints such as solar-powered harvesting. A similar observation is evident in satellite transmission, where power usage is integrated with link-layer control, utilizing Lyapunov drift-plus-penalty frameworks to ensure control stability while minimizing long-term transmission energy.
The emergence of RIS assisted UAV-enabled IoT further expands this design space by coupling trajectory planning with passive surface configuration and active power control. Moreover, multi-objective UAV–MEC formulations have begun to explicitly characterize tunable trade-offs between energy consumption and system performance, using adaptive learning frameworks to optimize long-term sustainability. Finally, energy awareness has extended into security-oriented designs, indicating that lightweight cryptographic mechanisms are now treated as co-design variables to extend the operational endurance of resource-constrained drones.
7.4 Security: Decentralized Trust and Secrecy-Oriented Design
Security strategies are evolving to align with aerial platform-specific dynamics, particularly addressing vulnerabilities associated with frequent handovers and broadcast exposure. To mitigate trust risks induced by high mobility, decentralized authentication frameworks have been tailored for satellite IoT to secure handover procedures. This protection focus extends to the network edge, where privacy-preserving learning techniques in UAV-enabled IoT mitigate potential privacy breaches during mobile data aggregation.
At the physical layer, secrecy-oriented optimization remains a critical defense against eavesdropping in wide-area coverage scenarios. Recent approaches utilize successive convex approximation (SCA) and hypograph theory to optimize UAV trajectories specifically for enhancing Physical Layer Security (PLS). Collectively, these developments indicate that mobility control now functions not only as a performance mechanism but also as an explicit security-enhancing degree of freedom in NTN–IoT systems.
This section synthesizes the key insights derived from the reviewed literature across the four optimization themes: trajectory planning, resource allocation, energy utilization, and security. We highlight overarching trends, methodological patterns, and critical observations that recur throughout ML-driven NTN-assisted IoT research.
A central lesson from the trajectory planning literature is that conventional approaches, typically static optimization, geometric heuristics, or problem-specific rule-based designs, are inadequate for highly dynamic NTN-assisted IoT environments. Many applications, especially those involving time-sensitive data collection, require minimizing freshness-related metrics such as the AoI. In such settings, static or purely model-based optimization often struggles with scalability, time-varying channels, heterogeneous QoS requirements, and unpredictable mobility patterns.
RL, particularly DRL-based methods, therefore emerges as a natural fit. Offline-trained DQN or DDQN models can offer strong performance in relatively stable environments where accurate transition models or representative datasets are available. However, online learning is often necessary to adapt to changing traffic loads, user distributions, and environmental conditions. Actor–critic algorithms such as DDPG, TD3, and SAC are especially well-suited to continuous state–action spaces and high-dimensional trajectory control.
A recurring insight is that trajectory optimization is rarely a single-objective problem. In many UAV-assisted IoT scenarios, the objective is to jointly optimize AoI, mission time, coverage, throughput, or reliability while remaining within strict energy and flight-time budgets. Joint AoI–energy formulations, in particular, remain a core challenge, as they require navigating large, correlated state spaces and carefully balancing short-term gains against long-term sustainability.
DRL-based solutions demonstrate strong adaptability, but their performance hinges on accurate state representations, reward design, and training stability. Poorly shaped rewards or incomplete state modeling can lead to unstable policies, suboptimal trajectories, or inefficient exploration. Another lesson is the importance of realistic system modeling: simplified mobility, channel, or energy models may yield elegant formulations but can limit the practical relevance of learned policies when deployed in real NTN-assisted IoT environments.
In resource allocation, the dominant lesson is that jointly optimizing communication, computation, and scheduling in NTN-assisted IoT leads to intrinsically high-dimensional and sometimes complex decision spaces. Many problems are naturally expressed as mixed-integer nonlinear programming (MINLP) formulations, which are difficult to solve optimally at scale, particularly under fast dynamics and partial observability.
Consequently, learning-based methods, especially RL, DRL, and MARL, have become central for handling dynamic and uncertain resource allocation scenarios. DRL agents can learn to adapt bandwidth, power, user association, and offloading decisions to time-varying channel conditions, traffic patterns, and mobility. Hybrid approaches that combine ML-based prediction (e.g., of user distribution or demand) with convex optimization or heuristics often strike a useful balance between tractability and performance.
A key insight is that resource allocation cannot be treated in isolation. In UAV-assisted MEC, for example, trajectory decisions determine link qualities and coverage, which in turn affect feasible offloading strategies, latency, and energy consumption. Fairness considerations also emerge frequently, as naive throughput-maximizing strategies tend to starve users with poor channel conditions or disadvantaged locations. Many top-cited works therefore introduce fairness-aware or QoS-constrained formulations, leading to multi-objective or constrained RL settings.
Another important lesson is that distributed and multi-agent approaches are increasingly necessary. As networks scale and incorporate multiple UAVs, HAPs, and satellite nodes, centralized resource controllers become bottlenecks or single points of failure. MARL and decentralized decision-making frameworks help distribute control, but they also introduce new challenges in convergence, coordination overhead, and stability.
For energy utilization, the literature consistently shows that UAV flight energy dominates the overall consumption profile in NTN-assisted IoT, overshadowing communication energy in many scenarios. A key lesson is that realistic propulsion models, capturing hover power, acceleration, velocity–power trade-offs, and platform-specific dynamics, are indispensable for meaningful optimization. Many highly cited studies demonstrate that ignoring propulsion costs or adopting overly simplified models can lead to policies that look efficient in simulation but may be operationally challenging in real deployments.
DRL-based scheduling and trajectory control offer substantial improvements in energy efficiency, particularly under dynamic traffic and topology conditions. Multi-objective RL formulations allow UAVs and edge nodes to jointly balance energy, latency, AoI, throughput, and reliability constraints through carefully designed reward functions. However, as in trajectory planning, the quality of results depends strongly on environmental modeling accuracy, state representation, and reward shaping.
Another recurring insight is the importance of accurate energy harvesting (EH) and wireless power transfer (WPT) models. Non-linear EH behavior, environmental influences (e.g., solar irradiance, RF density), and hardware limitations can significantly affect achievable performance. Studies that explicitly incorporate these non-linearities tend to reveal more nuanced trade-offs between harvested energy, hovering duration, mission time, and charging schedules.
In multi-tier architectures such as SAGINs, UAVs may act simultaneously as data collectors and wireless energy transmitters. Optimal scheduling must then decide not only when and where UAVs collect data, but also when they recharge devices or relay tasks to MEC servers and satellites. RL-based solutions are particularly promising here, as they can adapt to evolving network conditions and energy states across tiers.
Security is increasingly recognized as a foundational requirement for reliable NTN-assisted IoT operation. A major lesson is that FL is emerging as a key paradigm for privacy-preserving security analytics. FL enables distributed training of intrusion detection and anomaly detection models across UAVs, satellites, and edge devices without exposing raw data, making it well-suited to privacy-sensitive and regulation-constrained environments.
Blockchain technologies complement FL by providing tamper-resistant, decentralized trust mechanisms, particularly for authentication, identity management, and secure logging in Internet of Drones (IoD) and SAGIN scenarios. Hybrid architectures combining blockchain, MEC, and FL have begun to appear as promising solutions for low-latency, high-trust intrusion detection and access control.
A critical lesson concerns the vulnerability of ML models themselves. Deep neural architectures used for intrusion detection, traffic classification, or routing decisions are susceptible to adversarial attacks, data poisoning, and evasion strategies. While defense mechanisms such as adversarial training, defensive distillation, and anomaly-aware regularization can improve robustness, lightweight and computationally efficient defenses tailored to resource-constrained UAV and IoT devices are largely lacking.
Finally, security frameworks must explicitly address mobility-induced vulnerabilities. Frequent topology changes, intermittent links, handovers, and decentralized routing make secure communication more challenging than in static terrestrial networks. Authentication, key management, and intrusion detection must operate reliably despite dynamic connectivity and heterogeneous link qualities, motivating further work on mobility-aware and context-aware security models.
8.5 Intelligence Architectures
From the reviewed works, machine learning–assisted NTN-IoT systems can be categorized based on how intelligence and learning operate across the network. As illustrated in Fig. 13, three fundamental intelligence architectures largely characterize this domain: (i) centralized intelligence, where data and decision-making are concentrated at a central controller (e.g., a ground station); (ii) distributed intelligence, enabled by multi-agent deep reinforcement learning for local coordination among aerial or space nodes; and (iii) decentralized intelligence, where federated learning, optionally combined with blockchain, supports privacy-preserving and trust-aware model training across heterogeneous NTN-IoT entities.

Figure 13: Evolution of intelligence architectures in ML-enabled NTN-IoT systems, from centralized DRL to distributed multi-agent DRL and decentralized federated learning with optional blockchain support.
Across the reviewed NTN-assisted IoT systems, a consistent design principle is to align the optimization methodology with time-varying system dynamics and the degree of coupling among decisions. When task arrivals, wireless channels, or network topology exhibit non-stationary dynamics, static or single-stage optimization methods become inadequate. Adaptive sequential policies based on RL are therefore preferred for real-time offloading and scheduling [81,83,90,92]. A closely related design implication is that combinatorial complexity and nonconvexity should be deliberately shifted away from the execution phase and handled during offline learning or structured decomposition, allowing high-complexity subproblems such as visit-order selection or trajectory planning to be incorporated into the training phase while online operation is reduced to fast policy inference [56,69]. When partial decoupling between decision variables is possible, hybrid designs further improve practicality by separating discrete and continuous components, for example by fixing association or offloading indicators and solving the resulting continuous resource allocation via convex optimization or Lagrangian methods, so that hard constraints are enforced analytically while learning is reserved for the discrete decision space [86]. In latency-critical MEC scenarios, this logic naturally leads to modular architectures that combine clustering or heuristic preprocessing with lightweight neural decision modules, avoiding fully end-to-end DRL formulations whose retraining cost and inference latency may be incompatible with real-time operation [82].
Beyond algorithmic frameworks, system-level considerations related to scalability, and adversarial conditions are fundamental in practical deployments. In multi-UAV or large-scale NTN settings, decentralized or multi-agent control is essential to avoid centralized bottlenecks; however, the resulting non-stationarity during learning increases training complexity, motivating centralized training with decentralized execution as a pragmatic compromise between coordination efficiency and scalability [72,84]. Energy-aware designs further demonstrate that optimizing a single performance metric is insufficient in practice, since several metrics, UAV and environmental parameters, like throughput, harvested energy, propulsion power, and device energy consumption are sometimes tightly coupled, requiring multi-objective formulations or explicit energy-risk modeling to prevent policies that maximize average reward while violating endurance or stability constraints [92,94]. Finally, in security-sensitive NTN–IoT environments, robustness and trust must be incorporated into the control and optimization framework from the outset, as adaptive game-theoretic and learning-based countermeasures against intelligent jamming introduce additional signaling and computational overhead, and blockchain-enabled protection mechanisms expose explicit security–latency trade-offs that must be carefully managed through techniques such as lightweight verification or off-chain processing to preserve deployability [96–99]. A summary of the design guidelines derived from these studies is provided in Fig. 14.

Figure 14: Representative design principles for ML-enabled NTN-assisted IoT systems, derived from influential studies across trajectory, resource allocation, energy, and security optimization aspects.
Across all themes, several overarching lessons emerge:
• ML-driven optimization—especially DRL and MARL, is central to handling the dynamic, high-dimensional, and uncertain nature of NTN-assisted IoT systems.
• Multi-objective formulations dominate the literature, reflecting the interdependence of trajectory, energy, communication, and security constraints rather than isolated single-metric optimization.
• UAV-centric research remains dominant, but broader and more systematic integration of HAPs and satellites is needed to achieve globally optimized and scalable NTN-assisted IoT ecosystems.
• Distributed intelligence (e.g., FL, MARL) is increasingly essential for large-scale, heterogeneous, and privacy-sensitive deployments, where centralized control is infeasible or undesirable.
• Robustness and reliability under adversarial or stochastic environments remain comparatively less explored than performance-centric optimization, highlighting the need for algorithms that jointly optimize efficiency, stability, and security.
These lessons collectively motivate the research opportunities outlined in Section 9.
Although significant progress has been made in machine learning–enabled NTN-assisted IoT, several important gaps and emerging opportunities remain. The lessons in Section 8 highlight clear trends toward multi-objective optimization, distributed learning, security robustness, and cross-layer design. This section outlines future research avenues that build on these insights while incorporating emerging directions in NTN research. Fig. 15 provides a high-level illustration on how some of the discussed aspects converge toward resilient, secure, and sustainable intelligence across integrated space–air–ground NTN-IoT architectures.

Figure 15: A roadmap toward resilient 6G-era NTN-IoT connectivity, detailing the transition from performance-centric optimization to sustainable, secure, and fully integrated space-air-ground architectures.
Trajectory planning remains a challenging problem due to environmental uncertainty, user mobility, and limited energy budgets. Several promising avenues for future research include:
• Distributed and federated learning for trajectory optimization. While many existing studies rely on centralized training, privacy-preserving learning strategies are expected to play a crucial role in multi-UAV-assisted IoT systems, particularly for trajectory optimization. Recent work on decentralized FL for NTN-assisted IoT [11,141] indicates that this direction has significant potential for future research.
• Joint trajectory design across SAGIN layers. In integrated space–air–ground networks, such as those involving LEO satellites in marine environments, connectivity challenges arise due to Earth’s curvature, which causes coverage gaps and hinders continuous communication for marine IoT devices [11]. Future research could investigate coordinated trajectory optimization involving satellites, high-altitude platforms, and UAVs, especially considering the interactions among different network tiers. For example, satellites can provide global situational awareness of terrestrial and maritime networks, which may assist UAV trajectory planning in both marine and terrestrial-based applications. This cross-tier interaction is conceptually illustrated in Fig. 15, where satellite-level situational awareness informs aerial mobility decisions.
• Obstacle- and environment-aware planning. Many trajectory models neglect realistic factors such as 3D urban blockages, wind dynamics, or air-traffic constraints. Incorporating physics-aware models and real-time environmental sensing into RL-based frameworks is necessary for safe deployment.
• Multi-UAV cooperation at scale. Existing MARL approaches often struggle with convergence, reward design, or communication overhead as the fleet size increases. Scalable MARL architectures, emergent-behavior learning, and attention-based multi-agent policies remain open problems.
• Cross-layer trajectory optimization. Future system designs will increasingly require joint optimization across multiple layers, given the broad range of technologies and performance considerations involved. These include mmWave and THz communications along with their beamforming strategies, integrated sensing and communication, reconfigurable intelligent surfaces, wireless power transfer, computation offloading, and physical-layer security. Moreover, key network objectives, such as AoI, energy consumption, user association, spectrum and bandwidth allocation, collision avoidance, and secrecy rate, must also be jointly incorporated into trajectory optimization frameworks.
Resource allocation challenges will intensify as NTNs support a growing number of devices, more heterogeneous mission profiles, and more stringent latency constraints.
• Context-aware allocation via predictive learning. Integrating mobility prediction, traffic forecasting, and user-behavior modeling into resource allocation policies remains an open opportunity. Algorithms such as LSTM and their variants can be leveraged for predictions useful for resource allocation applications.
• Cross-domain resource optimization in SAGIN. Resource allocation must extend beyond UAV-only systems to incorporate satellites, HAPs, and terrestrial edge servers. Multi-tier resource sharing and cross-layer optimization represent a major challenge for 6G NTNs.
• Robust RL for adversarial environments. Current DRL-based allocation methods are vulnerable to jamming, spoofing, and data poisoning. Robust RL, risk-sensitive optimization, and secure model updates are essential for practical deployment.
• Ultra-low-latency and reliability constraints. URLLC applications require resource allocation strategies capable of meeting strict delay, jitter, and reliability targets while managing UAV mobility and device heterogeneity.
• Fairness-aware and priority-aware allocation. As devices become more heterogeneous and mission priorities diverge, fairness-aware or priority-aware allocation policies are required to prevent resource starvation in dense IoT networks.
Sustainable NTN-assisted IoT operation requires more realistic modeling, renewable energy integration, and robust scheduling.
• Sustainable NTN operation via energy harvesting EH and wireless power transfer WPT. Solar-powered UAVs, satellite-assisted WPT, and multi-source EH systems will enable longer missions. Realistic non-linear EH models must be integrated into DRL-based policies to ensure reliable performance.
• Joint energy–communication–computation optimization. Future frameworks should leverage the collective experience of multiple agents to minimize energy wastage, particularly in networks that incorporate energy transfer mechanisms [142]. In addition, future work should examine energy optimization in relation to trajectory planning, computation offloading, and energy budgeting across multi-UAV networks, especially in the context of emerging technologies such as VTOL platforms and the Internet of Vehicles. Promising research directions include learning-based decision-making methods such as hierarchical RL and multi-armed bandit formulations; energy-supply and harvesting techniques such as cooperative charging, laser-charged UAVs, and SWIPT; and advanced communication and resource-management schemes, including bandwidth allocation, covert communication, and NOMA.
• Reliability- and risk-aware energy management. Energy-depleted nodes or UAVs can introduce catastrophic failure risks. Risk-aware or safe RL approaches should be explored to maintain mission feasibility under uncertain energy states.
• Cooperative and distributed energy routing. Multi-UAV systems will benefit from energy sharing, coordinated charging, and distributed energy-routing protocols between UAVs and mobile charging nodes.
• Energy-aware MEC and satellite-computing integration. As computation is increasingly offloaded to edge UAVs or LEO satellites, energy-aware task partitioning becomes a key design consideration in NTN-assisted IoT systems.
Security will remain a central challenge as NTNs become more pervasive, globally scalable, and data-intensive.
• Federated, decentralized, and privacy-preserving security analytics. Distributed malicious-traffic detection using FL [143], as well as learning mechanisms with strong data-modeling and feature-extraction capabilities, such as hypergraph neural networks, can be leveraged to detect malicious activity in NTN-assisted IoT systems. These approaches are particularly promising for complex environments involving multi-UAV cooperation and UAV swarms.
• Adversarial robustness for ML models. Many deep-learning-based IDS and authentication models are vulnerable to evasion and poisoning attacks. Lightweight adversarial defenses suitable for low-power devices remain relatively unexplored.
• Secure cooperative autonomy. In multi-UAV or SAGIN architectures, compromised nodes can mislead cooperative learning processes. Trust-aware MARL and secure consensus mechanisms are therefore essential to ensure reliable coordination and constitute important directions for further investigation.
• Blockchain-enhanced trust and compliance. Blockchain can strengthen accountability, identity management, and secure logging across mobile IoT and IoD-enabled NTN systems, but must be adapted stringent network latency constraints. constraints.
• Integrated security and optimization. Future NTNs must co-optimize communication, mobility, and security. Secure trajectory planning, secure offloading, and secure resource allocation are fundamentally cross-layer problems. Security challenges increasingly span multiple NTN tiers in integrated space–air–ground architectures (Fig. 15) and must be addressed jointly with mobility and resource optimization.
Overall, the future of ML-driven NTN-assisted IoT research will depend on integrating intelligence across space–air–ground layers while maintaining efficiency, robustness, and trust. Progress will require new learning paradigms, scalable multi-agent coordination, resilience under adversarial conditions, and energy-sustainable network design. These directions offer a roadmap toward resilient, secure, and adaptive NTN-enabled IoT ecosystems as we move toward 6G and beyond.
NTNs have emerged as a critical pillar of next-generation connectivity, enabling large-scale, resilient, and intelligent IoT deployments across diverse environments. As UAVs, HAPs, and satellite systems become increasingly integrated with terrestrial infrastructures, ML plays a central role in addressing the complex optimization challenges inherent to these multi-tier ecosystems.
This paper presented a comprehensive, optimization-centric review of ML-driven NTN-assisted IoT, focusing on four key themes that consistently shape the optimization problem space: trajectory planning, resource allocation, energy utilization, and security. Using a structured methodology grounded in bibliometric analysis, thematic clustering, and targeted review of top-cited contributions, we synthesized existing knowledge and clarified how ML advances autonomous decision-making across dynamic and heterogeneous NTN-assisted IoT environments.
The research landscape analysis and taxonomy revealed strong methodological trends, particularly the dominance of deep reinforcement learning, the growing importance of multi-agent intelligence, and the emergence of distributed learning frameworks such as federated learning. The top-cited review further highlighted the centrality of multi-objective optimization, the interdependence of mobility, computation, and communication constraints, and the increasing relevance of security considerations as NTNs become more deeply embedded in critical IoT infrastructures.
Our lessons and future directions emphasize the need for more realistic modeling, broader integration of satellite and HAP components, robustness against adversarial conditions, and energy-aware autonomy. As NTNs evolve toward 6G and beyond, unified optimization frameworks that span trajectory design, resource management, energy sustainability, and security will be essential for achieving global-scale, trustworthy, and resilient connectivity.
Overall, this survey provides a structured roadmap for researchers and practitioners working on intelligent NTN-assisted IoT systems. By consolidating existing knowledge and identifying promising avenues for further exploration, we aim to support the development of the next generation of adaptive, secure, and energy-efficient non-terrestrial networks.
Acknowledgement: Oluwatosin Ahmed Amodu and Zurina Mohd Hanapi acknowledge the support of the Ministry of Higher Education Malaysia through the Fundamental Research Grant Scheme under Grant FRGS/1/2023/ICT11/UPM/02/2/5540649. During the preparation of this manuscript, the authors utilized ChatGPT (OpenAI, version GPT-5.2) to assist in the synthesis of technical literature notes related to computational complexity and the summarization of design guidelines. In addition, Gemini (Google, version 1.5 Pro) was employed as a supportive tool to assist in organizing technical keywords into a preliminary hierarchical taxonomy structure and to assist in grouping the 47 sources highlighted in Section 7. All AI-assisted outputs were carefully reviewed, verified against the original sources, and revised by the authors, who accept full responsibility for the accuracy, originality, and integrity of the final manuscript.
Funding Statement: Oluwatosin Ahmed Amodu and Zurina Mohd Hanapi acknowledge the support of the Ministry of Higher Education Malaysia through the Fundamental Research Grant Scheme under Grant FRGS/1/2023/ICT11/UPM/02/2/5540649.
Author Contributions: The authors confirm their contribution to the paper as follows: Study Conception and Design: Oluwatosin Ahmed Amodu, Zurina Mohd Hanapi; Data Collection: Oluwatosin Ahmed Amodu, Faten A. Saif, Huda Althumali, Chedia Jarray, Mohammed Sani Adam; Analysis and Interpretation of Results: Oluwatosin Ahmed Amodu; Draft Manuscript Preparation: Oluwatosin Ahmed Amodu, Faten A. Saif, Huda Althumali, Chedia Jarray, Mohammed Sani Adam; Review and Editing: Oluwatosin Ahmed Amodu, Raja Azlina Raja Mahmood, Nor Fadzilah Abdullah; Illustrations: Oluwatosin Ahmed Amodu, Mohammed Sani Adam; Supervision: Zurina Mohd Hanpi; Funding: Zurina Mohd Hanapi. All authors reviewed and approved the final version of the manuscript.
Availability of Data and Materials: The authors confirm that the data supporting the findings of this study are available within the article.
Ethics Approval: Not applicable.
Conflicts of Interest: The authors declare no conflicts of interest.
1The exact Scopus query used for the recent-trends snapshot was: KEY(uav OR HAP OR satellite AND IoT AND trajectory OR security OR “resource allocation” OR “energy consumption”) AND PUBYEAR > 2024 AND PUBYEAR < 2027 AND TITLE(UAV OR drone OR HAP OR satellite AND IoT OR “Internet of Things”) AND (LIMIT-TO(DOCTYPE,“ar”)) AND (EXCLUDE(EXACTKEYWORD,“Steganography”) OR EXCLUDE(EXACTKEYWORD,“%moisture”) OR EXCLUDE(EXACTKEYWORD,“% Reductions”)).
References
1. Tong Z, Wang J, Hou X, Chen J, Jiao Z, Liu J. Blockchain-based trustworthy and efficient hierarchical federated learning for UAV-enabled IoT networks. IEEE Internet Things J. 2024;11(21):34270–82. doi:10.1109/JIOT.2024.3370964. [Google Scholar] [CrossRef]
2. Pei X, Zhang Z, Zhang Y. Cost-efficient hierarchical federated edge learning for satellite-terrestrial internet of things. Mob Netw Appl. 2024;29(3):922–34. doi:10.1007/s11036-024-02352-6. [Google Scholar] [CrossRef]
3. Shi W, Li H, Yang Y, Zuo Y, Chen Y. Energy efficient task offloading and resource allocation for NOMA-enabled IoT in HAP-assisted MEC. Int J Web Inf Syst. 2025;21(4):494–518. doi:10.1108/IJWIS-02-2025-0050. [Google Scholar] [CrossRef]
4. Ahmad SZ, Qamar F, Alshehri H, Jeribi F, Tahir A, Siddiqui ST, et al. A GAN-based approach for enhancing security in satellite based IoT networks using MPI enabled HPC. PLoS One. 2025;20(9):e0331019. doi:10.1371/journal.pone.0331019. [Google Scholar] [PubMed] [CrossRef]
5. Zhao D, Ding R, Song B. Satellite-assisted 6G wide-area edge intelligence: dynamics-aware task offloading and resource allocation for remote IoT services. Sci China Inf Sci. 2025;68(2):122303. doi:10.1007/s11432-024-4258-x. [Google Scholar] [CrossRef]
6. Wang Q, Zhang H, Liang X. Control-aware energy-efficient transmission for satellite internet of things systems. IEEE Internet Things J. 2025;12(12):21577–92. doi:10.1109/JIOT.2025.3547921. [Google Scholar] [CrossRef]
7. Kim D, Jung H, Lee I-H, Niyato DT. Multibeam management and resource allocation for LEO satellite-assisted IoT networks. IEEE Internet Things J. 2025;12(12):19443–58. doi:10.1109/JIOT.2025.3542238. [Google Scholar] [CrossRef]
8. Li F, Wang J, Dong Y, Wang W, Ma X. Flexible beam scheduling and resource allocation strategies for satellite internet of things. Chin J Internet Things. 2025;9(1):27–40. doi:10.11959/j.issn.2096-3750.2025.00465. [Google Scholar] [CrossRef]
9. Lukito WD, Xiang W, Lai P, Cheng P, Liu C, Yu K, et al. Integrated star-RIS and UAV for satellite IoT communications: an energy-efficient approach. IEEE Internet Things J. 2025;12(9):11356–71. doi:10.1109/JIOT.2024.3472019. [Google Scholar] [CrossRef]
10. He Y, Wu J, Zhu L, Huang F, Wang B, Yang D, et al. A review of physical layer security in aerial-terrestrial integrated internet of things: emerging techniques, potential applications, and future trends. Drones. 2025;9(4):312. doi:10.3390/drones9040312. [Google Scholar] [CrossRef]
11. Tesfaw BA, Juang RT, Tarekegn GB, Kabore WN, Tsai M. Joint UAV 3-D trajectory and resource allocation for integrated LEO satellite and multi-UAV-enabled marine IoT networks: a federated multiagent deep reinforcement learning approach. IEEE Internet Things J. 2025;12(21):45076–93. doi:10.1109/JIOT.2025.3598332. [Google Scholar] [CrossRef]
12. Wang Q, Xia X, Chen T, Chen S, Wang Y, Li Z, et al. Energy-efficient resource allocation in LEO-assisted UAV architecture for internet of things. IEEE Internet Things J. 2025;12(8):9614–26. doi:10.1109/JIOT.2025.3542618. [Google Scholar] [CrossRef]
13. Wang T. Energy-efficient resource allocation for UAV-aided full-duplex OFDMA wireless powered IoT communication networks. J King Saud Univ—Comput Inf Sci. 2024;36(9):102225. doi:10.1016/j.jksuci.2024.102225. [Google Scholar] [CrossRef]
14. Fan H, Sun C, Long J, Li L, Huo Y, Wang S. Graph-driven resource allocation strategies in satellite IoT: a cooperative game-theoretic approach. IEEE Internet Things J. 2025;12(4):3463–81. doi:10.1109/JIOT.2024.3407123. [Google Scholar] [CrossRef]
15. Shi Y, Luo Q, Zhang S, Wang J, Liu J. Jamming scheduling and resource allocation for secure communication in massive LEO satellite-empowered IoT network. IEEE Internet Things J. 2026;13(5):9849–60. doi:10.1109/JIOT.2025.3646153. [Google Scholar] [CrossRef]
16. Xu H, Chen X, Huang X, Min G, Chen Y. Uncertainty-aware scheduling for effective data collection from environmental IoT devices through LEO satellites. Future Gener Comput Syst. 2025;166(5):107656. doi:10.1016/j.future.2024.107656. [Google Scholar] [CrossRef]
17. Qian L, Fan X, Li M, Wu Y. Energy-efficient data gathering and computing in LEO satellite-assisted marine Iot networks. IEEE Trans Cogn Commun Netw. 2026;12(4):1933–47. doi:10.1109/TCCN.2025.3602859. [Google Scholar] [CrossRef]
18. Bukar UA, Sayeed MS, Razak SFA, Yogarayan S, Amodu OA. An exploratory bibliometric analysis of the literature on the age of information-aware unmanned aerial vehicles aided communication. Informatica. 2023;47(7):91–114. doi:10.31449/inf.v47i7.4783. [Google Scholar] [CrossRef]
19. Amodu OA, Bukar UA, Mahmood RAR, Jarray C, Othman M. Age of information minimization in UAV-aided data collection for WSN and IoT applications: a systematic review. J Netw Comput Appl. 2023;216(2):103652. doi:10.1016/j.jnca.2023.103652. [Google Scholar] [CrossRef]
20. Amodu OA, Jarray C, Mahmood RAR, Althumali H, Bukar UA, Nordin R, et al. Deep reinforcement learning for AoI minimization in UAV-aided data collection for WSN and IoT: a survey. IEEE Access. 2024;12:108000–40. doi:10.1109/access.2024.3425497. [Google Scholar] [CrossRef]
21. Amodu OA, Althumali H, Hanapi ZM, Jarray C, Mahmood RAR, Adam MS, et al. A comprehensive survey of deep reinforcement learning in UAV-assisted IoT data collection. Veh Commun. 2025;55(2):100949. doi:10.1016/j.vehcom.2025.100949. [Google Scholar] [CrossRef]
22. Liu J, Xiang J, Jin Y, Liu R, Yan J, Wang L. Boost precision agriculture with unmanned aerial vehicle remote sensing and edge intelligence: a survey. Remote Sens. 2021;13(21):4387. doi:10.3390/rs13214387. [Google Scholar] [CrossRef]
23. Ouhami M, Hafiane A, Es-Saady Y, Hajji MEl, Canals R. Computer vision, IoT and data fusion for crop disease detection using machine learning: a survey and ongoing research. Remote Sens. 2021;13(13):2486. doi:10.3390/rs13132486. [Google Scholar] [CrossRef]
24. Chamola V, Hassija V, Gupta S, Goyal A, Guizani M, Sikdar B. Disaster and pandemic management using machine learning: a survey. IEEE Internet Things J. 2021;8(21):16047–71. doi:10.1109/JIOT.2020.3044966. [Google Scholar] [PubMed] [CrossRef]
25. Gohari A, Ahmad AB, Rahim RBA, Supa’at A, Razak SA, Gismalla MSM. Involvement of surveillance drones in smart cities: a systematic review. IEEE Access. 2022;10:56611–28. doi:10.1109/ACCESS.2022.3177904. [Google Scholar] [CrossRef]
26. Ullah Z, Al-Turjman F, Mostarda L, Gagliardi R. Applications of artificial intelligence and machine learning in smart cities. Comput Commun. 2020;154(2):313–23. doi:10.1016/j.comcom.2020.02.069. [Google Scholar] [CrossRef]
27. Qadir Z, Ullah F, Munawar HS, Al-Turjman F. Addressing disasters in smart cities through UAVs path planning and 5G communications: a systematic review. Comput Commun. 2021;168:114–35. doi:10.1016/j.comcom.2021.01.003. [Google Scholar] [CrossRef]
28. Amodu OA, Mahmood RAR, Althumali H, Jarray C, Adnan MH, Bukar UA, et al. A question-centric review on drl-based optimization for UAV-assisted mec sensor and iot applications, challenges, and future directions. Veh Commun. 2025;53(6):100899. doi:10.1016/j.vehcom.2025.100899. [Google Scholar] [CrossRef]
29. Yazid Y, Ez-Zazi I, Guerrero-González A, Oualkadi AEl, Arioua M. UAV-enabled mobile edge-computing for IoT based on AI: a comprehensive review. Drones. 2021;5(4):148. doi:10.3390/drones5040148. [Google Scholar] [CrossRef]
30. Huda SA, Moh S. Survey on computation offloading in UAV-enabled mobile edge computing. J Netw Comput Appl. 2022;201:103341. doi:10.1016/j.jnca.2022.103341. [Google Scholar] [CrossRef]
31. Amodu OA, Nordin R, Jarray C, Bukar UA, Raja Mahmood RA, Othman M. A survey on the design aspects and opportunities in ag4e-aware UAV-aided data collection for sensor networks and internet of things applications. Drones. 2023;7(4):260. [Google Scholar]
32. Ullah Z, Al-Turjman F, Moatasim U, Mostarda L, Gagliardi R. UAVs joint optimization problems and machine learning to improve the 5G and beyond communication. Comput Netw. 2020;182(14):107478. doi:10.1016/j.comnet.2020.107478. [Google Scholar] [CrossRef]
33. Pogaku AC, Do D-T, Lee BM, Nguyen ND. UAV-assisted RIS for future wireless communications: a survey on optimization and performance analysis. IEEE Access. 2022;10(4):16320–36. doi:10.1109/ACCESS.2022.3149054. [Google Scholar] [CrossRef]
34. Cheng N, Wu S, Wang X, Yin Z, Li C, Chen W, et al. AI for UAV-assisted IoT applications: a comprehensive review. IEEE Internet Things J. 2023;10(16):14438–61. doi:10.1109/jiot.2023.3268316. [Google Scholar] [CrossRef]
35. Saeedi IDI, Al-Qurabat AKM. A comprehensive review of computation offloading in UAV-assisted mobile edge computing for IoT applications. Phys Commun. 2025;72(21):102810. doi:10.1016/j.phycom.2025.102810. [Google Scholar] [CrossRef]
36. Satouf A, Hamidoğlu A, Gul OM, Kuusik A, Kadry SN, Elghirani A. A survey on task scheduling and optimization techniques for IoT-enabled UAV with edge/fog computing. Telecommun Syst. 2025;88(3):89. doi:10.1007/s11235-025-01320-z. [Google Scholar] [CrossRef]
37. Adnan MH, Zukarnain ZA, Amodu OA. Fundamental design aspects of UAV-enabled MEC systems: a review on models, challenges, and future opportunities. Comput Sci Rev. 2024;51:100615. [Google Scholar]
38. Sharma J, Mehra PS. Secure communication in IoT-based UAV networks: a systematic survey. Internet Things. 2023;23(10):100883. doi:10.1016/j.iot.2023.100883. [Google Scholar] [CrossRef]
39. Michailidis ET, Maliatsos K, Skoutas DN, Vouyioukas D, Skianis C. Secure UAV-aided mobile edge computing for IoT: a review. IEEE Access. 2022;10(4):86353–83. doi:10.1109/access.2022.3199408. [Google Scholar] [CrossRef]
40. Abualigah L, Diabat A, Sumari P, Gandomi AH. Applications, deployments, and integration of internet of drones (IoDa review. IEEE Sens J. 2021;21(22):25532–46. doi:10.1109/JSEN.2021.3114266. [Google Scholar] [CrossRef]
41. Singh D. A review on deep learning models. Smart Innov Syst Technol. 2022;273:223–9. doi:10.1007/978-3-030-92905-3_29. [Google Scholar] [CrossRef]
42. Luong NC, Hoang DT, Gong S, Niyato D, Wang P, Liang Y-C, et al. Applications of deep reinforcement learning in communications and networking: a survey. IEEE Commun Surv Tut. 2019;21(4):3133–74. doi:10.1109/COMST.2019.2916583. [Google Scholar] [CrossRef]
43. Frikha MS, Gammar SM, Lahmadi A, Andrey L. Reinforcement and deep reinforcement learning for wireless internet of things: a survey. Comput Commun. 2021;178(7553):98–113. doi:10.1016/j.comcom.2021.07.014. [Google Scholar] [CrossRef]
44. Li T, Zhu K, Luong NC, Niyato D, Wu Q, Zhan11g Y, et al. Applications of multi-agent reinforcement learning in future internet: a comprehensive survey. IEEE Commun Surv Tut. 2022;24(2):1240–79. doi:10.1109/COMST.2022.3160697. [Google Scholar] [CrossRef]
45. Nguyen T-H, Park L. A survey on deep reinforcement learning-driven task offloading in aerial access networks. In: 13th International Conference on Information and Communication Technology Convergence (ICTC); 2022 Oct 19–21; Jeju Island, Republic of Korea. p. 822–7. doi:10.1109/ICTC55196.2022.9952687. [Google Scholar] [CrossRef]
46. Amodu OA, Hanapi ZM, Mahmood RAR, Jarray C, Saif FA, Althumali H, et al. A systematic mapping and review on machine learning for non-terrestrial networks assisted internet of things: enabling technologies. ICT Express. 2026;12(2):422–443. doi:10.1016/j.icte.2026.01.002. [Google Scholar] [CrossRef]
47. Van Eck N, Waltman L. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics. 2010;84(2):523–38. doi:10.1007/s11192-009-0146-3. [Google Scholar] [PubMed] [CrossRef]
48. Mozaffari M, Kasgari ATZ, Saad W, Bennis M, Debbah M. Beyond 5G with UAVs: foundations of a 3D wireless cellular network. IEEE Trans Wirel Commun. 2019;18(1):357–72. doi:10.1109/TWC.2018.2879940. [Google Scholar] [CrossRef]
49. Hu S, Yuan X, Ni W, Wang X, Jamalipour A. RIS-assisted jamming rejection and path planning for UAV-borne IoT platform: a new deep reinforcement learning framework. IEEE Internet Things J. 2023;10(22):20162–73. doi:10.1109/jiot.2023.3283502. [Google Scholar] [CrossRef]
50. Yi M, Wang X, Liu J, Zhang Y, Hou R. Multi-task transfer deep reinforcement learning for timely data collection in rechargeable-UAV-aided IoT networks. IEEE Internet Things J. 2023;10(23):20545–59. [Google Scholar]
51. Wei X, Zhang G, Han Z. Satellite-controlled UAV-assisted IoT information collection with deep reinforcement learning and device matching. In: 7th International Conference on Intelligent Computing and Signal Processing (ICSP); 2022 Apr 15–17; Xi’an, China. p. 1254–9. [Google Scholar]
52. Xu S, Zhang X, Li C, Wang D, Yang L. Deep reinforcement learning approach for joint trajectory design in multi-UAV iot networks. IEEE Trans Veh Technol. 2022;71(3):3389–94. doi:10.1109/tvt.2022.3144277. [Google Scholar] [CrossRef]
53. Yi M, Wang X, Liu J, Zhang Y, Hou R. Deep reinforcement learning for energy-efficient fresh data collection in rechargeable UAV-assisted IoT networks. In: IEEE Wireless Communications and Networking Conference (WCNC); 2023 Mar 26–29; Glasgow, UK. p. 1–6. [Google Scholar]
54. Nguyen KK, Masaracchia A, Sharma V, Poor HV, Duong TQ. Ris-assisted UAV communications for IoT with wireless power transfer using deep reinforcement learning. IEEE J Selected Topics Signal Process. 2022;16(5):1086–96. doi:10.1109/jstsp.2022.3172587. [Google Scholar] [CrossRef]
55. Zhang S, Liu W, Ansari N. Completion time minimization for data collection in a UAV-enabled IoT network: a deep reinforcement learning approach. IEEE Trans Veh Technol. 2023;72(11):14734–42. doi:10.1109/tvt.2023.3280848. [Google Scholar] [CrossRef]
56. Wang Y, Gao Z, Zhang J, Cao X, Zheng D, Gao Y, et al. Trajectory design for UAV-based internet of things data collection: a deep reinforcement learning approach. IEEE Internet Things J. 2022;9(5):3899–912. doi:10.1109/jiot.2021.3102185. [Google Scholar] [CrossRef]
57. Sun M, Xu X, Qin X, Zhang P. Aoi-energy-aware UAV-assisted data collection for IoT networks: a deep reinforcement learning method. IEEE Internet Things J. 2021;8(24):17275–89. doi:10.1109/jiot.2021.3078701. [Google Scholar] [CrossRef]
58. Zhang J, Yu Y, Wang Z, Ao S, Tang J, Zhang X, et al. Trajectory planning of UAV in wireless powered IoT system based on deep reinforcement learning. In: 2020 IEEE/CIC International Conference on Communications in China (ICCC); 2020 Aug 9–11; Chongqing, China. p. 645–50. [Google Scholar]
59. Tong P, Liu J, Wang X, Bai B, Dai H. Deep reinforcement learning for efficient data collection in UAV-aided internet of things. In: 2020 IEEE International Conference on Communications Workshops (ICC Workshops); 2020 Jun 7–11; Dublin, Ireland. p. 1–6. [Google Scholar]
60. Khodaparast SS, Lu X, Wang P, Nguyen UT. Deep reinforcement learning based energy efficient multi-UAV data collection for IoT networks. IEEE Open J Veh Technol. 2021;2:249–60. doi:10.1109/ojvt.2021.3085421. [Google Scholar] [CrossRef]
61. Hu Y, Liu Y, Kaushik A, Masouros C, Thompson JS. Timely data collection for UAV-based IoT networks: a deep reinforcement learning approach. IEEE Sens J. 2023;23(11):12295–308. doi:10.1109/jsen.2023.3265935. [Google Scholar] [CrossRef]
62. Chen G, Zhai XB, Li C. Interference-aware trajectory design for fair data collection in UAV-assisted IoT networks by deep reinforcement learning. In: 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th International Conference on Data Science & Systems; 19th International Conference on Smart City; 7th International Conference on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys); 2021 Dec 20–22; Haikou, China. p. 345–52. doi:10.1109/hpcc-dss-smartcity-dependsys53884.2021.00070. [Google Scholar] [CrossRef]
63. Esrafilian O, Bayerlein H, Gesbert D. Model-aided deep reinforcement learning for sample-efficient UAV trajectory design in IoT networks. In: 2021 IEEE Global Communications Conference (GLOBECOM); 2021 Dec 7–11; Madrid, Spain. p. 1–6. [Google Scholar]
64. Yi M, Wang X, Liu J, Zhang Y, Bai B. Deep reinforcement learning for fresh data collection in UAV-assisted IoT networks. In: IEEE INFOCOM 2020—IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS); 2020 Jul 6–9; Toronto, ON, Canada. p. 716–21. [Google Scholar]
65. Guo Z, Chen H, Li S. Deep reinforcement learning-based UAV path planning for energy-efficient multitier cooperative computing in wireless sensor networks. J Sens. 2023;2023:2804943. doi:10.1155/2023/2804943. [Google Scholar] [CrossRef]
66. Liu R, Qu Z, Huang G, Dong M, Wang T, Zhang S, et al. DRL-UTPS: DRL-based trajectory planning for unmanned aerial vehicles for data collection in dynamic IoT network. IEEE Trans Intell Veh. 2023;8(2):1204–18. [Google Scholar]
67. Yang J, Yang Y, Xu H, Hu J, Song T. Unmanned aerial vehicle trajectory design in wireless sensor networks: a deep reinforcement learning method. In: Second International Conference on Electronic Information Technology (EIT 2023); 2023 Mar 31–Apr 2; Wuhan, China; 2023. [Google Scholar]
68. Luo X, Chen C, Zeng C, Li C, Xu J, Gong S. Deep reinforcement learning for joint trajectory planning, transmission scheduling, and access control in UAV-assisted wireless sensor networks. Sensors. 2023;23(10):4691. doi:10.3390/s23104691. [Google Scholar] [PubMed] [CrossRef]
69. Zhu B, Bedeer E, Nguyen HH, Barton R, Henry J. UAV trajectory planning in wireless sensor networks for energy consumption minimization by deep reinforcement learning. IEEE Trans Veh Technol. 2021;70(9):9540–54. doi:10.1109/tvt.2021.3102161. [Google Scholar] [CrossRef]
70. Liang Z, Dai Y, Lyu L, Lin B. Adaptive data collection and offloading in multi-UAV-assisted maritime IoT systems: a deep reinforcement learning approach. Remote Sens. 2023;15(2):292. [Google Scholar]
71. Dai Y, Liang Z, Lyu L, Lin B. Deep reinforcement learning-based UAV data collection and offloading in NOMA-enabled marine IoT systems. Wirel Commun Mob Comput. 2022. doi:10.1155/2022/8805416. [Google Scholar] [CrossRef]
72. Bayerlein H, Theile M, Caccamo M, Gesbert D. Multi-UAV path planning for wireless data harvesting with deep reinforcement learning. IEEE Open J Commun Soc. 2021;2:1171–87. doi:10.1109/OJCOMS.2021.3081996. [Google Scholar] [CrossRef]
73. Gao C, Bian X, Hu B, Chen S, Wang H. Intelligent online offloading and resource allocation for HAP drones and satellite collaborative networks. Drones. 2024;8(6):2024–245. doi:10.3390/drones8060245. [Google Scholar] [CrossRef]
74. YR SK, N CH. An efficient localization-based secure resource allocation using E-FSO with SS-DDNN-based CM-LSGEO techniques. Multimed Tools Appl. 2024;83(34):80543–64. doi:10.1007/s11042-024-18322-9. [Google Scholar] [CrossRef]
75. Hu B, Zhang W, Al-Rubaye S, Zhang H, Wang X, Huang S. Digital twin-empowered offloading optimisation and resource allocation for UAV-assisted IoT network systems. In: 2024 IEEE 100th Vehicular Technology Conference (VTC2024-Fall). Piscataway, NJ, USA: IEEE; 2024. p. 1–6. [Google Scholar]
76. Gan T, Dong S, Wang S, Li J. Distributed resource allocation in dispersed computing environment based on UAV track inspection in urban rail transit. Comput Mater Contin. 2024;80(1):643–60. doi:10.32604/cmc.2024.051408. [Google Scholar] [CrossRef]
77. Liu RW, Nie J, Garg S, Xiong Z, Zhang Y, Hossain MS. Data-driven trajectory quality improvement for promoting intelligent vessel traffic services in 6G-enabled maritime IoT systems. IEEE Internet Things J. 2021;8(7):5374–85. doi:10.1109/JIOT.2020.3028743. [Google Scholar] [CrossRef]
78. Liu RW, Liang M, Nie J, Lim WYB, Zhang Y, Guizani M. Deep learning-powered vessel trajectory prediction for improving smart traffic services in maritime internet of things. IEEE Trans Netw Sci Eng. 2022;9(5):3080–94. doi:10.1109/TNSE.2022.3140529. [Google Scholar] [CrossRef]
79. Liu RW, Liang M, Nie J, Yuan Y, Xiong Z, Yu H, et al. STMGCN: mobile edge computing-empowered vessel trajectory prediction using spatio-temporal multigraph convolutional network. IEEE Trans Ind Inform. 2022;18(11):7977–87. doi:10.1109/TII.2022.3165886. [Google Scholar] [CrossRef]
80. Abedin SF, Munir MS, Tran NH, Han Z, Hong CS. Data freshness and energy-efficient UAV navigation optimization: a deep reinforcement learning approach. IEEE Trans Intell Transp Syst. 2021;22(9):5994–6006. doi:10.1109/tits.2020.3039617. [Google Scholar] [CrossRef]
81. Cheng N, Lyu F, Quan W, Zhou C, He H, Shi W, et al. Space/aerial-assisted computing offloading for iot applications: a learning-based approach. IEEE J Sel Areas Commun. 2019;37(5):1117–29. doi:10.1109/JSAC.2019.2906789. [Google Scholar] [CrossRef]
82. Jiang F, Wang K, Dong L, Pan C, Xu W, Yang K. Deep-learning-based joint resource scheduling algorithms for hybrid MEC networks. IEEE Internet Things J. 2019;7(7):6252–65. doi:10.1109/jiot.2019.2954503. [Google Scholar] [CrossRef]
83. Seid AM, Boateng GO, Anokye S, Kwantwi T, Sun G, Liu G. Collaborative computation offloading and resource allocation in multi-UAV-assisted iot networks: a deep reinforcement learning approach. IEEE Internet Things J. 2021;8(15):12203–18. doi:10.1109/jiot.2021.3063188. [Google Scholar] [CrossRef]
84. Seid AM, Boateng GO, Mareri B, Sun G, Jiang W. Multi-agent DRL for task offloading and resource allocation in multi-UAV enabled IoT edge network. IEEE Trans Netw Serv Manag. 2021;18(4):4531–47. doi:10.1109/tnsm.2021.3096673. [Google Scholar] [CrossRef]
85. Liu Y, Xie S, Zhang Y. Cooperative offloading and resource management for UAV-enabled mobile edge computing in power IoT system. IEEE Trans Veh Technol. 2020;69(10):12229–39. doi:10.1109/tvt.2020.3016840. [Google Scholar] [CrossRef]
86. Cui G, Li X, Xu L, Wang W. Latency and energy optimization for MEC enhanced SAT-IoT networks. IEEE Access. 2020;8:55915–26. doi:10.1109/access.2020.2982356. [Google Scholar] [CrossRef]
87. Asheralieva A, Niyato D. Distributed dynamic resource management and pricing in the IoT systems with blockchain-as-a-service and UAV-enabled mobile edge computing. IEEE Internet Things J. 2020;7(3):1974–93. doi:10.1109/JIOT.2019.2961958. [Google Scholar] [CrossRef]
88. Wan S, Lu J, Fan P, Letaief KB. Toward big data processing in IoT: path planning and resource management of UAV base stations in mobile-edge computing system. IEEE Internet Things J. 2019;7(7):5995–6009. [Google Scholar]
89. Nguyen KK, Duong TQ, Do-Duy T, Claussen H, Hanzo L. 3D UAV trajectory and data collection optimisation via deep reinforcement learning. IEEE Trans Commun. 2022;70(4):2358–71. doi:10.1109/tcomm.2022.3148364. [Google Scholar] [CrossRef]
90. Ding Y, Feng Y, Lu W, Zheng S, Zhao N, Meng L, et al. Online edge learning offloading and resource management for UAV-assisted MEC secure communications. IEEE J Sel Top Signal Process. 2023;17(1):54–65. doi:10.1109/jstsp.2022.3222910. [Google Scholar] [CrossRef]
91. Hu J, Zhang H, Song L, Han Z, Poor HV. Reinforcement learning for a cellular internet of UAVs: protocol design, trajectory control, and resource management. IEEE Wirel Commun. 2020;27(1):116–23. doi:10.1109/mwc.001.1900262. [Google Scholar] [CrossRef]
92. Zhou C, Wu W, He H, Yang P, Lyu F, Cheng N, et al. Deep reinforcement learning for delay-oriented IoT task scheduling in SAGIN. IEEE Trans Wirel Commun. 2021;20(2):911–25. doi:10.1109/TWC.2020.3029143. [Google Scholar] [CrossRef]
93. Mao B, Tang F, Kawamoto Y, Kato N. Optimizing computation offloading in satellite-UAV-served 6G IoT: a deep learning approach. IEEE Netw. 2021;35(4):102–8. doi:10.1109/MNET.011.2100097. [Google Scholar] [CrossRef]
94. Yu Y, Tang J, Huang J, Zhang X, So DKC, Wong K-K. Multi-objective optimization for UAV-assisted wireless powered IoT networks based on extended DDPG algorithm. IEEE Trans Commun. 2021;69(9):6361–74. doi:10.1109/TCOMM.2021.3089476. [Google Scholar] [CrossRef]
95. Ferrag MA, Friha O, Maglaras L, Janicke H, Shu L. Federated deep learning for cyber security in the internet of things: concepts, applications, and experimental analysis. IEEE Access. 2021;9:138509–42. doi:10.1109/ACCESS.2021.3118642. [Google Scholar] [CrossRef]
96. Kumar R, Kumar P, Tripathi R, Gupta GP, Gadekallu TR, Srivastava G. Sp2f: a secured privacy-preserving framework for smart agricultural unmanned aerial vehicles. Comput Netw. 2021;187(2):107819. doi:10.1016/j.comnet.2021.107819. [Google Scholar] [CrossRef]
97. Tian J, Wang B, Guo R, Wang Z, Cao K, Wang X. Adversarial attacks and defenses for deep-learning-based unmanned aerial vehicles. IEEE Internet Things J. 2022;9(22):22399–409. doi:10.1109/JIOT.2021.3111024. [Google Scholar] [CrossRef]
98. Heidari A, Jafari Navimipour N, Unal M. A secure intrusion detection platform using blockchain and radial basis function neural networks for internet of drones. IEEE Internet Things J. 2023;10(10):8445–54. doi:10.1109/JIOT.2023.3237661. [Google Scholar] [CrossRef]
99. Han C, Liu A, Wang H, Huo L, Liang X. Dynamic anti-jamming coalition for satellite-enabled army IoT: a distributed game approach. IEEE Internet Things J. 2020;7(11):10932–44. doi:10.1109/JIOT.2020.2991585. [Google Scholar] [CrossRef]
100. Lin X, Li Y, Bi S, Wang L. On throughput fairness for solar-powered IoT sensors in a UAV-assisted MEC system. IEEE Internet Things J. 2026;13(1):998–1017. doi:10.1109/JIOT.2025.3627589. [Google Scholar] [CrossRef]
101. Zhang C, Liu Y, Wang D, Yu C, Zhang H, Hu P, et al. Joint physical layer and power resource scheduling for GEO satellite internet of things based on channel conditions. IEEE Commun Lett. 2026;30:11–5. doi:10.1109/LCOMM.2025.3624374. [Google Scholar] [CrossRef]
102. Andreou AC, Mavromoustakis CX, Markakis EK, Bourdena A, Mastorakis G. UAV-asisted IoT network framework with hybrid deep reinforcement and federated learning. Sci Rep. 2025;15(1):37107. doi:10.1038/s41598-025-21014-5. [Google Scholar] [PubMed] [CrossRef]
103. Zhaxygulova D, Iavich M, Rakhmetullina S, Alipbayev K. Secure and energy-aware cryptographic framework for IoT-enabled UAV systems. Symmetry. 2025;17(11):1987. doi:10.3390/sym17111987. [Google Scholar] [CrossRef]
104. Woo T, Fu C, Wu Y. Machine learning assisted random access in LEO satellite-based internet of things. Wirel Pers Commun. 2025;145(1–2):177–209. doi:10.1007/s11277-025-11859-4. [Google Scholar] [CrossRef]
105. Giannetti G, Badii M, Lasagni G, Maddio S, Collodi G, Righini M, et al. Internet of things node with real-time lora geo satellite connectivity for agrifood chain tracking in remote areas. Sensors. 2025;25(20):6469. doi:10.3390/s25206469. [Google Scholar] [PubMed] [CrossRef]
106. Chang T, Sheu J, Cuong NV. UAV trajectory planning for IoT data collection and offloading with energy constraints. IEEE Trans Green Commun Netw. 2025;10:573–84. doi:10.1109/TGCN.2025.3592263. [Google Scholar] [CrossRef]
107. Shen L, Nie J, Li M, Wang G, Zhang Q, He X. Trajectory optimization for UAV-aided IoT secure communication against multiple eavesdroppers. Fut Internet. 2025;17(5):225. doi:10.3390/fi17050225. [Google Scholar] [CrossRef]
108. Hou D, Yao Z, Jin B, Cai X, Huan X, Xu J, et al. Dynamic UAV inspection boosted by vehicle collaboration under harsh conditions in the IoT realm. Appl Sci. 2025;15(9):4671. doi:10.3390/app15094671. [Google Scholar] [CrossRef]
109. Hnaien H, Aboud A, Touati H, Snoussi H. Joint localization and data-based path planning for UAV-assisted IoT networks: a heuristic approach. SN Comput Sci. 2025;6(2):160. doi:10.1007/s42979-025-03721-y. [Google Scholar] [CrossRef]
110. Xu X, Wen H, Wang Y, Song H, Liu T, Chang S. Digital-twin-based satellite orbit prediction for internet of things systems. IEEE Internet Things J. 2025;12(6):6431–44. doi:10.1109/JIOT.2024.3424672. [Google Scholar] [CrossRef]
111. Huang Z, Chen H, Gu B, Gong S, Su Z, Guizani MM. A learning-based iterative algorithm for AoI-optimal trajectory planning in UAV-assisted IoT networks. IEEE Trans Wirel Commun. 2025;24(6):4598–613. doi:10.1109/TWC.2025.3543042. [Google Scholar] [CrossRef]
112. Xu J, Yao H, Zhang R, Mai T, Guizani MM. Low latency and accuracy-guaranteed DNN inference for drone-assisted IoT networks. IEEE Trans Cogn Commun Netw. 2025;11(6):4050–61. doi:10.1109/TCCN.2025.3542443. [Google Scholar] [CrossRef]
113. Al-Bakhrani AA, Li M, Obaidat MS, Amran GA. MOALF-UAV-MEC: adaptive multiobjective optimization for UAV-assisted mobile edge computing in dynamic IoT environments. IEEE Internet Things J. 2025;12(12):20736–56. doi:10.1109/JIOT.2025.3544624. [Google Scholar] [CrossRef]
114. Liu X, Wu J, Zhao C, Liu Z. Integrated sensing and communications for UAV assisted internet of things based on deep reinforcement learning. IEEE Trans Veh Technol. 2025;74(6):9604–16. doi:10.1109/TVT.2025.3539693. [Google Scholar] [CrossRef]
115. Qu L, Wang J, Assi C. Resource scheduling and delay optimization of IoT devices in drone-assisted multiaccess edge computing. IEEE Internet Things J. 2025;12(11):16998–7011. doi:10.1109/JIOT.2025.3535553. [Google Scholar] [CrossRef]
116. Pan J, Li Y, Chai R, Xia S, Zuo L. Multiobjective trajectory planning for UAV-assisted IoT networks based on DRL approach. IEEE Internet Things J. 2025;12(11):15840–52. doi:10.1109/JIOT.2025.3533584. [Google Scholar] [CrossRef]
117. Khan S, Durrani S, Thapa C, Camtepe S. Modified AKMA for decentralized authentication in LEO satellite-based IoT networks. IEEE Internet Things J. 2025;12(10):14720–32. doi:10.1109/JIOT.2025.3526635. [Google Scholar] [CrossRef]
118. Deng Z, Liao Y. Intelligent beam-hopping-based grant-free random access in secure IoT-oriented satellite networks. Sensors. 2025;25(1):199. doi:10.3390/s25010199. [Google Scholar] [PubMed] [CrossRef]
119. Chapnevis A, Bulut E. Time-efficient approximate trajectory planning for AoI-centered multi-UAV IoT networks. Internet Things. 2025;29(12):101461. doi:10.1016/j.iot.2024.101461. [Google Scholar] [CrossRef]
120. Qiao Y, Teng S, Luo J, Sun P, Li F, Tang F. On-orbit DNN distributed inference for remote sensing images in satellite internet of things. IEEE Internet Things J. 2025;12(5):5687–703. doi:10.1109/JIOT.2024.3488076. [Google Scholar] [CrossRef]
121. Thanh Le TTT, Hassan NU, Chen X, Alouini M-S, Han Z, Yuen C. A survey on random access protocols in direct-access LEO satellite-based IoT communication. IEEE Commun Surv Tut. 2025;27(1):426–62. doi:10.1109/COMST.2024.3385347. [Google Scholar] [CrossRef]
122. Zhong J, Jin X, Hu Y, Li Y, Gao R, Wang J, et al. UAV data collection for wide-area IoT with grant-free access. IEEE Trans Vehicular Technol. 2025. doi:10.1109/TVT.2025.3644897. [Google Scholar] [CrossRef]
123. Rajashekar A, Chouhan D. Efficient UAV clustering with stable backup nodes for mobility management in UAV based IoT network. Wirel Netw. 2026;32(1):331–48. doi:10.1007/s11276-025-04057-4. [Google Scholar] [CrossRef]
124. Gou H, Zhao S, Rao Y, Zhang G, Sha J, Lu Z, et al. Energy-efficient trajectory design and resource allocation for multi-UAV-enabled ISAC in IoT networks. IEEE Trans Consum Electron. 2025. doi:10.1109/TCE.2025.3631745. [Google Scholar] [CrossRef]
125. Abualhayja’A M, Centeno AE, Tran DH, Butt MM, Sehier P, Imran MA, et al. Efficient data harvesting in urban IoT networks: DRL for RIS-UAV communications. IEEE Trans Veh Technol. 2026;75(3):4248–60. doi:10.1109/TVT.2025.3608968. [Google Scholar] [CrossRef]
126. Shabani M, Faraji N, Baghani M. On SE and delay trade-off in UAV-enabled IoT networks underlying HetNets with optimal spectrum partitioning. IEEE Trans Veh Technol. 2026;75(2):2382–95. doi:10.1109/TVT.2025.3598288. [Google Scholar] [CrossRef]
127. Jiang Y, Zhai L, Wu T, Li B, Zou Y, Yan P. Energy-efficiency optimization for RIS-assisted UAV-enabled IoT networks. IEEE Internet Things J. 2025;12(20):42599–612. doi:10.1109/JIOT.2025.3594180. [Google Scholar] [CrossRef]
128. Pan H, Lin B, Liu Y, Liang S, Yuen C. Diffusion-model-enhanced multiobjective optimization for improving forest monitoring efficiency in UAV-enabled internet of things. IEEE Internet Things J. 2025;12(19):40980–96. doi:10.1109/JIOT.2025.3590507. [Google Scholar] [CrossRef]
129. Wang K, Zhao T, Yuan Y, Chu J, Chen Z, Dui H. Resilience evaluation and resource allocation in UAV-enabled IoT via multiswarm logistics support. IEEE Internet Things J. 2025;12(19):40793–808. doi:10.1109/JIOT.2025.3590565. [Google Scholar] [CrossRef]
130. Pokhrel SR, Aslam S, Aloqaily M. Harnessing autoencoder-based power allocation for direct-to-satellite IoT. IEEE Internet Things J. 2025;12(18):36834–41. doi:10.1109/JIOT.2025.3578093. [Google Scholar] [CrossRef]
131. Wei Y, He Y, Xiao Y, Leng S, Hu J, Yang K. UAV-enabled split learning with privacy preservation in internet of things. IEEE Internet Things J. 2025;12(18):38454–63. doi:10.1109/JIOT.2025.3586588. [Google Scholar] [CrossRef]
132. Tan W, Ding T, Liu L. Intelligent UAV deployment for energy-efficient IoT data collection. IEEE Internet Things J. 2025;12(17):34890–99. doi:10.1109/JIOT.2025.3586685. [Google Scholar] [CrossRef]
133. Zhao G, Wang J, Meng Z, Wang Z, Fu H, Jiang C. Energy-efficient path planning and task allocation for multi-drone-aided IoT cluster-based data collection. IEEE Trans Aerosp Electron Syst. 2025;61(5):14177–91. doi:10.1109/TAES.2025.3582920. [Google Scholar] [CrossRef]
134. Talukdar N, Raghav A, Hazra A, Barman DC, Mazumdar N. A deep deterministic policy gradient method for optimizing task completion time and energy efficiency in UAV-assisted IoT networks. IEEE Internet Things J. 2025;12(15):31907–17. doi:10.1109/JIOT.2025.3575714. [Google Scholar] [CrossRef]
135. Zhang X, Sun W, Zhang Z, Wang L, Gao A, Cheng N, et al. Reconfigurable-intelligence-surface-assisted opportunistic multiple access in UAV-IoT networks. IEEE Internet Things J. 2025;12(15):29626–41. doi:10.1109/JIOT.2025.3570263. [Google Scholar] [CrossRef]
136. Gong L, Chen Q, Yang L, Yin Z, Wang Y. Autonomous traffic prediction for LEO satellite-based IoT based on satellite spatiotemporal features mapping. IEEE Internet Things J. 2025;12(14):27021–32. doi:10.1109/JIOT.2025.3562631. [Google Scholar] [CrossRef]
137. Lam TC, Vo NS, Bui M, Thai CDT, Jung H, Phan CV. Service time-aware caching, power allocation, and 3D trajectory optimised multimedia content delivery in UAV-assisted IoT networks. IEEE Trans Veh Technol. 2025;74(4):6419–32. doi:10.1109/TVT.2024.3510621. [Google Scholar] [CrossRef]
138. Wang J, Guo S, Wang J, Bai L. Age-of-information-oriented security transmission scheme for UAV-aided IoT networks. IEEE Internet Things J. 2025;12(8):9570–82. doi:10.1109/JIOT.2025.3540488. [Google Scholar] [CrossRef]
139. Pandey GK, Gurjar DS, Yadav S, Krstić DS, Jiang Y. Secrecy analysis and optimization of UAV-assisted IoT networks with RF-EH and imperfect hardware. IEEE Internet Things J. 2025;12(7):8049–63. doi:10.1109/JIOT.2025.3540061. [Google Scholar] [CrossRef]
140. Li D, Wu S, Wang Y, Wu W, Zhang Q. Intelligent task scheduling in hybrid geo-LEO satellite-assisted marine IoT network. IEEE Internet Things J. 2025;12(7):8353–67. doi:10.1109/JIOT.2024.3502791. [Google Scholar] [CrossRef]
141. Yang X, Liwang M, Fu L, Su Y, Hosseinalipour S, Wang X, et al. Adaptive UAV-assisted hierarchical federated learning: optimizing energy, latency, and resilience for dynamic smart IoT. IEEE Trans Serv Comput. 2025;18(6):3420–34. doi:10.1109/TSC.2025.3621606. [Google Scholar] [CrossRef]
142. Joshi N, Budhiraja I, Bansal A, Garg S, Choi BD, Hassan MM. Federated learning based energy efficient scheme for iot devices: wireless power transfer using ris-assisted underlaying solar powered UAVs. Alex Eng J. 2024;107(23):103–16. doi:10.1016/j.aej.2024.06.097. [Google Scholar] [CrossRef]
143. Uddin R, Kumar SA. SDN-based federated learning approach for satellite-IoT framework to enhance data security and privacy in space communication. IEEE J Radio Freq Identification. 2023;7:424–40. doi:10.1109/JRFID.2023.3279329. [Google Scholar] [CrossRef]
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF




Downloads
Citation Tools