Double Deep Q-Network Method for Energy Efficiency and Throughput in a UAV-Assisted Terrestrial Network
1 University Grenoble Alpes, CNRS, Grenoble INP, LIG, DRAKKAR Teams, 38000, Grenoble, France
2 Laboratoire d’informatique Médical, Université de Bejaia, Targa Ouzemour, Q22R+475, Algeria
3 Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O.Box 84428, Riyadh, 11671, Saudi Arabia
4 Department of Research and Development, Centre for Space Research, School of Electronics and Electrical Engineering, Lovely Professional University, Phagwara, 144411, India
5 Department of Communication and Electronics, Delta Higher Institute of Engineering and Technology, Mansoura, Egypt
6 Electrical Engineering Department, College of Engineering, Taif University, P. O. BOX 11099, Taif, 21944, Saudi Arabia
* Corresponding Author: Reem Alkanhel. Email:
Computer Systems Science and Engineering 2023, 46(1), 73-92. https://doi.org/10.32604/csse.2023.034461
Received 18 July 2022; Accepted 22 September 2022; Issue published 20 January 2023
AbstractIncreasing the coverage and capacity of cellular networks by deploying additional base stations is one of the fundamental objectives of fifth-generation (5G) networks. However, it leads to performance degradation and huge spectral consumption due to the massive densification of connected devices and simultaneous access demand. To meet these access conditions and improve Quality of Service, resource allocation (RA) should be carefully optimized. Traditionally, RA problems are nonconvex optimizations, which are performed using heuristic methods, such as genetic algorithm, particle swarm optimization, and simulated annealing. However, the application of these approaches remains computationally expensive and unattractive for dense cellular networks. Therefore, artificial intelligence algorithms are used to improve traditional RA mechanisms. Deep learning is a promising tool for addressing resource management problems in wireless communication. In this study, we investigate a double deep Q-network-based RA framework that maximizes energy efficiency (EE) and total network throughput in unmanned aerial vehicle (UAV)-assisted terrestrial networks. Specifically, the system is studied under the constraints of interference. However, the optimization problem is formulated as a mixed integer nonlinear program. Within this framework, we evaluated the effect of height and the number of UAVs on EE and throughput. Then, in accordance with the experimental results, we compare the proposed algorithm with several artificial intelligence methods. Simulation results indicate that the proposed approach can increase EE with a considerable throughput.
In recent years, unmanned aerial vehicle (UAV)-assisted fifth-generation (5G) communication has provided an attractive way to connect users with different devices and improve network capacity. However, data traffic on cellular networks increases exponentially, Thus, resource allocation (RA) is becoming increasingly critical . Industrial spectrum bands experience increased demand for channels, leading to a spectrum scarcity situation. In the context of 5G, mmWave is considered a potential solution to meet this demand [2,3]. Moreover, other techniques, such as beamforming, multi-input multi-output (MIMO), and advanced power control, are introduced as promising solutions in the design of future networks . Despite all these attempts to satisfy this demand, RA remains a priority to accommodate users in terms of Quality of Service (QoS). RA problems are often formulated as nonconvex problems requiring proper management [5,6]. Optimal solutions are obtained by implementing heuristic methods, such as genetic algorithm, particle swarm optimization, and simulated annealing [7,8]. However, such solutions end up with quasioptimal solutions and converge relatively slowly. Therefore, alternative solutions and flexible algorithms that exploit late development in artificial intelligence are desirable to explore. Recently, deep learning (DL)  has emerged as an effective tool to increase flexibility and optimize RA in complex wireless communication networks. First, DL-based RA is flexible because the same deep neural network (DNN) can be implemented to achieve different design objectives by modifying the loss function . Second, the computation time required by DL to obtain RA results is lower than that of conventional algorithms . Finally, DL can receive complex high-dimensional information as input and allocate the optimal action for each input statistic in a particular condition . On the basis of the above analysis, DL can be chosen as an accurate method for RA.
As an emerging technology, DL has been used in several research studies to improve RA for terrestrial networks. For instance, the authors in  investigated the deep reinforcement learning (DRL)-based time division duplex configuration to allocate radio resources dynamically in an online manner and with high mobility. In , Lee et al. proposed deep power control based on a convolutional neural network to maximize spectral efficiency (SE) and energy efficiency (EE). In this study, a comparison between the DL model and a conventional weighted minimum mean square error was realized. In the same context,  performed a max-min and max-prod power allocation in downlink massive MIMO. To maximize EE, a deep artificial neural network scheme was applied in , where interference and system propagation channels were considered. Deep Q-learning (DQL)-based RA has also attracted much attention in recent literature. In , the authors studied the RA problem to enhance EE. The proposed method formulated a combined optimization problem, considering EE and QoS. More recently, a supervised DL approach in 5G multitier networks was adopted in  to solve the joint RA and remote–radio–head association. For this model, efficient subchannel and power allocation were used to generate training data. According to the decentralized RA mechanism, the authors in  developed a novel decentralized DL for vehicle-to-vehicle communications. The main objective was to determine the optimal sub-band and power level for transmission without requiring or waiting for global information. The authors used a DRL-based power control to investigate the problem of spectrum sharing in a cognitive radio system. The aim of this framework is that the secondary user shares the common spectrum with the primary user. Instead of unsupervised learning, the authors in  introduced supervised learning to maximize the throughput of device-to-device with maximum power constraint. The authors in  presented a comprehensive approach and considered DRL to maximize the total network throughput. However, this work did not include EE for optimization. Majority of the learning algorithms introduced above do not incorporate constraints directly into the training cost functions. Nowadays, literatures focus on RA in UAV-assisted cellular networks based on artificial intelligence. In reference , authors proposed a multiagent reinforcement learning framework to study the dynamic RA of multiple UAVs. The objective of this investigation was to maximize long-term rewards. However, the work did not consider UAV height. In , the authors used deep Q-network to solve the RA for UAV-assisted ultradense networks. To maximize system EE, a link selection strategy was proposed to allow users to select the optimal communication links. The authors did not consider the influence of SE on EE. In addition, the authors in  studied the RA problem of UAV-assisted wireless-powered Internet of Things systems, aiming to allocate optimal energy resources for wireless power transfer. In , the authors thoroughly investigated deep Q-network (DQN), invoking the difference of convex-based optimization method for multicooperative UAV-assisted wireless networks. This work assumed beamforming technique to serve users simultaneously in the same spectrum and maximize the sum user achievable rate. However, the work was not focused on EE. Another DQN work in  was presented to study the low utilization rate of resources. A novel DQN-based method was introduced to address the complex problem. The authors in  analyzed RA for bandwidth, throughput, and power consumption in different scenarios for multi-UAV-assisted IoT networks. On the basis of machine learning, authors considered DRL to address the joint RA problem. Although the proposed approach remained efficient, it did not consider ground network, EE and total throughput. In the present study, we aim to optimize EE and total throughput in UAV-assisted terrestrial networks subject to the constraints on transmission power and UAV height. Our main efforts are to apply a double deep Q-network (DDQN) that obtains optimal rewards better than DQN .
Existing research on RA in UAV-assisted 5G networks focuses on single objective optimization and considers the DQN algorithm to generate data. Following the previous analysis, we investigate the RA problem in UAV-assisted cellular networks that maximize EE and total network throughput. Especially, DDQN is proposed to address intelligent RA. The main contributions of this study are listed below.
(1) We formulate EE and total throughput in mmWave scenario while ensuring the minimum QoS requirements for all users according to the environment. However, the optimization problem is formulated as a mixed integer nonlinear program. Multiple constraints, such as the path loss model, number of users, channel gains, beamforming, and signal-to-interference-plus-noise ratio (SINR) issues, are used to describe the environment.
(2) We investigate a multiagent DDQN algorithm to optimize EE and total throughput. We assume that each user equipment (UE) behaves as an agent and performs optimization decisions on environmental information.
(3) We compare the performance of the proposed algorithm, QL, and the DQN approaches already proposed in terms of RA.
The remainder of this paper is organized as follows: An overview for DRL is presented in Section 2, and the system model is introduced in Section 3. Then, the DDQN algorithm is discussed in Section 4, followed by simulation and results in Section 5. Lastly, conclusions and perspectives are drawn in Section 6.
DRL is a prominent case of machine learning and thus a class of artificial intelligence. It allows agents to identify the ideal performance based on its own experience, rather than depending on a supervisor . In this approach, a neural network is used as an agent that learns by interacting with the environment and solves the process by determining an optimal action. Compared with the standard ML, namely supervised and unsupervised learning , DRL does not depend on data acquisition. Thus, sequential decision making occurs, and the next input is based on the decision of the learner or system. Moreover, in DRL, the Markov decision process (MDP) is formalized as a mathematical approach to modeling and decision-making situations. The reinforcement learning process operates as follows : the agent begins in a specific state within its environment by obtaining an initial observation and takes an action at each time step . As illustrated in Fig. 1, the DRL can be categorized into three algorithms, such as value-based, policy gradient, and model-based methods. In value-based DRL, the agent uses the learned value function to evaluate pairs and generate a policy . DQL is a much more popular and efficient algorithm in this category. By contrast, a policy-based algorithm is intuitive, where algorithms learn a policy . Learning a policy to act in an environment is sensible; thus, a policy function considers a state s as input to generate an action .
As a popular branch of machine learning, Q-learning is based on the main concept of the action value function for policy . It uses the Bellman equation to learn and calculate the optimum values of the Q-function in an iterative way , which is expressed as
where is the step-size parameter that defines the extent to which the new data contribute to the existing Q value, is the MDP discounter factor, is the numerical reward for the agent after the execution of the action, and indicates that the environment changes to a new state, with transition probability , as illustrated in Fig. 2. However, the Q-learning algorithm could be applied only to RA problems with low dimensionality in state and action, resulting in an evolutionary limitation . Moreover, this application is only used when the state and action spaces are discrete (e.g., channel access) .
As stated above, the Q-learning algorithm faces difficulties in obtaining the optimal policy when the action and state spaces become exceptionally large . This constraint is often observed in the RA approaches of cellular networks. To solve this problem, the DQN algorithm, which connects the traditional Q-learning algorithm to a convolutional neural network, was proposed . The main difference with the Q-learning algorithm is the replacement of the table with the function approximator called DNN; this process attempts to approximate the Q values. Approximators have two types: a linear function and a nonlinear function . However, in a nonlinear DNN, the new Q function is defined as , where represents the weights of the neural network. At each time , action is taken in accordance with the -greedy policy, and the transition tuple ( is stored mainly in a replay memory denoted by D. During the training process, a minibatch is sampled randomly from experience D to optimize the mean squared error. Thus, the target Q-network is used to improve the stability of DQN, whose is regularly adjusted to follow those of the principal Q-network. On the basis of the Bellman equation, the optimal state-action function is given by 
To train the DQN, iterative updating of the weight is used, thus minimizing the mean squared error of the Bellman equation. Mathematically, the loss function at each iteration is given by
In the proposed model, we consider the downlink communication of a UAV-assisted cellular network comprising a set of small base stations (SBSs) denoted as and a set of UAVs which is defined as . UAVs are placed at a particular altitude , and is assumed to be constant for all UAVs. Each cell contains an mm-wave band and some user N distributed randomly in a dense area. We assume that a particular user is assigned to a single base station that provides the strongest signal. In this work, MBS and UEs are assumed to be equipped with omnidirectional antennas, i.e., antennas with unit gain, and every UAV are equipped with directional antennas . Moreover, each UE associated with UAVs are assigned with orthogonal resource blocks RBs (an RB consists of 12 subcarriers, with a total bandwidth of 180 kHz in the frequency domain and one time slot 0.5 ms in the time domain), whereas UEs associated with SBSs share the remaining RBs . The transmission power allocated by UAVs and SBSs are denoted by , respectively. Furthermore, the link between BS = UAVs SBSs and users can have two conditions, i.e., line-of-sight (LoS) or non-line-of-sight (NLoS) link. As illustrated in Fig. 3, interference powers from adjacent base stations are considered. Table 1 summarizes the notations that were used in this article.
The channel between the base station and UE can be fixed or time varying. Fading is defined as the fluctuation in received signal strength with respect to time, and it occurs due to several factors, including transmitter and receiver movement, propagation environment, and atmospheric condition. Similar to , we model the channel in a way that it can capture small-scale and large-scale fading. At each time slot , the small-scale fading between UAVs, SBSs, and UEs is considered frequency-selective fading, whose objective is to obtain a delay spread greater than the symbol period. By contrast, the channel in every subcarrier is supposed to be flat fading. This combination means that the channel gains can remain unchanged. All UEs periodically transmit their channel quality information to the related BS. In addition, let designate the channel gain from BS to user k on different subcarriers . A binary variable is introduced to define the association mode. If UE is associated with the UAV/SBS according to LoS link, then ; otherwise . We apply the following assumption in formulation. The mm-wave signal is affected by various factors, such as buildings in urban areas, making the link susceptible to effect blockage. Thus, the downlink achievable throughput (data rate) of user k on the subcarrier can be given by the following equation as
where b and c are constants that depend on the network environment, and is the Euclidean distance between the typical UE and UAV (see Fig. 4).
where is the blockage parameter that defines the average size of obstacles. Here, d corresponds to the distance between SBS and UE.
Adding additional gain to the system remains necessary due to the propagation losses that occur at mm-wave frequencies. One of the main solutions proposed by several research for future wireless networks is beamforming . The fundamental principle of beamforming is to control the direction of a wavefront toward the UE. According to , UAV and SBS serve UE through beamforming technology. In this manner, the SINR of UE from UAV at time slot t can be written as
where represents the directional beamforming gain for the desired link, and refers to additive white Gaussian noise. and represent the interference from the adjacent SBS and UAV, respectively, and are expressed as
Without loss of generality, different properties are displayed in terms of propagation. For air-to-ground communication, the path loss of LoS and NLoS links at time slot t can be experienced depending on additional path losses in LoS and NLoS links and path loss exponents , as
Similarly, we define the SINR when UE is associated with SBS. In this case, we adopt the standard power-law path loss model with the mean , for LoS and NLoS, respectively. Hence, the path loss model can be given as
The SINR additional loss at the typical UE when it is connected to SBS is given by (13), where and are the interferences from UAV and SBS, respectively.
SE and EE are the key metrics to evaluate any wireless communication system. SE is defined as the efficiency capability of a given channel bandwidth. In other words, it represents the transmission rate per unit of bandwidth and is measured in bits per second per hertz. The EE metric is used to evaluate the total energy consumption for a network. It is defined as a ratio of the total transferred bits to the total power consumption. Nevertheless, EE and SE have a fundamental relationship. Let be the power consumed in the circuit of the transmitter; then, EE can be given by
where is the transmit power , which ranges ; the achievable of transmitter can be computed as
The proper performance of EE approaches is of paramount importance in UAV-assisted terrestrial networks because it is directly related to the choice of objectives and constraints for relevant optimization problems. In this work, we aim to optimize two specific objectives for RA, namely, the maximization of EE and throughput. From the SE perspective, the EE maximization problem can be formulated as
Constraint means that the transmit power and must be in the interval . It specifies the upper limit of the power transmission. Constraint indicates that the UAV should be positioned between a minimum and maximum height. At higher heights, the distance between the UAV and UE increases, resulting in considerable path loss. By contrast, when the UAV is located at a certain minimum height, the NLoS conditions are recorded and may affect EE; hence, this constraint must be studied. The constraint in guarantees that the EE of UAV must be greater than that of SBS. In , defines the maximum downlink achievable data rate, whereas accounts for the data rate requirement. Constraints and specify that the SINR of the UE should be higher than a certain threshold; the SINR threshold differs from each tier (UAV, SBS). Lastly, the last constraint ensures that the UE is connected with a single BS. In a subsequent section, we will present our second objective, which is to maximize the total network throughput. The overall throughput is defined as the sum of the data rates that are provided perfectly to all UE. Mathematically, the maximization problem can be computed as
The constraint in indicates the minimum required data rate for QoS. Here, constraint means that the LoS probability of the SBS must be less than that of the UAV.
In this section, we present a DRL algorithm-based EE and throughput RA framework to address the network problems of (17) and (18). The task of the DRL agent is to learn an optimal policy from state to action, thus maximizing the utility function. We formulate the optimization problem as a fully observable Markov decision process. Similar to literature, we consider a tuple ( . Based on the transition probability , the current network state learns a new state according to the action selected by the agent at time slot . A DDQN is applied to achieve an optimal solution. However, we assume that UAV and SBS act as an agent that continuously interacts with the environment to optimize the policy. First, the agent observes the state and decides to take an action in accordance with the optimal policy. Then, at each time policy, the agent receives reward conditioned by the action and moves to the next state . This procedure concerns the DQN algorithm with a single agent. The major inconvenience of this algorithm lies in the confusion of the selection or evaluation of actions, leading to overestimation of action values and unstable training. To solve this overestimation, Hasselt et al. proposed a DDQN architecture, where the max function estimators is decomposed into action selection and evaluation, as illustrated in Fig. 5. The fundamental concept of the algorithm is to change the target network as
At each time t, the weighted parameters of the online network is used to evaluate the greedy policy, whereas the weighted parameter estimates the policy value. For improved performance evaluation, the target network for DDQN can use any parameters from the previous iteration . Therefore, a periodic update of the target network settings is applied with copies of the online network.
The state describes a specific configuration of the environment. At time slot , UAVs and SBSs act as agents and define the observation space . The observation of each includes the SINR measurement from the UAV and SBS to UE, the height of UAVs , and spectral efficiency. We define the global state as
where represents the set of observation and can be expressed as
In our problem, each agent must choose an appropriate base station (i.e., UAV or SBS), power transmission, UAV height, and LoS/NLoS link probability. At time step , the action of UAV/SBS can be expressed as
where is the selected BS; is the power transmission requirement, which indicates how much power should be assigned to UE; is the UAV elevation.
Reinforcement learning is based on the reward function, stating that the agent (UAV and SBS) is guided toward an optimal policy. As mentioned above, we model this problem as a fully observable MDP to maximize EE and throughput. Therefore, the reward of j can be computed as
where and . and are the weights for each objective, respectively. The pseudo code for DDQN is outlined in Algorithm 1.
This section discusses the simulation and results for EE and throughput in the downlink UAV-assisted terrestrial network comprising eight SBSs with a radius of 500 m and five UAVs deployed randomly in the area. The cell contains randomly distributed users and uses mm-wave bands. We assume that the maximum power transmission for SBS is , and different values of maximum is shown in simulation. The path loss exponent in the LoS and NLoS links for the UAV and SBS have the values , , , and . In addition, the power consumed in the circuit of the transmitter . The added white Gaussian noise . In the DDQN algorithm, the DNN of each agent is a four-layer fully connected neural network with two hidden layers of 64 and 32 neurons. Other simulation and DDQN parameters are listed in Table 2. The simulation is realized using MATLAB (R2017a) running on a Dell PC (2.8 Ghz @ Intel Core i7-7600U, 16 GB). In our simulation, we consider and .
In this subsection, we show some results of EE, which are obtained using DDNQ. For improved performance validation, we compare our proposed algorithm with the DQN and QL architectures. Moreover, the effect of UE demand, number of UAVs, and beamforming on maximum power are discussed. In the simulation evaluation, the parameter values in Table 2 are used, unless otherwise specified. First, we evaluate the effect of UE demand on the EE for different algorithms in Fig. 6. A common observation in Fig. 6 is that increasing UE demand can lead to increased EE; however, from 60 Mbps, EE converges less quickly.
This result is obtained because when UE demand increases considerably ( , all algorithms (DDQN, DQN, and QL) aim to maximize network throughput, which requires high transmission power, causing reduced EE. Another comment from Fig. 6 is that the DDQN algorithm can outperform DQN and QL. This outcome is achieved because the agent selects a more appropriate Q value to estimate the action. This perfection is mainly due to the two separate estimators applied in DDQN. In other terms, the use of the opposite estimator is cost effective for obtaining unbiased Q-values. A proposed solution to the EE problem when UE demand increases is to add base stations. The increase in the number of UAVs has a remarkable effect on EE, as illustrated in Fig. 7. As the number of UAVs increases, EE improves because the number of users covered in UAV LoS increases.
Moreover, Fig. 7 demonstrates that the DDQN algorithm outperforms DQN and QL by 13.3% on EE because traditional RL algorithms use a one-actor network to train multiple agents; thus, conflicts between agents are recorded. Next, EE is plotted as a function of the number of UE for different UAV height ( constraint), as shown in Fig. 8. Moreover, an increase in the number of UE results in EE degradation because of the increase in energy consumption. Fig. 8 also shows that UAV height can affect EE. Therefore, EE increases as increases because the increase in UAV height results in additional UEs in the LoS link condition, leading to an increase in the total number of bits transmitted.
As the number of UE increases, the power assigned to UE declines. Therefore, the increase in height compensates this shortcoming. Fig. 9 shows EE vs. the maximum power of UAV with and without beamforming. A common observation in Fig. 9 is that EE decreases by extending the maximum transmission power of the UAV due to the increased energy consumption by users. In addition, when the power of UAVs increases, the links between UAVs and UEs are in NLoS condition, thus reducing EE. This analysis is conducted with and without beamforming. As illustrated in Fig. 9, applied beamforming improves EE in each algorithm (DDQN and DQN) because beamforming provides additional gains and can overcome mm-wave blockage constraints.
To validate the accuracy of our approach, we analyze the total throughput (second objective) according to the number of UAVs deployed, UAV height , and beamforming. Considering the first scenario, Fig. 10 depicts the total throughput as a function of the number of UAVs. As the number of UAVs increases, the total throughput is enhanced. Thus, DDQN outperforms DQN and QL. However, this effect is mainly due to the increase in LoS links. The same figure also shows that the total throughput reaches a congestion level at a particular number of UAVs due to the rise in interference between UAVs.
Fig. 11 illustrates the variation of throughput vs. UAV height in different AI algorithms. According to Fig. 11, throughput increases with maximization of altitude because at low altitude, the propagation condition is in NLoS, and interference between tiers is observed. By contrast, when the UAV height increases, the LoS condition occurs, resulting in reduced loss. Moreover, saturation is experienced from an altitude of 130 m because as UAV height increases, the distance between the UAV and UE increases, leading to signal attenuation.
Fig. 12 shows the variation of the throughput vs. maximum UAV power. As expected, the total throughput increases as increases. Fig. 12 also reveals that DDQN achieves a maximum throughput of 582.7 Mbps with a maximum power of dBm. By contrast, DQN achieves a maximum throughput of 269,234 Mbps at the same . Again, the proposed DDQN algorithm outperforms DQN. Finally, we plot the throughput as a function of blockage parameter for SBS when UAVs are assumed to be located at m, as shown in Fig. 13. When increases, the total throughput of the network decreases. Therefore, with the increase in obstacle density, more UEs are served by NLoS conditions. In addition, Fig. 13 shows that the proposed DDQN scheme converges to highly satisfactory solutions compared with the other approaches because it handles interference perfectly.
In this study, we proposed a DDQN scheme for RA optimization in UAV-assisted terrestrial networks. The problem is formulated as EE and throughput maximization. Initially, we provided a general overview of deep reinforcement architectures. Then, we presented the network architecture where the base stations use the beamforming technique during transmission. The proposed EE and throughput were assessed under the number of UAVs, beamforming, maximum UAV power transmission, and blockage parameter. The algorithm accuracy of the obtained EE and throughput was demonstrated by a comparison with deep Q-network and Q-learning. Our results indicate that EE can be affected by the number of UAVs to be deployed in the coverage area, as well as the maximum altitude variation (constraint). Moreover, the use of beamforming can be cost effective in improving EE. Our investigation also revealed other useful conclusions. For throughput analysis, the blockage parameter has a dominant influence on the throughput, and an optimal value can be selected. In terms of convergences, our DDQN consistently outperforms DQN and QL. In future work, other issues can be explored and investigated. For instance, UAV mobility can be considered, and an optimal mobility model can be selected to maximize throughput. Interference coordination may also be introduced between tiers.
Acknowledgement: The authors would like to acknowledge the financial support received from Princess Nourah bint Abdulrahman University Researchers Supporting Project Number (PNURSP2022R323), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia and Taif University Researchers Supporting Project Number TURSP-2020/34), Taif, Saudi Arabia.
Funding Statement: This work was supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number (PNURSP2022R323), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia, and Taif University Researchers Supporting Project Number TURSP-2020/34), Taif, Saudi Arabia.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
- Z. Sylia, G. Cédric, O. M. Amine and K. Abdelkrim, “Resource allocation in a multi-carrier cell using scheduler algorithms,” in 4th Int. Conf. on Optimization and Applications (ICOA), Mohammedia, Morocco, pp. 1–5, 2018.
- T. O. Olwal, K. Djouani and A. M. Kurien, “A survey of resource management toward 5G radio access networks,” IEEE Communications Surveys & Tutorials, vol. 18, no. 3, pp. 1656–1686, 2016.
- T. S. Rappaport, “Millimeter wave mobile communications for 5G cellular: It will work!,” IEEE Access, vol. 1, pp. 335–349, 201
- M. A. Ouamri, M. E. Oteşteanu, A. Isar and M. Azni, “Coverage, handoff and cost optimization for 5G heterogeneous network,” Physical Communication, vol. 39, pp. 1–8, 2020.
- C. Sun, C. She, C. Yang, T. Q. S. Quek, Y. Li et al., “Optimizing resource allocation in the short blocklength regime for ultra-reliable and low-latency communications,” IEEE Transactions on Wireless Communications, vol. 18, no. 1, pp. 402–415, Jan. 2019.
- Y. Hu, M. Ozmen, M. C. Gursoy and A. Schmeink, “Optimal power allocation for QoS-constrained downlink multi-user networks in the finite blocklength regime,” IEEE Transactions on Wireless Communications, vol. 17, no. 9, pp. 5827–5840, Sept. 2018.
- L. Zhu, J. Zhang, Z. Xiao, X. Cao, D. O. Wu et al., “Joint Tx-Rx beamforming and power allocation for 5G millimeter-wave non-orthogonal multiple access networks,” IEEE Transactions on Communications, vol. 67, no. 7, pp. 5114–5125, July 2019.
- S. O. Oladejo and O. E. Falowo, “Latency-aware dynamic resource allocation scheme for multi-tier 5G network: A network slicing-multitenancy scenario,” IEEE Access, vol. 8, pp. 74834–74852, 2020.
- L. Lei, Y. Yuan, T. X. Vu, S. Chatzinotas and B. Ottersten, “Learning-based resource allocation: Efficient content delivery enabled by convolutional neural network,” in IEEE 20th Int. Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Cannes, France, pp. 1–5, 201
- K. I. Ahmed, H. Tabassum and E. Hossain, “Deep learning for radio resource allocation in multi-cell networks,” IEEE Network, vol. 33, no. 6, pp. 188–195, Dec. 2019.
- L. Liang, H. Ye, G. Yu and G. Y. Li, “Deep-learning-based wireless resource allocation with application to vehicular networks,” Proceedings of the IEEE, vol. 108, no. 2, pp. 341–356, Feb. 2020.
- F. Tang, Y. Zhou and N. Kato, “Deep reinforcement learning for dynamic uplink/downlink resource allocation in high mobility 5G HetNet,” IEEE Journal on Selected Areas in Communications, vol. 38, no. 12, pp. 2773–2782, 2020.
- L. Sanguinetti, A. Zappone and M. Debbah, “Deep learning power allocation in massive MIMO,” in 2018 52nd Asilomar Conf. on Signals, Systems, and Computers, USA, pp. 1257–1261, 2018.
- A. Zappone, M. Debbah and Z. Altman, “Online energy-efficient power control in wireless networks by deep neural networks,” in IEEE 19th Int. Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Greece, pp. 1–5, 2018.
- H. Li, H. Gao, T. Lv and Y. Lu, “Deep Q-learning based dynamic resource allocation for self-powered ultra-dense networks, in IEEE Int. Conf. on Communications Workshops (ICC Workshops), USA, pp. 1–6, 2018.
- S. Ali, A. Haider, M. Rahman, M. Sohail and Y. B. Zikria, “Deep learning (DL) based joint resource allocation and RRH association in 5G-multi-tier networks,” IEEE Access, vol. 9, pp. 118357–118366, 2021.
- H. Ye, G. Y. Li and B. F. Juang, “Deep reinforcement learning based resource allocation for V2V communications,” IEEE Transactions on Vehicular Technology, vol. 68, no. 4, pp. 3163–3173, April 2019.
- D. Ron and J. -R. Lee, “DRL-based sum-rate maximization in D2D communication underlaid uplink cellular networks,” IEEE Transactions on Vehicular Technology, vol. 70, no. 10, pp. 11121–11126, Oct. 2021.
- R. Amiri, M. A. Almasi, J. G. Andrews and H. Mehrpouyan, “Reinforcement learning for self organization and power control of two-tier heterogeneous networks,” IEEE Transactions on Wireless Communications, vol. 18, no. 8, pp. 3933–3947, Aug. 20
- J. Cui, Y. Liu and A. Nallanathan, “Multi-agent reinforcement learning-based resource allocation for UAV networks,” IEEE Transactions on Wireless Communications, vol. 19, no. 2, pp. 729–743, Feb. 20
- X. Chen, X. Liu, Y. Chen, L. Jiao and G. Min, “Deep Q-Network based resource allocation for UAV-assisted Ultra-Dense Networks,” Computer Networks, vol. 196, pp. 1–10, 20
- B. Liu, H. Xu and X. Zhou, “Resource allocation in unmanned aerial vehicle (UAV)-assisted wireless-powered internet of things,” Sensors, vol. 19, no. 8, pp. 1908–1928, 2019.
- P. Luong, F. Gagnon, L. -N. Tran and F. Labeau, “Deep reinforcement learning-based resource allocation in cooperative UAV-assisted wireless networks,” IEEE Transactions on Wireless Communications, vol. 20, no. 11, pp. 7610–7625, Nov. 2021.
- W. Min, P. Chen, Z. Cao and Y. Chen, “Reinforcement learning-based UAVs resource allocation for integrated sensing and communication (ISAC) system,” Electronics, vol. 11, no. 3, pp. 441, 2022.
- Y. Y. Munaye, R. -T. Juang, H. -P. Lin and G. B. Tarekegn, “Resource allocation for multi-UAV assisted IoT networks: A deep reinforcement learning approach,” in Int. Conf. on Pervasive Artificial Intelligence (ICPAI), Taiwan, pp. 15–22, 2020.
- H. van Hasselt, A. Guez and D. Silver, “Deep reinforcement learning with double Q-learning,” in Proc. AAAI Conf. Artif. Intell., Phoenix, Arizona, USA, pp. 2094–2100, Sep. 2016.
- H. A. Shah, L. Zhao and I. -M. Kim, “Joint network control and resource allocation for space-terrestrial integrated network through hierarchal deep actor-critic reinforcement learning,” IEEE Transactions on Vehicular Technology, vol. 70, no. 5, pp. 4943–4954, May 2021.
- M. M. Sande, M. C. Hlophe and B. T. Maharaj, “Access and radio resource management for IAB networks using deep reinforcement learning,” IEEE Access, vol. 9, pp. 114218–114234, 2021.
- M. Agiwal, A. Roy and N. Saxena, “Next generation 5G wireless networks: A comprehensive survey,” IEEE Communications Surveys & Tutorials, vol. 18, no. 3, pp. 1617–1655, 2016.
- L. Liang, H. Ye, G. Yu and G. Y. Li, “Deep-learning-based wireless resource allocation with application to vehicular networks,” in Proc. of the IEEE, vol. 108, no. 2, pp. 341–356, Feb. 2020.
- A. Iqbal, M. -L. Tham and Y. C. Chang, “Double deep Q-network-based energy-efficient resource allocation in cloud radio access network,” IEEE Access, vol. 9, pp. 20440–20449, 2021.
- Y. Zhang, X. Wang and Y. Xu, “Energy-efficient resource allocation in uplink NOMA systems with deep reinforcement learning,” in 11th Int. Conf. on Wireless Communications and Signal Processing (WCSP), Xi’an, China, pp. 1–6, 2019.
- X. Lai, Q. Hu, W. Wang, L. Fei and Y. Huang, “Adaptive resource allocation method based on deep Q network for industrial internet of things,” IEEE Access, vol. 8, pp. 27426–27434, 2020.
- F. Hussain, S. A. Hassan, R. Hussain and E. Hossain, “Machine learning for resource management in cellular and IoT networks: Potentials current solutions, and open challenges,” IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 1251–1275, 2021.
- S. Yu, X. Chen, Z. Zhou, X. Gong and D. Wu, “When deep reinforcement learning meets federated learning: Intelligent multitimescale resource management for multiaccess edge computing in 5G ultradense network,” IEEE Internet of Things Journal, vol. 8, no. 4, pp. 2238–2251, 2021.
- J. Zhang, Y. Zeng and R. Zhang, “Multi-antenna UAV data harvesting: Joint trajectory and communication optimization,” Journal of Communications and Information Networks, vol. 5, no. 1, pp. 86–99, March 2020.
- A. Pratap, R. Misra and S. K. Das, “Maximizing fairness for resource allocation in heterogeneous 5G networks,” IEEE Transactions on Mobile Computing, vol. 20, no. 2, pp. 603–619, Feb. 2021.
- Y. Fu, X. Yang, P. Yang, A. K. Y. Wong, Z. Shi, et al.,, “Energy-efficient offloading and resource allocation for mobile edge computing enabled mission-critical internet-of-things systems,” EURASIP Journal on Wireless Communications and Networking, vol. 2021, no. 1, 2021.
- A. Al-Hourani, S. Kandeepan and S. Lardner, “Optimal LAP altitude for maximum coverage,” IEEE Wireless Commun. Lett., vol. 3, no. 6, pp. 569–572, Dec. 2014.
- M. A. Ouamri, M. E. Oteşteanu, G. Barb and C. Gueguen “Coverage analysis and efficient placement of drone-BSs in 5G networks,” Engineering Proceedings, vol. 14, no. 1, pp. 1–8, 2022.
- Md. A. Ouamri “Stochastic geometry modeling and analysis of downlink coverage and rate in small cell network,” Telecommun Syst, vol. 77, no. 4, pp. 767–779, 2021.
- D. Alkama, M. A. Ouamri, M. S. Alzaidi, R. N. Shaw, M. Azni et al., “Downlink performance analysis in MIMO UAV-cellular communication with LoS/NLoS propagation under 3D beamforming,” IEEE Access, vol. 10, pp. 6650–6659, 2022.