Cross-Layer Design for EH Systems with Finite Buffer Constraints

: Energy harvesting (EH) technology in wireless communication is a promising approach to extend the lifetime of future wireless networks. A cross-layer optimal adaptation policy for a point-to-point energy harvesting (EH) wireless communication system with finite buffer constraints over a Rayleigh fading channel based on a Semi-Markov Decision Process (SMDP) is investigated. Most adaptation strategies in the literature are based on channel-dependent adaptation. However, besides considering the channel, the state of the energy capacitor and the data buffer are also involved when proposing a dynamic modulation policy for EH wireless networks. Unlike the channel-dependent policy, which is a physical layer-based optimization, the proposed cross-layer dynamic modulation policy is a guarantee to meet the overflow requirements of the upper layer by maximizing the throughput while optimizing the transmission power and minimizing the dropping packets. Based on the states of the channel conditions, data buffer, and energy capacitor, the scheduler selects a particular action corresponding to the selected modulation constellation. Moreover, the packets are modulated into symbols according to the selected modulation type to be ready for transmission over the Rayleigh fading channel. Simulations are used to test the performance of the proposed cross-layer policy scheme, which shows that it significantly outperforms the physical layer channel-dependent policy scheme in terms of throughput only.


Introduction
Despite all the above properties of green communication represented in EH wireless networks, certain difficulties should be investigated and perhaps a new design dimension should be added. The main challenge of EH technology is the time-varying energy harvesting [9] and the scarcity of energy amount [10], which lead to the conclusion that the communication performance guarantee is difficult to fulfill. Therefore, considerable efforts have been made to improve the performance of EH wireless communication [11,12]. It is highlighted that adjusting the randomness and low rate of energy arrivals is quite crucial to develop efficient transmission policies and schemes for EH wireless networks. Due to the time-varying energy arrivals in EH technology, the transmission power needs to be adjusted even if the wireless fading channel remains unchanged, which is an additional challenge and unique feature of EH wireless networks [13].
In contrast, due to the additional metric of data buffering characteristics, the buffering delay must be considered in the queue, and resource allocation algorithms are proposed in [14,15]. Moreover, different types of delay constraints, including delay-tolerant and non-delay-tolerant views, need to be explored along with guaranteeing QoS on delay properties while proposing resource allocation schemes. A non-delay tolerant approach can be classified as a real-time application, such as real-time streaming, online gaming, and intelligent and smart assisted systems [16], which can be considered as a hard delay constraint. An example of delay-tolerant applications is traditional Internet services such as file transfer, email exchange, and web browsing, which can generally tolerate some delays in certain areas. However, a modern power-constrained wireless communication system is constrained by wireless time-varying fading channels as well as random arrival rate of traffic, which can lead to greater difficulties in ensuring the required QoS characteristics for real-time applications. Also, further limitations arise for wireless nodes that use energy harvesting technology and can therefore be referred to as Energy Harvesting Nodes (EHNs). Although EHNs are suitable for remote operation in monitored areas without human intervention, the random nature of energy harvesting technology introduces a new paradigm in resource allocation, including power allocation and scheduling. Therefore, a cross-layer dynamic modulation policy is a guarantee to meet the overflow requirements of the upper layer by maximizing throughput while optimizing transmission power and minimizing packet loss.
In this paper, we investigate the cross-layer dynamic modulation policy for energy harvesting (EH) communication system by dynamically adapting the variable power and variable rate with finite buffer constraints, including states for each channel condition, data buffers, as well as energy capacity, to guarantee that the network throughput is maximized while minimizing both the energy consumption and the number of dropped packets. Due to the natural instability of wireless timevarying fading channels and the arrival rates of data and energy, the transmission power and rate generally depend on the time-varying channel condition, the data buffer condition, and the energy capacity.
In general, both the data buffer and the energy capacity are limited by finite memory in practice. Consequently, in addition to optimizing the channel adaptive strategy, the buffers in the system must also be considered. Moreover, statistical optimization techniques cannot lead to the determination of an exact scheduling strategy due to the overlap and consideration of several elements, such as varying channel gains, the randomness of data arrivals, and the randomness of energy arrivals. Moreover, since the packet scheduling formulation is inherently dynamic, the formulation is classified according to the criterion of stochastic dynamic programming, i.e., dynamic optimization. The Markov decision process (MDP) is one of the formulas that use the criterion of dynamic optimization, a mathematical framework that analyzes system dynamics in uncertain environments. Since the decisions made using the MDP approach follow time-based characteristics, the MDP approach is not suitable for the decision epochs that have random characteristics in terms of energy and data arrival, resulting in different durations of the decision epochs.
Therefore, a wireless communication system with the EH capability is event-based in nature. Therefore, the semi-Markov decision process (SMDP) is the more appropriate approach to propose a wireless communication system with EH capability and finite buffer constraints over a wireless fading channel. In this paper, the proposed system model is formulated using the SMDP scheme to increase the throughput of the network while allocating less energy and minimizing packet dropping. To the best of our knowledge, no recent work in the open literature has studied the throughput maximization and resource allocation problem of point-to-point EH wireless communication system with finite buffer constraints over a Rayleigh fading wireless channel as an infinite horizon SMDP-based problem under data buffer and uncertainty constraints for wireless fading channels.
The main contributions of this paper are summarized as follows: -Formulation of a novel framework for a point-to-point EH wireless communication system with finite buffer constraints on the source node over a fading channel based on an SMDP approach to maximize the network throughput by optimally allocating the harvested energy while maintaining minimum packet overflow.
-A dynamic programming technique based on SMDP is proposed to dynamically adapt the change of channel and/or buffer states, which results in optimally satisfying the physical layer requirements BER on the one hand and the data link layer overflow requirements on the other hand.

Related Work
In [17], the authors proposed a resource allocation framework for a point-to-point EH wireless communication system based on the SMDP approach that maximizes the network throughput by considering only channel adaptation. Since the transmission scheduling is only channel-based, the proposed scheme provided the benchmark for the maximum performance of the physical layer under the assumption that both the data buffer and the energy buffer are infinite and the data buffer is full with stored data to be transmitted. For practical wireless networks, the adaptation of packet transmission to channel conditions along with consideration of buffer state is critical. The goal of adaptation is to stabilize system performance by providing maximum throughput while reducing the drop probability and minimizing buffer delay. The design of a wireless communication system with EH capability has generated many research activities in the field of modern wireless technology. The throughput maximization problem for a point-to-point EH wireless communication system over a fading channel was considered, while the authors in [18] attempted the same system model by proposing a low-complexity and optimal transmission policy called recursive geometric water filling (RGWF). Two-hop wireless cooperative transmission with EH capable nodes have been well studied recently.
In [19], an optimal transmission policy for the two-hop wireless communication system with EH capability at the relay node was proposed. The throughput maximization problem for a twohop wireless communication system with EH capability at the source node was studied in [20] and solved with a cumulative curve algorithm. In [17], the RGWF algorithm was used to maximize the throughput of the two-hop EH system. Moreover, in [21], the authors considered ultra-dense small cell networks with EH capability on the base stations, where the resource allocation problem is studied and the joint user allocation and optimal power allocation are modeled based on mixedinteger programming. Moreover, in [22], the authors have tried to solve the problem of minimizing the outage probability of a network with mesh topology with sources' EH capabilities.
On the other hand, numerous system models have been formulated based on the SMDP approach, such as mobile cloud computing networks, vehicular cloud computing networks, wireless networks, and cognitive vehicular networks. The authors in [23] showed how to manage the cloud resources, i.e., virtual machines, to support continuous cloud service across multiple cloud domains based on SMDP. In [24], the authors proposed a framework for shared multi-resource allocation for the same proposed system model in [23] using SMDP. The main objective of the proposed framework is to achieve an optimal multi-resource allocation decision by maximizing the total rewards while reducing the probability of service rejection and the time of service operation. In [25], the authors propose an optimized resource allocation scheme to optimize the long-term potential reward of the SMDP-based vehicular cloud computing system. The long-term expected reward of the system is derived by considering both the return and cost of the proposed system model and the changing characteristics of the resources. From the perspective of cognitive vehicle networks, the authors in [26] captured the dynamic property of vehicle user mobility and the change in availability in the cognitive band, where the shared resource allocation framework is formulated using the SMDP approach.
In [27], the authors considered a Narrowband-Internet of Things (NB-IT) edge computing system where Mobile Edge Computing (MEC) servers were deployed at NB-IoT enabled BSs. As a result, the IoT sensors can single-hop their sensed data into the MEC servers and utilize maximum computing and storage capacities. In general, the normal MDP model requires additional overhead because more information about the system states is needed to store information about previous system association actions. Also, scheduling and offloading decisions need to be made at each time point of the slot. Therefore, the Continuous-Time Markov Decision Process (CTMDP) model was used to formulate the NB-IoT system in [27] to reduce both the total power consumption of the IoT sensors and the long-term average system delay. Similarly, in [28], the authors used the CTMDP-based scheme to formulate the vehicle cloud resource allocation problem for mobile video services. In particular, the authors investigated dynamic offloading, which they claimed has a great impact on expanding the number of shareable resources, in addition to reducing the cost of communication paths. Therefore, the goal of the model was to improve the use of the iterative algorithms imposed in the SMDP scheme. Also, the authors in [29] used the SMDP-based scheme to propose a service function allocation algorithm for mobile edge cloud networks.
The problem was defined by considering a system reward and cost. The value iteration algorithm was used to obtain the maximum reward and reduce the rate of rejected requests. Also, many efforts have been made to utilize the promising technology Software-Defined Network (SDN) in IoT applications. The authors in [30] used SMDP to formulate the radio resource allocation problem to maximize the expected average reward of the proposed SDN-based IoT networks. The optimal solution was obtained by a relative value iteration algorithm in SMDP, while simulation results showed that the proposed resource allocation scheme successfully improved the long-term average system rewards compared to other similar resource allocation schemes in the literature. Moreover, an optimal power allocation for wireless sensors powered by a dedicated radio frequency energy source was formulated using the SMDP scheme for both time division multiplexing and frequency division multiplexing [31]. Simulation results showed that the proposed scheme outperformed the heuristic greedy method in the literature.

System Model
We consider an EH technology for a point-to-point wireless communication system over fading channels with a single EH transmitter and a single receiver. The transmitter is equipped with finite energy capacitor K max and finite data buffer D max as shown in Fig. 1a. We assume that the point-to-point transmission is represented as radio frames, where a radio frame divides into multiple time-slots.
Let λ c denote an average packet arrival rate at the transmitter data buffer assuming it follows the Poisson distribution. Moreover, let λ e denote an average EH arrival rate at the transmitter energy capacitor. The protocol data unit (PDU) at the higher level is classified as packets, where each packet consists of a bunch of information bits and they are cumulated at the transmitter data buffer with finite size. In contrast, the PDU at the physical layer is classified as blocks, where each block is made up of a group of symbols. According to the states of channel condition, data buffer, and energy capacitor, the scheduler chooses a particular action u ∈ U, which is equivalent to the selected modulation constellation. Based on chosen modulation type, packets will be modulated into symbols for being ready for transmission over the Rayleigh fading channel. On the other hand, received symbols will be demodulated into the stream of bits, where bits' streams are cumulated as symbols and stored at the receiver data buffer. As the last step, the received demodulated packets are delivered to the application layer through the network's stack.
We assume that the discrete duration of time-slots represents by frames that contain N s channels, as shown in Fig. 1b. Depending on the scheduler's decision, the number of transmitted packets may be varied at each frame in the time-line. Assuming w n is the number of packets that are extracted from the data buffer for purpose of transmission, R n is the adaptive modulation rate at each transmission in the unit of bits/symbol. The relationship between the number of packets transmitted and the rate of modulation is expressed as, where N p is the size of packets in a unit of bits.

Channel Modeling
We consider Rayleigh fading channel that follows ergodic flat fading in our analyzed EH technology system. The probability density function (pdf) of the fading power gain for the Rayleigh channel follows exponential distribution [32].
where γ is the average power gain of the received channel.
Rayleigh fading channel is modeled as a first-order Markov model and channel states in the system are described as C = {c 1 , c 2 , . . . , c C }. Probability transition matrix among states, on the other hand, is constituted by P = [P c i , c j , 1 ≤ i, j ≤ C], in which C is the number of channel states that are not overlapped, whereas P c i , c j is the transition probability between states, i.e., P c i , c j = P(c j | c i ), 1 ≤ i, j ≤ C. Let = {γ 0 , γ 1 , . . . , γ C } describes the thresholds set of received SNR in increasing sequence, where γ 0 = 0, γ i < γ i+1 and γ C = ∞. For example, to illustrate, the channel may consider in-state c i if γ i−1 ≤ γ ≤ γ i . In this paper, a C-state wireless channel model is described our proposed point-to-point EH transmission model, where C-possible channel states may illustrate as c ∈ {c 1 , c 2 , . . . , c C }.

Energy and Battery Model
The transmitter is assumed to be equipped with a finite energy capacitor that can hold a maximum of K EUs. Let K = {k 0 , k 1 , . . . , k K } denote the space of capacitor state in term of EU occupancy, where k j corresponds to j ∈ {0, 1, . . . , K} EUs in the capacitor. The number of EUs in the buffer is determined dynamically based on capacitor status, energy consumption, and new harvested energy. The dynamics of the capacitor occupancy is given by, where g ∈ {0, 1, . . . , G} denotes the EUs that are harvested, and o ∈ {0, 1, . . . , O} represents the number of consumed energy at each time-slot for transmission purposes.

Queue Dynamics with Finite Buffer Constraint
The transmitter utilizes its data buffer to store the arrival packets. Let D = {d 0 , d 1 , . . . , d D } represent the space of data buffer state in term of buffer occupancy and d i i ∈ {0, 1, . . . , D} denotes the range of stored packets in the buffer. The number of stored packets in the buffer at each decision-epoch is determined dynamically based on the current buffer state, transmitted packets, and new incoming traffic, and it can be expressed as follows, where f ∈ {0, 1, . . . , F} corresponds to the number of received packets into the data buffer whereas w ∈ {0, 2, . . . , W } denotes the packets that are extracted from the data buffer for purpose of transmission. The constraints of the maximum number of a transmitted packet through the wireless transmission are the number of packets that physically exist in the data buffer as well as the instantaneous link capacity. The data buffer is assumed to be stable, and it is represented by the buffer overflow constraint: The equation implies that the data buffer size d D plays the main role in determining whether a strict or loose buffer overflow constraint exists. In particular, it is noticeable that a small data buffer size leads to a strict buffer overflow constraint, while a large data buffer size leads to a loose buffer overflow constraint. Since the decisions made with the MDP approach follow timebased characteristics, the MDP approach is not suitable for the decision epochs that have random characteristics in terms of energy and data arrival, which leads to different duration of the decision epochs. Therefore, a wireless communication system with the capability of EH is inherently event-based. Therefore, the semi-Markov decision process (SMDP) is a more suitable approach to propose a wireless communication system with EH capability and finite buffer constraints over a wireless fading channel.

SMDP Formulation of the Cross-Layer Scheduling
As discussed earlier, it is necessary to establish an approach that is suitable to account for the variability in decision epoch duration due to the variation in energy arrival as well as the arrival of data packets on the transmit capacitor or data buffer. Therefore, the time between successive control decisions varies because the decision epoch duration depends on the current states of the system as well as the action selection of the epochs, which vary inherently. On the other hand, the weight of the decision epoch cost is determined by the time it takes the system to move from one state to another. Consequently, the problem considered above is constituted as an SMDP process satisfying the dynamic nature and the required dynamic programming. The objective of our work is to implement a cross-layer scheduler for a point-to-point EH wireless network that optimally adjusts the energy allocation and transmission rate based on the physical layer (channel state) and data link layer (energy capacitor and data buffer states) such that the network throughput is maximized and packet overflow is minimized. The proposed problem can be modeled based on a semi-Markov decision process that considers the following tuple {S, A s , W, T s , P}, corresponding to system states, actions, system reward, consumption time, and transition probabilities, as explained below.

System States
To resolve the proposed dynamic programming problem, a composite system state space is structured containing the change of the channel space, information buffer state space and vitality capacitor state space.

Set of Actions
Adaptive power allocation and modulation constellation scheme are proposed to verify an action that dynamically adapts the power/rate transmission scheme, which has a two-to-one mapping between the energy allocation and the transmission rate from one hand, and the number of transmitted packets from another hand. Depending on the instantaneous composite system state s n , the controller chooses an action u n , where U = {u 1 , . . . , u U } denotes a finite space of actions. Generally, a policy π that is part of a policy system space π can be constructed by π = {μ 1 , μ 2 , . . .}, and an action u n = μ n (s n ) at decision-epoch n may be taken at each instant. Moreover, considering the set of several allocated EUs E = {e 0 , e 1 , . . . , e E } and the range of available transmission rates W = {w 0 , w 1 , . . . , w W }, two mapping functions φ and ψ can be identified, where φ maps an action of several allocated EUs that is applied φ : U → E and ψ maps an action of selected transmission rate for transmission ψ : U → W, respectively. Assuming P e (γ ) is the instantaneous bit error rate (BER) with received SNR γ , BER expression can be found for M-QAM and it is expressed by [33]; where v = log 2 (M) is the number of modulated bits into 2 v -QAM symbol and P denotes the average transmitted signal power. The instantaneous received SNR for a constant transmit power is given by γ = hP/σ 2 , where h is the power gain of the channel and σ 2 is the variance of channel noise. Assuming the power of the transmission is denoting as P T , the instantaneous received SNR at interval n is determined by γ P T /P. Two adaptation policies are considered to examine the implementation of the proposed cross-layer wireless communication system with EH constraints:

Channel-Dependent Static Policy
Adaptive modulation rate is selected based on the channel condition status only but it maintains a fixed specified BER. However, this adaptation is not implementable in practice because it does not consider the finiteness of the data buffer and consequently the overflow equipment.

Dynamic Joint the Finiteness Buffer and Channel-Dependent Policy
The SMDP process is constituted to firmly formulate the dynamic Joint adaptation both of finiteness buffer as well as the channel-dependent state. While the proposed policy considers both buffer states and channel state, the scheduler/controller determines the optimum action for each state that maximizes the long-run system reward. The proposed policy satisfies the system requirements in maximizing the system reward while ensuring minimum energy consumption and packet overflow. The combination of energy allocation and transmission rate is set by

Transition Matrix
The probability of transition from a single state s = s q to another state s = s r for a particular action is determined by transition probability, which is denoted by P(s | s, u). At each particular action u = u i , the transition matrix can be formulated using Kronecker product of channel transition, energy buffer, and data buffer matrices, where all are independent.
System state transition probability from state s = s q = [d i , c l , k j ] to state s s r = [d x , c y , k z ] for action u = u i can be given by,

Reward Model
The choice for action in a state is selected by associated costs. the controller chooses the action that results in the maximum reward. A cost function Q(s i , u j ) constitutes the relationship between the state-action pair (s i , u j ) and the system reward. System reward r(s, a) (also called associated cost) at each pair of system state and corresponded action is given by, n(s, a) denotes the instant income and cost of the system when a specified action is taken a(s) at a particular state s. We describe these objective functions as follows.

Adaptive Modulation Rate
It is equivalent to the immediate system reward for state-action pair (s, a) and is described as modulation constellation set Q E (s, a) = [no transmission, QPSK, 16QAM 64QAM = [0, 2, 4, 6] bits/symbol, which is the number of packets that are token from the data buffer for transmission.

Buffer Overflow Cost
During the buffer is at full state, the probability of dropped packets is high. The immediate overflow cost is the number of packets that are dropped from the buffer and it can be expressed as The system expected cost g(s, a), on the other hand, can be described as follows: s, a) = c(s, a)τ (s, a), where τ (s, a) denotes the service time, and c(s, a) indicates the power consumption cost that is considered by choosing a certain action u j at a certain channel state c i , shown as; where the power cost c(s, a) = P T can be found using (6) by replacing the instantaneously received SNR γ into average received SNR γ on the given equation:

Sojourn Time
After choosing an action, the normal average estimated time τ (s, a) is the length of the taken time from the current event to other occurrences. Consequently, the normal average rate of an occurring event γ (s, a) Is the summation of the rates of all element processes from one state to another after an action a(s) is selected. Computation of γ (s, a) and τ (s, a) where R i, l is the modulation rate that is adapted by occupying i EU when the channel is at state l. In case of harvesting new EUs (ẽ ∈ {F}) or arriving new packets at the transmitter's data buffer (ẽ ∈ {G}), no action is taken and no continuing processing service is on run. Once the channel state is changed (ẽ ∈ {C l }), the scheduler determines the system state and then taken action consequently. The expected instant reward r(s, a) for time period τ (s, a) is determined based on the discounted reward model that is shown at [34], as below: where a)] and α is a continuous-time discounting factor. Relying on the transition probabilities at (7) and also the reward model at Eq. (13), we can formulate the maximal discounted long-term reward of the state s based on Bellman equation which described the discount reward model as follows: where λ = γ (s, a) α + γ (s, a) < 1.

Adaptation Policy of the Cross-Layer Design
The policy of the cross-layer adaptation scheme takes into account the energy capacitor and data buffer occupancies as well as the channel state to target the overflow cost. For example, the transmitter requires different transmit powers at different channel states on time-varying channels. However, the sender could also transmit at a higher rate to avoid packet congestion when the data buffer is full, so to speak, or when the average data arrival rate is high and vice versa. In this section, we show how to optimally adjust the modulation rate for cross-layer EH networks using the SMDP approach. It is based on the iteration approach discussed in [35]. Can obtain an optimal policy as described in Algorithm 1.

Algorithm 1:
Adaptation policy of the cross-layer design based on SMDP approach 1. Set long-term incentive for each state s. and set iteration k = 0, and ε > 0, respectively. 2. Compute the corresponding reward for each state s using (14).

End
Initially, both v(s) and P opt (s) are launched at zero for each state s. Also, , v(s) and P opt (s) are continuously determined till the rate of v(s) for each state s is equal to that of the associated v(s ) in the previous iteration, meaning that the process of converging is achieved. The overall output performance P opt (s) for all states is the system's taking actions policy, which ends up in acquiring the maximal discounted reward.

Numerical Results
In this section, we show the performance of two adaptation strategies. We set our parameter values as follows: we assume that the energy extraction rate and the packet arrival rate follow a Poisson distribution with an average rate (λ e = 2) and (λ c = 3), respectively. Moreover, we assume that the finite energy capacitor K max = 20, finite data buffer D max = 20, N s /N p = 1, and the number of channel states and actions are C = 4 and U = 4, respectively. An independent and identically distributed Rayleigh fading channel with a mean value (m = 1) is considered. Moreover, average transmission power is set by (P = 1 mW ) and the corresponding normalized average received signal to noise ratio (SNR) is valued by (γ = 1). Also, and average channel bit error rate (BER) and modulation constellation set are assumed as (P e = 10 −4 ) and, w = [0, 2, 4, 6] bits/symbol, respectively.

Figure 2:
Relationships between the total throughputs and the overflow probability rates with the change of packet arrival rate among different schemes The total throughput and blocking probability of the static adaptation policy on the physical layer and the dynamic adaptation policy on the other layer are compared in Fig. 2. It can be seen that the throughput of our proposed cross-layer policy scheme achieves the same performance as the benchmark scheme. However, the benchmark scheme does not track the state of the energy capacitor in each time period, since it is assumed that the energy available in each interval is infinite. Nevertheless, the average transmitted power is limited to the bounded λ e . Hence, the control action may not always be feasible. Therefore, although the benchmark scheme is characterized by its low computational complexity, this scheme is not applicable in reality. Finally, the dashed curves reflect the actual average system throughput in the case of the channel-dependent static adaptation strategy. The gap between the average throughput in the case of the cross-layer dynamic strategy and the channel-dependent static strategy grows as the packet arrival rates grow. Fig. 2 also shows how with the growth of packet arrival rates in the channel-dependent static strategy, the blocking risk increases, while the blocking rate in the cross-layer dynamic strategy is minimal and approaches zero even with the increase in data arrival rate. The reason is that in the channel-dependent policy, the scheduler uses different modulation constellations based only on the channel state without tracking the capacitor and buffer states. Consequently, the policy has no guarantee of overflow requests. On the other hand, in the cross-layer policy, BER and the packet overflow requirements are guaranteed for a high data arrival rate. Fig. 3 shows the trade-off curve between maximum throughput and maximum buffer for the cross-layer dynamic and static policies for a layer. It can be seen that the total throughput increases with the growth of the finite buffer size for both policies. However, while the throughput growth rate is high for smaller buffer sizes, the growth rate slows down as the data buffer size increases. It is also seen that the proposed cross-layer strategy achieves the same overall optimal throughput performance as the benchmark method. Moreover, it can be seen from the figures that the proposed scheme outperforms the static approach and the performance difference between them increases as the maximum data buffer size increases. It can be concluded that although the complexity of the cross-layer dynamic scheme is higher, it is still worth implementing due to its performance over the static method, especially as the data buffer size increases. Energy harvesting (EH) technology in wireless communications is a promising approach to extend the lifetime of future wireless networks. Unlike most adaptation strategies in the literature, which are based only on channel-dependent adaptation at the physical layer, this paper investigates a cross-layer optimal adaptation strategy for a point-to-point energy harvesting (EH) wireless communication system with finite buffer constraints over a Rayleigh fading channel based on a Semi-Markov Decision Process (SMDP). While the channel-based transmission scheduling provides the benchmark for the maximum performance of the physical layer under the assumption that the data buffer always has data to transmit and the size of the data buffer and the energy buffer is infinite, the practical adaptation design needs to be invented to stabilize the system performance by providing the maximum throughput while reducing the drop probabilities and minimizing the buffer delay for a cross-layer design. Therefore, the SMDP framework has been applied to determine the optimal policy of a cross-layer design for a single-hop network EH based on channel-dependent static adaptation and cross-layer dynamic adaptation. In cross-layer adaptation, throughput is maximized by tracking the state of the battery, data buffer, and channel to optimally control the transmit power and rate over the transmit time intervals. Illustrating the numerical results, it is noticed that the cross-layer adaptation policy outperforms the channeldependent policy by guaranteeing the overflow rate and hence the network throughput in a network with green communication features and EH sources. Moreover, the proposed cross-layer scheme was shown to be implementable compared to the benchmark scheme and still provides the same throughput as the benchmark scheme for all packet arrival rates and maximum buffer size. As a suggestion for future work, an optimal transmission policy based on the SMDP formulation can be applied to a cooperative wireless communication where the source and relay have energy harvesting capability, and the model is designed based on the SMDP formulation. Since the proposed model is based on a single-hop connection between the source and the destination, relays with the capability EH can help relay the information signal when there is a direct connection between the sender and the receiver (cooperative communication), saving more energy and speeding up the data transmission. Both cooperative communication and relay selection protocol can be analyzed in terms of throughput, outage probability and energy efficiency.