|Intelligent Automation & Soft Computing |
Genetic Algorithm Energy Optimization in 3D WSNs with Different Node Distributions
Department of Electrical Engineering, Al-Zaytoonah University of Jordan, Amman, 11733, Jordan
*Corresponding Author: Yousef Jaradat. Email: email@example.com
Received: 09 October 2021; Accepted: 24 November 2021
Abstract: Optimal node clustering in wireless sensor networks (WSNs) is a major issue in reducing energy consumption and extending network node life time and reliability measures. Many techniques for optimizing the node clustering process in WSN have been proposed in the literature. The metaheuristic algorithms are a subset of these techniques. Genetic algorithm (GA) is an evolutionary metaheuristic technique utilized to improve the network reliability and extending the network life time by optimizing the clustering process in the network. The GA dynamic clustering (GA-DC) algorithm is proposed in this paper to extend the network reliability and node life time of three dimensional (3D) WSN. The GA-DC algorithm made use of an improved fitness function that takes into account a variety of metrics such as energy expenditure per protocol round, clustering distance, and the number of long-distance wireless connections. There have been two types of simulation scenarios run. First, simulation results show that the GA-DC algorithm increases network life time by 80% and network throughput by 55% when compared to the well-known LEACH protocol. Second, simulation results show that the uniform node distribution outperforms the normal and exponential distributions in terms of network life time by 5.7% and 7%, network reliability by 4.2% and 76%, and data throughput by 10.85% and 19.54%, respectively
Keywords: Dynamic GA; selection; mutation; crossover; clustering; energy efficient
Rapid technological advances in the field of micro electro mechanical systems (MEMS) have led to the development of cheap and autonomous miniature sensor nodes . These sensor nodes have the capabilities of sensing and monitoring the environment, processing and aggregating data, and communicating data reports to each other or to a central location, usually referred to as the sink or the base station (BS). Wireless sensor network (WSN) is the interconnection of large number of these sensor nodes to serve an application-specific purpose . Sensor nodes have limited battery energy sources and these batteries are mostly not rechargeable. The communication function of a sensor node usually consume energy the most . If the energy source of a particular node gets drained, the node is considered dead and it becomes useless. WSNs have a wide variety of real-world applications. They are used in both civilian and military applications, including agriculture, industry, healthcare, surveillance, target tracking, and security management . Fig. 1 shows a typical sensor node block diagram .
Sensor nodes can send their collected data directly to the sink node , but this consumes more energy and leads to the node's premature death, introducing other problems into the network such as the coverage/hole problem . Clustering is utilized to balance energy consumption between sensor nodes. Clustering is based on grouping nearby nodes or nodes with similar characteristics or purpose. Nodes in a cluster can be classified into two categories: cluster head (CH) and member nodes (MN). MNs collect data reports from the environment and send them to the CHs. Moreover, CHs aggregate, compress and send the information to the BS node. The CH to BS communication can be done directly or through multiple hop techniques . CHs usually consume more energy compared to the MNs. Rotation of the CH role between nodes in the network is required to balance the overall network energy and extend the network life time . Many protocols have been developed for clusters creation, operation, and maintenance. These include low energy adaptive clustering hierarchy (LEACH), stable energy protocol (SEP), hybrid energy efficient distributed (HEED) and many more . The inability of most clustering protocols in WSNs to reduce overall network energy and balance energy consumption among sensor nodes is a known issue . In some cases, a significant reduction in overall network energy expenditure is achieved at the expense of unequal remaining energies among sensor nodes. In other cases, sensor node remaining energies are balanced at the expense of increasing the overall network consumed energy.
Nature-inspired or meta-heuristic algorithms have been applied to WSNs to achieve a balance between balancing energy consumption among sensor nodes and lowering overall energy consumption. Genetic algorithm (GA) is one of the most popular and few metaheuristics techniques used for solving the four fundamental problems of WSNs. These include: energy efficient clustering, localization of sensor nodes, data aggregation, and optimal coverage . In this paper, GA dynamic clustering is used to improve the lifetime and reliability of three-dimensional (3D) WSNs by optimizing the number of clusters in the network and balancing consumed energies among sensor nodes for three different randomly distributed sensor nodes. In typical WSN settings, two dimensional network (2D) and uniform node distribution are studied. In real-world applications, the environment is 3D, and sensor nodes are distributed based on terrain and deployment location. Sensor deployment methods can be deterministic or random. Any distribution can be used for random deployment. Three randomly distributed techniques are studied and compared in this work. Uniform, normal, and exponential distributions are our case studies.
This paper has the following contributions: First, GA dynamic clustering (GA-DC) algorithm is proposed to extend the life time of 3D WSN and network reliability. Second, three random deployment strategies of sensor nodes are investigated and evaluated for better network life time and reliability. Three, a new GA fitness function is used to optimize the number of clusters in the network, which essentially includes information about the type of connections MNs establish with CHs in a cluster and CHs establish with BSs in the network.
The rest of the paper is organized as follows. Section 2 overviews the related work. Section 3 describes GA operators’ details. In Section 4 GA fitness function is introduced a long with cluster formation and maintenance phases. Section 5 provides the simulation setups and configuration. Simulation results are explained in Section 6. Finally, we conclude our research in Section 7.
2 Related Work
A lot of researches have been conducted utilizing nature inspired algorithms for optimizing WSNs life time. GA has showed a great applicability and solutions to different challenges in WSNs. In , the authors have proposed GA based energy efficient clustering hierarchy (GAECH) protocol, the purpose of this protocol is to reduce the overall network energy and extends the life span of the network. Their results have shown great increase in network life time compared to other renowned clustering protocols especially when the BS is located outside the network area. The authors in  have proposed GAEEC, GA based energy efficient clustering, which divided the network into optimal number of clusters utilizing GA, then GA is utilized again to select the suitable node within a cluster as a CH based on their fitness function per protocol round. Their results have shown improvement over the state-of-art LEACH protocol in terms of stable period and network throughput. In , the authors have proposed genetic algorithm-based optimized clustering (GAOC) and multiple data sinks based GAOC (MS-GAOC) for CH optimization in heterogeneous WSN. GAOC is utilized to select the optimal number of CHs in the network by evaluating a cost function that depends on node density, remaining energy and distance to the BS. To overcome the hot-spot problem and saving communication energies to the BS, MS-GAOC is utilized. Their results have outperformed state-of-the-art clustering protocols for different performance metrics viz. network life span, throughput, stability period and remaining energy. The authors in  have proposed (FCM-GA), Fuzzy-C means genetic algorithm, protocol. The protocol has two stages. In the first stage, FCM is utilized to perform clustering based on three objectives including appropriate online energy distribution of each cluster, intra-cluster distances, and distances between CHs and the BS. FCM-GA results have shown great improvement over other protocols viz. direct transmission, MH-FEER, and SHMEER in terms of network life span and network throughput. In , the authors have proposed variants of PEGASIS protocol utilizing genetic algorithm for energy efficient clustering and routing in 3D WSN. GA has been used to build near-minimal length chain for data transmission. New approach for devising CHs based on the distance and remaining energy for better load balancing have been introduced. Their results have shown superiority over normal PEGASIS protocol in terms of first node-die (FND) metric. The authors in  have proposed optimized version of the LEACH protocol (O-LEACH). OLEACH uses GA for finding the optimal route. Their results have shown a reduction of 17.39% in energy consumption. In , the authors have studied the implication of different node deployments on the energy consumption in WSN utilizing k-means and k-medoids clustering algorithms. It has been shown that the uniform and beta distributions of node deployment with k-medoids have the least energy consumption compared to other distributions. The authors in  have proposed genetic algorithm based energy efficient clusters (GABEEC). GABEEC protocol is based on two phases: the setup phase and the steady state phase. In their proposed method, the clusters created during the setup phase are not changed throughout the network. As a result, there are static clusters with dynamically changed CHs in each round. The new CH is chosen utilizing the GA and based on the residual energy of the current CH and its member nodes. GABEEC protocol outperforms the LEACH protocol in terms of extending network life time and throughput. The downside of GABEEC is that it is used in small size network in which all of the node connections utilize the free space model for communication. No long range connections are established between member nodes and CHs and between CHs and the BS. In  the authors have proposed 3D-cluster, a new clustering routing protocol, which innovatively combines node scheduling with routing optimization, in order to optimize network energy consumption and network lifetime. The improved genetic algorithm determines the optimal node set by minimizing the number of working nodes while satisfying the full coverage of discrete point of interest (DPOI). 3D-cluster outperforms other protocols in terms of network lifetime, throughput, and energy consumption, according to simulation results. However, the proposed algorithm is centralized, whereas the network is distributed. Furthermore, the co-frequency interference of sensor nodes is not taken into account in this protocol.
3 Genetic Algorithm Overview
GA is an evolutionary, bio-inspired, meta-heuristic and stochastic optimization algorithm . GA is composed of three operators; selection (reproduction), crossover and mutation. The solution of GA is encoded in a string (binary or decimal) referred to as chromosome or individual. Elements of a chromosome are called genes. All chromosomes should have the same number of genes. A fitness (cost) function is used to evaluate the fitness value of a chromosome. A collection of chromosomes are called population. Fig. 2, Shows a general GA flow diagram .
GA is an iterative algorithm. It applies the three operators at each run to find new chromosome which may yield better solution. The GA operators are [20,21]:
■ Selection: At this stage, chromosomes with higher fitness values (parents) are selected to produce next generation's offspring. Roulette wheel, tournament, rank, and elitism are some examples of the selection operator.
■ Crossover: At this stage, the genetic material of the two selected parents are merged to produce offspring. Single point, double-point and uniform-point techniques are used individually or collectively in this phase.
■ Mutation: At this stage, single or multiple gene modifications are applied to the newly produced chromosomes in order to broaden the search space.
GA is terminated by checking two conditions: achieving a threshold optimal solution or reaching the maximum number of algorithm repetitions. The GA then suggests the best chromosome as a solution.
4 Genetic Algorithm Dynamic Clustering Algorithm
We propose genetic algorithm dynamic clustering (GA-DC) algorithm to improve network reliability, extend the network life time and reduce the overall energy expenditure per protocol round. GA-DC algorithm uses an enhanced fitness function (F) compared to other clustering algorithms. GA-DC fitness function is comprised of the following components.
4.1 Total Energy Expenditure per Protocol Round
In order to extend network life time, energy consumption per protocol round must be minimized. The chromosome with the minimal energy expenditure is chosen as a solution. Total energy expenditure is the sum of the energy consumed within a cluster (EC) and the energy consumed by the CH to transfer the aggregated data to the BS (EBS). For a cluster of n nodes, the energy consumption is given by:
For m clusters in the network, the energy consumption by all CHs is given by:
The total energy (ET) expenditure per protocol round is:
ETX(i, CH) represents the consumed transmitted energy of the ith node to its corresponding CH. (n − 1) ERX represents the energy consumed by the CH while receiving (n − 1) data reports from its member nodes. EDA represents the energy consumed by the CH for aggregating the (n − 1) reports into single data packet to be sent to the BS. ETX(j, BS) shows the transmitted energy consumed by the jth CH to the BS. This paper uses the simple node's radio energy model [22,23]. In this model ETX depends on the distance between the transmitter and the receiver to switch between the free-space and multiple-path amplification models and the number of bits (k) in a data report. ETX is given by:
d0 is the distance threshold to swap between the free-space and multiple-path models. d0 is given by .
4.2 Clustering Distance
The clustering distance (dC), is the sum of the distances from member nodes to their corresponding CHs and the distances from CHs to the BS per protocol round as shown in Fig. 3. dC is given by:
d(i, CH(j)) and d(CH(j), BS) represent the Euclidean distance from ith member node to its corresponding jth CH, and the distance from the jth CH to the BS respectively.
The total distance in the network (dT) is the sum of distances of all nodes (N) to the BS and is given by:
4.3 Number of CHs
The number of CHs (m) plays an important role in extending the network's life time. As previously stated, CHs consume more energy than other member nodes. Then, it is critical to reduce the number of CHs to a bare minimum.
4.4 Number of Multiple Path Propagation Connections
A node can connect to either a CH or to the BS. If the distance between a CH or the BS is greater than a threshold value (d0), a multiple-path amplification model is used. Moreover, multiple-path amplification model uses more energy for sending data reports compared to free-space model, see Eq. (4). Multiple-path connections (Cmp) is the sum of the multiple-path connections within a cluster (CmpBS) and the multiple-path connections to the BS (CmpBS), as shown below.
As the value of the Cmp increases more energy is consumed and the life time of network decreases. Then, it is critical to keep the number of multiple-path connections to a minimum.
The fitness function (F) of GA-DC protocol is a function of four independent variables and is given by:
ω1, ω2, ω3 and ω4 represent constant coefficients weights of the four fitness parameters. Their values are application-dependent.
GA-DC protocol has two working phases, cluster formation and data collection. The following explains the two phases.
■ Cluster formation: At this phase GA-DC algorithm runs in the BS since it knows the geographical location of all nodes. Binary encoding technique is utilized in the GA-DC algorithm. In this approach, CHs are encoded as binary ‘1’ and other member nodes as binary ‘0’. An example of binary encoding is shown in Fig. 4, in which sensor nodes S4 and S9 are considered CHs while others are considered member nodes connecting to one of these two CHs in a particular protocol run. GA-DC runs all the GA operators to a randomly generated initial population till the termination condition is satisfied. After that, the best fit chromosome solution is produced and that represents the new cluster configuration. Then, the BS will distribute the cluster settings to the network's nodes.
■ Data processing and transfer: At this phase data collected by MNs are transferred to the CHs. Then, CHs process, aggregate and compress the collected data into packets before being transferred to the BS. CHs organize data collection from their MNs through TDMA scheduling while sending their data reports utilizing CDMA technique.
Algorithm 1 shows the pseudo code for the GA-DC algorithm.
5 Performance Metrics and Experimental Setup
In this section a number of network metrics are utilized to evaluate the performance of the GA-DC algorithm for various network configurations. These metrics include:
■ Network life time (NLT): Represents the time duration from the beginning of the network to the death of the last node. NLT is usually measured by the number of protocol rounds till the network is down.
■ Energy metric: Represents a number of energy parameters to measure. It includes, total remaining energy per protocol round, average energy consumption per protocol round.
■ Stability period: Represents the time interval (number of rounds) till the death of the first node in the network, usually referred to as the network reliability measure. The time interval from the death of the first node (FND) to the death of the last node (LND) is referred to as the instability period.
■ Network throughput: Represents the accumulated data reports reaching the BS during the network life time.
Simulation and algorithm implementation are done using MATLAB. A number of simulation scenarios are implemented. The first scenario compares the performance of the GA-DC algorithm to the state-of-the-art LEACH protocol in a 3D network setting with the BS in the middle of the network and a uniform node distribution. The second scenario involves evaluating the performance of the GA-DC algorithm when different node distributions are used. In this paper, three common node distributions are used viz. uniform, normal and exponential. Most of the literature assumes that nodes are uniformly distributed in a 2D environment. The actual location of a node is specified by generating a uniform random number in the interval [0,1] and then multiplying it be the network dimensions. Uniform distribution does not require any location information to determine a node position. All network locations are equally likely to be chosen as a potential node position. The need to define distribution parameters is a major issue when distributing network nodes according to other non-uniform distributions. Normal node distribution requires the sample mean and standard deviation, ( , ), as the defining parameters to distribute the nodes accordingly. Exponential node distribution requires the sample rate parameter, ( ), as the defining parameter to distribute the nodes accordingly. The maximum likelihood estimation (MLE) technique is used to estimate the defining parameters of the normal and exponential distributions using the uniform distribution's node locations. Appendix (A) describes the mathematical foundation of the MLE technique for computing the defining parameters of the normal and exponential node distributions.
Three dimensional (3D) networks are more realistic than 2D networks. The third dimension is important in determining the CHs to connect with. It also important in calculating the energy expenditure budget during protocol round. It is shown that 3D networks life span is less than the life span of the 2D networks . Fig. 5 depicts an example of the three node distributions on a 3D network. The BS is shown in the middle of the network.
Tab. 1 shows the parameter settings used in the simulation.
6 Simulation Results
Two sets of simulation scenarios are investigated in this section. The first scenario compares and evaluates the performance of the GA-DC algorithm with that of the well-known LEACH algorithm for clustering and routing in a 3D network with uniformly distributed nodes. Figs. 6 and 7 show the distribution of the number of CHs during network operation. A number of observations are noticed regarding these two figures. First, the range of the number of CHs for the LEACH algorithm is [1 19], while the range in the GA-DC algorithm is [3 14]. The huge gap in the LEACH algorithm require the network to consume more energy for intra-cluster data communications especially when the number of CHs is one or two, and in inter-cluster data communications when the number of CHs is more than fourteen. The percentage of protocol rounds that uses one, two and fourteen and more CHs in LEACH based network is around 19%. This huge percentage contributes to minimizing the life span of the network compared to the GA-DC based network. The optimal number of CHs used in LEACH and GA-DC algorithms during the network operation are 8 and 4 respectively. They contribute to 10.5% and 28.5% of the number of CHs used during the network life time.
Fig. 8 depicts the total remaining energy for the two protocols. It clearly shows that the LEACH protocol performs better than the GA-DC protocol up to an equilibrium point (P0) in which the GA-DC algorithm outperforms the LEACH algorithm. P0 represents the point at which the remaining energy in the network for the two algorithms is the same. P0 occurs at round number 738 in which 11.5% of the energy is available. GA-DC algorithm utilizes the remaining energy efficiently to extend the network life time.
Fig. 9 shows the packet throughput for the two clustering algorithms. The GA-DC algorithm has much higher throughput than the LEACH algorithm because it extends the network's life time compared to the LEACH protocol. When compared to the LEACH algorithm, the GA-DC algorithm increases throughput by 55%.
Fig. 10 shows the network life time (NLT) of the two algorithms. NLT is defined as the sum of the stability and instability periods of the clustering algorithm. It is shown that the stability period of the LEACH protocol is larger than the stability period of the GA-DC algorithm. This indicates that LEACH has more clustering reliability than the GA-DC algorithm. On the other and GA-DC algorithm has a better NLT because of the larger instability period compared to the LEACH algorithm. This shows that the network can still send data reports about the environment regardless the death of most of the network nodes. It is a trade-off between reliability and NLT. The stability periods of the LEACH and GA-DC algorithms comprise 38.3% and 12.7% of the NLT, respectively.
The second scenario evaluates and compares the performance of GA-DC algorithm for three different node distributions. The goal of this scenario is to determine which node distribution is best in terms of energy consumption and extending the NLT. Fig. 11 shows the frequencies of the CHs per node distribution. It shows that the most used number of CHs in uniform and normal distributions is 4 which comprise 28.5% and 28% of the number of clusters formed respectively. The most used number of CHs in exponential distributions is 5 and it comprises 27.8% of the number of clusters formed. It is clearly shown that 4 is the best number of clusters used for extending the life time for both the uniform and normal distributions while 5 is best number of clusters used in exponential distribution.
Fig. 12 shows that the uniform and normal node distributions consumed nearly the same amount of energy during the network's lifetime, with the uniform distribution consuming slightly less energy than the normal distribution. Exponential distribution consumes energy the most since it has less total remaining energy during the network life time.
Fig. 13 depicts that the exponential distribution has the least stability region which indicates a low reliability of the clustering process compared to the stability regions of the uniform and normal distributions. All three distributions favor increasing the NLT over clustering process reliability by extending the instability region. The stability regions end at round 142, 240 and 250 for the exponential, normal and uniform distributions respectively. This shows that GA-DC algorithm over uniform distribution is more reliable than the other distributions and this is important because when the node population begins to decline, the number of CHs per round becomes unstable (lower than intended), and there is no guarantee that an optimal number of CHs will be generated per round. Furthermore, because there are fewer alive nodes, the field is sampled (sensed) over fewer nodes than intended.
Fig. 14 shows the cumulative packet throughput reached the CHs by the MNs. During this phase, the CHs aggregate and compress the packets received from their corresponding MNs before transmitting them to the BS. When compared to the exponential distribution, the uniform and normal node distributions have the most packets transmitted from the MNs to the CHs, with the uniform distribution having the most packets because it manages its energy more efficiently.
Fig. 15 shows that the uniform distribution outperforms the other two distributions in terms of the total number of packet reports reached the BS. This is the useful throughput that gives information about the environment. The uniform distribution's high throughput performance is due to its ability to manage energy efficiently through the optimal selection of CHs over the network's lifetime.
Tab. 2 summarizes the different network metrics for the three node distributions used in this work.
Tab. 3 shows the improvement ratio (γ) of utilizing the uniform distribution in GA-DC algorithm over the other two distributions. The improvement ratio (γ) is given by
where PIu refers to any performance indicator (NLT, FND and throughput) under the uniform distribution, PIo refers to any performance indicator under other distributions.
In this work, the GA-DC algorithm is proposed for 3D network setting. The GA-DC algorithm is based on binary GA for distinguishing CHs from MNs. When compared to the state-of-the-art LEACH protocol, the GA-DC algorithm increased network life time by 80% and data throughput by 55%. The GA-DC algorithm demonstrated that the uniform distribution of network nodes in a 3D setting outperformed the normal and exponential distributions in terms of network life time, reliability, and network throughput. When compared to the normal distribution, the uniform distribution improved the NLT by 5.7%, reliability measure by 4.2% and throughput by 10.85%. Furthermore, when compared to the exponential distribution, the uniform distribution improved NLT by 7%, reliability measure by 76% and throughput by 19.54%. The GA-DC algorithm employs a new fitness function that implements various parameters to compute the cost value of the proposed solution. The parameters include information about the proposed solution's energy consumption, inter and intra clustering distances, and short and long range wireless link connections.
Data Availability: The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Funding Statement: This research was supported by Al-Zaytoonah University of Jordan fund. Project Number 15/12/2019–2020.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
Maximum Likelihood Estimation (MLE)
MLE technique computes the defining parameter(s) of a random distribution by maximizing the likelihood function, L, of the samples of interest. L(θ; x1,…, xn) of n samples is defined by:
where θ is the parameter(s) of the distribution of interest and fX(x1,…, xn; θ) is the joint probability function from which the samples are drawn. Assuming that the random samples are independent, identically distributed (IID) random variables, then L is simplified to:
To find the value of θ that maximize the L function, the derivative of the L function is computed and set to zero.
It is usually easier to work with ln[L(θ)] since the maxima of the L(θ) and ln[L(θ)] occur at the same θ value especially when working with distributions that utilize an exponential function in their distribution.
A. Estimating Normal Distribution Parameters
For n independent random samples x1,…, xn, the L function is given by:
Taking the natural logarithm of both sides of Eq. (13).
To find the estimators for μ and σ that maximize the L function.
Solving Eqs. (15) and (16) for μ and σ we get:
B. Estimating Exponential Distribution Parameter
For n independent random samples x1,…, xn, the L function is given by:
Taking the natural logarithm of both sides of Eq. (19).
To find the estimator for λ that maximize the L function we solve the following.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|