Industrial Centric Node Localization and Pollution Prediction Using Hybrid Swarm Techniques

Major fields such as military applications, medical fields, weather forecasting, and environmental applications use wireless sensor networks for major computing processes. Sensors play a vital role in emerging technologies of the 20th century. Localization of sensors in needed locations is a very serious problem. The environment is home to every living being in the world. The growth of industries after the industrial revolution increased pollution across the environment. Owing to recent uncontrolled growth and development, sensors to measure pollution levels across industries and surroundings are needed. An interesting and challenging task is choosing the place to fit the sensors. Many meta-heuristic techniques have been introduced in node localization. Swarm intelligent algorithms have proven their efficiency in many studies on localization problems. In this article, we introduce an industrial-centric approach to solve the problem of node localization in the sensor network. First, our work aims at selecting industrial areas in the sensed location. We use random forest regression methodology to select the polluted area. Then, the elephant herding algorithm is used in sensor node localization. These two algorithms are combined to produce the best standard result in localizing the sensor nodes. To check the proposed performance, experiments are conducted with data from the KDD Cup 2018, which contain the name of 35 stations with concentrations of air pollutants such as PM, SO2, CO, NO2, and O3. These data are normalized and tested with algorithms. The results are comparatively analyzed with other swarm intelligence algorithms such as the elephant herding algorithm, particle swarm optimization, and machine learning algorithms such as decision tree regression and multi-layer perceptron. Results can indicate our proposed algorithm can suggest more meaningful locations for localizing the sensors in the topology. Our proposed method achieves a lower root mean square value with 0.06 to 0.08 for localizing with Stations 1 to 5. This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Computer Systems Science & Engineering DOI:10.32604/csse.2022.021681 Article ech T Press Science

Abstract: Major fields such as military applications, medical fields, weather forecasting, and environmental applications use wireless sensor networks for major computing processes. Sensors play a vital role in emerging technologies of the 20th century. Localization of sensors in needed locations is a very serious problem. The environment is home to every living being in the world. The growth of industries after the industrial revolution increased pollution across the environment. Owing to recent uncontrolled growth and development, sensors to measure pollution levels across industries and surroundings are needed. An interesting and challenging task is choosing the place to fit the sensors. Many meta-heuristic techniques have been introduced in node localization. Swarm intelligent algorithms have proven their efficiency in many studies on localization problems. In this article, we introduce an industrial-centric approach to solve the problem of node localization in the sensor network. First, our work aims at selecting industrial areas in the sensed location. We use random forest regression methodology to select the polluted area. Then, the elephant herding algorithm is used in sensor node localization. These two algorithms are combined to produce the best standard result in localizing the sensor nodes. To check the proposed performance, experiments are conducted with data from the KDD Cup 2018, which contain the name of 35 stations with concentrations of air pollutants such as PM, SO 2 , CO, NO 2 , and O 3 . These data are normalized and tested with algorithms. The results are comparatively analyzed with other swarm intelligence algorithms such as the elephant herding algorithm, particle swarm optimization, and machine learning algorithms such as decision tree regression and multi-layer perceptron. Results can indicate our proposed algorithm can suggest more meaningful locations for localizing the sensors in the topology. Our proposed method achieves a lower root mean square value with 0.06 to 0.08 for localizing with Stations 1 to 5.

Introduction
The genuine factors that act as major challenges in our environment are air quality issues, water pollution issues, and radiation pollution problems. A healthy society is the main sustainable development goal among countries worldwide. At present, advanced environmental monitoring systems are highly focused on developing technologies such as the internet of things (IoT), parallel computing, sensors, and distributed computing. Under these circumstances, this research article aims to achieve smart placement of sensors and investigate predicting smart environment pollution, such as factors involving monitoring of air quality level, water quality level, radiation pollution levels, and advanced monitoring over agricultural systems Wireless sensor nodes are placed and scattered at all locations in the environment. The sensors are placed for various applications. Localizing or predicting where to place sensor nodes is important to focus on the sensing area accurately and reduce redundancy of sensing the same environment using parallel computing. In our research problem, we introduce industrial-centric node localization using a decision tree with a swarm intelligence algorithm. Manufacturing industries emit more pollutants. Therefore, placing more nodes in such areas with help of IoT is necessary, as is decreasing the nodes when moving away from industries. Owing to this architecture, the cost of sensors will be reduced and data prediction accuracy increases.
As the technological and information field grows, wireless sensor networks (WSN) collect huge data in and around the environment for various applications like military projects, metrological applications, medical fields, and security surveillance. Broadly enabled applications significantly introduce advancement in WSN. At present, the IoT concept uses WSN for data collection and processing in real time [1]. The collection sensor nodes with other inexpensive devices form the WSN infrastructure. This network is used to monitor and detects the environmental data for computing [2,3]. Collected data are sent to sink nodes, which are a destination for processing and storing the collected data and sent to the network for various usage and user applications. The WSN has many advantages such as parallel computing, deployment, communications, transferring data, and organizing data. Day by day, various challenges are experienced across WSNs. Further implementing the WSN has enormous challenges such as localizing nodes, area coverage, energy consumption, and time of transfer in sensors. Among these various challenges, node localization is a very important and basic problem to address.
Validation of sensor node localization is very important to collect data effectively. If we do not localize the sensors at correct or needed locations, we will miss information or misread data while processing without correct information. The nodes cannot be reached due to some climatic factors and unreachable environments, creating the problem of communication and network structure in deploying nodes in visible and needed places for future applications. The GPS system deploys sensor nodes in most places without any constraints, leading to more energy consumption by transferring data with high cost. Node localization can be defined as placing the sensor nodes in exact places and needed places. The triangulation technique is suggested in previous studies for node localization [4]; some methods used so far include arrival time [5], using signals of radio for localization [6][7][8], and arrival angles for choosing location [9].
The location of a GPS system can be precisely identified using the help of anchor nodes. The sensor nodes of GPS and non-GPS have their own localization methods based on range-based approaches [10]. These range-based algorithms identify the node distances with the help of angle measurement with unknown nodes. Here, the position of the unknown nodes is only known. Furthermore, this problem was addressed using a triangulation technique to identify the coordinates. Topological data use range-free algorithms for localization to identify target nodes. Past research proved that these algorithms are not economically affordable to all users [11][12][13]. Sometimes, when focusing localization on indoor-based environments that have walls, humidity and temperature are totally different. Range-based localization cannot be used here. In this case, localization of nodes must be wirelessly done in areas within buildings.
The main goal in node localization is to place sensor nodes accurately at needed locations. In this article, WSN places sensors for pollution prediction. Previous research mainly focuses on cost-effective sensor node localization. Industries are commonly acknowledged to be the main sources of pollution in every location. Our contributions in this research paper are as follows.
1. Industrial areas are identified by pollution level using random forest regression techniques. 2. Once the location is identified, we use the elephant herding algorithm to decide the number of sensors and location in that industrial area. 3. Fringes of or places far away from industries do not need the number of sensors that we place near industrial areas. 4. This focus can reduce the cost of sensors by reducing extra or wasted sensor locations.
Node localization is also a NP-hard scenario to identify accurate location for the sensor nodes. In this article, we only attempted using industrial centric concepts. This paper has been organized with five chapters. Chapter 2 provides a literature survey. Chapter 3 gives details of the proposed methodology and its implementation strategy. Chapter 4 evaluates the result by applying datasets. Chapter 5 concludes.

Literature Survey
The atmosphere is composed of different pollutant factors from industries, vehicles, and automobiles, such as PM2.5, PM10, ozone, sulfur dioxide, and nitrogen oxides. These pollutants are highly spread from manufacturing. However, pollution causes more damage to the environment and health that is irreparable. Many studies have proved the health effects occurring due to pollution, necessitating technological solutions to control and monitor the pollution concentration of various gases in the industries as well as in the surroundings. Various sensor solutions have been used for environmental quality assessment. At present, more sensor technologies are used, but they are expensive. The unavailability of low-cost techniques and quality of data makes processing ineffective data very difficult.
Next, different swarm intelligence algorithms used in node localization problems are reviewed. Solving NP-hard problems is very important. This case is also reviewed in the following section. Localizing the WSN is considered a real-life problem as an optimization task. Domains that focus on optimization problems are metaheuristic and artificial intelligence algorithms. Practically, NP-hard problems are addressed using metaheuristic solutions.
The nature-inspired metaheuristic algorithms can be classified into evolutionary and swarm intelligence. Genetic algorithm (GA) is a commonly used evolutionary algorithm for various NP-hard problems and WSN. The node localization uses GA with a range-free distributed algorithm for 3D WSN [14]. GAbased localization techniques provide accuracy with unknown sensor nodes in WSN [15]. The first swarm intelligence method described in our review work is particle swarm optimization (PSO) [16]. Flocks of fish and birds simulation are used in the search technique of PSO. This technique is highly applied in WSN node localization. Addressing the problems in WSN localization are addressed by swarm-based topologies and variants of PSO for optimal solution [17]. Velocity PSO [18,19] improves the accuracy of the localization in the WSN. Hybrid PSO in [20] improves localization performance.
The artificial bee colony (BC) optimization algorithm is used in localization of NP-hard problems. In [21], BC localization is used in optimizing the nodes in the network. The firefly algorithm is an effective optimization algorithm in swarm intelligence techniques. It uses the light properties of the fireflies in addressing the localization. Many hybrid versions of the firefly algorithm are used in WSN localization [22]. The virtual node projection in the anchor nodes for analyzing target nodes for localization uses firefly concepts [23]. A novel swarm intelligence approach, namely, the monarch butterfly optimization (MB) method, is widely used in solving NP-hard problems. It was first introduced in 2015 by Wang et al. [24]. The MB algorithm is highly effective in optimizing NP solutions. WSN localization that uses MB produced accurate results. MB also addressed the multi-localization stage of WSN nodes in the network [25]. Moth search optimization is another swarm intelligent technique for WSN node localization strategy. It is highly appreciated using axis of photo and levy. It also proved its performance in solving benchmark problems [26]. Moth search is used in real-time problems like machine movement, drone control, and node localization in WSN [27]. The hybridized technologies in WSN have also been are tested and implemented [28]. The 5G cloud computing environment [29] and adversarial networks generation [30] address next-generation demands for parallel computing. Low energy consumption [31] with image color recognition [32] are reconfigured for the network connectivity actions.

Proposed Hybrid RFR-EHO Methodology
Our goal is to place the nodes similar to IoT devices based on an industrial-centric environment. Placing more nodes in empty areas without any industries is a waste of sensor cost. In this proposed work, we use random forest regression to decide the number of nodes based on parallel processing of the decision from the algorithm indicating industrial or non-industrial areas. Then, the swarm intelligent technique called elephant herding optimization (EHO) is further used for node localization to place the number of sensors to increase the accuracy of data prediction. A previous study [33] proved that random forest regression predicts the air quality accurately with a low error rate and proved that among the various swarm intelligence algorithms in WSN for localization, EHO is efficient and robust to address WSN localization problems.
Our proposed work is smart environment pollution prediction on industries based on random forest regression for decision-making on the type of area and elephant herding to localize the sensors in that area. The overall workflow of the proposed work is diagrammatically presented in Fig. 1.

Data Gathering
The data that are used for this prediction model are gathered from the various heterogeneous nodes similar to IoT devices that are connected to the areas. Pollutants of industries are ozone gas (O 3 ), nitrogen dioxides (NO 2 ), carbon monoxides (CO), sulfur dioxides (SO 2 ), and particulate materials (PM). These pollutant data are collected and pre-processed because each pollutant format may vary. Pre-processing is the step to filter out unnecessary information and make the data ready for the next level of processing. In this preprocessing stage, the missing values are handled and the data are normalized. Step 1: for each row Step 2: //Split the data based on the collected area Create file called area, and add concentration that collected in that area Step Step 5: End

Prediction Analysis (Random Forest Regression)
Random forest is a decision tree algorithm that uses a tree structure. Each tree is generated from the data sample from the training data. When generating the tree, the random subset of the selected features is considered the best split among all the other alternatives. The process of random forest regression is shown in Fig. 2. Owing to random choice, the bias may increase and variance may decrease. In this proposed work, on the basis of this random forest regression, the polluted area that has a high air quality index (AQI) is considered an industrial area and that with a low AQI value is considered a nonindustrial area.
The AQI value is calculated using Eq. (1) from China's Environmental Protection Ministry. AQI is the maximum value of the IAQI of one air pollutant p. Output: prediction of the area using AQI level Step 1: For T trees Step 2: Random selection of m features from s Step 3: For feature m in each tree node To split the data set, Calculate information gain as Entropy ðcÞ ¼ À P k i¼1 pðc i Þ log 2 pðc i Þ   Step 4: Split the data set node using max(IG) Step 5: Remove selected features from S Step 6: Input UD into T. The final probability of the i th level AQI is calculated as Step 7: Predict the area based on AQI level using On the basis of the air pollution quality index, the polluted area is categorized as not polluted, low, medium, and high. An AQI level more than 200 is considered the industrial area. For our proposed work area, industry classification is greatly needed for sensor placement for better prediction because greatly polluted areas need more sensors for accurate prediction.

EHO Algorithm for Node Localization
The localization procedure in WSN consists of anchor nodes and unknown nodes. The process has two phases. First is the ranging phase where the algorithm finds the distance between anchor nodes and unknown nodes. In the second phase, a sensor is fitted based on the position of the nodes based on angle of arrival, round trip time, radio signal strength, time of arrival, and time difference of arrival. The principle of localization to place wireless sensor consists of M sensors located in N unknown nodes using the information of the location of M-N anchors and transmission range. If a sensor is placed within the transmission range of three or more anchors, then it is considered localized. In this paper, the EHO algorithm is used for localization problems. This algorithm has been proven to be the best method for localization in WSN.
EHO has been used to solve global optimization problems [34,35]. They proposed a heuristic search that is based on the co-existence of the elephants in the clan that are guided by the leader or the matriarch, which is the oldest female in the clan. The other members of that clan are females and calves, while the male elephants leave the clan and live separately. The male elephants can also communicate with the clan. This concept can be used in node localization in WSN. On the basis of structural differences, this EHO can have two environments such as elephants living under the guidance of the matriarch and males living separately but still in communication with the clan. These environments are used as a separating operator. Algorithm 2: (Continued) The general structure of the EHO algorithm is shown in Fig. 3. In EHO, each possible solution of clan ci is updated with the current position and matriarch ci using the updating operation. The population difference is updated using a separating operator. Initially, the population is divided into n clans.

Algorithm 3: EHO localization
Input: Predicted features of the pollutant area using algorithm 2 Output: Best possible solution of the positions to place sensors Step 1: Initialization: generate the predicted features from algorithm 2 as population; divide the population into 'n' clans; calculate fitness of each individual of the population; initialize the generation counter as c = 1 and maximum gen (MG) Step 2: while c < MG do Step 3: Arrangement of the solutions based on fitness value Step 4: For all clans ci do Step 5: For all solution j in the clan ci do Step 6: Compute updating operation of the clan using Eq. (6) x newcðijÞ ¼ x cðijÞ þ a Â ðx bestci À x cðijÞ Þ Â r Step 7: Select better solution between x c(i,j) and x new,c(i,j) Step 8: Update the fittest value of x best,ci and generate new population x new,c(i,j) Using Eq. (7) initial population Clan updating operator Separating Operator

Best current Solution
Stopping Criteria Localization Figure 3: EHO localization structure (Continued) Step 9: Select better solution between x best,ci and x new,c(i,j)

End for
End for Step 10: For all clan ci in the population do Replace the worst solution using the separating operator as Eq. (9) End for Step 11: Compute the population and fitness value Step 12: End while Step On the basis of this EHO, the multi-stage WSN localization for our proposed work is shown in Fig. 4. In multi-stage localization, unknown sensor nodes with three or more anchor neighbors can be localized. In single-stage localization, the unknown node that has more neighbors can be localized. The multi-stage localization of the industrial area results in more sensors placed on highly polluted positions for accurate prediction of air pollution levels.

Data Management
Storing and managing the localized data is an important step for analysis in real-time processing. Given that our proposed work is for smart environment pollution prediction, the data must be stored and processed rapidly. Our processed data are stored in an HDFS system. The stored information is then communicated to the application with an interface. The communication is frequently sent to the end user about the levels of polluted areas. This communication will announce all the information about the polluted areas so that people can monitor pollution there and make precautions and decisions accordingly.
Our proposed work for finding the air polluted area based on the air pollutant index with random forest regression tree will categorize the area affected by pollution as industrial and non-industrial areas. This process will help to localize the sensor count as needed according to air pollution using EHO localization. Such localization will improve the efficiency and prediction accuracy because more sensors will be placed in the most affected areas, leading to improved prediction with quick response time. Thus, authorities will take necessary actions to prevent that. The aim of this work is to place more sensors in needed areas than in normal areas, creating a smart environment to predict air pollution with fast response time and high accuracy.

Data Description
The experimental analysis of our proposed smart air pollution prediction environment has been tested using data from the KDD Cup 2018, which contains the names of 35 stations with concentrations of air pollutants such as PM, SO 2 , CO, NO 2 , and O 3 . The raw dataset has missing values and is not normalized. During the data-gathering phase of our proposed work, the raw data are preprocessed to obtain normalized data [36]. The normalized data set is shown in Tab. 2. The algorithms are implanted using Python programming. The preprocessing stage is done by Panda. The evaluation metrics are calculated using sklearn in Python.

Experimental Evaluation
To evaluate the proposed work with the simulation data set, the error between actual and predicted values are calculated using different types of error measure such as mean absolute error (MAE), root mean square error (RMSE), and accuracy. MAE is the measure of the average magnitude of the errors in the set of predictions. MAE is the measure of the difference between the actual and predicted values using Eq. (7). RMSE is the measurement of the average of the squared differences between the actual and predicted value using Eq. (8).
n-number of observations, y j -actual value, y 0 j -predicted value The data set is split by station and saved as a separate file to identify the stations with high air pollution. Each station consists of these six types of pollutant concentrations. Thus, the data set is split into 35 stations with 6 features and 8,886 records. For our evaluation, we consider five stations core stations and use our proposed algorithm to predict air pollution in a smart way. The error value for the proposed method in these five stations are calculated as shown in Tab. 3, with the illustrated chart in Fig. 5. The MAE, RMSE, and accuracy of the proposed smart prediction method for the five stations are computed and stated with minimum MAE and minimum error value. To analyze the performance of the proposed scheme, it is compared with existing air prediction techniques such as decision tree regression (DTR) [33], hybridized elephant herding optimization (HEHO) [5], multi-layer perceptron (MLP) [33], and particle swarm optimization (PSO) [36].
The performance evaluation of the proposed algorithm with the existing algorithms is shown in Tab. 4. The observed result of the table values shows that our proposed algorithm obtains a low error value compared with other existing algorithms. The proposed algorithm obtains an average of 11.916% of MAE and 0.0617 of RMSE, representing a high level of prediction accuracy. The graphical representation of these    The proposed algorithm performs better in prediction and localization. This method results in a smart environment with a smart air pollution prediction system with quick response. The method also reduces the cost of the sensors that are unnecessarily placed in less polluted areas.

Conclusion
The present work focuses on localizing nodes and reducing sensor cost by accurate placement using the using swarm intelligence algorithm. To achieve efficiency, we first implement the random forest regression technique to classify industrial areas. Then, the classified areas are localized using elephant herding techniques. The work has two advantages in deciding the number of sensors to be placed and solving node localization. The accuracy of sensor node localization is achieved by working with target nodes. WSN is very important in day-to-day life. Sensor information is important to various information processing applications. Localizing the nodes and knowing how many sensors are to be placed in what type of location is very important for economical computation. This process also helps to avoid redundancy in sensing the same locations in the network. Our approach overcomes all problems by using a hybridized solution for various problems. The performance and error rates are computed for the proposed algorithm, with a mean error value of less than 1. Existing techniques such as PSO, EHO, DTR, and MLP are compared with the proposed RFR-EHO approach. The performance of localization is markedly higher than all existing techniques. In our article, we focus only on industrial areas, but urban areas also have high pollution. The future scope of this article is to use an industrial-and urban-centric approach using deep learning with the swarm intelligence technique. Highly proven algorithms will be used in this approach.
Funding Statement: The authors received no specific funding for this study.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.