With the rapid development and popularization of new-generation technologies such as cloud computing, big data, and artificial intelligence, the construction of smart grids has become more diversified. Accurate quick reading and classification of the electricity consumption of residential users can provide a more in-depth perception of the actual power consumption of residents, which is essential to ensure the normal operation of the power system, energy management and planning. Based on the distributed architecture of cloud computing, this paper designs an improved random forest residential electricity classification method. It uses the unique out-of-bag error of random forest and combines the Drosophila algorithm to optimize the internal parameters of the random forest, thereby improving the performance of the random forest algorithm. This method uses MapReduce to train an improved random forest model on the cloud computing platform, and then uses the trained model to analyze the residential electricity consumption data set, divides all residents into 5 categories, and verifies the effectiveness of the model through experiments and feasibility.
With the construction and development of urbanization in China, the number of residential quarters of cities is increasing. The total amount of electricity consumed by urban residents is also increasing, which brings new challenges to the stable operation of the power grid. Categorizing the electricity consumption of residential users can help the power supply side perform better resource management and distribution, and reduce power loss [
There have been many analyses and studies on residential electricity consumption. Song et al. [
Based on this, this paper designs a random forest residential electricity classification method based on the distributed architecture of cloud computing. This method firstly uses the fruit fly algorithm to improve the random forest model, and then uses MapReduce to train the improved random forest model on the cloud computing platform. After that, the trained model is used to analyze the residential electricity consumption data set, and all residents are divided into 5 categories. The effect of the model is verified through experiments.
Cloud computing is a concept first proposed by Google [
The overall framework of Hadoop is shown in
Random forest is an ensemble classification algorithm composed of several decision trees based on statistical learning theory [
Suppose a random forest
When constructing a random forest, an out-of-bag data set will be generated. The out-of-bag error of the out-of-bag data set can be used to measure the performance of the model. The random forest out-of-bag error formula is as follows:
The function
According to random forest
The larger the margin value, the more reliable the classification prediction.
Since the collected electricity consumption data will have problems such as missing and irregularities, the data needs to be preprocessed. Preprocessing is divided into two steps: missing data completion and data standardization.
Use interpolation method to complete the data, set the original data set as
If continuous data is missing, for example,
Suppose the data set to complete the missing value completion is
The data set after preprocessing is
This paper uses the unique out-of-bag error of random forest to optimize the internal parameters of random forest and improve random forest to improve the performance of random forest model algorithm. Assume that the random forest model
We combine the drosophila algorithm to optimize the internal parameters of the random forest model [
Suppose the size of the fruit fly population is
where
Then calculate the rate of change of
According to the change rate, the optimal step weight
Calculate the adaptive optimization route of drosophila:
then:
Repeat several times until the maximum number of iterations
Get the drosophila
On the built Hapood platform, use the MapReduce module to build a random forest model, the steps are as follows: The random subspace sampling method is used for sampling, multiple sets of characteristic attribute sets The data is distributed among each distributed computing node, and the information Each Map node executes the program, reads and maps the information on the characteristic attribute set After all the Map work is completed, each Map node passes the information to the Reduce node, and the Reduce node merges the information, performs iterative calculations on the data with the same key, and calculates the nodes of the first layer of different decision trees. Iteratively execute steps 2 to 4 until all decision trees are constructed.
In order to verify the performance of the algorithm, this article compares and verifies with the random forest model under a single server. Collect the daily electricity consumption data of 200 households in a community in Shanghai from August 2019 to August 2020, and collect the daily electricity consumption of each user every hour. Smart electricity collection equipment is installed in the residential houses of the community, and the high-power consumption equipment in the home, such as washing machines, refrigerators, air conditioners, etc., is transmitted to the home smart gateway wirelessly (433 MHz) to complete the data collection task. According to the daily power consumption trend, users are divided into eight categories, category 1 is high power consumption users throughout the day, and category 2 is peak power consumption users from 6 am to 9 am. Category 3 is the peak power user from 9 am to 12 am, Category 3 is the peak power user from 12 noon to 3 pm, and Category 4 is the peak power user from 3 pm to 5 pm. Category 5 is peak electricity consumption users from 5 to 7 pm, category 6 is peak electricity consumption users from 8 pm to 10 pm, category 7 is peak electricity consumption users from 10 pm to 1 am, and category 8 is low electricity consumption throughout the day power users. Use 70% of the collected electricity consumption data as the training set, and the remaining 30% as the test set.
To verify the performance of the model, the classification accuracy of the model under different data sets was tested
In order to verify the superiority of the model, the
In order to help the power supply side perform better resource management and distribution, it is necessary to classify the electricity consumption of residential users. Based on the distributed architecture of cloud computing, this paper uses an improved random forest model to design a residential electricity classification method. This method firstly uses the fruit fly algorithm to improve the random forest model, and then uses MapReduce to train the improved random forest model on the cloud computing platform. After that, the trained model is used to analyze the residential electricity consumption data set, and all residents are divided into 5 categories. And the effect of the model is verified through experiments.
This work was supported by the I6000 migration to the cloud micro-application pilot construction project of the Information and Communication Branch of State Grid Anhui Electric Power Co., Ltd., Technical project (contract number: SGAHXT00XYXX2000121).