The data in Mobile Edge Computing (MEC) contains tremendous market value, and data sharing can maximize the usefulness of the data. However, certain data is quite sensitive, and sharing it directly may violate privacy. Vertical Federated Learning (VFL) is a secure distributed machine learning framework that completes joint model training by passing encrypted model parameters rather than raw data, so there is no data privacy leakage during the training process. Therefore, the VFL can build a bridge between data demander and owner to realize data sharing while protecting data privacy. Typically, the VFL requires a third party for key distribution and decryption of training results. In this article, we employ the consortium blockchain instead of the traditional third party and design a VFL architecture based on the consortium blockchain for data sharing in MEC. More specifically, we propose a V-Raft consensus algorithm based on Verifiable Random Functions (VRFs), which is a variant of the Raft. The V-Raft is able to elect leader quickly and stably to assist data demander and owner to complete data sharing by VFL. Moreover, we apply secret sharing to distribute the private key to avoid the situation where the training result cannot be decrypted if the leader crashes. Finally, we analyzed the performance of the V-Raft and carried out simulation experiments, and the results show that compared with Raft, the V-Raft has higher efficiency and better scalability.
In the Internet of Things (IoT), the massive amount of data collected by devices has to be analyzed, processed, and stored. Mobile Edge Computing (MEC) [
In the training process of the VFL, the third party plays an important role. It generates a public-private key pair, in which the public key is distributed to the participants for homomorphic encryption [
Consensus algorithm is the core of blockchain, which can ensure the consistency of distributed nodes. Raft [ We modify the Raft based on verifiable random functions and propose a new consensus algorithm, V-Raft. The V-Raft utilizes a method of sortition for leader election, which enhances the election speed and scalability. In addition, the V-Raft adds the committee state, which reduces node consensus time and replacement time. We introduce secret sharing to distribute the private key to avoid data sharing failure due to the leader crashes. The leader assists the data demander and owner in VFL and decomposes the private key into subkeys to send to the committee. If the leader crashes, the committee can collect the subkeys to recover the private key and ensure that the data sharing is completed smoothly. We design simulation experiments to compare Raft and V-Raft. The experimental results show that V-Raft has better efficiency and performance than Raft.
In recent years, with the increased awareness of privacy protection, a variety of privacy protection schemes have been proposed [
Blockchain was born in 2008 with the emergence of Bitcoin [
VFL [
That is, the two datasets in VFL have a large number of overlapping samples, and these same samples each have different features. In addition, usually only one side of the dataset has sample labels. In VFL, a typical assumption is that the training participants and a third-party collaborator are honest-but-curious, that is, they will abide by the agreement but will try to get additional information from the training process. Moreover, the third-party collaborator does not collude with participants. The VFL consists of two parts. Part 1 is encrypted sample alignment, which is to find common samples of training participants by Private Set Intersection (PSI). The PSI can calculate the intersection of samples held by both parties without revealing additional information. Part 2 is encrypted model training. Training participants combine features of the common samples to train the machine learning model.
Raft [
Verifiable Random Functions (VRFs) [
The VRFs constructed based on RSA is as follows:
Using the private key
The VFL architecture based on consortium blockchain is shown in
There are several problems with the current Raft algorithm: (1) Raft uses a voting mechanism for the leader election, which may fail due to network partitioning so that no node can get more than half of the votes. (2) In Raft, as the number of nodes in the consortium blockchain increases, the time of leader election, leader replacement and consensus increases. In response to the above problems, we propose an improved consensus algorithm based on the VRFs, called the V-Raft. Firstly, V-Raft uses a sortition algorithm based on VRFs, which can stably elect the leader and improve the election speed at the same time. And according to the characteristics of VRFs, the selected nodes are random and verifiable. In addition, we add the consensus committee node role, which can improve the consensus speed. If the leader crashes, select the qualified nodes in the committee to replace, which can improve the leader replacement speed.
V-Raft sets four node states: follower, candidate, committee, and leader. Some nodes of 5 are a typical setting in Raft, which allows the system to tolerate two crash nodes. In a practical scenario, the probability of three nodes crashing simultaneously in a cluster containing five nodes is very small. Therefore, we set the number of committee nodes to 5 in V-Raft. Initially, all nodes in the consortium blockchain are followers. When receiving the assistance request of VFL sent by the training participants, followers execute sortition algorithm and generate a hash. If the hash matches the condition, follower is converted to candidate. Then, sorting the hash of candidates, the five candidates with the smallest hash value are elected as committee and the smallest hash value is selected as leader. The node states and their transitions are shown in
Suppose that training participants A and B want to start VFL, and A and B send a request to the consortium blockchain. The request contains the public keys of A and B and a
When followers in the consortium blockchain receive the request, they use the sortition algorithm to select candidate. As shown in Algorithm 1, a follower uses its private key and the
After all candidates receive the message
When the
If the election fails due to network or other reasons, the candidates do not receive the heartbeat message from the leader within the specified time, then the leader election will be repeated and the qualified candidate will broadcast the message containing its own public key and hash again.
In Raft, time is divided into terms of arbitrary length, and each term elects a leader. Index is the log entry number of each node, and the Raft uses term number and log index to check the consistency. In V-Raft, the term of each elected leader is one VFL from start to finish, so the term of each federation learning is the same, and we only consider the index to maintain consistency. If the leader crashes, the committee cannot receive the heartbeat of leader, then the leader will be replaced. The replacement node that meets the conditions is elected from committee, and this condition is that the log index number of the node is the maximum. If the index is the same, the node with the smallest hash value is elected as the leader. The node replacement process is shown in Algorithm 3.
The consortium blockchain uses the V-Raft consensus algorithm to select a leader and committee to assist the training participants A and B to start joint model training. First, leader generates a public-private key pair, and the public key is sent to A and B for encryption and decryption of the intermediate parameters. In order to avoid the leader crashing during the training process and the gradient cannot be decrypted, the leader sends the private key to the committee node by secret sharing method. If the leader crashes, the replacement node can collect the subkeys to recover the private key to continue the model training. When the training is completed, the leader will chain up the training gradient and loss. The VFL based on consortium blockchain is shown in
The leader generates a public-private key pair and sends the public key to the training participants A and B as a homomorphic encryption key. Homomorphic encryption allows the calculation of the encrypted data, and the calculation process will not disclose the plaintext information. The calculation result is still encrypted, and the result obtained after decryption is the plaintext after processing. This paper uses the homomorphic encryption scheme [
The leader uses secret sharing [
If the leader crashes, the committee uses Algorithm 3 to select the replacement leader. The new leader collects subkeys from the committee and reconstructs the private key. We stipulate that three subkeys can recover the private key. And then, the new leader establishes a connection with the training participants and continues VFL.
In this article, we use linear regression model training based on homomorphic encryption as an example. Suppose that the features of the common samples of A and B are
Let
Let
Let
The training process is as follows:
A uses the public key as homomorphic encryption key to calculate According to Leader uses private key to decrypt encrypted gradient and sends When the training is completed, the leader packages each round of training logs to form a block and uploads it to the consortium blockchain, and other nodes synchronize the block.
Election safety means that at most one leader can be elected in a given term. In V-Raft, leader election is based on VRFs. The nodes in the consortium blockchain use their own private keys and the seed of the training participants as inputs of VRFs, and output hash value and proof value. The VRFs in this article is constructed based on the sha-256 algorithm. The sha-256 is a secure hash algorithm, which has strong collision resistance, that is, the probability that different inputs get the same output is extremely low. The private key of node is unique and confidential, and the seed of the training participants is variable. Therefore, the output hash value is unique. The V-Raft sorts the hash values to elect leader, so the elected leader is also unique. Similarly, when the leader crashes and a replacement node is selected from the committee, the committee will compare the index and the hash value, and since the hash value is unique, the new leader selected is also only one.
State machine safety: if a node has applied a log entry at a given index to its state machine, no other node will ever apply a different log entry for the same index. That is, the state machines of more than half of the committee nodes store the same log entries under the same term. The leader forwards the message sent by the training participant to the committee in the form of a log entry. When receiving the reply from more than 2 committee nodes (The number of committee nodes is 5), the leader submits the log entry to its own state machine, and this operation is included in the next heartbeat. When the committee nodes receive the heartbeat, they update their local state machines, so that more than half of the committee nodes reach a consensus to store the same log entries.
Liveness is defined as a transaction being completed in a limited time. In the leader election stage of the V-Raft, the probability and number of selected nodes are controlled by the sortition threshold
In V-Raft, the number of committee nodes is 5, so at least 5 candidates need to be selected for each sortition. The number of candidates is related to the sortition threshold
In addition, we tested the success rate when the
Number of nodes n | Sortition threshold |
Number of sortition | Number of selected node ˂ 5 | Success rate (%) |
---|---|---|---|---|
50 | 0.30 | 1000 | 0 | 100 |
100 | 0.15 | 1000 | 0 | 100 |
150 | 0.10 | 1000 | 1 | 99.9 |
200 | 0.075 | 1000 | 1 | 99.9 |
250 | 0.06 | 1000 | 0 | 100 |
In the experiment of this section, we compare the leader election latency of Raft, KRaft [
In Raft, When the leader receives the client request, it will forward the request to the follower. When the leader receives a reply from more than half of the followers, it indicates that the nodes in the consortium blockchain have reached a consensus on the request. Instead of a single leader forwarding the logs in Raft, in KRaft the leader and the candidate jointly forward the logs. While V-Raft adds committee state, the leader only needs to reach consensus with fewer and fixed committee members. The experimental results are shown in
In Raft and KRaft, leader election is restarted if the leader crashes. Due to the committee being elected in advance in V-Raft, the V-Raft will select a qualified node from the committee to replace it. The experimental results of the three algorithms are shown in
In this article, we propose a VFL architecture based on consortium blockchain for data sharing in MEC. Firstly, we improve the popular consensus algorithm in the consortium blockchain, Raft, and propose a new consensus algorithm based on VRFs, V-Raft. In addition, we compare the performance of the two algorithms through experiments. The experimental results show that compared with Raft, the V-Raft has higher efficiency and better scalability in leader election, leader replacement and log replication. Also, we apply secret sharing to ensure that even if the leader crashes, the new leader can still recover the private key and continue to complete the joint model training. Therefore, the consortium blockchain can act as a trusted third party to assist the data demander and owner to realize the data sharing by VFL. The circulation and sharing of data can effectively promote the development of MEC. This article is carried out under the setting of honest-but-curious nodes and future work will consider the case where there are byzantine nodes in the consortium blockchain.
This research is funded by the
The authors declare that they have no conflicts of interest to report regarding the present study.