Distributed Secure Storage Scheme Based on Sharding Blockchain

Distributed storage can store data in multiple devices or servers to improve data security. However, in today’s explosive growth of network data, traditional distributed storage scheme is faced with some severe challenges such as insufficient performance, data tampering, and data lose. A distributed storage scheme based on blockchain has been proposed to improve security and efficiency of traditional distributed storage. Under this scheme, the following improvements have been made in this paper. This paper first analyzes the problems faced by distributed storage. Then proposed to build a new distributed storage blockchain scheme with sharding blockchain. The proposed scheme realizes the partitioning of the network and nodes by means of blockchain sharding technology, which can improve the efficiency of data verification between nodes. In addition, this paper uses polynomial commitment to construct a new verifiable secret share scheme called PolyVSS. This new scheme is one of the foundations for building our improved distributed storage blockchain scheme. Compared with the previous scheme, our new scheme does not require a trusted third party and has some new features such as homomorphic and batch opening. The security of VSS can be further improved. Experimental comparisons show that the proposed scheme significantly reduces storage and communication costs.


Introduction
Traditional centralized storage systems use centralized storage servers to store all data, which places high requirements on server performance, including reliability and security. At the same time, with the explosive growth of network data, centralized storage systems cannot satisfy the needs of large-scale applications. As a peer-to-peer storage method, distributed storage is gradually The combination of blockchain and distributed storage technology in the database provides a way to solve the above problem. The distributed storage system based on the blockchain can be used to securely store all kinds of data, and can be applied to fields such as smart grid, smart home, and Internet of Vehicles. As the underlying technology of Bitcoin, blockchain has received widespread attention due to its strong security characteristics [3]. Blockchain was originally used to construct cryptocurrency. Because the blockchain has the characteristics of anti-tampering, openness and transparency, it was subsequently regarded as one of the methods to construct a secure data storage scheme [4][5][6]. The blockchain itself is a distributed setting, but because of the Merkle tree structure used in data storage, it needs to pay more storage costs when dealing with large-scale applications.
Secret share combined with blockchain has some applications such as electronic voting, consensus algorithms, and P2P storage scheme [7][8][9][10]. Such a scheme usually requires the participation of a Dealer, and we cannot guarantee that the Dealer is credible. This paper proposes an improved verifiable secret share scheme based on polynomial commitment without Dealer to replace the secret share scheme in distributed storage blockchain.
The specific contributions of this paper are as follows: (1) This paper proposes a verifiable secret share scheme based on polynomial commitment (PolyVSS, for short). Compared with the previous scheme, our new scheme does not require a trusted third party and has homomorphic characteristics. (2) Use PolyVSS to construct a distributed storage scheme based on blockchain. This scheme uses sharding technology to realize the partitioning of nodes and transactions. Experimental comparisons show that the proposed scheme can reduce storage and communication costs.
The structure of this paper is as follows. In Section 2, we introduce the related work of this paper. In Section 3, we first give the structure of a distributed storage blockchain based on PolyVSS. Section 4 introduces the proposed PolyVSS and analyzes its security. In Section 5, we analyzed the performance of the distributed storage blockchain and summarized in Section 6.

Verifiable Secret Share
Secret share is one of the important research directions of modern cryptography. The earliest secret share scheme was proposed by Shamir. In their scheme, there is a dealer who is responsible for dividing a secret into n parts and distributing them to n members. After knowing any t or more shares (t ≤ n), these members can reconstruct the secret.
Due to the excessive trust given to the dealer, we cannot guarantee that the dealer will not have malicious behavior. To prevent the dealer from malicious behavior, verifiable secret share (VSS) is proposed [11]. Verifiable secret share is based on secret share, adding a step of share verification. To put it simply, members verify the legitimacy of the secret distributed by the dealer. An important feature of VSS is unconditional privacy. This feature prevents the shared information from being obtained by a collection of members without permission. In addition to VSS, some practical variants of VSS schemes have been proposed, such as verifiable multi-secret share [12], non-interactive verifiable secret share, and public verifiable secret share.
Harin et al. [13,14] gave the formal definition of (n, t, n) secret share. In this scheme, n share-holders participate in sharing a master secret together, and everyone can randomly select a sub-secret and use an algorithm to generate sub-shares. Then using the homomorphic feature, each shareholder can combine all the sub-shares into the master share. Finally, the master share can be restored to the master secret through the reconstruction algorithm.

Polynomial Commitment
The concept of commitment is at the core of almost all modern cryptographic protocol constructions. In this case, making a commitment simply means that a participant in the protocol can choose a value from a certain (limited) set and commit to his choice so that he can no longer change his mind. However, he does not have to reveal his choice (although he may choose to reveal it at some point in the future). Cryptography commitment has been applied to the blockchain. Zerocoin [15] uses Pedersen commitment to bind a series of numbers s to Zerocoin z. The commitment C is as follows: where p is unknown. Given the generators g and h, the user randomly selects the random numbers s and z, and the commitment C can be calculated. It is difficult to calculate the random numbers s and z when only knowing the commitment C, even if one of them is revealed. In addition to this, Kate et al. [16] proposed the first efficient polynomial commitment, which was subsequently used to construct a blockchain-based zero-knowledge proof protocol. Their scheme has the characteristics of a static accumulator. Next, we will introduce the construction of polynomial commitment: The polynomial commitment scheme is constructed based on bilinear pairing. First, we use G = e, G, G T to represent the generation of bilinear groups (see Definition 6). The algorithm of polynomial commitment can be divided into four phases: 1) Initialization phase: This step mainly generates a public-private key pair pk, sk , where the public key is expressed as pk = G, g, g ϑ , g ϑ 2 . . . , g ϑ n .The private key sk = ϑ cannot be used in the next steps. 2) Commit phase: Calculate the corresponding commitment C = g F(ϑ) ∈ G. Since the polynomial can be the commitment can also be written as: 3) Open phase: This step opens the committed polynomial C.

4) Verify phase:
At this phase, the verifier first needs to verify the legitimacy of the commitment: If the equation holds, the verification passes. Otherwise, it fails. Then output a triple α, F(α), ω α , where ω α = g f α (ϑ) is the witness in the index α. g f i (ϑ) satisfies: Finally, verify the evaluation in the index α: If the equation holds, the verification passes. Otherwise, it fails.
Suppose there is an adversary . The polynomial commitment satisfies the three characteristics of polynomial binding, evaluation binding, and computational hiding: Polynomial Binding. We say that the polynomial commitment is polynomial binding if it is satisfied: Evaluation Binding. We say that the polynomial commitment is evaluation binding, if it is satisfied: Computational Hiding. Assuming there is an adversary , given pk, C and i υ , F(i υ ), ω F αυ . Where 1 ≤ υ ≤ deg(F), and for each υ, the verify phase can be verified successfully. No adversary can determine F(υ) with non-negligible probability for any un-queried indexυ.
In addition, the polynomial commitment also satisfies strong correctness, the proof of which has been given in the paper [16].

System Model of Distributed Storage Scheme Based on Blockchain
Before introducing the system model of DSB, we first introduce a few related notions. Let B t denote the t-th block, H t denote the hash value stored with the (i + 1)th transaction, and is the hash of the previous block. h and h are two hash functions respectively. The specific structure is shown in Fig. 1. The i-th block is hashed and stored together with the hash of the previous block.  First give a node partition: (8) where n represents the total number of nodes. R = n r+1 indicates that the nodes are divided into R subsets of size r + 1. The specific stages are as follows.

1) Initial phase
For l ∈ 1, n r+1 , the initialization algorithm randomly generates a key key

2) Encryption phase
There is an encryption algorithm denoted as φ, and the block can be encrypted with a key: 3) Storage phase Distribute and store M (t) l among r + 1 nodes in partition χ , and then use secret share algorithm to store key (t) l and ψ t .

The Structure of Distributed Storage Scheme Based on Blockchain
We constructed our storage scheme based on the blockchain, and introduce some of the corresponding concepts are related to the blockchain in this section [17][18][19]. First, we will introduce the components of the framework of our scheme: 1) Data management center (DMC): The data management center is responsible for sending data verification requests and distributing data to nodes in designated shard. 2) Node: The node is responsible for the maintenance of the ledger and the verification of the data. 3) Shard: With the help of blockchain sharding technology [20,21], the nodes in our scheme are randomly divided into a specified number of shards, and the number of nodes in each shard is the same. 4) Blockchain database: The blockchain database is used to store data that has been verified by the nodes. 5) P2P network: P2P networks have advantages in building distributed applications [22][23][24].
Our scheme uses a distributed P2P network without central node, and a network is randomly established between nodes.
First, the DMC sends a request to the nodes. After receiving the request, each node runs PolyVSS three-phase algorithm to distribute and store data. Fig. 2 shows the structure of a sharding-based blockchain storage system (assuming that all nodes are divided into three shards). It should be noted that the structure is the same regardless of the number of shard. Each dashed box in the figure represents a shard, and each shard has the same number of nodes. The nodes in each shard are independent of each other, do not affect each other, and can communicate with each other when necessary. This can prevent malicious nodes in different shards from colluding with each other and prevent double-spending attacks. Of course, in order to prevent all malicious nodes from being divided into the same shard, we refer to the technique of the paper [20], so that the node allocation is completely random.
The number of nodes is not as many as possible. With reference to the practical Byzantine fault-tolerant algorithm, we generally limit the number of nodes in each shard to no more than 100. When the number of nodes exceeds 100, the efficiency of reaching consensus among nodes will become low. Of course we can increase the number of shard. In our scheme, there are a total of three shards and we assume that the number of nodes in each shard is 50.
Let the total number of nodes be Nub, F represents the number of shards, we have F = Nub r+1 , where r + 1 is the size of each shard. The specific scheme is given in the next section.

The Proposed Scheme Based on Sharding Blockchain
Our scheme is based on sharding blockchain, and can process multiple data in parallel, which theoretically improves the efficiency of data verification. Our scheme is divided into three phases: request phase, secret share phase and storage phase. When a piece of data needs to be added to the chain, the Data Management Center (DMC) will send a request to all nodes in a shard.

2) Data verification phase
Each node N i independently selects a sub-secret S i , and the master secret can be expressed as For each sub-secret S i , N i randomly selects a t-degree polynomial F i (x), and the corresponding sub-secret is F i (0) = S i . N i uses the Commit algorithm to generate the commitment C and broadcast it throughout the P2P network.
For j ∈ [1, n], N i respectively calculates a witness w j and the sub-share: and then sends j, F i (x j ), w j to other N i in the network through a trusted channel.
After receiving j, F i (x j ), w j , each N i starts to run the evaluation verification algorithm in the polynomial commitment.
After the verification is passed, all nodes accept the corresponding sub-secret, and use the Lagrange interpolation to restore the corresponding sub-secret.

3) Data storage stage
After PolyVSS is executed, the verified data is uploaded to the blockchain. The specific process is shown in Fig. 3.

The Proposed Verifiable Secret Sharing Scheme Based on Polynomial Commitment
In this section, we will first introduce the formal definition of VSS and some cryptographic assumptions. Then, the specific construction is given. We also conduct security and performance analysis of the scheme.

Preliminary
First of all, we give the formal definition of VSS scheme and several security features that it needs to satisfy.

Definition 2 Verifiable secret share (VSS). A VSS scheme is divided into two phases:
Share phase: At the beginning of the phase, the Shareholder holds an input s, and the corresponding share can be calculated using s.
Reconstruction phase: With any t shares, users can use Lagrangian interpolation formulas to reconstruct the secret value.
To facilitate the description of the application later, in the following text we will use node instead of Shareholder. Usually, a VSS scheme needs to satisfy two security features: Secrecy and Correctness. Below we give their definitions.

Definition 3
Secrecy. The adversary cannot calculate the correct sharing s during the share phase.

Definition 4
Correctness. The reconstructed value should be equal to the shared secret s or every honest node will reach a result and accuse the node of maliciousness by outputting ⊥.
Some VSS schemes have introduced cryptographic commitment, such as Pedersen commitment with homomorphic characteristics. Cryptographic commitment generally consists of two phases: commit and open, which are respectively to commit and open the message. Polynomial commitment is also a kind of homomorphic commitment, which can be constructed based on discrete logarithm and Pedersen commitment. The polynomial commitment algorithm is based on the two traditional commitment algorithms, combined with the characteristics of the accumulator to add a verification algorithm. The existing research points of verifiable secret share scheme based on polynomial commitments are mainly in the scheme construction of asynchronous and synchronous models [25][26][27].
Here are a few cryptographic assumptions used for the security proof of our scheme.

Definition 5
Discrete Logarithm Assumption (DLA). Given a group G * of generating elements g, G * = G, and a random number ϑ ∈ Z P , the probability that g ϑ is computed by ϑ is κ for each adversary.
Definition 6 Bilinear Pairing. Let G 1 , G 2 be the additive cyclic group of order p, G T is the multiplicative group of the same order, and e : G 1 , G 2 → G T is expressed as a bilinear mapping.

The Proposed Scheme Based on Polynomial Commitment
Our scheme is an improvement on the (n, t, n) verifiable secret sharing scheme [13]. In the (n, t, n) scheme, the first n represents n sub-shares, t represents the threshold, the knowledge of the threshold cryptography is used here, and the last n represents n participants. One advantage of such a scheme is that it does not require a trusted third party, which is not completely trusted. Scheme without a trusted third party can improve security.
On the basis of the previous scheme, polynomial commitment is introduced. Our scheme is divided into two phases: share phase and reconstruction phase. At the beginning of the scheme, the node runs the initial algorithm in the polynomial commitment, randomly selects a generator g, a random number α ∈ Z * p , and then generates a public key pk = g, g, g α , . . . , g α t .

1) Share phase
Master secret generation algorithm: Each node P i independently chooses a sub-secret S i , the master secret can be expressed as S = Share generation algorithm: For each sub-secret S i , P i randomly selects a t-degree polynomial F i (x), and the corresponding sub-secret is F i (0) = S i . Then run the commit algorithm in the polynomial commitment to generate a commitment C = g F(α) and broadcast it throughout the P2P network. For j ∈ [1, n], P i calculates sub-shares s ij = F i (x j ), a witness w j , and sends j, F i (x j ), w j to other P i in the network. The master share can be expressed as s = n i=1 s ij .
Verification algorithm: After receiving j, F i (x j ), w j , each P i starts to run the verify algorithm in the polynomial commitment. If the verification of a share holder P i fails, other nodes will return an accusation message to oppose P i . If more than t nodes accuse P i , obviously, P i is wrong and disqualified. On the contrary, P i broadcasts the corresponding share and i, F i (x), w i to the accusing party. If the revealed share fails to be verified again, then P i is unqualified and the agreement ends, otherwise, each P i accepts s ij .

2) Reconstruction phase:
In the reconstruction phase, when t + 1 shared holders pass the verification algorithm, each P i interpolation pair i, F i (x) to determine S i = F i (0), and then calculates the master secret S.

Security Analysis
First, we give the adversary model. We consider a network P = {P 1 , P 2 , . . . , P n } composed of n participants. Our adversary is t-bounded and adaptive and can compromise and coordinate the actions of up to t of n parties. It can damage any party under any circumstances during the execution of the protocol, as long as the amount of damage is bounded by t.

Theorem 1:
The proposed VSS scheme based on polynomial commitment satisfies correctness and secrecy.
Proof: We will prove that our scheme satisfies the correctness and secrecy features.
Correctness. Compared with other VSS schemes, our scheme does not have dealers. That is to say, in our scheme, we do not need to consider whether the dealer is honest. Suppose that the node uses the polynomial F(x) to share a secret s and remains honest throughout the execution of the sharing phase. Let C be the commitment sent to each node. Considering the strong correctness of the polynomial commitment, all honest nodes will get the correct share of the secret s consistent with C. Suppose a malicious node is allowed to broadcast its triplet i , F i (x), w i , but the final verified value is not equal. Since polynomial commitment is computational binding, only honest nodes can reconstruct the secret.
Secrecy. The secrecy of our scheme comes from the hiding feature of polynomial commitment. Regardless of whether the node is malicious or honest, it is difficult for an adversary to obtain secret-related information. Suppose there is a t-bounded adversary , which can obtain t messages i, F i (x), w i . Since polynomial commitment is constructed based on discrete logarithms, it has hiding features. Below we first prove hiding.
Suppose there is an algorithm E constructed by adversary that can break the DLA. Let g, g ϑ as an instance of the discrete logarithm problem that algorithm E needs to solve. Algorithm E randomly chooses a number ϑ ∈ Z * P to generate a public key pk = G, g, g ϑ , g ϑ 2 , . . . , g ϑ n to the adversary . Algorithm E sets τ , φ(τ ) as the index of polynomial φ(x) at index τ . Then suppose φ(0) = u, which is the answer to the DL instance, and use n + 1 exponential evaluation to calculate g φ(x) , 0, g ϑ and other selected pairs τ , g φ(τ ) . Finally, E calculates the testimony τ , F(τ ) : And send pk and witness tuple τ , φ(τ ), ω τ to the adversary . Once the adversary returns the polynomial φ(x), E returns the constant term φ(0) as the solution of the DLA instance.
It is easy to see that the success probability of solving the DLA instance is the same as the success probability of , and the time required is larger than the time required by by a small constant. That is, it is impossible to reconstruct the polynomial F(x) and the corresponding secret by only revealing such t messages.

PolyVSS Performance Analysis
This section compares the computational costs and functions of the six schemes in the four stages of parameter setting, reconstruction, verification, and recovery.
The polynomial commitment scheme given in Section 2.2 can only open and verify the evaluation of one index and is not suitable when multiple guidelines need to be opened. A batch polynomial commitment was proposed to open and verify the evaluation of multiple indexes. The batch polynomial commitment mainly modifies the verify phase. Let all the indexes τ to be opened form a set W ⊂ Z p , that is τ ∈ W . W satisfies |W | < t. Algorithm output triples W , r(x), ω W , where ω W = g f w (α) is the witness of all indexes. h(x) is expressed as the remainder of Finally, the verifier verifies the correctness of the following equation: With the aid of batch polynomial commitment, when n indexes need to be opened, the burden of witness calculation is reduced from n to 1.
We compared the computational cost and functions of several VSS schemes [28][29][30][31][32]. The specific comparison is shown in Tab. 1, where n represents how many operations are done, and t can be represented as the number of nodes. The function comparison is shown in Tab. 2.

Security Analysis
Denial of service (DoS) attack is a method used to disrupt legitimate users' access to the target network or website resources [33][34][35]. Usually this is achieved by overloading a target with a large amount of traffic (usually a web server), or by sending malicious requests that cause the target resource to malfunction or completely collapse [36][37][38][39][40][41].
Blockchain will also suffer from DoS attacks. In the traditional blockchain, when a node is attacked, it needs to visit other nodes (because each node stores the entire ledger) to recover local data. In our scheme, when a node in the network is attacked, the node can use the reconstruction algorithm of the PolyVSS scheme to recover the corresponding data by accessing other r+1 nodes. Therefore, our scheme can effectively deal with single point of failure.

Theorem 2:
The proposed distributed storage scheme can reconstruct secret by accessing any r + 1 nodes.
Proof: Since deg(F) ≤ t, the polynomial F(x) can be interpolated by accessing any r + 1 nodes.
Below we analyze the cost of restoring communication. For convenience, we use DSB and LSS-DSB respectively to replace the name of the scheme in the paper [42][43][44][45]. The data of the corresponding schemes are given in Tab. 3. We use symbols Stor to represent recovery communication cost, and symbols Com to represent storage cost.
The core of our scheme is the secret sharing scheme, which is also an important tool to achieve recovery. The Shamir secret share used in DSB is one of the most classic schemes. Local secret share is based on Shamir secret share, introducing two new concepts: global secret and local secret. Among them, information as global secret is more important than a local secret. Global secrets are maintained by all users, while local secrets are maintained by individuals. Unlike their two schemes, our scheme does not have a central party, such as the dealer in the Shamir's scheme. In addition, participants in our scheme will mutually verify the legality of share, thereby improving security. Stor log 2 τ + log 2 p 2log 2 p + log 2 τ/(r + 1) log 2 p + log 2 τ/r log 2 p + log 2 τ/(r + 1) Com log 2 τ + log 2 p log 2 τ + 2(r + 1)log 2 p + γ log 2 τ + rlog 2 p log 2 τ + (r + 1)log 2 p Blockchain. Due to the characteristics of traditional blockchains, each node needs to store the entire ledger. When a single point of failure occurs, it is necessary to access all other nodes to restore all transaction data. Assuming B t ∈ F τ , ψ t ∈ F p , where F τ , F p are two prime number domains, so the recovery communication cost is: The symbol ∝ means proportional. Once the size of the prime number field is determined, the storage cost of the blockchain is fixed.

DSB.
Nodes need to visit r + 1 other subsets of nodes to recover all data in DSB. Assuming M l ∈ F p , ψ t ∈ F p , so the recovery communication cost is: Com DSB = log 2 τ + 2(r + 1)log 2 p + γ (16) γ represents the additional cost of accessing other subsets and its value is fixed. Obviously, the recovery communication cost is related to r, and as r increases, the communication recovery cost also increases.

LSS-DSB.
The node can recover the entire data by accessing r subsets locally. Compared with DSB, no additional recovery communication cost is required. The recovery communication cost is: Our scheme. In our scheme, the node also needs to access r + 1 other nodes to recover data. The recovery communication cost is: Assuming p = 2 400 , τ = 2 40 , the recovery communication cost is shown in Fig. 4. Our scheme is superior to DSB in terms of communication cost, similar to LSS-DSB.

Storage Analysis
In this section, we will compare the storage cost of several schemes when storing a transaction. The data of the corresponding schemes are given in Tab. 3, here is a brief analysis of several schemes.
Blockchain. In traditional blockchains based on Bitcoin, nodes usually store the entire transaction ledger. The storage overhead for each node of the blockchain to store a transaction is: DSB. Different from traditional blockchain, DSB uses coding technology to reduce storage overhead, but the node needs to store a private key. The storage overhead for each node of DSB to store a transaction is: LSS-DSB. Local secret share (LSS) divides secrets into one global secret and many local secrets. The most important information will be treated as global secrets. The LSS-based DSB scheme can efficiently store private keys and hash values, which can further reduce storage overhead. The storage overhead for each node of LSS-DSB to store a transaction is: Our scheme. In our scheme, the node does not need to store additional private keys. The storage overhead of each node storing a transaction is: Assuming p = 2 400 , τ = 2 40 and γ = 200, the comparison of storage overhead is shown in Fig. 5. From the figure, we can see that as the size of shard increases, the storage cost of the blockchain is constant, and our scheme becomes smaller and tends to be constant as the size of the shard increases. Compared with several other schemes, our scheme is the best.