Data security is an essential aspect of data communication and data storage. To provide high-level security against all kinds of unauthorized accesses, cryptographic algorithms have been applied to various fields such as medical and military applications. Advanced Encryption Standard (AES), a symmetric cryptographic algorithm, is acknowledged as the most secure algorithm for the cryptographic process globally. Several modifications have been made to the original architecture after it was proposed by two Belgian researchers, Joan Daemen and Vincent Rijment, at the third AES candidate Conference in 2000. The existing modifications aim to increase security and speed. This paper proposes an efficient pipelined architecture for the key expansion process, effectively reducing the propagation delay to generate the required subkeys. Along with the pipeline structure, the fork and join architecture is also used in the key expansion part of the AES architecture, and it significantly reduces the time taken to generate the required subkeys. The proposed architecture is simulated and implemented on Xilinx Virtex4 XC4VLX200 FPGA. The result indicates that the proposed architecture achieves an improvement of about 37% in throughput compared with the original architecture. Also, the proposed architecture can convert the given plain text to cipher with a minimal propagation delay.
Information is regarded as the most valuable asset today, and a large amount of information is processed and shared per second in this technological world. However, the information must be processed and shared securely to avoid piracy issues or alteration of the information by an unintended user. Cryptographic researches aim to prevent unauthorized access to any information. Also, unauthorized change and availability of the original information motivate the researchers to keep our information securely.
Researchers use cryptographic algorithms to keep information secure while the information is transmitted and received on an insecure channel. Sometimes, information must be secure where it is stored. Cryptographic algorithms perform encryption operations on the original plain text before transmitting and perform decryption operations after receiving the encrypted text from the insecure channel. A secret key is used for both encryption and decryption operations. In terms of the usage of secret keys, cryptographic algorithms can be classified as symmetric and asymmetric algorithms. Also, the algorithms can be classified as block and stream cipher algorithms based on the number of bits. Cryptographic algorithms rely on statistical and analytical techniques that depend on the key of the cryptographic process. Algorithms used for secure communication must withstand the cryptanalytic attacks that aim to find the secret key through the techniques and mathematical operations used to convert the plain text to ciphertext. In 2001, the National Institute of Standards and Technology (NIST) published a symmetric algorithm called Rijndael, and it has been taken as the AES after the Third AES Candidate Conference. AES is available as FIPS 197 in the federal register. Rijndael is designed by two Belgian researchers, Joan Daemen and Vincent Rijment. In the last two decades, researchers have proposed many cryptographic efficient algorithms. However, AES is still the most secure cryptographic algorithm, and it is cost-effective and easy to implement in both hardware and software. AES is a non-Feistel cipher that has the same rounds of operations in both encryption and decryption processes.
In terms of secrecy, the AES cryptographic algorithm is efficient. Also, it has flexibility in both hardware and software implementations. Besides, it executes faster than the other cryptographic algorithms. According to the number of rounds presented in the algorithms, the subkeys required also vary. The generation of these subkeys is a sequential process, and it takes a significant amount of time to generate all the subkeys. This paper proposes an efficient architecture that reduces the time taken to generate the subkeys, and a pipelined architecture is introduced to decrease the amount of propagation delay to generate the subkeys. The proposed architecture can also be implemented in hardware and software effectively. Especially, the algorithm’s security is not compromised since all the originally statistical and mathematical operations are preserved in the proposed architecture. The primary motivation of this design is to provide a high-speed and secure AES algorithm.
The rest of the paper is organized as follows. Section 3 reviews the existing AES algorithms and the mathematical operations involved in the encryption and decryption operations. Section 4 introduces the proposed modifications to the existing algorithm. Section 5 presents the results and discussion of the new design. Section 6 concludes this paper and gives further research work.
AES is a symmetric cryptographic algorithm in which a single secret key is used for both encryption and decryption operations. AES has three different versions, and each version operates at different bit levels of a secret key. Based on the number of bits in the secret key, AES can be classified into AES-128, AES-192, and AES-256 [
AES version | Rounds (Nr) | Key sizes | No of key (Nr+1) |
---|---|---|---|
AES-128 | 10 | 128 | 11 |
AES-192 | 12 | 192 | 13 |
AES-256 | 14 | 256 | 15 |
It should be noted that the data block size is 128 bits for all three versions of the AES algorithm. The number of rounds denotes the encryption operations for a single data block with different subkeys obtained from the key expansion process. The diffusion and confusion process for each round remains the same [
AES is a non-Feistel block cipher cryptographic algorithm that encrypts a 128-bit block of the plain original text into a 128-bit block of encrypted ciphertext based on a secret key. The length of key differs for different versions of the AES algorithm, as mentioned in
Substitution is a process in which a byte is replaced by another byte. The only non-linear process involved in the AES algorithm is substitution. Each byte in the state matrix is considered a polynomial in the Galois Field 28 and transformed into another byte by matrix multiplication and affine transformation. Substitution can also be performed directly through the Rijndael S-box. In decryption, inverse S-box is used in the substitution process. The substitution step introduces the actual confusion to the original data, which forms the vital step in preserving the original data from unauthorized accesses [
In this operation, the rows in the state array are shifted cyclically. The second, third, and fourth rows are respectively shifted left by one, two, and three times, whereas the first row remains unchanged. For decryption operation, rows are shifted to the right. Row shift is the first process of diffusion in the AES algorithm, and this process is invertible [
Like row shift operation, the mix column operation is performed at the column level [
where S’1, S’2, S’3, and S’4 are the output of the mix column operation; S1, S2, S3, and S4 are the output of the row shift operation and are taken as the input of the mix column operation.
Add round key is the most vital operation in the AES algorithm. Because all the previous three steps are invertible, without this operation, AES encryption becomes meaningless. The secret key works at the add round key operation. In this operation, the main key or several subkeys are first arranged as a state matrix with a size of 4 × 4, and matrix addition between keys and plain text is performed. As a result of the add round key operation, the ciphertext is obtained [
All the above four operations are repeated for n times, where n is the number of rounds in the AES version. For every iteration, different subkeys are used. Therefore, for n rounds, n subkeys must be derived. Derivation of several subkeys from a single main key is the key expansion process. The operations in the key expansion process or key schedule must be carefully implemented because the relation between subkeys should not be exploited easily. The key expansion process in the AES algorithm is a non-linear process that follows a mathematical procedure to derive different keys from a single key. Many researchers have developed different architectures for this process to improve the security of the keys obtained. As for the proposed architecture in this paper, the time consumption to generate subkeys from the primary key is reduced.
The key expansion process of AES-128 is discussed in this section. In this process, the main secret key of 128 bits is divided into four words, each with a length of 32 bits. For AES-128, ten subkeys must be derived from the main secret key, and these subkeys are used in the process of producing ciphertext from the plain text. The process of key expansion is illustrated in
Temporary word is the most important in the key expansion process. The only non-linear process involved in the generation of subkeys is the generation of the temporary word. The first word of every subkey is generated as a result of the EXOR operation between its corresponding temporary word and a word from the previous subkey. Three operations are performed in a word to create a temporary word, as shown in
The Rotword operation is the same as the row shift operation, but it is applied to only one row. The Subword operation is similar to the Subbyte operation. The value of Rcon differs for each round and it is listed in
Round | Rcon | Round | Rcon |
---|---|---|---|
1 | (01000000)16 | 6 | (20000000)16 |
2 | (02000000)16 | 7 | (40000000)16 |
3 | (04000000)16 | 8 | (80000000)16 |
4 | (08000000)16 | 9 | (1B000000)16 |
5 | (10000000)16 | 10 | (36000000)16 |
The studies of the efficient implementation of the AES algorithm are going on for the past two decades. Several architectures that are efficient in terms of speed, area, and throughput have been proposed. Zhang et al. [
In the conventional key expansion architecture, all the subkeys are generated sequentially. The interdependency between the subkeys is essential to making efficient cryptanalytic-prone subkeys. Therefore, the interdependency between the subkeys must be preserved. Meanwhile, in the conventional architecture, the words of Wi-1 and Wi-4 are required to create the current word of With. Because of this interdependency, all the required subkeys are generated sequentially. The sequential process is usually time-consuming since the last step is executed only after all the previous steps have been executed. In this case, there is a lot of time delay in generating all the required subkeys. Thus, this research work aims to decrease the time delay required for generating the required subkeys.
The proposed architecture makes two types of modifications to the conventional key expansion structure to reduce the time delay. Each word in the subsequent subkey always depends on a word of the previous subkey, which makes deep confusion necessary. Therefore, the interdependency between subkeys should be preserved to ensure that it is less possible to track the main key with the subkey generated.
In the proposed architecture, the generation of the temporary word differs from that of conventional architectures. In conventional architectures, the last word from the previous subkey is mathematically operated to form the temporary word. Due to this, pipeline architecture is difficult to be implemented. Instead of obtaining the temporary word from the last word of the previous subkey, it is possible to obtain the temporary word from the first word of the previous subkey. Based on this modification, the temporary word can be obtained earlier than that of the conventional architecture. Also, pipeline architecture can be implemented for the subkey generation process to speed up the process.
In the conventional architecture, W8 is dependent on W4 and T8, but T8 is generated from W7. Therefore, it is not possible to generate W8 until W7 is generated. Based on the newly proposed modification, the temporary word W8 can be generated once W4 is available. Since T8 is dependent on W4, it can be generated at the same time as W5, and there is no need to wait until the generation of W7 is completed. Likewise, W9 can also be generated before W7. This effectively speeds up the process of generating subkeys. Based on the circuit with the block structure shown in
In the new proposed architecture, the second modification is splitting the entire architecture into two blocks and executes them in parallel. As for AES-128, ten subkeys are used in ten iterations of the encryption process. Therefore, in the AES-128 architecture, the generation of the first five subkeys can be grouped into one block, and the generation of the remaining five subkeys can be grouped into another block. The first subkey is generated from the main key. The second, third, fourth, and fifth subkeys are generated from the previous subkeys as in the conventional architecture. In the standard AES architecture, the sixth subkey is generated from the previous subkey (the fifth subkey). As shown in
For AES-128, it is necessary to create ten temporary words that form an integral part in forming the first word in all subkeys. Therefore, before employing the fork and join model for the key expansion process, the temporary word for the generation of the first word of the sixth subkey must be carefully created so that it does not depend on the subkeys in the first block. To address this issue, t24 is designed by using the third word of the main secret key. In the proposed architecture, t24 is generated from W2, whereas in existing architecture t24 is generated from W23. Also, the sixth subkey is entirely dependent on the main key instead of the fifth subkey. All the other operations are the same as that of the standard AES temporary word generation process. AES-192 has 12 subkeys because the AES-192 architecture has 12 iterations in both encryption and decryption processes. In AES-192, the entire key expansion architecture can be split into two blocks with each block generating six subkeys. The seventh subkey and seventh temporary word can be derived from the main key instead of the sixth subkey as in the conventional architecture. Likewise, in the AES-256 architecture, each block generates seven subkeys, and the eighth subkey is derived from the main key instead of the seventh subkey. Also, the temporary word of the eighth subkey is derived from the main key.
The proposed key expansion architecture can replace the conventional key expansion architecture without changing the other parts of the AES algorithm. The proposed architecture with a modified key expansion structure for AES-128 is first simulated to verify whether all the subkeys are correctly generated. After verification, this proposed architecture is used in the key expansion process, and the entire AES-128 algorithm is simulated, synthesized, and implemented in Modelsim 10.4d and Xilinx Virtex 4XC4VLX200 FPGA, respectively. Meanwhile, the time consumption to generate the ciphertext of 128 bits is calculated.
Since the final output is different, it is impossible to decrypt the encrypted data of the standard AES algorithm with the proposed algorithm. Therefore, it is necessary to use the same algorithm for both encryption and decryption operations. That is, the receiver must be aware of the modification introduced to the AES algorithm used for encryption. Suppose that the receiver tries to decrypt the ciphertext with the standard AES algorithm and its secret key. In this case, because of the modified subkey values, it is not possible to decrypt correctly. Also, the security is enhanced because even if the intruders obtain the secret key but are not aware of the modifications introduced in the algorithm, the intruder cannot decipher the encrypted data. However, the modifications introduced to the AES algorithm have to be declared to the public since being an open-source algorithm is an important criterion to be considered as AES.
This proposed architecture is implemented in the AES algorithm replacing the conventional architecture for the key expansion process. Because of the multi-stage pipeline architecture, the proposed architecture has the potential to achieve better results than the conventional architecture. Literature [
After the proposed architecture is implemented in the Virtex-4 FPGA, the time cost of the algorithm to encrypt a text is calculated, and the result is compared with that of the existing architecture.
S. No | Method | Slices | Clock period (ns) | Throughput (Mbps) |
---|---|---|---|---|
01 | AES | 1209 | 6.06 | 21,200 |
02 | Pipeline architecture (4 stages) | 3425 | 2.74 | 46,523 |
03 | Pipeline architecture (6 stages) | 3425 | 2.33 | 54,725 |
04 | Sub-pipeline architecture (6 stages) | 1407 | 2.37 | 54,008 |
05 | This work | 3439 | 2.28 | 56,140 |
The throughput efficiency of the proposed architecture is compared to that of the other existing architectures, and the result is listed in
Parameter | AES | Pipeline |
Pipeline |
Sub pipeline |
---|---|---|---|---|
Throughput efficiency | 62.23 | 9.03 | 2.53 | 3.8 |
The proposed architecture can be expanded to AES-192 and AES-256, and similar improvements can be achieved. Security is the most crucial aspect of a cryptographic algorithm. The proposed architecture does not compromise the security of the algorithm because the modification of the existing architecture is made to the process of subkey generation. The basic AES operations of the algorithm remain unchanged. In the proposed architecture, the entire plain text will be processed by different numbers of iterations like that of the standard AES algorithm. The secret key and the length of the subkey used in the proposed architecture also remain the same. Also, in the subkey generation process, the interdependency between the subkey is preserved in each block. So, the security of the proposed algorithm is the same as that of the standard AES.
In this paper, a speed-efficient AES algorithm is proposed and implemented on FPGA. In the proposed architecture, the CFA-based pipeline architecture for key schedule operation is used. Also, the parallel execution of the subkey generation process is implemented in the proposed architecture. The proposed architecture aims to improve the execution speed of the algorithm without compromising the security and the area of the original AES algorithm. The structure is implemented on the Xilinx Vitex-4 FPGA. The analysis indicates that the proposed architecture achieves a higher speed than that of existing architectures with the same area. The critical path delay is reduced by 2.15% by this new pipelined structure. Because of the pipeline architecture and parallel block execution of the subkey generation process, this new architecture achieves an improvement of about 2.53% in throughput. However, the operating frequency can be further increased without affecting the area consumption. Also, some optimization techniques like the pipeline architecture in S-box can be exploited to extend this research work. Besides, the implementation of the basic AES operations can be improved to achieve a better area-efficient algorithm.
We would like to thank TopEdit (www.topeditsci.com) for the English language editing of this manuscript.