

# Investigation of Logic Level Techniques to Improve AES Throughput

<sup>[1]</sup>Megha S Hallikeri, <sup>[2]</sup>Raghuram Srinivasan
 <sup>[1]</sup> Student, <sup>[2]</sup> Associate Professor,
 Dept. of E&C, M. S. Ramaiah Institute of Technology, Bengaluru
 <sup>[1]</sup>meghahallikeri0@gmail.com,<sup>[2]</sup> raghuram@msrit.edu

*Abstract*— Substitution Box(S-Box) is an important integral part of modern cryptographic cipher techniques like Advanced Encryption Standard (AES). There exists a bulk literature devoted to the implementation of AES. Combinational logic of implementation attains high throughput in terms of parameters like speed, area, delay etc. This paper represents the working principle of AES, Novel algorithmic approaches for S-Box. We mainly concentrated on combinational logic implementation of S-Box in order to Improve Area. AES is programmed and executed on Xilinx 14.7 version, Spartan 3 Family, ModelSim Simulator. Cadence 180nm tool is used to verify the parameters. We examine the current work which exploits the mathematical properties of S-Box. Using this technique throughput is increased by 30%. We verified individual modules separately and waveforms are obtained.

Index Terms—Advanced Encryption Standard (AES), Substitution Box(S-Box)

### I. INTRODUCTION

Cryptography plays a vital role in the network security; security issues have become prominent as technology is developed. The National Institute of Standards and Technology(NIST) declared Rijndael Block cipher method as AES algorithm in 2001 [1]. AES performs based on symmetric key algorithm where in encryption and decryption both uses same key. The length of block used for processing is 128 bit, length of key can be varied to multiples of 32 i.e 128,192 or 256bits. From 128-bit key, algorithm generates 10 keys of 128-bit each, which are placed into 4X4 arrays. Simultaneously plain text is divided into 4X4 arrays each are of 128-bits. Each of 126-bit plain text block is processed in 10 rounds (number of rounds varies with key size). After 10<sup>th</sup> round new code is generated. Every individual byte is substituted in an S-Box and replaced by reciprocal on Galois Field (GF). AES algorithm undergoes four transformations namely AddRound key, substitution (S-Box), Mix columns, Shift rows [2].

S-Box is the main and costliest block in AES, several different methods for implementation have been proposed in literature. S-Box is a non-linear substitution step where in individual byte is replaced with another byte according to Lookup table [2]. We compared our results with different algorithms and standard lookup table where lookup table (LUT) results with\_736 cell numbers, area 2772 square meters.\_Combinational logic is prominent and emerging

technology where implementation of S-Box is achieved by operating on Galois Fields (GF), Computation of S-Box using GF results better than direct implementation or using LUT.

The sections of this paper are analyzed as follows. Section II briefs the basic understanding and functionality of transformations in GF while calculating S-Box, section III elaborate the proposed architecture of S-Box using combinational logic, Section IV provides the obtained results and compares with Base paper.

### II. AES WORKING PRINCIPLE



Fig.1 AES basic block diagram



### International Journal of Engineering Research in Electronic and Communication Engineering (IJERECE) Vol 3, Issue 5, May 2016



Fig.2 AES flowchart

As shown in above figure AES mainly perform on input which can be text, image or video. And encryption and decryption are done as shown Fig.1. Detailed procedure of AES is shown in Fig.2.

### a. Add Round Key

Add Round Key is the initial step, where the sub key is combined and performed with state. Sub key is obtained from main key using Rijndael's key. Size of key and state are kept same. Su bkey is combined with each byte of state with corresponding to bitwise Ex-or[1].

### b. Shift Rows

This is the module of AES operates on rows. It operates by shifting each row cyclically, the first row is left unchanged. For shifting certain rows offset is set.  $2^{nd}$ , 3rd rows are shifted by one left shift, two shifts. Shifting rows remains same for different block sizes 128 bits, 192 bits and 256 bits. Row m is shifted left circularly by m-1 bytes[2].

### c. Mix Column

Mix Column step is one of the complicated step in AES, four bytes of each column are combined using linear transformation. The function takes four bytes as input. During operation, each column is transformed using matrix. The matrix retains in all AES operation.

| 2                                     | <b>3</b> | 1 | 1                                                                      |
|---------------------------------------|----------|---|------------------------------------------------------------------------|
| 1                                     | <b>2</b> | 3 | $     \begin{array}{c}       1 \\       3 \\       2     \end{array} $ |
| 1                                     | 1        | 2 | 3                                                                      |
| $\begin{bmatrix} 1\\ 3 \end{bmatrix}$ | 1        | 1 | 2                                                                      |

We are mainly dealing with S-Box and detailed explanation is given in preceding sections.

### III. PRELIMINARIES FOR GF TRANSFORMATION

# a. Isomorphic Mapping and Inverse Isomorphic Mapping of GF:

In order to attain transformation of higher order Galois Fields into lower order, irreducible polynomials are used as given in equation (1)

$$\begin{array}{ccc} GF(2^4) & \longrightarrow & GF(2^8) ; I2(X) = x^2 + x + \lambda; \\ GF(2^2) & \longrightarrow & GF(2^4) ; I1(X) = x^2 + x + \phi; \\ GF(2) & \longrightarrow & GF(2^2) ; I0(X) = x^2 + x + 1; \end{array}$$

Where  $\phi = \{10\} \ \lambda = \{1100\}$ 

(1)

The implementation of S-Box involves 8-bit multiplicative

Inverse module followed by Affine Transform (AT). The inverse SubByte can be generated using Inverse AT. Where both SubByte and Inverse SubByte can be generated together. The bytes are represented in  $GF(2^8)$  are viewed in terms of polynomial forms. All the operations in  $GF(2^8)$  are obtained by taking Modulo-2 operations using corresponding fixed irreducible polynomial p(x).

$$p(x) = x^8 + x^4 + x^3 + x + 1$$
(2)

Inversion in  $GF(2^8)$  has many sublevels. Multiplicative inverse can be calculated by reducing complex  $GF(2^8)$  into lower order fields like  $GF(2^4)$  and  $GF(2^2)$ . Multiplicative inverse can be given as in equation (3).

$$(bx+c)^{-1} = b(b^{2}B+bcA+c^{2})^{-1}x+(c+bA)(b^{2}B+bcA+c^{2})^{-1}$$
 (3)



International Journal of Engineering Research in Electronic and Communication Engineering (IJERECE) Vol 3, Issue 5, May 2016



Fig.3 Multiplcative Inverse module in  $GF(2^4)$ 

| The legends of block diagram are as follows | The legends of | f block diagram | are as follows |
|---------------------------------------------|----------------|-----------------|----------------|
|---------------------------------------------|----------------|-----------------|----------------|

| δ             | Isomorphic mapping to composite fields                        |  |  |  |
|---------------|---------------------------------------------------------------|--|--|--|
|               | in GF                                                         |  |  |  |
| Squarer       | Squaring in GF(2 <sup>4</sup> )                               |  |  |  |
|               | Multiplication inversion                                      |  |  |  |
| $d^* \lambda$ | Multiplication with constant $\lambda$ in GF(2 <sup>4</sup> ) |  |  |  |
| $\oplus$      | Addition operation in GF(2 <sup>4</sup> )                     |  |  |  |
| X             | Multiplication operation in GF(2 <sup>4</sup> )               |  |  |  |
| δ-1           | Inverse isomorphic mapping to GF(2 <sup>8</sup> )             |  |  |  |

### Multiplicative Inverse (MI) in GF

Calculation for the multiplicative inverse in composite fields cannot be directly applied to byte which is based on GF(2<sup>8</sup>). That Byte or element has to be mapped to its Composite field representation by applying to an isomorphic function,  $\delta$ . Followed by performing the MI, the result of MI also have to be mapped back from its composite field to its equivalent in GF(2<sup>8</sup>) via the inverse isomorphic function,  $\delta^{-1}$ . Where  $\delta$  and  $\delta^{-1}$  are represented in an 8x8 matrix. Let q be the element in GF(2<sup>8</sup>), then the isomorphic mapping and its inverse isomorphic mapping can be written as  $\delta^*q$  and  $\delta^{-1} * q$ , which is a obtained by matrix multiplication as shown below, where q7 indicates most significant bit(MSB) and q0 least significant bit(LSB).

| $\delta =$ | 10100000<br>11011110<br>10101100<br>10101110<br>11000110<br>10011110<br>01010010 | $\delta^{-1} =$ | 11100010<br>11011110<br>01100010<br>01110110<br>0011110<br>10011110<br>00110000 | (4) |
|------------|----------------------------------------------------------------------------------|-----------------|---------------------------------------------------------------------------------|-----|
|            | 01000011                                                                         |                 | 01110101                                                                        |     |

### Affine Transformation

The Multiplicative Inverse output has been computed, and affine transformation is carried out on the resulted polynomial[3]. The affine transformation is a simple matrix multiplication and XOR with a constant column matrix. Affine transformation over a finite field i.e. GF  $(2^8)$  can be given as below equation,

$$\alpha = \begin{bmatrix} 11111000\\01111100\\00111110\\00011111\\10001111\\11000111\\11100011\\11100011\\111100011\\x_{0}\end{bmatrix} \begin{pmatrix} x_{7}\\x_{6}\\x_{5}\\x_{4}\\x_{3}\\x_{2}\\x_{1}\\1\\1\\0 \end{bmatrix} (5)$$

#### IV. PROPOSED WORK

Squarer

Equations for squaring a element in  $GF(2^4)$  is taken from [1], where in converting the  $GF(2^4)$  elements into  $GF(2^2)$  by using irreducible polynomial. Hence the Boolean expression can be given as equation (6)

$$d_{3} = b_{3};$$

$$d_{2} = b_{3} \oplus b_{1};$$

$$d_{1} = b_{2} \oplus b_{1}$$

$$d_{0} = b_{3} \oplus b_{1} \oplus b_{0}$$
(6)

# **<u>ERP</u>**

### Multiplication with constant ( $\lambda$ )

Modulo reduction can be performed and simplified by substituting  $x^2 = x + \varphi$  using the irreducible Polynomial. The equations for multiplication with constant are derived in [4]. The polynomial 'd' is result of squarer of polynomial 'b', is multiplied with the constant  $\lambda = \{1 \ 1 \ 0 \ 0\}$ . This operation can be further simplified and 'g' could be achieved as simple Boolean expressions, given in equation (7).

$$g_{3} = d_{2} \oplus d_{0}$$

$$g_{2} = d_{3} \oplus d_{2} \oplus d_{1} \oplus d_{0}$$

$$g_{1} = d_{3}$$

$$g_{0} = d_{2}$$
(7)

### *Multiplication in* $GF(2^4)$

Multiplication in  $GF(2^4)$  is the major block in Combinational logic architecture, which consumes more area and complicated to analyze. Multiplication of polynomial in GF is performed as given in fig 4. Which include three major blocks of multiplication. These multiplication blocks carried out in  $GF(2^2)$ , the equivalent design is given in fig 3. Equations derived for multiplication block in [1], the simplified equation for multiplication in  $GF(2^2)$  can be given as below equation(8)

$$z(1) = x(1)y(0) \oplus x(0)y(1) \oplus x(1)y(1)$$
  

$$z(0) = x(0)y(0) \oplus x(1)y(1)$$
(8)



### Fig.4 Multiplier In $Gf(2^4)$ Embedded With Multipliers In $Gf(2^2)$

According to literature this block is directly implemented as shown in Fig.3, and the above equations are used. Using Ex-or gates leads to increase in area and delay. To overcome this hurdle we tried to simplify the equation and proposed a new structural design as shown in Fig.5



Fig.5 Proposed Architecture for GF (2<sup>2</sup>) Multiplier

Where Ex-or gate is replaced by the normal and, or gates. Which has given overall optimized result. This proposed Multiplication design of  $GF(2^2)$  is used in Fig.4 The results for each block and module are plotted in next section.

### Multiplicative Inverse:

Multiplicative inverse is sub module of combinational logic, can be implemented directly using the standard table as given below. The equations are obtained from [1]

$$q^{-3} = q_3 \oplus q_2 \oplus q_1 \oplus q_3 q_2 \oplus q_3 q_0$$

$$q^{-2} = q_2 \oplus q_2 q_1 \oplus q_3 q_0 \oplus q_3 q_2 q_1 \oplus q_3 q_2 q_0$$

$$q^{-1} = q_3 \oplus q_1 \oplus q_2 \oplus q_2 q_0 \oplus q_3 q_1 q_0 \oplus q_3 q_2 q_1$$

$$q^{-0} = q_0 \oplus q_1 \oplus q_2 \oplus q_2 q_1 \oplus q_3 q_1 \oplus q_3 q_0 \oplus q_3 q_2 q_0 \oplus q_3 q_2 q_1$$
(9)

### V. RESULTS

We worked on many methods to achieve the standard table for S-Box, and calculated varies parameters for each methods. Combinational logic method is the efficient method which attained less area. We tried to optimize each sub module of whole architecture shown in fig.1. Could



### International Journal of Engineering Research in Electronic and Communication Engineering (IJERECE) Vol 3, Issue 5, May 2016

achieve efficient result in inverse delta module calculation, multiplication in  $GF(2^2)$  which in turn enhanced the result of multiplication in  $GF(2^4)$ . The result of multiplicative inverse module is performed on Affine Transform to obtain Substitution box, and performed on Inverse Affine Transform to get inverse substitution box.

| d4bf5d30e0b452 | 2aeb84111f11 | le2798e5 |   |
|----------------|--------------|----------|---|
| 046681e5e0cb19 | 99a48f8d37a2 | 2806264c | 1 |
| 046681e5       |              |          |   |
| e0cb199a       |              |          |   |
| 48f8d37a       |              |          |   |
| 2806264c       |              |          |   |
|                |              |          |   |

Fig.6 MixColumn output wavefroms



| տուփոու  | տիտոփոտ    | ռիռոռիռո      | ախտուիստ         | տիտուիտ      | ЛЛЛ |
|----------|------------|---------------|------------------|--------------|-----|
|          |            |               |                  |              |     |
|          |            |               |                  |              |     |
| 00       |            |               | <u>10</u>        |              |     |
|          |            |               |                  |              |     |
| 53       |            | 6b            | 32               | )ca          |     |
| 00       |            | 52            | )7c              | 74           | 1   |
| 0000000  | 01001011   |               | 00110100         | 00111001     |     |
| 0000000  | 00000101   |               | <u> 10100001</u> | 00010000     |     |
| 01100011 |            | 01101011      | 00110010         | (11001010    |     |
| 0000000  |            | 01010010      | 01111100         | 01110100     |     |
| 0000     |            | (1011         | 0100             | (1001        |     |
| 0000     |            | 0100          | )0011            |              |     |
| 0000     |            | 1001          | 0011             | <u>(1101</u> |     |
| 0000     |            | 0011          | 0101             |              | 1   |
| 0000     | نصعة معداد | (1111         | 0111             | (1010        |     |
| 0000     |            | 0010          | (1100            | 0110         |     |
| 0000     |            | 1001          | (1010            | 0111         |     |
| 0000     |            | (1110         | 0011             | 0011         |     |
| 0000     |            | 0010          | (1101            | (1001        |     |
| 0000     |            | <u>)</u> 1010 | (1100            | (1000        |     |
| 0000     |            | Yonho 1       | 10011            |              |     |

Fig.8 S-Box Output Waveforms In Xilinx

| _  |    |    |    |    |     |  |
|----|----|----|----|----|-----|--|
| 00 | 01 | 02 | 03 | 04 | 05  |  |
| 63 | 70 | 77 | 75 | f2 | ,6b |  |
|    |    |    |    |    |     |  |

Fig.9 S-Box Output Waveforms Of Lut



Fig.10 Synthesized Output Of S-Box In Cadence

| Architecture  | Number   | Area        | Timing(ps) | Total     |
|---------------|----------|-------------|------------|-----------|
|               | of Cells | $(\mu m^2)$ |            | Power(nW) |
| LUT           | 736      | 2772        | 3265       | 36220     |
| S-Box in      | 227      | 1255        | 6467       | 23694     |
| GF(16)        |          |             |            |           |
| Using         | 485      | 1821        | 2114       | 27486     |
| Multiplcative |          |             |            |           |
| Inverse       |          |             |            |           |
| Using         | 167      | 842         | 4375       | 38773     |
| inverse       |          |             |            |           |
| module        |          |             |            |           |
| [1]           | 178      |             |            | 61112     |
|               |          | 3988.35     |            |           |
| Our design    | 152      | 785         | 4068       | 35667     |

## Table 1: Comparison Of Different algorithms

We considered different algorithms in our work. Where lut method is popular, in which the standard calculated table of s-box is implemented directly. Which consumes large hardware and area. S-box in gf(16) is one of architecture we worked on, where gf(256) is converted to gf(16) and multiplicative inverse is found and processed to affine transform and s-box is obtained. By calculating all possible combination of multiplicative inverse and inverse module, results are tabulating and used in programming. Where in our proposed design implemented by using polynomials in gf( $2^8$ ),gf( $2^4$ ) and gf( $2^2$ ) as explained in above sections.

## VI. CONCLUSION

The Results of Our Design Is Verified And tabulated in table.2. Where cell number and area are reduced BY 579,  $1987\mu^2$ . And comparing with base paper [1] cell number and area is improved by 30%. the final results obtained are plotted in fig.7. Cadence 180nm technology is used to calculate the parameters. This work can be performed on asic[1].



From table.1 we conclude that combinational logic implementation of s-box gives more efficient and compact platform.

### REFERENCES

- [1] P.V.S.ShastIy, Anuja Agnihotri, Divya Kachhwaha, Jayasmita Singh, Dr.M.S.Sutaone "A Combinational Logic Implementation of S-box of AES" 978-1-61284-857-0/11/\$26.00 @2011 IEEE
- [2] Xinmiao Zhang, Keshab K. Parhi "Implementation Approaches for the Advanced EncryptionStandard Algorithm" 1531-636X/12/\$10.00©2002IEEE
- [3] J. Daeme, V.Rijmen " AES proposal: Rijndael. NIST AES Proposal" April 2003
- [4] Bahram Rashidi, Bahman Rashidi "Implementation of An Optimized and Pipelined Combinational Logic Rijndael S-Box on FPGA" I. J. Computer Network and Information Security, 2013, 1, 41-48 Published Online January 2013 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijcnis.2013.01.05
- [5] Saleh Abdel-hafeez, Ahmed Sawalmeh, Sameer Bataineh "HIGH PERFORMANCE AES DESIGN USING PIPELINING STRUCTURE OVER GF((2<sup>4</sup>)<sup>2</sup>)" 2007 IEEE International Conference on Signal Processing and Communications (ICSPC 2007), 24-27 November 2007, Dubai, United Arab Emirates