This is a continuation of my previous post. I finished the encryption portion of my Python implementation of the AES-256 algorithm, and in this post I'll cover some of the details.

In my last post I left off after the key expansion portion of the algorithm. The next step is to carry out the encryption of the input data. First, the input data is split into a 4x4 matrix called the `state matrix`

. The AES encryption operations work on this matrix. The first 4 bytes of the input data become the first column of the matrix, the next 4 bytes the second column, and so on. I created a function to initialize a two dimensional list with the input data to represent the `state matrix`

, named `matrix`

.

```
#Generate the initial matrix from the input data
def gen_matrix(data):
matrix = [[0] * 4 for x in range(4)]
#Generate the 4x4 input matrix
row = 3
col = 3
for x in range(15,-1,-1):
matrix[3-row][3-col] = (data >> (8 * (x))) & 0xFF
if x % 4 == 0:
col -= 1
row = (row - 1) % 4
return matrix
```

Next, the input data is XOR'd with the first of the round keys (`round_key[0]`

). This operation is repeated throughout the encryption process, so I created a function to do it. (The function `dumpMatrix()`

is a utility for printing the state matrix to aid in debugging, it is not a part of the AES process.)

```
#Add round key operation
def add_round_key(key):
global matrix
key_matrix = gen_matrix(key)
for r in range(4):
for c in range(4):
matrix[r][c] = matrix[r][c] ^ key_matrix[r][c]
dumpMatrix("add round keys")
```

AES-256 uses 14 rounds of encryption. Each round is the same except for the final round which skips over the `mix columns`

step. Each round uses its respective round key. The loop below carries out the encryption rounds.

```
for round in range(len(round_keys)-1):
#Byte substitution
byte_sub()
#Shift rows
shift_rows()
# Final round doesn't include mix columns
if round < 13:
#Mix columns
mix_columns()
#Add round key
add_round_key(round_keys[round+1])
```

The first step in each round is the `substitute bytes`

operation. This operation makes use of the same substitution table, or S-box, as the key schedule. I covered the generation of the S-box in my previous post. Each byte in the matrix is substituted in-place. The S-box operation is non-linear and helps prevent linear analysis of the algorithm among other attacks.

```
#Perform the byte substitution layer
def byte_sub():
global matrix
for r in range(4):
for c in range(4):
t = matrix[r][c]
matrix[r][c] = sbox(matrix[r][c])
dumpMatrix("Byte sub.")
```

The next two steps, `shift rows`

and `mix columns`

provide *diffusion* for the AES algorithm, i.e. ensuring that small changes in the plaintext result in large differences in the ciphertext. The `shift rows`

step consists of leftward rotations of the last three rows of the state matrix. The second row is shifted one place, the third row is shifted two places, and the final row is shifted three places. I wrote a helper function to carry out the rotation of a single row, and a function called `shift_rows()`

to operate on the state matrix.

```
#Helper function for a leftward rotation of a matrix row
def l_rotate_row(rowNum, shiftCount):
for x in range(shiftCount):
global matrix
temp_byte = matrix[rowNum][0]
matrix[rowNum][0] = matrix[rowNum][1]
matrix[rowNum][1] = matrix[rowNum][2]
matrix[rowNum][2] = matrix[rowNum][3]
matrix[rowNum][3] = temp_byte
#Shift rows operation
def shift_rows():
global matrix
l_rotate_row(1, 1)
l_rotate_row(2, 2)
l_rotate_row(3, 3)
dumpMatrix("shift rows")
```

The `mix columns`

operation involves multiplying each column of the state matrix by a constant matrix defined by the AES specification. The multiplication is done in the AES finite field defined by the AES irreducible polynomial, `P = x^8 + x^4 +x^3 + x + 1`

. I covered finite field multiplication and addition in my previous post, but in short, the result of the multiplication is reduced modulo `P`

. Addition is carried out in the prime field `GF(2)`

, i.e. modulo 2, the equivalent of the XOR operation. To perform the `mix columns`

operation, I created the following function.

```
#Mix columns operation
def mix_columns():
global matrix
for c in range(4):
col = [
matrix[0][c],
matrix[1][c],
matrix[2][c],
matrix[3][c]
]
col = mmult(col)
matrix[0][c] = col[0]
matrix[1][c] = col[1]
matrix[2][c] = col[2]
matrix[3][c] = col[3]
dumpMatrix("mix columns")
```

This function calls `mmult`

to carry out the matrix multiplication. The multiplication operations in `mmult`

use a helper function for multiplication in GF(2^8). The constant values in the `mmult`

function corresponds to those defined in the constant matrix in the AES specification. In places that the constant is `1`

, the multiplication operation is skipped.

```
#Matrix multiplication done in GF(2^8)
def mmult(matb):
c = [
None,
None,
None,
None
]
c[0] = gf2mult(2, matb[0]) ^ gf2mult(3, matb[1]) ^ matb[2] ^ matb[3]
c[1] = matb[0] ^ gf2mult(2, matb[1]) ^ gf2mult(3, matb[2]) ^ matb[3]
c[2] = matb[0] ^ matb[1] ^ gf2mult(2, matb[2]) ^ gf2mult(3, matb[3])
c[3] = gf2mult(3, matb[0]) ^ matb[1] ^ matb[2] ^ gf2mult(2, matb[3])
return c
```

The function `gf2mult`

is an algorithmic way of multiplying a byte in GF(2^8) using the shift-and-add method, operating on bit vector representations of members of the field. I covered this in more detail in my previous post. Keep in mind, the XOR operation is equivalent to addition (and subtraction) in GF(2^8). The highest bit of the operand `x`

is checked and, if set, indicates that the result of the multiplication exceeds the degree of the irreducible polynomial `p`

and that it must be reduced. This is done by subtracting away the bit vector representation of `p`

via XOR.

```
#GF(2^8) multiplication using AES irreducible polynomial
def gf2mult(x, y):
ret = 0
for i in range(8):
if (y & 1) != 0:
ret = ret ^ x
b = (x & 0x80)
x = (x << 1) & 0xFF
if b:
x = x ^ 0x1B
y = (y >> 1) & 0xFF
return ret
```

The `mix columns`

operation is not included in the final round of encryption as a design choice by in the AES specification because it does not enhance the security of the algorithm.

The final step in the encryption round is XOR'ing the state matrix with the corresponding round key. This operation uses the same `add_round_key`

function described above. Lastly, the ciphertext is reconstructed from the state matrix.

```
#Reconstruct the 128-bit cipher text from the matrix
cipher = 0
for c in range(4):
for r in range(4):
cipher = cipher << 8
b = matrix[r][c]
cipher |= b
```

Running the script results in the following ciphertext (given the key and input data below):

```
Input key:
0x97247d91d32fa1f6bece5da9bfe61c1a3b32edf26fd6ec2a6187ba777fc3c1d8
Input data:
0xe536638ecbcec0be6ce6a97e98da827b
Cipher text:
0x6034088a2dedde69013d073e8681d21c
```

As a quick validity check, I verified the output of my implementation against the python PyCrypto library's AES-256 ECB implementation.

```
from Crypto.Cipher import AES
from Crypto.Random import get_random_bytes
def byteLen(x):
return (x.bit_length() + 7) // 8
#The input key
key = 0x97247d91d32fa1f6bece5da9bfe61c1a3b32edf26fd6ec2a6187ba777fc3c1d8
#One block of input data
input_data = 0xe536638ecbcec0be6ce6a97e98da827b
print("Input key:\n{}\n".format(hex(key)))
print("Input data:\n{}\n".format(hex(input_data)))
key = key.to_bytes(byteLen(key),'big')
data = input_data.to_bytes(byteLen(input_data),'big')
cipher = AES.new(key,AES.MODE_ECB)
ciphertext = cipher.encrypt(data)
print("Cipher text:\n{}".format(hex(int.from_bytes(ciphertext,'big'))))
```

Running this script results in the expected output.

```
Input key:
0x97247d91d32fa1f6bece5da9bfe61c1a3b32edf26fd6ec2a6187ba777fc3c1d8
Input data:
0xe536638ecbcec0be6ce6a97e98da827b
Cipher text:
0x6034088a2dedde69013d073e8681d21c
```

As I've mentioned in previous posts, I've found it very helpful to write code in Python because it is an easy to write/read/run/debug language and it gives me a good baseline to work from in Verilog. The next post will cover the encryption portion of the algorithm in Verilog.