AES in CDL Part 3 - Encryption in Python

This is a continuation of my previous post. I finished the encryption portion of my Python implementation of the AES-256 algorithm, and in this post I'll cover some of the details.

AES 256 Encryption in Python

In my last post I left off after the key expansion portion of the algorithm. The next step is to carry out the encryption of the input data. First, the input data is split into a 4x4 matrix called the state matrix. The AES encryption operations work on this matrix. The first 4 bytes of the input data become the first column of the matrix, the next 4 bytes the second column, and so on. I created a function to initialize a two dimensional list with the input data to represent the state matrix, named matrix.

#Generate the initial matrix from the input data
def gen_matrix(data):
    matrix = [[0] * 4 for x in range(4)]
    #Generate the 4x4 input matrix
    row = 3
    col = 3
    for x in range(15,-1,-1):
        matrix[3-row][3-col] =  (data >> (8 * (x))) & 0xFF 
        if x % 4 == 0:
            col -= 1
        row = (row - 1) % 4
    return matrix

Next, the input data is XOR'd with the first of the round keys (round_key[0]). This operation is repeated throughout the encryption process, so I created a function to do it. (The function dumpMatrix() is a utility for printing the state matrix to aid in debugging, it is not a part of the AES process.)

#Add round key operation
def add_round_key(key):
    global matrix
    key_matrix = gen_matrix(key)
    for r in range(4):
        for c in range(4):
            matrix[r][c] = matrix[r][c] ^ key_matrix[r][c]
    dumpMatrix("add round keys")

AES-256 uses 14 rounds of encryption. Each round is the same except for the final round which skips over the mix columns step. Each round uses its respective round key. The loop below carries out the encryption rounds.

for round in range(len(round_keys)-1):
    #Byte substitution
    byte_sub()
    
    #Shift rows
    shift_rows()

    # Final round doesn't include mix columns
    if round < 13:
        #Mix columns
        mix_columns()

    #Add round key
    add_round_key(round_keys[round+1])

The first step in each round is the substitute bytes operation. This operation makes use of the same substitution table, or S-box, as the key schedule. I covered the generation of the S-box in my previous post. Each byte in the matrix is substituted in-place. The S-box operation is non-linear and helps prevent linear analysis of the algorithm among other attacks.

#Perform the byte substitution layer
def byte_sub():
    global matrix
    for r in range(4):
        for c in range(4):
            t = matrix[r][c]
            matrix[r][c] = sbox(matrix[r][c])
    dumpMatrix("Byte sub.")

The next two steps, shift rows and mix columns provide diffusion for the AES algorithm, i.e. ensuring that small changes in the plaintext result in large differences in the ciphertext. The shift rows step consists of leftward rotations of the last three rows of the state matrix. The second row is shifted one place, the third row is shifted two places, and the final row is shifted three places. I wrote a helper function to carry out the rotation of a single row, and a function called shift_rows() to operate on the state matrix.

#Helper function for a leftward rotation of a matrix row
def l_rotate_row(rowNum, shiftCount):
    for x in range(shiftCount):
        global matrix
        temp_byte = matrix[rowNum][0]
        matrix[rowNum][0] = matrix[rowNum][1]
        matrix[rowNum][1] = matrix[rowNum][2]
        matrix[rowNum][2] = matrix[rowNum][3]
        matrix[rowNum][3] = temp_byte

#Shift rows operation
def shift_rows():
    global matrix
    l_rotate_row(1, 1)
    l_rotate_row(2, 2)
    l_rotate_row(3, 3)
    dumpMatrix("shift rows")

The mix columns operation involves multiplying each column of the state matrix by a constant matrix defined by the AES specification. The multiplication is done in the AES finite field defined by the AES irreducible polynomial, P = x^8 + x^4 +x^3 + x + 1. I covered finite field multiplication and addition in my previous post, but in short, the result of the multiplication is reduced modulo P. Addition is carried out in the prime field GF(2), i.e. modulo 2, the equivalent of the XOR operation. To perform the mix columns operation, I created the following function.

#Mix columns operation
def mix_columns():
    global matrix
    for c in range(4):
        col = [ 
                matrix[0][c],
                matrix[1][c],
                matrix[2][c],
                matrix[3][c]
        ]
        col = mmult(col)
        matrix[0][c] = col[0]
        matrix[1][c] = col[1]
        matrix[2][c] = col[2]
        matrix[3][c] = col[3]
    dumpMatrix("mix columns")

This function calls mmult to carry out the matrix multiplication. The multiplication operations in mmult use a helper function for multiplication in GF(2^8). The constant values in the mmult function corresponds to those defined in the constant matrix in the AES specification. In places that the constant is 1, the multiplication operation is skipped.

#Matrix multiplication done in GF(2^8)
def mmult(matb):
    c = [
            None,
            None,
            None,
            None
        ]
    c[0] = gf2mult(2, matb[0]) ^ gf2mult(3, matb[1]) ^ matb[2] ^ matb[3]
    
    c[1] = matb[0] ^ gf2mult(2, matb[1]) ^ gf2mult(3, matb[2]) ^ matb[3]
    
    c[2] = matb[0] ^ matb[1] ^ gf2mult(2, matb[2]) ^ gf2mult(3, matb[3])

    c[3] = gf2mult(3, matb[0]) ^ matb[1] ^ matb[2] ^ gf2mult(2, matb[3])
    
    return c

The function gf2mult is an algorithmic way of multiplying a byte in GF(2^8) using the shift-and-add method, operating on bit vector representations of members of the field. I covered this in more detail in my previous post. Keep in mind, the XOR operation is equivalent to addition (and subtraction) in GF(2^8). The highest bit of the operand x is checked and, if set, indicates that the result of the multiplication exceeds the degree of the irreducible polynomial p and that it must be reduced. This is done by subtracting away the bit vector representation of p via XOR.

#GF(2^8) multiplication using AES irreducible polynomial
def gf2mult(x, y):
    ret = 0
    for i in range(8):
        if (y & 1) != 0:
            ret = ret ^ x
        b = (x & 0x80)
        x = (x << 1) & 0xFF
        if b:
            x = x ^ 0x1B
        y = (y >> 1) & 0xFF
    return ret

The mix columns operation is not included in the final round of encryption as a design choice by in the AES specification because it does not enhance the security of the algorithm.

The final step in the encryption round is XOR'ing the state matrix with the corresponding round key. This operation uses the same add_round_key function described above. Lastly, the ciphertext is reconstructed from the state matrix.

    #Reconstruct the 128-bit cipher text from the matrix
    cipher = 0
    for c in range(4):
        for r in range(4):
            cipher = cipher << 8
            b = matrix[r][c]
            cipher |= b

Testing and Verification

Running the script results in the following ciphertext (given the key and input data below):

Input key:
0x97247d91d32fa1f6bece5da9bfe61c1a3b32edf26fd6ec2a6187ba777fc3c1d8

Input data:
0xe536638ecbcec0be6ce6a97e98da827b

Cipher text:
0x6034088a2dedde69013d073e8681d21c

As a quick validity check, I verified the output of my implementation against the python PyCrypto library's AES-256 ECB implementation.

from Crypto.Cipher import AES
from Crypto.Random import get_random_bytes

def byteLen(x):
    return (x.bit_length() + 7) // 8

#The input key
key = 0x97247d91d32fa1f6bece5da9bfe61c1a3b32edf26fd6ec2a6187ba777fc3c1d8

#One block of input data
input_data = 0xe536638ecbcec0be6ce6a97e98da827b

print("Input key:\n{}\n".format(hex(key)))
print("Input data:\n{}\n".format(hex(input_data)))

key = key.to_bytes(byteLen(key),'big')
data = input_data.to_bytes(byteLen(input_data),'big')
cipher = AES.new(key,AES.MODE_ECB)
ciphertext = cipher.encrypt(data)

print("Cipher text:\n{}".format(hex(int.from_bytes(ciphertext,'big'))))

Running this script results in the expected output.

Input key:
0x97247d91d32fa1f6bece5da9bfe61c1a3b32edf26fd6ec2a6187ba777fc3c1d8

Input data:
0xe536638ecbcec0be6ce6a97e98da827b

Cipher text:
0x6034088a2dedde69013d073e8681d21c

Summary

As I've mentioned in previous posts, I've found it very helpful to write code in Python because it is an easy to write/read/run/debug language and it gives me a good baseline to work from in Verilog. The next post will cover the encryption portion of the algorithm in Verilog.