This is a continuation of my previous post. In this post I will cover the first iteration on the encryption portion of my Verilog implementation of the AES-256 algorithm. This post primarily focuses on the implementation. My previous post covers the algorithm itself in more detail. I left off at the key expansion portion of the algorithm and I continued with the implementation of the encryption portion.

Encryption in Verilog

Architecture of the Module

The AES algorithm operates on a 4x4 state matrix that contains the input data. There are two initial operations: splitting the input data into the state matrix and adding the initial round key. Next, there are 14 rounds of encryption that operate on the state matrix. Each round consists of a byte substitution, a byte rotation, a column mixing operation, and adding the corresponding round key (the final round excludes the mix column operation). Lastly, the state matrix is re-assembled into a 128-bit register. Each of these operations works on the same state matrix and depends on the output of the previous operation. Instead of duplicating the state matrix for ever logical operation, I decided to store one state matrix and create a state machine to give each step consent to operate on the state matrix. At the time of writing, I don't know if this is a "good" design or not. It is, admittedly, a non-parallel software-like design choice. If/when I learn more about design patterns in CDL, I may revisit this. I am happy to have a working encryption scheme and I'm not opposed to refactoring/refining when possible. I will describe the current design in more detail as I go through the module.

Modifications to the Module

First, I modified the module declaration to include a reset signal, reset, a 128-bit data input register, data_in and a 128-bit data output register, data_out.

module AES256
    (
        input clock,             //Clock signal
        input enable,            //Enable signal
        input reset,             //Reset signal
        input [255:0] key,       //Input key
        input [127:0] data_in,   //Input data
        output reg [127:0] data_out, //Output data
        output wire valid        //Valid signal
    );

Next, I enumerated the various consent bits for each step of the process. I will describe how these are used below.

//Init key consent
localparam KEY_INIT = 12'h1;

//Key word consent
localparam KEY_WORDS = 12'h2;

//Key scheduling consent
localparam KEY_SCHED = 12'h4;

//Init matrix consent 
localparam MATRIX_INIT = 12'h8;

//First round key consent
localparam ADD_R0_KEY = 12'h10;

//Sub bytes consent
localparam SUB_BYTE = 12'h20;

//Shift row consent
localparam SHIFT_ROW = 12'h40;

//Mix col consent
localparam MIX_COL = 12'h80;

//Add key consent
localparam ADD_KEY = 12'h100;

//Output data consent
localparam OUT_CONS = 12'h200;

//Validity consent
localparam OP_DONE = 12'h400;

I then added a parameter to hold the number of rounds; in the case of AES256, this value is 14. Below that, I added a register to count the rounds and a consent register to hold the current state of the consent state machine.

//Round count
localparam NUM_ROUNDS = 8'hE;

//Round counter
reg [7:0] round_counter;

//Matrix consent register
reg[10:0] consent_reg;

Next, I added a multidimensional array to represent the state matrix described above.

//State matrix
reg [7:0] matrix[0:3][0:3];

The Consent State Machine

I added an always block to manage the state of the consent state machine. The state machine only gives consent to the other blocks if the operation isn't already finished. I make this check at the top of the block.

    if(consent_reg != OP_DONE)
    begin

If this check clears, I step through a case statement moving from one consent to the next in the order that the operations occur. (I will go into the implementation of each step later in this post)

  • KEY_INIT - Performs the initial split of the input key into words
  • KEY_WORDS - Calculates the remaining words that make up the round keys
  • KEY_SCHED - Concatenates the key words into the round keys and stores them for use
  • MATRIX_INIT - Initializes the state matrix with the input data bytes
  • ADD_R0_KEY - Adds the first round key to the state matrix

The variable round_counter is used to track the encryption rounds and acts as an index into the round key array. This value is incremented after the 0th round key is used in the ADD_R0_KEY state.

    case(consent_reg)
        0: consent_reg = KEY_INIT;
        KEY_INIT: consent_reg = KEY_WORDS;
        KEY_WORDS: consent_reg = KEY_SCHED;
        KEY_SCHED: consent_reg = MATRIX_INIT;
        MATRIX_INIT: consent_reg = ADD_R0_KEY;
         ADD_R0_KEY:
            begin
                round_counter += 1;
                consent_reg = SUB_BYTE;
            end
        SUB_BYTE: consent_reg = SHIFT_ROW;

The encryption rounds comprise of the next four states:

  • SUB_BYTE - Performs the byte substitution on the state matrix
  • SHIFT_ROW - Performs the row rotations on the state matrix
  • MIX_COL - Performs the column mixing operation on the state matrix
  • ADD_KEY - Adds the current round key to the state matrix

In the SHIFT_ROW state, the transition depends on the encryption round. In the final round of encryption, the MIX_COL state is skipped.

    SHIFT_ROW:
    begin
        if(round_counter < 13)
            consent_reg = MIX_COL;
        else
            consent_reg = ADD_KEY;
    end
    MIX_COL: consent_reg = ADD_KEY;

Likewise, in the ADD_KEY state, the transition returns to the byte substitution step, SUB_BYTE, unless the 14 encryption rounds are complete, in which case it proceeds to the OUT_CONS state. In the OUT_CONS state, the state matrix is re-assembled into the final 128-bit field. The OUT_CONS state then transitions to the OP_DONE state. Each ADD_KEY step indicates the completion of one round so the round_counter is incremented.

            ADD_KEY:
            begin
                if(round_counter < 14)
                    consent_reg = SUB_BYTE;
                else
                    begin
                    consent_reg = OUT_CONS;
                    end
                round_counter += 1;
            end
            OUT_CONS: consent_reg = OP_DONE;
        endcase
    end
end

The Key Schedule

I covered the key scheduling portion of the algorithm in my previous posts. The main difference here is that the operations are broken up into discrete always blocks that wait for consent from the state machine. The underlying implementation is the same for these steps as that post. For example, the KEY_INIT state responsible for breaking the key into words checks for consent before operating on the key.

//Initial split of the key into words
always @(posedge clock)
begin: key_split_op
    if(consent_reg == KEY_INIT)
    begin
        ///Split the input key into the first 8 words
        key_words[0] = key[255:224];
        key_words[1] = key[223:192];
        key_words[2] = key[191:160];
        key_words[3] = key[159:128];
        key_words[4] = key[127:96];
        key_words[5] = key[95:64];
        key_words[6] = key[63:32];
        key_words[7] = key[31:0];
    end
end

State Matrix Initialization

The MATRIX_INIT state is responsible for splitting the input data into a 4x4 byte matrix. I hardcoded the assignment of the data bytes to their respective positions in the state matrix.

//Initialize the state matrix with the input data
always @(posedge clock)
begin: init_matrix
    if(consent_reg == MATRIX_INIT)
    begin
        matrix[3][3] = data_in[7:0];
        matrix[2][3] = data_in[15:8];
        matrix[1][3] = data_in[23:16];
        //...
        matrix[0][0] = data_in[127:120];
    end
end

Adding the First Round Key

The ADD_R0_KEY state adds the first round key to the state matrix. This state and the ADD_KEY state are both carried out by the same always block. The two states perform the same action but are identified uniquely because they have different exit transitions. In the ADD_R0_KEY state, the variable round_counter is 0, so the first (0th) round key is taken from the round_keys array. In GF(2^n) Galois fields, the XOR operation is equivalent to addition and subtraction, so the operation of adding the round key reduces to a byte-by-byte XOR of the state matrix.

//Add the round key
always @(posedge clock)
begin: r0_key
    if(consent_reg == ADD_R0_KEY || consent_reg == ADD_KEY)
    begin
    matrix[3][3] = matrix[3][3] ^ round_keys[round_counter][7:0];
    matrix[2][3] = matrix[2][3] ^ round_keys[round_counter][15:8];
    matrix[1][3] = matrix[1][3] ^ round_keys[round_counter][23:16];
    matrix[0][3] = matrix[0][3] ^ round_keys[round_counter][31:24];
    matrix[3][2] = matrix[3][2] ^ round_keys[round_counter][39:32];
    matrix[2][2] = matrix[2][2] ^ round_keys[round_counter][47:40];
    matrix[1][2] = matrix[1][2] ^ round_keys[round_counter][55:48];
    matrix[0][2] = matrix[0][2] ^ round_keys[round_counter][63:56];
    matrix[3][1] = matrix[3][1] ^ round_keys[round_counter][71:64];
    matrix[2][1] = matrix[2][1] ^ round_keys[round_counter][79:72];
    matrix[1][1] = matrix[1][1] ^ round_keys[round_counter][87:80];
    matrix[0][1] = matrix[0][1] ^ round_keys[round_counter][95:88];
    matrix[3][0] = matrix[3][0] ^ round_keys[round_counter][103:96];
    matrix[2][0] = matrix[2][0] ^ round_keys[round_counter][111:104];
    matrix[1][0] = matrix[1][0] ^ round_keys[round_counter][119:112];
    matrix[0][0] = matrix[0][0] ^ round_keys[round_counter][127:120];
    end
end

Encryption Round Step 1. Byte Substitution Layer

The SUB_BYTE state performs a byte-by-byte substitution of the state matrix using the AES S-box. I added an always block to carry out this operation by iterating over each column and row, calling the sbox function on each byte. The sbox function and the S-box table are the same used for all other S-box operations in the algorithm. I covered the AES S-box in my previous post about the key scheduling portion of the algorithm.

//S-Box function
function [7:0] sbox;
    input [7:0] b;
    begin
        sbox = s_box[b[7:4]][b[3:0]];
    end
endfunction

//Sub bytes layer
always @(posedge clock)
begin: sub_bytes
    integer row;
    integer col;
    if (consent_reg == SUB_BYTE)
    begin
        for(row = 0; row < 4; row++)
        begin
            for(col=0; col<4; col++)
            begin
                matrix[row][col] = sbox(matrix[row][col]);
            end
        end
    end
end

Encryption Round Step 2. Shift Rows

The SHIFT_ROW state performs a row-by-row rotation of the state matrix. The first row remains as-is, the second row is rotated one byte to the left, the third row two bytes to the left, and the fourth row three bytes to the left. I created an always block to carry out the rotation in single-step increments. I repeated the left-shift step in a loop row times for each row 1-3.

//shift rows layer
always @(posedge clock)
begin: shift_row
    integer row;
    integer iter;
    reg [7:0] temp_byte;
    if (consent_reg == SHIFT_ROW)
    begin
        for(row = 1; row < 4; row++)
        begin
            for(iter = 0; iter < row; iter++)
            begin
                temp_byte = matrix[row][0];
                matrix[row][0] = matrix[row][1];
                matrix[row][1] = matrix[row][2];
                matrix[row][2] = matrix[row][3];
                matrix[row][3] = temp_byte;
            end
        end
    end
end

Encryption Round Step 3. Mix Columns

In the Mix Columns operation, each column of the state matrix is multiplied by a constant matrix defined in the AES specification. The MIX_COL state carries out this matrix multiplication. All of the multiplication and addition operations are performed in GF(2). Several elements within the constant matrix are 1 and are ignored during the multiplication operation. The MIX_COL always block loops over each column performing the matrix multiplication.

//Mix columns layer
always @(posedge clock)
begin: mix_col
    integer col;
    reg [7:0] temp_col[0:3];

    if(consent_reg == MIX_COL)
    begin
        for(col = 0; col < 4; col++)
        begin
            temp_col[0] = gf2mult(2, matrix[0][col]) ^ gf2mult(3, matrix[1][col]) ^ matrix[2][col] ^ matrix[3][col];
            temp_col[1] = matrix[0][col] ^ gf2mult(2, matrix[1][col]) ^ gf2mult(3, matrix[2][col]) ^ matrix[3][col];
            temp_col[2] = matrix[0][col] ^ matrix[1][col] ^ gf2mult(2, matrix[2][col]) ^ gf2mult(3, matrix[3][col]);            
            temp_col[3] = gf2mult(3, matrix[0][col]) ^ matrix[1][col] ^ matrix[2][col] ^ gf2mult(2, matrix[3][col]);
            matrix[0][col] = temp_col[0];
            matrix[1][col] = temp_col[1];
            matrix[2][col] = temp_col[2];
            matrix[3][col] = temp_col[3];
        end
    end
end

The function gf2Mult is a helper function that performs multiplication in GF(2). The function uses the shift-and-add technique for multiplication. The highest bit of the x operand is checked. If this bit is set, the polynomial that x represents in GF(2) is of a degree higher than the irreducible polynomial, P, that defines the AES finite field and, as a result, must be reduced by subtracting away P. P is represented in hex 0x11B. Because this multiplication is a single-byte operation, P is truncated to the lower 8 bits, or 0x1B. I described this operation and some of the mathematical basis of it in more detail in my previous post.

//Multiplication in GF
function [7:0] gf2mult;
    input reg [7:0] x;
    input reg [7:0] y;
    integer i;
    reg [7:0] b;
    begin
        gf2mult = 0;
        for(i = 0; i < 8; i++)
        begin
            if(y & 1'b1)
            begin
                gf2mult = gf2mult ^ x;
            end
            b = (x & 8'h80);
            x = (x << 1);
            if(b)
                x = x ^ 8'h1B;
            y = (y >> 1);
        end
    end
endfunction

Encryption Round Step 4. Add the Round Key

The ADD_KEY state re-uses the same block as the ADD_R0_KEY state.

Producing the Output Data from the State Matrix

The OUT_CONS state is responsible for reassembling the 4x4 state matrix into the 128-bit output register. This is performed by the final always block in the encryption process.

//Construct the output data from the state matrix
always @(posedge clock)
begin : outdata
    integer c;
    integer r;
    reg [7:0] b;
    if(consent_reg == OUT_CONS)
    begin
        for(c = 0; c < 4; c++)
        begin
            for(r = 0; r < 4; r++)
            begin
                data_out = data_out << 8;
                b = matrix[r][c];
                data_out |= b;
            end
        end
    end
end

Validity Bit

When the encryption process is complete, the consent state machine transitions into the OP_DONE state. If the consent register is in this state, the valid output bit is set to indicate that the output data is valid.

//Set valid signal if the operation is complete
assign valid = (consent_reg == OP_DONE);

Reset

The state machine is reset if the key is changed, input data is changed, or the module receives a reset signal. The always block that performs the reset sets the consent register to the initial state and restarts the round counter.

//Reset if the input parameters change
always @(key, data_in, reset)
begin : reset_op
    consent_reg = 0;
    round_counter = 0;
end

Creating a Test Bench

I created a test bench to drive the AES module. First, I added registers and wires for the module inputs and outputs.

module test_bench;
    //Clock signal
    reg clock = 0;

    //Input key
    reg [255:0] key;

    //Input data
    reg [127:0] data_in;

    //Enable signal
    reg enable;

    //Reset signal
    reg reset;

    //Output data
    wire [127:0] data_out;

    //Valid signal
    wire valid;

    //Valid register
    reg valid_reg;

Next I declared an instance of the AES module, making the appropriate connections.

    //DES block under test
    AES256 uut (
        .clock(clock),
        .enable(enable),
        .reset(reset),
        .key(key),
        .data_in(data_in),
        .data_out(data_out),
        .valid(valid)
    );

I set the key, set the input data, and enabled the module.

    initial
    begin : test
        valid_reg = 0;
        //Set the key
        key = 256'h97247d91d32fa1f6bece5da9bfe61c1a3b32edf26fd6ec2a6187ba777fc3c1d8;
        $display("Input key: \n %x \n", key);
        
        //Set the input data
        data_in = 128'he536638ecbcec0be6ce6a97e98da827b;
        $display("Input data: \n %x \n", data_in);
        
        //Set enabled
        enable = 1;

I triggered the clock in a loop for a maximum of 100 clock cycles. The loop exits early if the validity bit is set indicating that the encryption process has completed. A message is printed if the process times out. The resulting output data, number of clock cycles and the state of the validity bit are printed.

    $display("Starting encryption.");
    //Trigger the clock until the valid bit is set or it times out
    for(i = 0; (i < 100) && (valid_reg != 1); i++)
    begin
        #5 clock = 1;
        #5 clock = 0;
    end
    if(i == 100)
        $display("Timed out!");

    //Verify the valid bit is set
    $display("Clock cycles: %d", i);
    $display("Data out: %x", data_out);
    $display("Valid: %b", valid_reg);

Next I changed the key to exercise the key change reset functionality.

    //Change the key
    key = key + 1;
    $display("\nSetting a new key: \n %x \n", key);
    
    //Trigger the clock
    #5 clock = 1;
    #5 clock = 0;

    //Verify the validity bit is cleared
    $display("Valid after new key: %d", valid_reg);

I exercise the key change reset functionality by changing the key, triggering the clock and verifying that the validity bit is no longer set.

    //Change the key
    key = key + 1;
    $display("\nSetting a new key: \n %x \n", key);
    
    //Trigger the clock
    #5 clock = 1;
    #5 clock = 0;

    //Verify the validity bit is cleared
    $display("Valid after new key: %d", valid_reg);

After repeating the encryption/printing cycle described above, I test the reset signal functionality by setting the reset bit and verifying that the validity bit is no longer set.

    //Verify reset works
    $display("\nChecking reset.");
    reset = 1;
    #5 clock = 1;
    #5 clock = 0;
    reset = 0;
    #5 clock = 1;
    #5 clock = 0;
    $display("Valid after reset: %d", valid_reg);

As before, I repeated the encryption and output printing cycle. Lastly, I verified that a change in the input data results in a reset of the AES module before encrypting and printing the output a final time.

    //Verify reset on input data change
    $display("\nChecking reset on data change.");
    data_in = data_in + 1;
    $display("Setting new input data: %x", data_in);

    #5 clock = 1;
    #5 clock = 0;

    $display("Valid after data change: %d", valid_reg);

Executing the test bench results in the following output:

Input key: 
97247d91d32fa1f6bece5da9bfe61c1a3b32edf26fd6ec2a6187ba777fc3c1d8 

Input data: 
e536638ecbcec0be6ce6a97e98da827b 

Starting encryption.
Clock cycles:          63
Data out: 6034088a2dedde69013d073e8681d21c
Valid: 1

Setting a new key: 
97247d91d32fa1f6bece5da9bfe61c1a3b32edf26fd6ec2a6187ba777fc3c1d9 

Valid after new key: 0
Starting encryption.
Clock cycles:          62
Data out: 1009ddfd3e0fff0ee3d1f63944fc14f4
Valid: 1

Checking reset.
Valid after reset: 0
Starting encryption.
Clock cycles:          62
Data out: 1009ddfd3e0fff0ee3d1f63944fc14f4
Valid: 1

Checking reset on data change.
Setting new input data: e536638ecbcec0be6ce6a97e98da827c
Valid after data change: 0
Starting encryption.
Clock cycles:          62
Data out: 549d22264b0dabc2a9b8f15d27310a80
Valid: 1

Verification

To verify the output of the script, I used my Python implementation of AES covered in my last post (itself verified against Python's PyCrypto library.) Using the same initial key and input data, I repeated the same set of cryptographic operations.

#The input key
input_key = 0x97247d91d32fa1f6bece5da9bfe61c1a3b32edf26fd6ec2a6187ba777fc3c1d8

#One block of input data
input_data = 0xe536638ecbcec0be6ce6a97e98da827b

print("First key")
crypt(input_key, input_data)

print("Key 2")
input_key = input_key + 1
crypt(input_key, input_data)

print("Data 2")
input_data = input_data + 1
crypt(input_key, input_data)

The output of this script matches the output of my test bench.


First key
Input key:
0x97247d91d32fa1f6bece5da9bfe61c1a3b32edf26fd6ec2a6187ba777fc3c1d8

Input data:
0xe536638ecbcec0be6ce6a97e98da827b

Cipher text:
0x6034088a2dedde69013d073e8681d21c
Key 2
Input key:
0x97247d91d32fa1f6bece5da9bfe61c1a3b32edf26fd6ec2a6187ba777fc3c1d9

Input data:
0xe536638ecbcec0be6ce6a97e98da827b

Cipher text:
0x1009ddfd3e0fff0ee3d1f63944fc14f4
Data 2
Input key:
0x97247d91d32fa1f6bece5da9bfe61c1a3b32edf26fd6ec2a6187ba777fc3c1d9

Input data:
0xe536638ecbcec0be6ce6a97e98da827c

Cipher text:
0x549d22264b0dabc2a9b8f15d27310a80

Summary

While I'm certain there are optimizations/improvements and potentially a complete refactor that might improve my implementation, I'm still happy with the result. In future posts, I am going to continue refining this design and continue working towards a useful, software-accessible AES peripheral.

Get honeypotted? I like spam. Contact Us Contact Us Email Email ar.hp@outlook.com email: ar.hp@outlook.com