This is a continuation of my previous posts about AES. In this post, I cover my Verilog implementation of the AES-256 key schedule. My previous post goes more in-depth on the AES algorithm and some of the mathematical basis of the algorithm; this post is focused on the implementation aspect.

AES 256 Key Schedule in Verilog

The first iteration of my AES module generates the round keys required for the AES-256 algorithm. In future iterations, as I add more functionality, I intend to make some optimizations and refinements to this implementation (just the implementation from a development perspective, I intend to strictly adhere to the cryptographic algorithm). For the time being, I generally stuck to mirroring the AES algorithm as it is described by the specification from an structural/architectural standpoint.

The module takes a clock signal, enable signal, and a key as inputs, clock,enable, and key respectively. The module outputs a validity signal, valid.

module AES256
    (
        input clock,        //Clock signal
        input enable,       //Enable signal
        input [255:0] key,  //Input key
        output wire valid   //Valid signal
    );

Next, I defined constants to represent the mode, either encryption or decryption. Following the mode constants, I defined bit positions that correspond to blocks in the AES process. For now, this includes KWORDS_VALID and KEYS_VALID to indicate when the key has been split into 32-bit words, and when the round key calculation is complete, respectively. I then defined a flag, OP_DONE to indicate when each of the operations are finished.

//Enable constants
localparam DISABLE = 1'b0;
localparam ENABLE = 1'b1;

//Initial key words valid
localparam KWORDS_VALID = 0;

//Key scheduling valid bit
localparam KEYS_VALID = 1;

//Flag indicating all operations are valid
localparam OP_DONE = 2'b11;

Next, I added two multi-dimensional look-up tables for the AES S-box substitution and inverse substitution. I then added an array to hold the round constants. Registers to hold the 128-bit round keys and the 60 key words were then added.

//Substition boxes
reg [7:0] s_box [0:15][0:15];
reg [7:0] inv_s_box[0:15][0:15];

//Round constants
reg [7:0] r_con [0:9];

//Round key storage
reg [127:0] round_keys[0:15];

//Key word storage
reg [31:0] key_words[0:59];

The next register I added holds state information used to indicate when the steps of the algorithm are completed. When the value of this register is the same as the OP_VALID flag, the output of the algorithm is valid.

//Validity register
reg [1:0] op_valid;

The initial block of my module is a large list of initial values for the S-box and round constant arrays. I chose to include this in a separate file using the Verilog include directive. Next, I added the S-box function. Per the AES algorithm, the first nibble of the byte indicates the row and the second nibble indicates the column into the lookup table.

//S-Box function
function [7:0] sbox;
    input [7:0] b;
    begin
        sbox = s_box[b[7:4]][b[3:0]];
    end
endfunction

The AES algorithm operates on 32-bit words. The next function I added carries out a byte-wise leftward rotation of a 32-bit word, an operation required by the key generation algorithm. The most significant byte is saved off into the right-most byte of the return value, and the remaining bytes are shifted 8 bits to the left.

//byte-wise left rotate of a word
function [31:0] l_rotate_word;
    input reg [31:0] word;
    begin
        l_rotate_word[31:8] = word[24:0];
        l_rotate_word[7:0] = word[31:24]; 
    end
endfunction

The next function, the G function, performs a byte-wise rotation and substitution on the input word. A round-specific constant from the round constants table is added to the upper-most output bit.

//AES G function
function [31:0] g;
    input reg [31:0] word;
    input reg [7:0] rc;
    begin
        word = l_rotate_word(word);
        g[31:24] = sbox(word[31:24]) ^ rc;
        g[23:16] = sbox(word[23:16]);
        g[15:8]  = sbox(word[15:8]);
        g[7:0]   = sbox(word[7:0]);
    end
endfunction

Another function used in the key generation scheme, H, performs a simple byte-by-byte substitution on the input word.

//AES H function
function [31:0] h;
    input reg [31:0] word;
    begin
        h[31:24] = sbox(word[31:24]);
        h[23:16] = sbox(word[23:16]);
        h[15:8]  = sbox(word[15:8]);
        h[7:0]   = sbox(word[7:0]);
    end
endfunction

The first always block in the module is responsible for initializing the first 8 words of the key word array with the user provided encryption key. This is a simple split of the 256-bit key into 32-bit words, starting with word0 on the left (MSB) to word7 on the right (LSB). This block will only process the key data when the module is enabled. When this block is finished, it updates the validity register to indicate that.

//Initial split of the key into words
always @(posedge clock)
begin: key_split_op
    if(enable == ENABLE)
    begin
        ///Split the input key into the first 8 words
        key_words[0] = key[255:224];
        key_words[1] = key[223:192];
        key_words[2] = key[191:160];
        key_words[3] = key[159:128];
        key_words[4] = key[127:96];
        key_words[5] = key[95:64];
        key_words[6] = key[63:32];
        key_words[7] = key[31:0];
        //Flag the initial key words as valid
        op_valid[KWORDS_VALID] = 1;
    end
end

The next block carries out the bulk of the key generation algorithm. First I declared some integers to hold loop and index information.

//Round key calculation
always @(posedge clock)
begin: key_sched_op
    integer x;
    integer rconIdx;
    integer keyIdx;
    rconIdx = 0;
    keyIdx = 0;

Next, the module checks if the key_split_op block is done processing and the first 8 words of the key words array has been initialized with the input key.

if(op_valid[KWORDS_VALID])

The operation begins by iterating over the N-1 group of 8 key words, deriving the Nth group of 8, and so on, until the 60 key words have been calculated. Every 8th word requires processing in the G function. Each call to G uses the corresponding round constant from the round constant table. An index to track the round constant lookup, rconIdx is incremented each time. Every-other 4th word requires processing in the H function.

        for(x = 8; x < 60; x++)
        begin
            if(x % 8 == 0)
            begin
                //Every 8th word uses the G function
                //Round constants initialized in the initial block
                //with the frist ten members of GF(2^8)
                key_words[x] = g(key_words[x-1], r_con[rconIdx]) ^ key_words[x-8];
                rconIdx = rconIdx + 1;
            end
            else if(x % 4 == 0)
            begin
                //Every-other 4th word uses the H function
                key_words[x] = h(key_words[x-1]) ^ key_words[x-8];
            end
            else
                //Otherwise use a simple XOR
                key_words[x] = key_words[x-1] ^ key_words[x-8];
        end

Next, the 15 round keys are formed by concatenating groups of 4 words to create 128-bit blocks.

    //Every 4 words form a subkey
    for(x = 0; x < 61; x++)
    begin
        if(x != 0 && (x % 4) == 0)
        begin
            round_keys[keyIdx][127:96] = key_words[x-4];
            round_keys[keyIdx][95:64]  = key_words[x-3];
            round_keys[keyIdx][63:32]  = key_words[x-2];
            round_keys[keyIdx][31:0]   = key_words[x-1];
            keyIdx = keyIdx + 1;
        end
    end

Finally, when the block is complete, the op_valid register is updated.

    //Flag the round keys as valid
    op_valid[KEYS_VALID] = 1;

The valid signal is assigned to the result of comparing the op_valid register to the OP_DONE flag. This signal will go high if all the blocks have completed processing.

//Set valid signal if the operation is complete
assign valid = (op_valid == OP_DONE);

Another always block invalidates the chain if any of the input parameters change. In this case, only the key is monitored.

//Reset if the input parameters change
always @(key)
begin : reset
    op_valid = 2'b0;
end

Testing and Simulation

I added a temporary block to the module to print the round keys once they are calculated. This block monitors the op_done register and, when valid, loops over the round key array and prints them.

//TODO: Remove this test block
reg testflag = 1;
always @(posedge clock)
begin : test
    integer x;
    x = 0;
    if(op_valid == OP_DONE)
    begin
        if(testflag)
        begin
        $display("Round keys:");
            for(x = 0; x < 15; x++)
            begin
                $display("%d - %x", x+1, round_keys[x]);
            end
        testflag = 0;
        end
    end
end

I created a test bench to drive the AES module. First, I declared the necessary input registers and output wires.

module test_bench;
    
    //Clock signal
    reg clock = 0;

    //Input key
    reg [255:0] key;

    //Enable signal
    reg enable;

    //Valid signal
    wire valid;

Next, I instantiate an instance of the AES block, hooking up the relevant ports.

//AES block under test
AES256 uut (
    .clock(clock),
    .enable(enable),
    .key(key),
    .valid(valid)
);

Lastly, I provide a key, enable the module, drive the clock several times, and display the result of the valid bit.

    //Loop variable
    integer i;
    
    initial
    begin : test
        
        //Set the key
        key = 256'h97247d91d32fa1f6bece5da9bfe61c1a3b32edf26fd6ec2a6187ba777fc3c1d8;
        $display("Input key: \n %x \n", key);
        //Set enabled
        enable = 1;

        //Trigger the clock five times
        for(i = 0; i < 5; i++)
        begin
            #5 clock = 1;
            #5 clock = 0;
        end

        //Verify the valid bit is set
         #5 $display("Valid: %b", valid);

Running the test bench in simulation results in the following output.

Input key: 
97247d91d32fa1f6bece5da9bfe61c1a3b32edf26fd6ec2a6187ba777fc3c1d8 

Round keys:
        1 - 97247d91d32fa1f6bece5da9bfe61c1a
        2 - 3b32edf26fd6ec2a6187ba777fc3c1d8
        3 - b85c1c436b73bdb5d5bde01c6a5bfc06
        4 - 390b5d9d56ddb1b7375a0bc04899ca18
        5 - 5428b1113f5b0ca4eae6ecb880bd10be
        6 - f4719733a2ac268495f62d44dd6fe75c
        7 - f8bcfbd0c7e7f7742d011bccadbc0b72
        8 - 6114bc73c3b89af7564eb7b38b2150ef
        9 - 0def24edca08d399e709c8554ab5c327
        10 - b7c192bf747908482237bffba916ef14
        11 - 5a30de3e90380da77731c5f23d8406d5
        12 - 909efdbce4e7f5f4c6d04a0f6fc6a51b
        13 - ce3671965e0e7c31293fb9c314bbbf16
        14 - 6a74f5fb8e93000f48434a002785ef1b
        15 - 19e9de5a47e7a26b6ed81ba87a63a4be
Valid: 1

This output matches the expected output from my Python script that I covered in my previous post.

Summary

I plan to continue implementing the AES module and eventually implementing wrappers around it for some common block cipher modes. In addition, I plan to continually refine/optimize improve my implementation. Eventually I plan to implement a full-featured AES processor peripheral. I will continue to cover my progress with future posts.

Get honeypotted? I like spam. Contact Us Contact Us Email Email ar.hp@outlook.com email: ar.hp@outlook.com