This is a continuation of my previous posts about AES. In this post, I cover my Verilog implementation of the AES-256 key schedule. My previous post goes more in-depth on the AES algorithm and some of the mathematical basis of the algorithm; this post is focused on the implementation aspect.
The first iteration of my AES module generates the round keys required for the AES-256 algorithm. In future iterations, as I add more functionality, I intend to make some optimizations and refinements to this implementation (just the implementation from a development perspective, I intend to strictly adhere to the cryptographic algorithm). For the time being, I generally stuck to mirroring the AES algorithm as it is described by the specification from an structural/architectural standpoint.
The module takes a clock signal, enable signal, and a key as inputs, clock
,enable
, and key
respectively. The module outputs a validity signal, valid
.
module AES256
(
input clock, //Clock signal
input enable, //Enable signal
input [255:0] key, //Input key
output wire valid //Valid signal
);
Next, I defined constants to represent the mode, either encryption or decryption. Following the mode constants, I defined bit positions that correspond to blocks in the AES process. For now, this includes KWORDS_VALID
and KEYS_VALID
to indicate when the key has been split into 32-bit words, and when the round key calculation is complete, respectively. I then defined a flag, OP_DONE
to indicate when each of the operations are finished.
//Enable constants
localparam DISABLE = 1'b0;
localparam ENABLE = 1'b1;
//Initial key words valid
localparam KWORDS_VALID = 0;
//Key scheduling valid bit
localparam KEYS_VALID = 1;
//Flag indicating all operations are valid
localparam OP_DONE = 2'b11;
Next, I added two multi-dimensional look-up tables for the AES S-box substitution and inverse substitution. I then added an array to hold the round constants. Registers to hold the 128-bit round keys and the 60 key words were then added.
//Substition boxes
reg [7:0] s_box [0:15][0:15];
reg [7:0] inv_s_box[0:15][0:15];
//Round constants
reg [7:0] r_con [0:9];
//Round key storage
reg [127:0] round_keys[0:15];
//Key word storage
reg [31:0] key_words[0:59];
The next register I added holds state information used to indicate when the steps of the algorithm are completed. When the value of this register is the same as the OP_VALID
flag, the output of the algorithm is valid.
//Validity register
reg [1:0] op_valid;
The initial
block of my module is a large list of initial values for the S-box and round constant arrays. I chose to include this in a separate file using the Verilog include
directive. Next, I added the S-box function. Per the AES algorithm, the first nibble of the byte indicates the row and the second nibble indicates the column into the lookup table.
//S-Box function
function [7:0] sbox;
input [7:0] b;
begin
sbox = s_box[b[7:4]][b[3:0]];
end
endfunction
The AES algorithm operates on 32-bit words. The next function I added carries out a byte-wise leftward rotation of a 32-bit word, an operation required by the key generation algorithm. The most significant byte is saved off into the right-most byte of the return value, and the remaining bytes are shifted 8 bits to the left.
//byte-wise left rotate of a word
function [31:0] l_rotate_word;
input reg [31:0] word;
begin
l_rotate_word[31:8] = word[24:0];
l_rotate_word[7:0] = word[31:24];
end
endfunction
The next function, the G
function, performs a byte-wise rotation and substitution on the input word. A round-specific constant from the round constants table is added to the upper-most output bit.
//AES G function
function [31:0] g;
input reg [31:0] word;
input reg [7:0] rc;
begin
word = l_rotate_word(word);
g[31:24] = sbox(word[31:24]) ^ rc;
g[23:16] = sbox(word[23:16]);
g[15:8] = sbox(word[15:8]);
g[7:0] = sbox(word[7:0]);
end
endfunction
Another function used in the key generation scheme, H
, performs a simple byte-by-byte substitution on the input word.
//AES H function
function [31:0] h;
input reg [31:0] word;
begin
h[31:24] = sbox(word[31:24]);
h[23:16] = sbox(word[23:16]);
h[15:8] = sbox(word[15:8]);
h[7:0] = sbox(word[7:0]);
end
endfunction
The first always
block in the module is responsible for initializing the first 8 words of the key word array with the user provided encryption key. This is a simple split of the 256-bit key into 32-bit words, starting with word0
on the left (MSB) to word7
on the right (LSB). This block will only process the key data when the module is enabled. When this block is finished, it updates the validity register to indicate that.
//Initial split of the key into words
always @(posedge clock)
begin: key_split_op
if(enable == ENABLE)
begin
///Split the input key into the first 8 words
key_words[0] = key[255:224];
key_words[1] = key[223:192];
key_words[2] = key[191:160];
key_words[3] = key[159:128];
key_words[4] = key[127:96];
key_words[5] = key[95:64];
key_words[6] = key[63:32];
key_words[7] = key[31:0];
//Flag the initial key words as valid
op_valid[KWORDS_VALID] = 1;
end
end
The next block carries out the bulk of the key generation algorithm. First I declared some integers to hold loop and index information.
//Round key calculation
always @(posedge clock)
begin: key_sched_op
integer x;
integer rconIdx;
integer keyIdx;
rconIdx = 0;
keyIdx = 0;
Next, the module checks if the key_split_op
block is done processing and the first 8 words of the key words array has been initialized with the input key.
if(op_valid[KWORDS_VALID])
The operation begins by iterating over the N-1
group of 8 key words, deriving the Nth
group of 8, and so on, until the 60 key words have been calculated. Every 8th word requires processing in the G
function. Each call to G
uses the corresponding round constant from the round constant table. An index to track the round constant lookup, rconIdx
is incremented each time. Every-other 4th word requires processing in the H
function.
for(x = 8; x < 60; x++)
begin
if(x % 8 == 0)
begin
//Every 8th word uses the G function
//Round constants initialized in the initial block
//with the frist ten members of GF(2^8)
key_words[x] = g(key_words[x-1], r_con[rconIdx]) ^ key_words[x-8];
rconIdx = rconIdx + 1;
end
else if(x % 4 == 0)
begin
//Every-other 4th word uses the H function
key_words[x] = h(key_words[x-1]) ^ key_words[x-8];
end
else
//Otherwise use a simple XOR
key_words[x] = key_words[x-1] ^ key_words[x-8];
end
Next, the 15 round keys are formed by concatenating groups of 4 words to create 128-bit blocks.
//Every 4 words form a subkey
for(x = 0; x < 61; x++)
begin
if(x != 0 && (x % 4) == 0)
begin
round_keys[keyIdx][127:96] = key_words[x-4];
round_keys[keyIdx][95:64] = key_words[x-3];
round_keys[keyIdx][63:32] = key_words[x-2];
round_keys[keyIdx][31:0] = key_words[x-1];
keyIdx = keyIdx + 1;
end
end
Finally, when the block is complete, the op_valid
register is updated.
//Flag the round keys as valid
op_valid[KEYS_VALID] = 1;
The valid
signal is assigned to the result of comparing the op_valid
register to the OP_DONE
flag. This signal will go high if all the blocks have completed processing.
//Set valid signal if the operation is complete
assign valid = (op_valid == OP_DONE);
Another always
block invalidates the chain if any of the input parameters change. In this case, only the key is monitored.
//Reset if the input parameters change
always @(key)
begin : reset
op_valid = 2'b0;
end
I added a temporary block to the module to print the round keys once they are calculated. This block monitors the op_done
register and, when valid, loops over the round key array and prints them.
//TODO: Remove this test block
reg testflag = 1;
always @(posedge clock)
begin : test
integer x;
x = 0;
if(op_valid == OP_DONE)
begin
if(testflag)
begin
$display("Round keys:");
for(x = 0; x < 15; x++)
begin
$display("%d - %x", x+1, round_keys[x]);
end
testflag = 0;
end
end
end
I created a test bench to drive the AES module. First, I declared the necessary input registers and output wires.
module test_bench;
//Clock signal
reg clock = 0;
//Input key
reg [255:0] key;
//Enable signal
reg enable;
//Valid signal
wire valid;
Next, I instantiate an instance of the AES block, hooking up the relevant ports.
//AES block under test
AES256 uut (
.clock(clock),
.enable(enable),
.key(key),
.valid(valid)
);
Lastly, I provide a key, enable the module, drive the clock several times, and display the result of the valid
bit.
//Loop variable
integer i;
initial
begin : test
//Set the key
key = 256'h97247d91d32fa1f6bece5da9bfe61c1a3b32edf26fd6ec2a6187ba777fc3c1d8;
$display("Input key: \n %x \n", key);
//Set enabled
enable = 1;
//Trigger the clock five times
for(i = 0; i < 5; i++)
begin
#5 clock = 1;
#5 clock = 0;
end
//Verify the valid bit is set
#5 $display("Valid: %b", valid);
Running the test bench in simulation results in the following output.
Input key:
97247d91d32fa1f6bece5da9bfe61c1a3b32edf26fd6ec2a6187ba777fc3c1d8
Round keys:
1 - 97247d91d32fa1f6bece5da9bfe61c1a
2 - 3b32edf26fd6ec2a6187ba777fc3c1d8
3 - b85c1c436b73bdb5d5bde01c6a5bfc06
4 - 390b5d9d56ddb1b7375a0bc04899ca18
5 - 5428b1113f5b0ca4eae6ecb880bd10be
6 - f4719733a2ac268495f62d44dd6fe75c
7 - f8bcfbd0c7e7f7742d011bccadbc0b72
8 - 6114bc73c3b89af7564eb7b38b2150ef
9 - 0def24edca08d399e709c8554ab5c327
10 - b7c192bf747908482237bffba916ef14
11 - 5a30de3e90380da77731c5f23d8406d5
12 - 909efdbce4e7f5f4c6d04a0f6fc6a51b
13 - ce3671965e0e7c31293fb9c314bbbf16
14 - 6a74f5fb8e93000f48434a002785ef1b
15 - 19e9de5a47e7a26b6ed81ba87a63a4be
Valid: 1
This output matches the expected output from my Python script that I covered in my previous post.
I plan to continue implementing the AES module and eventually implementing wrappers around it for some common block cipher modes. In addition, I plan to continually refine/optimize improve my implementation. Eventually I plan to implement a full-featured AES processor peripheral. I will continue to cover my progress with future posts.