This post is the first in a series about implementing the DES encryption algorithm in CDL. Despite it's obsolesence, the DES algorithm is relatively straightforward to implement in software and I think it will be a good way to dig into learning Verilog. This post covers the first part of the algorithm, generating the round keys from the user provided key. I used this excellent page as a reference for the DES algorithm and for checking my work.

Disclaimer: I am a complete Verilog novice. I'm certain there are bad design/bad practice/bugs in this implementation. Eventually I intend for someone knowledgeable to review and critique my work in an effort to improve.

The Key Schedule in Python

Since I still "think" in software, I decided to implement the DES algorithm in Python. My thought process here is that the exercise of translating the Python operations to Verilog will give me some intuition into the CDL paradigm while also learning syntax/workflow.

DES requires 16 round keys for each round of operations on a given data block. The round keys are derived from a 64-bit user-provided key. The first step is to select 56 bits (excluding every 8th) and permutate them per a table called `PC-1`. I represented `PC-1` as a list and created a routine to construct the 56-bit key, looking up the bit positions into the user provided key (all keys are represented as arrays of characters for ease of printing, this can be changed later on when needed).

``````pc1_table = [
57, 49, 41, 33, 25, 17, 9,
1, 58, 50, 42, 34, 26, 18,
10, 2, 59, 51, 43, 35, 27,
19, 11, 3, 60, 52, 44, 36,
63, 55, 47, 39, 31, 23, 15,
7, 62, 54, 46, 38, 30, 22,
14, 6, 61, 53, 45, 37, 29,
21, 13, 5, 28, 20, 12, 4
]

def pc1(key_arr):
ret = [None] * 56
for x in range(56):
ret[x] = key_arr[pc1_table[x] - 1]
return ret
``````

Calling this routine with the user provided key provides the following 56-bit key:

``````11110000 11001100 10101010 11110101 01010110 01100111 10001111

``````

This key is split into two halves of 28 bits each, `C0` and `D0`. Sixteen D/C pairs are generated from the initial `C0` and `D0`. At each step, key `N` is dervied from key `N-1` by left-rotation per a schedule. I created a left-rotation routine and iterated over the shift table, creating the `C` and `D` key pairs.

``````
shift_table = [
1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1
]

def left_rotate(bit_arr, shift, len):
i_arr = bit_arr.copy()
i_arr.extend(i_arr)
ret = i_arr[shift:shift+len]
return ret

#C0 and D0 28-bit halves of the key
c[0] = key_arr[0:28]
d[0] = key_arr[28:]

#Calculate c1-16 and d1-16
for x in range(1,17):
c[x] = left_rotate(c[x-1], shift_table[x-1], 28)
d[x] = left_rotate(d[x-1], shift_table[x-1], 28)
print("C[{}] = {}".format(x, "".join(c[x])))
print("D[{}] = {}".format(x, "".join(d[x])))
``````

This operation results in the following key pairs:

``````C[1] = 1110000110011001010101011111
D[1] = 1010101011001100111100011110
C[2] = 1100001100110010101010111111
D[2] = 0101010110011001111000111101
C[3] = 0000110011001010101011111111
D[3] = 0101011001100111100011110101
C[4] = 0011001100101010101111111100
D[4] = 0101100110011110001111010101
C[5] = 1100110010101010111111110000
D[5] = 0110011001111000111101010101
C[6] = 0011001010101011111111000011
D[6] = 1001100111100011110101010101
C[7] = 1100101010101111111100001100
D[7] = 0110011110001111010101010110
C[8] = 0010101010111111110000110011
D[8] = 1001111000111101010101011001
C[9] = 0101010101111111100001100110
D[9] = 0011110001111010101010110011
C[10] = 0101010111111110000110011001
D[10] = 1111000111101010101011001100
C[11] = 0101011111111000011001100101
D[11] = 1100011110101010101100110011
C[12] = 0101111111100001100110010101
D[12] = 0001111010101010110011001111
C[13] = 0111111110000110011001010101
D[13] = 0111101010101011001100111100
C[14] = 1111111000011001100101010101
D[14] = 1110101010101100110011110001
C[15] = 1111100001100110010101010111
D[15] = 1010101010110011001111000111
C[16] = 1111000011001100101010101111
D[16] = 0101010101100110011110001111

``````

The `C` and `D` parts are concatenated and permutated using a second permutation table, `PC-2`.

``````#Concatenate the pairs, then perform PC-2 to obtain round keys
for x in range(1,17):
cat = []
cat.extend(c[x])
cat.extend(d[x])
round_keys[x-1] = pc2(cat)
``````

This results in the set of sixteen round keys:

``````K01 = 000110110000001011101111111111000111000001110010
K02 = 011110011010111011011001110110111100100111100101
K03 = 010101011111110010001010010000101100111110011001
K04 = 011100101010110111010110110110110011010100011101
K05 = 011111001110110000000111111010110101001110101000
K06 = 011000111010010100111110010100000111101100101111
K07 = 111011001000010010110111111101100001100010111100
K08 = 111101111000101000111010110000010011101111111011
K09 = 111000001101101111101011111011011110011110000001
K10 = 101100011111001101000111101110100100011001001111
K11 = 001000010101111111010011110111101101001110000110
K12 = 011101010111000111110101100101000110011111101001
K13 = 100101111100010111010001111110101011101001000001
K14 = 010111110100001110110111111100101110011100111010
K15 = 101111111001000110001101001111010011111100001010
K16 = 110010110011110110001011000011100001011111110101

``````

Implementation in CDL

Moving from Python to Verilog, I followed mostly the same organization of operations. I created a module that I wile eventually expand to contain the entire DES algorithm. For the time being, I only added two inputs, a 64-bit key, `key` and a clock signal, `clock`. Eventually I intend to have inputs for the user provided key, a clock signal, an enable signal, 64 bits of plaintext input, and 64 bits of ciphertext output. To make this implementation somewhat useful, I am going to wrap three DES cores to create a 3DES block and connect it up as an AXI peripheral with software-accessible registers.

``````module DES(
input clock,
input [0:63] key
);
``````

I added 56-bit register to hold the key output from the `PC-1` permutation. I created a table to look up the `PC-1` bit positions similarly to my python implementation. I initialized this table in the `initial` block of the module.

``````reg  [0:55] pc1_key;

//Initialize PC1 table
pc1_table[0] = 8'D56;
pc1_table[1] = 8'D48;
pc1_table[2] = 8'D40;
//...
pc1_table[53] = 8'D19;
pc1_table[54] = 8'D11;
pc1_table[55] = 8'D3;
``````

I created an `always` block to assign the `pc1_key` register to the output of a permutation function I added for the `PC-1` permutation.

``````//PC1 permutation function
function [0:55] pc1;
input wire [0:63] key;
integer i;
begin
for(i = 0; i < 56; i=i+1)
begin
pc1[i] = key[pc1_table[i]];
end
end
endfunction

always @(posedge clock)
begin
pc1_key = pc1(key);
end

``````

I created `c0` and `d0` from the pc1_key. I created an array to hold the `c0-16` and `d0-16` components for each iteration of rotations to produce the sixteen key pairs for the round keys.

``````//This array stores the c0-c16 keys
reg [0:27] c [0:17];

//This array stores the d0-d16 keys
reg [0:27] d [0:17];

``````

I represented the shift table as an array as well, initialized in the `initial` section.

``````//Shift count table
reg [1:0] shift_table[0:15];
//...
shift_table[0] = 2'D1;
shift_table[1] = 2'D1;
//...
shift_table[14] = 2'D2;
shift_table[15] = 2'D1;
``````

DES uses left-rotations of 1 or 2 bits. I created a function to perform these rotations.

``````//Left rotate function
function [0:28] left_rotate;
input wire [0:27] bits;
input reg[1:0] shift;
begin
if ( shift == 1)
left_rotate = {bits[1:27], bits[0]};
else
left_rotate = {bits[2:27], bits[0:1]};
end
endfunction
``````

Each `c`/`d` pair is concatenated and run through a second permutation, `PC-2`, that selects 48 bits of the round key pairs and permutates them to produce the 16 round keys. As with `PC-1`, I represented `PC-2` as a permutation table, initialized in the `initial` block, and created a function to perform the permutation.

``````//PC2 Permutation table
reg [7:0] pc2_table[0:47];

//Initialize the PC2 table
pc2_table[0] = 8'D14;
pc2_table[1] = 8'D17;
//...
pc2_table[46] = 8'D29;
pc2_table[47] = 8'D32;

//PC2 permutation function
function [0:47] pc2;
input wire [0:56] key;
integer i;
begin
for(i = 0; i < 48; i=i+1)
begin
pc2[i] = key[pc2_table[i]];
end
end
endfunction
``````

I created an always block to carry out these operations. First, obtaining `c0` and `d0` from `PC-1`, iterating over the shift schedule, creating the rest of the `C`/`D` pairs, concatenating them, and, lastly, calling the `PC-2` permutation function to produce the sixteen round keys. I called the `\$display` system function to print the round keys in simulation.

``````//Permutate the input key into the 56-bit key with PC1
always @(posedge clock)
begin
pc1_key = pc1(key);
end

always @(posedge clock)
begin : key_calc
integer i;
c[0] = pc1_key[0:27];
d[0] = pc1_key[28:55];
for(i = 1; i < 17; i=i+1)
begin
c[i] = left_rotate(c[i-1], shift_table[i-1]);
d[i] = left_rotate(d[i-1], shift_table[i-1]);
round_keys[i-1] = pc2({c[i],d[i]});
\$display("K%02d = %06b %06b %06b %06b %06b %06b %06b %06b",
i,
round_keys[i-1][0:5],
round_keys[i-1][6:11],
round_keys[i-1][12:17],
round_keys[i-1][18:23],
round_keys[i-1][24:29],
round_keys[i-1][30:35],
round_keys[i-1][36:41],
round_keys[i-1][42:47]
);
end

end
``````

Creating a Test Bench and Simulation

To exercise my IP block, I created a test bench file as a simulation source in Vivado. First, I created registers for the input key and a clock signal. I instantiated a DES block and connected the `key` input to the `key` register and the `clock` input to the `clock` register. In the `initial` block, I provided the same 64-bit key I used as an input to my python implementation. Fifty nanoseconds later, I set the `clock` signal high to trigger the DES core. The `\$display` calls in the `always` block above let me debug the output.

```````timescale 1ns / 1ps

module test_bench;

reg [0:63] key = 0;
reg clock = 0;

//UUT
DES uut (
.key(key),
.clock(clock)
);

initial
begin
key =  64'B00010011_00110100_01010111_01111001_10011011_10111100_11011111_11110001;
#5 clock = 1;
#5 \$finish;
end

endmodule
``````

Simulating the model resulted in the expected 16 round keys.

``````Vivado Simulator 2020.1
Time resolution is 1 ps
K01 = 000110 110000 001011 101111 111111 000111 000001 110010
K02 = 011110 011010 111011 011001 110110 111100 100111 100101
K03 = 010101 011111 110010 001010 010000 101100 111110 011001
K04 = 011100 101010 110111 010110 110110 110011 010100 011101
K05 = 011111 001110 110000 000111 111010 110101 001110 101000
K06 = 011000 111010 010100 111110 010100 000111 101100 101111
K07 = 111011 001000 010010 110111 111101 100001 100010 111100
K08 = 111101 111000 101000 111010 110000 010011 101111 111011
K09 = 111000 001101 101111 101011 111011 011110 011110 000001
K10 = 101100 011111 001101 000111 101110 100100 011001 001111
K11 = 001000 010101 111111 010011 110111 101101 001110 000110
K12 = 011101 010111 000111 110101 100101 000110 011111 101001
K13 = 100101 111100 010111 010001 111110 101011 101001 000001
K14 = 010111 110100 001110 110111 111100 101110 011100 111010
K15 = 101111 111001 000110 001101 001111 010011 111100 001010
K16 = 110010 110011 110110 001011 000011 100001 011111 110101

``````

Summary

While I haven't had anyone review this design yet, I am satisfied with producing the expected output. I look forward to continuing with the rest of the DES algorithm and refining my implementation as my Verilog and CDL engineering skills improve.