In this post, I cover my first attempts at working with the AXI-4 protocol to move data to and from different parts of the system; specifically, between the DDR memory and the programmable logic on my Zynq7000 development board. This topic is a neccessary, missing piece of my cryptography implementations. (Not to mention that I've become aware of numerous flaws/inefficiencies in my previous cryptography blocks that I will have to revisit in future posts.) I began with Xilinx's AXI-4 Master example code and modified it to provide a memory-like interface to my IP block. I acknowledge the fact that this implementation (4 byte, single-burst-only) design underutilizes/undersupports the AXI protocol. However, this was a good starting point for me to get more experience with the AXI protocol and Xilinx's templates/verification IP.

The AXI-4 Protocol

I'm not going to cover the AXI protocol in detail because I'm far from an expert. I've utilized several web resources (references below) and Xilinx's auto-generated AXI-4 master implementation to create my memory-like interface. Briefly though, the AXI-4 protocol consists of an address channel and a data channel for both read and write operations. Two devices, slave and master, perform a handshake on these channels uing ready and valid signals. An address is provided across the address channel, and the data is transfered in bursts of words. The AXI-4 protocol supports a range of burst sizes from 1 to 256 bytes.

Memory-like Interface for AXI-4

My design simplifies the AXI-4 transfer process into a single 32-bit word read or write. One side of the block is a full AXI-4 master interface. The other side is a block-ram-like interface. The memory interface consists of an address register (AIF_ADDR), data output register (AIF_RDATA), data input register (AIF_WDATA), write enable bit (AIF_WEN) and done/error signals (AIF_DONE, AIF_ERR). To read a word, a client sets the write enable bit low, the desired address into the address register, and enables the module. The AXI-4 master initiates a transfer from the slave and populates the read data register and asserts the AIF_DONE signal. To write a word, a client sets the write enable bit high, populates the write value, writes the target address to the address register, and enables the module. The AXI-4 master initiates the transfer to the slave and asserts the AIF_DONE signal when the write is complete.

Xilinx AXI-4 Boilerplate-based Implementation

I created my design using the Xilinx "AXI-4 Full Master" peripheral template. To begin, I created a new project and used the "create and package new IP" menu option to create a new AXI-4 peripheral.

First, I modified the configurable parameter for the AXI burst length, C_M00_AXI_BURST_LEN. My design will only support single-burst transfers, so I restricted this field to a value of 1 and hid it from the Vivado GUI customization menu.

Modifications to the Xilinx Template

First, in the top-level wrapper, I changed the AXI burst size parameter default to single burst.

    // Parameters of Axi Master Bus Interface M00_AXI
    parameter integer C_M00_AXI_BURST_LEN   = 1,

Next, I added the ports as described above.

    input [31:0] AIF_ADDR,
    //Data out (read), set to value of M_AXI_RDATA
    output wire [31:0] AIF_RDATA,
    //Data in (write)
    input [31:0] AIF_WDATA,
    //Write-enable (R or W mode)
    input AIF_WEN,
    //Indicates operation is complete, replaces TXN_DONE
    output wire AIF_DONE,
    //Indicates an error occured, replaces ERROR
    output wire [31:0] AIF_ERR,
    //Enable transactions, replaces INIT_AXI_TXN
    input AIF_EN,

I removed three signals that the Xilinx template includes that aren't needed, m00_axi_init_axi_txn, m00_axi_txn_done and m00_axi_error. I then added these ports to the instance of the AXI module.

axi_iface_v1_0_M00_AXI_inst (
        .AIF_ADDR(AIF_ADDR),
        .AIF_RDATA(AIF_RDATA),
        .AIF_WDATA(AIF_WDATA),
        .AIF_WEN(AIF_WEN),
        .AIF_DONE(AIF_DONE),
        .AIF_ERR(AIF_ERR),
        .AIF_EN(AIF_EN),

In the underlying module, I made the same change to the burst length parameter, added the new ports, and removed the same unused signals. I changed the C_NO_BURSTS_REQ local parameter to reflect the single-only burst length.

    // Number of bursts is always 1
     localparam integer C_NO_BURSTS_REQ = 1;

The default template state machine, when enabled, performs a sequential write operation, reads the values back, and compares the read values to the expected values. I modified the state machine to perform a read or write (depending on the write enable signal) and then wait for the operation to complete before indicating either "done" or "error". I created a new state for this, CHECK_DONE.

    parameter [1:0] IDLE = 2'b00, // This state initiates the transaction
        // after the state machine changes state to INIT_WRITE 
        // when there is 0 to 1 transition on INIT_AXI_TXN
        INIT_WRITE   = 2'b01, // This state initializes write transaction,
        // once writes are done, the state machine 
        // changes state to CHECK_DONE 
        INIT_READ = 2'b10, // This state initializes read transaction
        // once reads are done, the state machine 
        // changes state to CHECK_DONE
        CHECK_DONE = 2'b11; // This state issues the status of the operation, replaces INIT_COMPARE

I replaced the single-bit error register with a 32-bit register, error_reg, to support error codes. I removed two registers, compare_done and read_mismatch from the template code. I modified the template's write address wire, M_AXI_AWADDR, and read address wire, M_AXI_ARADDR, to use the AIF_ADDR input signal.

    //The AXI address is a concatenation of the target base address + active offset range
    assign M_AXI_AWADDR = C_M_TARGET_SLAVE_BASE_ADDR + AIF_ADDR;
    ...
    assign M_AXI_ARADDR = C_M_TARGET_SLAVE_BASE_ADDR + AIF_ADDR;

I assigned the AIF_DONE signal to either the writes_done or reads_done register, depending on the mode.

assign AIF_DONE =  (AIF_WEN == 1) ? writes_done : reads_done;

The template includes an always block to initiate the AXI transaction. I modified it to utilize my AIF_EN signal.

    //Generate a pulse to initiate AXI transaction.
    always @(posedge M_AXI_ACLK)                                              
      begin                                                                        
        // Initiates AXI transaction delay    
        if (M_AXI_ARESETN == 0 )                                                   
          begin                                                                    
            init_txn_ff <= 1'b0;                                                   
            init_txn_ff2 <= 1'b0;                                                   
          end                                                                               
        else                                                                       
          begin  
            init_txn_ff <= AIF_EN;
            init_txn_ff2 <= init_txn_ff;                                                                 
          end                                                                      
      end     

I removed unused always blocks that were used to generate sequential write and read addresseses that the example logic required. I also removed an always block that was used to compare the read-back data to the expected values. Another always block was used to assign and increment the AXI write data for the example logic. I modified this block to assign the externally provided write data, AIF_WDATA.

    /* Set the write data to the input write data */      
    always @(posedge M_AXI_ACLK)                                                      
    begin                                                                             
        axi_wdata = AIF_WDATA;
    end   

I modified an error-checking always block to populate my error register in the event of a failed read or write, based on the AXI read/write response signals.

      always @(posedge M_AXI_ACLK)                                 
      begin                                                              
        if (M_AXI_ARESETN == 0 || init_txn_pulse == 1'b1)                                          
          begin                                                          
            error_reg <= 1'b0;                                           
          end                                                            
        else if (write_resp_error || read_resp_error)   
          begin             
                if(write_resp_error)                                             
                    error_reg <= 8'b1;
                else
                    error_reg <= 8'b10;
          end                                                            
        else                                                             
          error_reg <= error_reg;                                        
      end  

Next, I modified the example state machine to perform read/write operations as described above. I added logic to clear the AIF_ERR register on reset to the reset section of the block. I modified the IDLE state to transition to either the INIT_READ or INIT_WRITE state, depending on the selected mode.

      always @ ( posedge M_AXI_ACLK)                                                                            
      begin                                                                                                     
        if (M_AXI_ARESETN == 1'b0 )                                                                             
          begin                                                                                                 
            // reset condition                                                                                  
            // All the signals are assigned default values under reset condition                                
            mst_exec_state      <= IDLE;                                                                
            start_single_burst_write <= 1'b0;                                                                   
            start_single_burst_read  <= 1'b0;                                                                                                                                        
            AIF_ERR <= 1'b0;   
          end                                                                                                   
        else                                                                                                    
          begin                                                                                                 
                                                                                                                
            // state transition                                                                                 
            case (mst_exec_state)                                                                               
                                                                                                                
             IDLE:                                                                                     
                // This state is responsible to wait for user defined C_M_START_COUNT                           
                // number of clock cycles.                                                                      
                if ( init_txn_pulse == 1'b1)                                                      
                  begin
                    if(AIF_WEN)                                                                            
                        mst_exec_state  <= INIT_WRITE;
                    else
                        mst_exec_state <= INIT_READ;                                                              
                    AIF_ERR <= 1'b0;
                  end                                                                                           
                else                                                                                            
                  begin                                                                                         
                    mst_exec_state  <= IDLE;                                                            
                  end 

The INIT_WRITE and INIT_READ states remained largely unchanged other than modifying the transition to enter my new CHECK_DONE state. In the CHECK_DONE state, the error output register, AIF_ERR is set to the value of the error_reg register and the state mahcine transitions back to IDLE. The actual read/write logic is still performed by the Xilinx boilerplate code, as is the error checking. Lastly, I added an always block to assign the AXI read data, M_AXI_RDATA to the output register, AIF_RDATA, when the read data is valid as indicated by the rvalid and rready signals.

    //This block is responsible for latching the read data into the RDATA register
    always @(posedge M_AXI_ACLK)
    begin
        if(M_AXI_RVALID && axi_rready) begin
            AIF_RDATA <= M_AXI_RDATA;
        end
    end  

Testing the Interface on the Zynq7000

To test this module, I began by creating a block design for the Zynq7000. I utilized Xilinx's Virtual Input Output (VIO) IP core. This IP allows for reading and writing to signals via JTAG in the Vivado hardware manager. I created a VIO block and added probes for each of the AIF_* signals on the AXI interface.

I set the target slave base address to 0. The AXI interface module's AIF_ADDR field is an offset from the target slave base address. Setting it to 0 allowed me to write absolute addresses, but I could have also set this to the base address of the actual target, in this case, the external DDR.

As seen in the block diagram below, the Zynq7000 has four AXI high performance slave ports that provide a path to the DDR memory controller. I enabled the HP0 port and connected it to the AXI interface master port.

The image below is the final block design for the system. The VIO probes are connected to each of the ports on the AXI interface block.

Software Setup

I created a minimal sofware application to bring the Zynq processing system up and interact with the DDR memory. The Technical Reference Manual (TRM) for the Zynq contains an address map indicating that the AXI HP slave ports, as well as the processing system, can access the DDR from address 0x001000000.

Additionally, to enable 32-bit mode on the HP port, the TRM shows that the AFI_RDCHAIN_CTRL register, bit 0, must be set.

The software application was created in the Vitis IDE using the hello world C template. It starts by creating a pointer to the DDR address range mentioned above and setting the requisite bit in the AFI_RDCHAIN_CTRL register.

    //Address is RAM from PS and PL perspective
    uint32_t * dataPtr = (uint32_t*)0x00100000;

    //Set HP0 to 32-bit mode
    uint32_t * afiRdChain = (uint32_t*)0xF8008000;
    *afiRdChain = (*afiRdChain) | 1;

First, the application writes some arbitrary sequential values to the DDR. After the writes, the cache memory is flushed. If this step isn't performed, the processor memory reads will come from cache, unaware of values written to the DDR by the PL and vice-versa. Xilinx provides an AXI port that supports cache coherency in a much more efficient manner, something I plan to look into more in the future.

    //Write some values
    *dataPtr = 0xAA;
    for(int i = 0; i < 10; i++)
    {
        *(dataPtr + i) = 0xAA + i;
    }

    //Flush the cache
    Xil_DCacheFlush();

Next, I enter an infinite loop that reads and prints the same memory locations, clearing the cache on each iteration. I ran the application on the hardware in debug mode, setting a break point on each iteration of this loop. The section below shows the serial terminal output after the first iteration of the loop.

    //Keep reading back the values to see changes from PL side
    while(1)
    {
        //Flush the cache to get changed values from DDR
        Xil_DCacheFlush();
        for(int i = 0; i < 10; i++)
        {
            printf("value %d: %lx\n", i, *(dataPtr+i));
        }
    }
value 0: aa
value 1: ab
value 2: ac
value 3: ad
value 4: ae
value 5: af
value 6: b0
value 7: b1
value 8: b2
value 9: b3


With the board running, the Vivado hardware manager view shows the VIO probes that were added to the design.

The VIO interface allowed me to manually manipulate the AXI interface. First, I pointed it to the DDR memory address via the AIF_ADDR field.

I enabled the interface by setting the AIF_EN bit and the value at the input address was read and populated into the AIF_RDATA register. The AIF_DONE signal indicated that the operation was completed and the AIF_ERR register did not indicate an error.

Next, I incremented the AIF_ADDR field to the next address and toggled the AIF_EN bit to initiate another read.

Next, I populated the AIF_WDATA field, set the write enable bit, AIF_WEN and toggled the enable signal to initiate a write. As before, the module indicated "done" with no error.

Next, I read back the same address location by clearing the write-enable bit and re-enabling the module. The expected value was read back.

I returned to the Vitis IDE and let the print loop complete another iteration. The output reflects the value just written from the PL.

value 0: aa
value 1: 1234abcd
value 2: ac
value 3: ad
value 4: ae
value 5: af
value 6: b0
value 7: b1
value 8: b2
value 9: b3

Summary

This interface, and the research it required for me to implement it, has given me an introductory understanding of the AXI-4 protocol. I intend to use this interface in a proof-of-concept design synthesized and run on the Zynq7000 development board that I will cover in a future post. I intend to refine this design and tailor it to for use as the primary data path for my cryptography implementations (at least until a streaming solution is required).

References

Get honeypotted? I like spam. Contact Us Contact Us Email Email ar.hp@outlook.com email: ar.hp@outlook.com