Cloud FPGA
EENG 428
ENAS 968
bit.ly/cloudfpga
Lecture: Advanced eXtensible Interface (AXI)
Prof. Jakub Szefer
Dept. of Electrical Engineering, Yale University
EENG 428 / ENAS 968
Cloud FPGA
Share: EENG 428 / ENAS 968 – Cloud FPGA
2
bit.ly/cloudfpga © Jakub Szefer, Fall 2019
FPGA Shell Review
• Recall that Amazon’s Cloud FPGAs contain a Shell (SH) which contains a number of
modules usable by the user’s Custom Logic (CL)
• PCIe controller to communicate with the server
• DRAM controller to use DRAM modules
• AXI bus interfaces
• QSFP interfaces
• Virtual logic analyzer
• …
PCIe DRAM Other
Ctrl. Ctrl. Ctrl.
User Logic
FPGA chip
Share: EENG 428 / ENAS 968 – Cloud FPGA
Block diagram from [2] 3
bit.ly/cloudfpga © Jakub Szefer, Fall 2019
Use of AXI in the Shell and Custom Logic
Most of the communication between Shell and the Custom Logic is done through AXI buses
• Different variants of AXI are used
• AXI4 512-bit
• AXI4-Lite 32-bit
• AXI4-Stream 512-bit
• Most custom logic (CL) modules
require use of AXI
• Except if only virtual LEDs
and DIP switches are used
• Modules developed with AXI
can be used outside of Cloud FPGAs,
in any design using AXI
Share: EENG 428 / ENAS 968 – Cloud FPGA
Block diagram from [2] 4
bit.ly/cloudfpga © Jakub Szefer, Fall 2019
Advanced eXtensible Interface (AXI)
Share: EENG 428 / ENAS 968 – Cloud FPGA
5
bit.ly/cloudfpga © Jakub Szefer, Fall 2019
AXI Background
• Advanced eXtensible Interface (AXI) is a communication interface that is
• parallel Multiple bits are sent in parallel,
typically 32 but can have other sizes
• high-performance
Contains features such as streaming, to
• synchronous move data more quickly than word by word
• high-frequency
Many devices can be on the same bus, but typically only worried about
• multi-master and multi-slave one master (controller) and slave (module doing computation)
• AXI targets on-chip communication in System-on-Chip (SoC) designs
• AXI is available royalty-free and its specification is freely available from ARM
• Latest version is 4:
• AXI4 • All variants are utilized by Cloud
• AXI4-Lite FPGAs in Amazon
• Most designs will need at least
• AXI4-Stream AXI4-Lite for basic
communication with the server
Share: EENG 428 / ENAS 968 – Cloud FPGA AXI information from [3] 6
bit.ly/cloudfpga © Jakub Szefer, Fall 2019
Basic AXI Connections between Master and Slave
In a most basic configuration, AXI is used to connect two modules:
All communication
• Master module initiates communication and data read/write requests is with respect to
• Slave module responds to the requests addresses, each
address and it’s
purpose is module
Communication is achieved over ‘channels’, which contain many wires each specific
Each channel only has one direction Register Purpose
Write address channel 0x1000 Reg1
0x2000 Reg2
Write data channel
0x3000 Config
Master Write response channel Slave
… …
module module
Read address channel
Note, can have a
Read data channel protocol where
‘commands’ are
sent on the data
Data channels are for sending actual data, while bus, so it’s not just
others are to control the data sending process pure data
Share: EENG 428 / ENAS 968 – Cloud FPGA
7
bit.ly/cloudfpga © Jakub Szefer, Fall 2019
AXI Handshake
• Communication over each channel is done
using a handshake protocol
• Handshake protocol ensures both the sender 1. Sender sets
VALID signal when
and receiver can control data transfer payload is placed
• Indicate when sender is ready on the bus
• Let receiver control accepting data
and acknowledge it go the data 2. Receiver sets
READY signal once
Write address channel it has accepted the
payload
Write data channel
A “beat” is used to
Master Write response channel Slave describe transfer
module module of one payload
Read address channel
The payload can
Read data channel be address, control
signal, or data
Handshake protocol image from:
Share: EENG 428 / ENAS 968 – Cloud FPGA https://commons.wikimedia.org/wiki/Fil 8
bit.ly/cloudfpga © Jakub Szefer, Fall 2019 e:AMBA_AXI_Handshake.svg
AXI Channels
Sizes (widths) of the addresses
• Write Address channel (AW) can be design specific, 32 or 64 bit
• Mainly provide address at which data should be written
• Can optionally (depending on AXI type) specify burst size, beats per burst, etc.
• AWVALID (master to slave) and AWREADY (slave to master)
• Write Data channel (W) Sizes (widths) of the data can be design
• Actual data to that is sent specific, e.g., 32, 64, 512
• Can optionally specify data id, beat identifier, etc.
• WVALID (master to slave) and WREADY (slave to master)
• Write Response channel (B)
• Mainly to specify burst status
• BVALID (slave to master) and BREADY (master to slave)
• Read Address channel (AR)
• Mainly provide address from which to read data
• Optional burst size, etc.
• ARVALID (master to slave) and ARREADY (slave to master)
• Read Data channel (R)
• Actual data sent back, plus optional data id, etc.
• RVALID (slave to master) and RREAD (master to slave)
Share: EENG 428 / ENAS 968 – Cloud FPGA
9
bit.ly/cloudfpga © Jakub Szefer, Fall 2019
AXI4-Lite
AXI4-Lite is a subset of the AXI4 protocol, with only basic features
• No bursts, only send one piece of data (beat) at a time
• All data accesses use the full data bus width, which can be either 32 or 64 bits
• AXI4-Lite removes many of the AXI4 signals but follows the AXI4 specification for the rest
• AXI4-Lite transactions are fully compatible with AXI4 devices
• AXI4-Lite masters can be used with AXI4 slaves
• AXI4 masters can work with AXI4-Lite slaves, if none of the extra features are triggered
AXI4-Lite signals:
Share: EENG 428 / ENAS 968 – Cloud FPGA
AXI4-Lite table and information from [3] 10
bit.ly/cloudfpga © Jakub Szefer, Fall 2019
AXI4-Lite Read Example
Example of AXI4-Lite read, need to specify address for each data transfer
AR
channel
R
channel
Ready to receive data Data is valid, and received Not ready to receive more data
Share: EENG 428 / ENAS 968 – Cloud FPGA
Modified waveform from Xilinx manuals 11
bit.ly/cloudfpga © Jakub Szefer, Fall 2019
AXI4 (Regular)
Main advantage of AXI4 over AXI4-Lite is that it supports bursts
• Allows multiple data transfers per single request
• Save on addressing overhead, better bandwidth
• Three burst types are supported
• FIXED
• INCR
• WRAP
• Burst addressing specifies where each
read/write should go to
Burst image from:
Share: EENG 428 / ENAS 968 – Cloud FPGA https://en.wikipedia.org/wiki/Advanced_eXtensi
ble_Interface#/media/File:AXI_Bursts.svg 12
bit.ly/cloudfpga © Jakub Szefer, Fall 2019
AXI4 Read Example
Example read:
Request 4 transfers (ARLEN + 1)
of 4 bytes (32 bits) each from
address 0x0
Respond with 4 transfers, some
take longer as the receiver is
not ready
Example image from:
Share: EENG 428 / ENAS 968 – Cloud FPGA https://en.wikipedia.org/wiki/Advanced_eXtensible_
Interface#/media/File:AXI_read_transaction.svg 13
bit.ly/cloudfpga © Jakub Szefer, Fall 2019
AXI4 Write Example
Example write:
Request 4 transfers (ARLEN + 1)
of 4 bytes (32 bits) each from
address 0x0
Respond with 4 transfers, some
take longer as the receiver is
not ready
Example image from:
Share: EENG 428 / ENAS 968 – Cloud FPGA https://en.wikipedia.org/wiki/Advanced_eXtensible_
Interface#/media/File:AXI_write_transaction.svg 14
bit.ly/cloudfpga © Jakub Szefer, Fall 2019
AXI4 Stream
• AXI4 is for memory mapped interfaces and allows burst of up to 256 data transfer cycles
with just a single address phase
• AXI4-Lite is a light-weight, single transaction memory mapped interface. It has a small
logic footprint and is a simple interface to work with both in design and usage
• AXI4-Stream removes the requirement for an address phase altogether and allows
unlimited data burst size
• AXI4-Stream interfaces and transfers do not have address phases and are therefore not
considered to be memory-mapped.
• The AXI4-Stream protocol is used for applications that typically focus on a data-centric and
data-flow paradigm where the concept of an address is not present or not required. Each AXI4-
Stream acts as a single unidirectional channel for a handshake data flow
Share: EENG 428 / ENAS 968 – Cloud FPGA AXI4 information from [1] 15
bit.ly/cloudfpga © Jakub Szefer, Fall 2019
AXI4 Stream Signals
The stream protocol minimizes overhead by removing need for addressing
• Bus signals indicate when data is available, TVALID
• Receiver can optionally specify ready, TREADY
• Data is sent using TDATA
• Signal end of packet of data with TLAST
AXI4 Stream signals table from [1],
Left image from:
Share: EENG 428 / ENAS 968 – Cloud FPGA https://fpgasite.blogspot.com/2017/07/xilinx-
axi-stream-tutorial-part-1.html 16
bit.ly/cloudfpga © Jakub Szefer, Fall 2019
Communicating with Cloud FPGA Servers via AXI
Share: EENG 428 / ENAS 968 – Cloud FPGA
17
bit.ly/cloudfpga © Jakub Szefer, Fall 2019
peek(), poke() and AXI4-Lite
• The software libraries (SDK) provides means to read and write data to the FPGA
• peek() and poke() are
most basic ways to read or
write data
AXI read or write
• They are triggered by software transaction
and result in AXI read or write
request to show up App App App
App
App App
App App App
Other communication ways: Guest Guest … Guest
VM VM VM PCIe
• DMA, uses AXI4
• FPGA-to-FPGA, uses AXI4-Stream Hypervisor (VMM)
Hardware
Share: EENG 428 / ENAS 968 – Cloud FPGA
18
bit.ly/cloudfpga © Jakub Szefer, Fall 2019
References
1. “AXI Reference Guide, UG761 (v14.3) November 15, 2012” Available at:
https://www.xilinx.com/support/documentation/ip_documentation/axi_ref_guide/latest/ug761_axi_reference_gui
de.pdf
2. “AWS Shell Interface Specification, v1.4.5” Available at: https://github.com/aws/aws-
fpga/blob/master/hdk/docs/AWS_Shell_Interface_Specification.md
3. “Advanced eXtensible Interface” Wikipedia, The Free Encyclopedia. Available at:
https://en.wikipedia.org/wiki/Advanced_eXtensible_Interface
Share: EENG 428 / ENAS 968 – Cloud FPGA
19
bit.ly/cloudfpga © Jakub Szefer, Fall 2019