This directory contains the Vivado build flow for running Loom on an Alveo U250 via PCIe/XDMA. It uses DFX (Dynamic Function eXchange) so the FPGA infrastructure (PCIe, clocking, AXI bus) persists on the board across DUT changes, and only the compiled DUT is swapped.
Bitstream (.bit)
The binary file that configures the FPGA fabric. A full bitstream configures
the entire device. A partial bitstream configures only one region (the RP).
DFX — Dynamic Function eXchange Xilinx/AMD's name for partial reconfiguration. A portion of the FPGA (the RP) can be reconfigured at runtime while the rest (the static region) keeps running. The PCIe link stays up, the AXI bus stays live — only the DUT logic changes.
Static region
The part of the FPGA that never changes after the shell is programmed. Contains:
XDMA (PCIe), AXI-Lite demux, firewall, DFX decoupler, CDC, clock generator, and
reset synchroniser. Defined by everything in loom_shell except u_emu_top.
RP — Reconfigurable Partition
The region of the FPGA fabric reserved for swappable logic. In Loom the RP is
the u_emu_top cell (122/128 clock regions across all 4 SLRs (~95% of device)
on the U250). The RP boundary is fixed at first-build time and cannot move
without rebuilding the static region.
RM — Reconfigurable Module
One specific implementation that fills the RP. Every compiled DUT produces one
RM. Multiple RMs can exist on disk; only one is loaded into the RP at a time.
Named via RM_NAME= on the make command line (default: rm).
Partial bitstream (*_partial.bit)
A bitstream that reconfigures only the RP. Much smaller than a full bitstream
(covers only 122/128 clock regions across all 4 SLRs (~95% of device)). Loaded
via JTAG without disturbing the static region or the PCIe link. Every new DUT
build produces one of these.
Full bitstream (full.bit)
A bitstream that configures the entire device: static region + one initial RM.
Used only for the first-time flash programming. After that, only partial
bitstreams are needed for DUT swaps.
MCS (.mcs)
Intel HEX-like file format for SPI flash programming. Generated from full.bit
by write_cfgmem. Written to the on-board SPI flash by dfx-program-flash so
the shell loads automatically on every power-on.
SPI flash The Micron MT25QU01G (1 Gb) NOR flash soldered on the U250 board. Stores the full bitstream. On power-on the FPGA configuration engine reads from it and configures the static region + initial RM automatically (~100 ms). Partial bitstreams are never written to flash — they are loaded directly into FPGA SRAM via JTAG and lost on power cycle.
DCP — Design Checkpoint Vivado's snapshot format. Stores netlist + constraints + placement + routing. Three DCPs matter here:
| File | When produced | Contains |
|---|---|---|
static_synth.dcp |
dfx-static step 1 |
Synthesised static netlist with RP as black box |
static_routed.dcp |
dfx-static step 3 |
Golden checkpoint — fully placed & routed static region |
${RM_NAME}_synth.dcp |
dfx-rm step 1 |
Synthesised RM netlist (OOC) |
OOC — Out-of-Context synthesis Synthesising a module in isolation without its parent. Used for the RM so Vivado doesn't need to re-synthesise the whole design for each new DUT. Boundary ports are treated as top-level I/Os for timing purposes.
Pblock
A rectangular region of FPGA fabric assigned to the RP. Defined in
boards/u250/u250_dfx.xdc. Must enclose all LUTs, FFs, BRAMs, and DSPs of
u_emu_top. Fixed at first-build time — changing the pblock invalidates
static_routed.dcp and requires a full rebuild.
pr_verify
Vivado command that checks a partial bitstream is compatible with its full
bitstream. Run automatically at the end of dfx_rm.tcl. Fails if the static
region changed between builds.
Decoupler
The Xilinx DFX Decoupler IP (xlnx_decoupler) in the static region. When
asserted (decouple=1 at register 0x5_0000), it drives the RP-facing AXI
signals to a safe idle state so the bus doesn't hang during partial
reconfiguration. Always assert before loading a partial bitstream; deassert
after. loomx decouple / loomx couple do this from the host.
Each Vivado invocation creates a timestamped run directory under work-u250/runs/
containing the full log, journal, and a snapshot of all outputs produced by that
run. work-u250/results/ holds the latest version of each file, with
bitstreams and reports as symlinks into runs/ and DCPs as real files (they are
inputs to downstream steps and must survive the start of the next run).
work-u250/
├── results/
│ ├── static_synth.dcp real file — static synthesis checkpoint
│ ├── static_routed.dcp real file — *** GOLDEN *** keep this safe
│ ├── <RM_NAME>_synth.dcp real file — RM synthesis checkpoint
│ ├── full.bit → runs/<timestamp>_dfx_impl_<rm>/full.bit
│ ├── <RM_NAME>_partial.bit → runs/<timestamp>_dfx_rm_<rm>/<RM_NAME>_partial.bit
│ └── *.rpt → runs/<timestamp>_<step>_<rm>/*.rpt
└── runs/
└── <timestamp>_<step>_<rm>/
├── vivado.log Vivado stdout log
├── vivado.jou Vivado journal (replay script)
├── metadata.txt step, rm_name, timestamp, exit_code
├── *.bit bitstream(s) produced by this run
└── *.rpt utilisation / timing / DRC reports
static_routed.dcp is the most important file. Treat it like a compiled
artifact for the board — back it up alongside full.bit and shell.mcs. If
you delete it, the next dfx-static must redo the full place & route.
At the start of each run, stale *.bit and *.rpt symlinks are removed
from results/ so a failed build never leaves a previous run's bitstream
silently in place.
# 1. Generate Xilinx IPs (downloads nothing, runs Vivado IP gen locally)
make ip
# 2. Build and lock the static shell — slow, run once
# Produces: static_routed.dcp, full.bit, rm_partial.bit
make dfx-static TRANSFORMED_V=path/to/transformed.v
# 3. Burn full.bit to SPI flash — shell now loads on every power-on
make dfx-program-flash# Compile new DUT through the Loom transform pipeline first, then:
# 1. Build partial bitstream (OOC synth + partial P&R against locked static)
make dfx-rm TRANSFORMED_V=path/to/new_transformed.v RM_NAME=my_dut
# → work-u250/results/my_dut_partial.bit
# 2. Load via JTAG — static shell is completely untouched
make dfx-program-rm RM_NAME=my_dut
# 3. Run
loomx -work path/to/build -t xdmaRM_NAME is just a label for output file naming. Omitting it defaults to rm.
Only needed if you change the static region itself:
| Change | Need full rebuild? |
|---|---|
New DUT (transformed.v) |
No — use dfx-rm |
loom_emu_top internals |
No — use dfx-rm |
loom_shell.sv (outside u_emu_top) |
Yes — dfx-static |
| IP changes (XDMA, CDC, clk_gen) | Yes — make ip then dfx-static |
Pblock changes (u250_dfx.xdc) |
Yes — dfx-static |
| Timing / pin constraints | Yes — dfx-static |
| Target | Description |
|---|---|
make ip |
Generate all Xilinx IPs (one-time) |
make synth |
Non-DFX flat synthesis (for development/debug) |
make bitstream |
Non-DFX full bitstream (for development/debug) |
make dfx-static |
DFX: build & lock static shell (slow, run once) |
make dfx-rm |
DFX: build partial bitstream for a DUT (fast) |
make dfx-program-flash |
Write full.bit to SPI flash (shell persistence) |
make dfx-program-rm |
Load $(RM_NAME)_partial.bit via JTAG (DUT swap) |
make program |
Non-DFX JTAG full-device program |
make driver |
Build XDMA kernel driver |
make driver-load |
Insert XDMA kernel module |
make rescan |
PCIe bus rescan after programming |
make clean |
Remove work-u250/ and IP .done stamps |
fpga/
├── README.md This file
├── Makefile Build orchestration
├── boards/u250/
│ ├── settings.tcl Part / board-part strings
│ ├── u250_pins.xdc Pin assignments
│ ├── u250_timing.xdc Timing constraints
│ ├── u250_implementation.xdc Bitstream config (SPI x4, compression)
│ └── u250_dfx.xdc DFX pblock definition (RP = 122/128 clock regions across all 4 SLRs (~95% of device))
├── ip/
│ ├── xlnx_xdma/ PCIe XDMA IP
│ ├── xlnx_clk_gen/ Clocking Wizard IP
│ ├── xlnx_cdc/ AXI Clock Converter IP
│ └── xlnx_decoupler/ DFX Decoupler IP
└── scripts/
├── synth.tcl Non-DFX flat synthesis
├── synth_static.tcl DFX: static synthesis (RP as black box)
├── synth_rm.tcl DFX: RM OOC synthesis
├── impl.tcl Non-DFX implementation
├── dfx_impl.tcl DFX: full P&R → produces static_routed.dcp
├── dfx_rm.tcl DFX: partial P&R using locked static
└── program.tcl Unified programmer (MODE=jtag|jtag-partial|flash)