System-on-Chip
(SoC) Design
ECE382M.20,
Fall 2023
Board
Tutorial
Notes:
•
This
is a tutorial related to the class project.
•
Please
use the discussion board on Ed for Q&A.
•
Please
check relevant web pages.
The goal of this
tutorial is to:
•
Give an introduction to the Xilinx tools and
the Ultra96 board.
This tutorial includes
the following:
•
Use
of the Ultra96 board to run and co-verify a multiply-and-accumulate (MAC)
example.
•
An
application that reads inputs, calls the hardware function with these inputs
(modifying a HAL implementation adapted from Lab 3), gets the output, and
prints it.
We will use three
Xilinx tools to prototype the application on the board:
•
Vitis
HLS is Xilinx’s high-level synthesis (HLS) tool that is used for C-to-RTL
synthesis (you already used Vitis HLS in Lab 2).
•
Vivado Design Suite will be used for
RTL-to-gate synthesis and FPGA bitstream generation.
•
Vivado’s IP Integrator lets you
create complex system designs by instantiating and interconnecting IP cores.
Please refer to the
following materials for additional information:
•
Vitis
HLS User Guide and Vitis
HLS Tutorial already provided in Lab 2
•
Vivado Design Suite User Guide (UG910)
•
Xilinx Embedded
Design Tutorials, in particular the Zynq
UltraScale+ Embedded Design Tutorial
This tutorial will use
an example that takes two integers as inputs, calculates their product, and
gives the accumulated sum as the output. The multiplication and accumulation
take place in the FPGA, which is called from the software.
Start by downloading
the example from https://github.com/gerstl/Board_demo
and go through the following steps:
(a)
Hardware
part
1. Go into the hardware source directory of the tutorial example (hls_macc/) and inspect the files:
• The example code, hls_macc.c/.h, is the C source code annotated with Xilinx-specific synthesis directives (pragmas), which are used to automatically infer a bus and register interface.
• The application code
comes with a testbench (hls_macc_test.c).
2.
Launch
Vitis HLS and synthesize the example code. The steps are equivalent to what you
did in Lab 2, following the Vitis HLS Tutorial and/or training materials:
a.
Create
a new project by using the following information:
Project name: hls_macc (or
whatever you want.)
Location: On the LRC machines under/misc/scratch
Top Function: hls_macc
Design Files: hls_macc.c
TestBench Files: hls_macc_test.c
Solution Name: solution1 (or whatever you want)
Clock Period: 4
Part Selection: RTL tool: Auto, Specify:
Parts->Select ‘xczu3eg-sbva484-1-e’
b.
Run
C Simulation
c.
Run
C Synthesis
d.
Run
C/RTL Cosimulation
3.
Click
‘Solution’ -> ‘Export RTL’ -> ‘OK’.
This will make the generated RTL code usable in Vivado
Design Suite as a custom hardware IP. If this step is successful, you will see
a zip file appear under ‘solution1’ -> ‘impl’ -> ‘ip’
in the explorer pane.
4.
Exit
Vitis HLS.
5.
Check
out the file in: /solution1/impl/misc/drivers/hls_macc_v1_0/src/xhls_macc_hw.h
This file includes the address mapping of the HLS component’s control
signals, such as start/done, interrupt enables and any IO you defined with an s_axilite interface.
6.
Launch
Vivado Design Suite on an LRC machine:
%
module load xilinx/2022
% vivado
7.
Follow
the steps below to create a new project, integrate the HLS IP into an overall
system design and generate the FPGA bitstream:
a.
Create a Vivado Project
•
Click:
“Create Project” and click "Next”.
•
Select
a project name, e.g. “hls_macc_project”
and click “Next”.
•
Select
“RTL project” and mark “Do not specify sources at this
time”. Click “Next”.
•
Click
on “Boards” and look for “Ultra96-V2 Single Board
Computer” in the search box. If a download button is shown in the
“Preview” or “Status” column then click on it. If this
board does not show at all, then restart Vivado and,
before creating a new project, in the main menu, go to “Vivado Store” and click Ok in the dialog box. Go to
“Boards”->”Avnet”->”Evaluation
Boards” and look for the board mentioned above. If not installed, then
right click on it and click “Install”. The board should now be
listed when creating the project. After selecting the board, click
“Next”.
•
You
should now see a window like this. Click “Finish”. You may have
noticed that the part number is slightly different than the one we selected in
Vitis HLS in Lab 3, but they are basically the same device:
b.
Add HLS component to IP Catalog
•
In
the Project Navigation area, go to “Window” at the toolbar at the
top and select “IP Catalog”. Right click anywhere in the white
space and click “Add Repository”. Browse to the location of the IP
generated by the Vitis HLS project
(/misc/scratch/<path_to_HLS_project>/solution1/impl/ip)
and click “Select” to close the IP repository manager. You should
now see an “hls_macc” IP under
“User Repository”->”VITIS HLS
IP”:
c. Create an IP Integrator block design of the system
•
In
the IP Integrator area of the Flow Navigator, click “Create Block
Design” and type “Ultra96_Design” (or whatever you want) in
the dialog box. The Block Design view opens in the main pane, with a new
Diagram tab, containing a blank Block Design canvas:
• Press the
“+” button to add an IP to the design.
• Type “zynq” into the search box entry.
• Select “Zynq Ultrascale+ MPSoC”. An IP
symbol for the ZYNQ7 Processing System appears on the canvas.
•
Click
the “Run Block Automation” link under the title bar. Ensure that
“zynq_ultra_ps_e_0” is selected and that “Apply Board
Preset” is also checked. Click “Ok”. The Zynq processing
system will show up on the canvas and look like this:
•
If
you double click on the ZYNQ processing system, the IP configuration block
should look like this:
•
Click
“Add IP” again and type in “hls”
in the search box. The “Hls_macc” IP
should show up. Double click to add it to the design.
•
Select
“Run Connection Automation”. Ensure that the following are selected
and click “Ok”:
•
Select
“Run Connection Automation” again and ensure that the following are
selected and click “Ok”:
•
In
the Block Diagram’s toolbar, select “Validate Design” and
ensure no errors or critical warnings occur. Two icons to the right of
“Validate Design”, click on “Regenerate Layout”. Your
block diagram should look like this:
•
Go
to the “Interrupt” port of the Hls_macc
block. Left click and drag to connect it to the “pl_ps_irq0[0:0”
port of the Zynq UltraScale+ MPSoC
IP. Your system should look like this:
d.
Implement the system
•
Before
proceeding with the system design, you must generate implementation sources and
create an HDL wrapper as the top-level module for synthesis and implementation.
•
Return
to the Project Manager view by clicking on “Project Manager” in the
Flow Navigator.
•
In
the Sources browser in the main workspace pane, a Block Diagram object called Ultra96_Design is at the top of the
Design Sources tree view. Right-click this object and select “Generate
Output Products”. In the pop-up window, ensure the following are selected
and click “Ok”:
•
Right-click
the Ultra96_Design object again,
select “Create HDL Wrapper”, ensure the following option is enabled
and click “Ok” to exit the resulting dialog box:
•
The
top-level of the Design Sources tree becomes the Ultra96_Design_wrapper.v file. The design is
now ready to be synthesized, implemented and to have an FPGA programming
bitstream generated. Click “Generate Bitstream” to initiate the
remainder of the flow. A warning that no implementations results are available
will be shown. Click “Ok”. Finally, in the last pop-up window, click
“Ok”, like so:
•
This
process will take some time. Once completed, a pop-up window will ask if you
want to (optionally) open the implemented design. You can ignore this for now,
so decline.
•
From
the Vivado menu, select “File”->
“Export”-> “Export Hardware”, click
“Next”. Select “Include bitstream” and click
“Next”. The hardware description will be exported to an .xsa file that you will need for the following steps of
the tutorial. Select where you want to save this and click
“Next”-> “Finish”.
(b)
Integrate
hardware on the board
1.
We
first need to integrate the synthesized hardware design into the device tree
blob (DTB) for the Linux kernel using PetaLinux. As
already mentioned in Lab 3, by default, PetaLinux
will create a temporary directory under /tmp that will fill up
fast. For this lab, you must to log into and use the yoshi machine, which has
local disk space mounted under /homework
that must be used for PetaLinux projects.
Alternatively, you can work on your own machine using the Docker image from Lab
3 available here: https://hub.docker.com/r/gerstla/petalinux-systemc.
Important: make sure to pull and work with the 2022.2 version/tag of the Docker
image.
2.
Log
into yoshi, setup the
environment, and create a new PetaLinux project under
/homework using the board
support package (BSP) for our Ultra96v2 board:
yoshi% module load xilinx/2022
yoshi% source
/usr/local/packages/Xilinx_2022.2/petalinux/2022.2/settings.sh
yoshi% umask 022
yoshi% mkdir -p /homework/$USER
yoshi% cd /homework/$USER
yoshi% petalinux-create -t project -n Project -s /home/projects/gerstl/ece382m/u96v2_sbc_base_2022_2.bsp
yoshi% cd Project
3.
Configure
the new project by pointing to the hardware description file produced by the Xilinx SDK, and
then configure PetaLinux to boot from an SD card image:
yoshi% petalinux-config --get-hw-description=<vivado_dir>/<project_name>/Ultra96_Design_wrapper.xsa
Select Image Packaging Configuration->Root filesystem
type->SD Card
Select Yocto Settings->Enable Buildtools Extended
Save the config and exit
4.
Open
up the following two files in the Petalinux project:
yoshi% vim components/plnx_workspace/device-tree/device-tree/system-bsp.dtsi
project-spec/meta-avnet/recipes-bsp/device-tree/files/u96v2-sbc/system-bsp.dtsi
Comment out the following lines from both files
and save them:
/*
---
&axi_intc_0
{
compatible = "xlnx,xps-intc-1.00.a";
interrupt-parent = <&gic>;
interrupts = <0 95 1>;
};
&amba_pl {
zyxclmm_drm
{
compatible =
"xlnx,zocl";
status =
"okay";
interrupt-parent
= <&axi_intc_0>;
interrupts =
<0 4>,
<1 4>, <2 4>, <3 4>,
<4 4>,
<5 4>, <6 4>, <7 4>,
<8 4>,
<9 4>, <10 4>, <11
4>,
<12 4>, <13 4>, <14 4>, <15 4>,
<16 4>, <17 4>, <18 4>, <19 4>,
<20 4>, <21 4>, <22 4>, <23 4>,
<24 4>, <25 4>, <26 4>, <27 4>,
<28
4>, <29 4>, <30 4>, <31 4>;
};
};
--- */
5.
Now
build the DTB:
yoshi% petalinux-build -c device-tree
6.
The
new DTB file is located under ./images/linux. You should see system.dtb there.
7.
If
you want to modify the device tree, you can use the dtc tool to decompile the system.dtb file into a system.dts source, edit it, and
then compile it back into a binary DTB blob as described in Lab 3.
8.
We
will also need the bitstream file that was generated by Vivado.
Look for the bitstreamfile in the Vivado
project directory and remember its location:
yoshi% find <vivado_dir> -name
"*.bit"
9.
Next,
we need to copy the new DTB and bitstream into the \boot partition of the
board. The boot partition on the SD card of the board is different from the \boot directory that is part
of the root filesystem that you see when logged into the board. As such, you
either need to use an SD card reader to access the SD card from your laptop/PC,
or, alternatively, you can simply mount the \boot partition directly from the board as the root
user:
$ sudo mount /dev/mmcblk0p1 /boot
10.
Now
copy the updated DTB file and the FPGA bitstream into the \boot partition. First back
up the current configuration on your SD card by making dated copies of the
original DTB and bitstream files:
$ cd /boot
$ mv system.dtb system.dtb.`date +%m.%d.%y`
$ mv system.bit system.bit.`date +%m.%d.%y`
Then
copy the new system.dtb and system.bit on the LRC machines to
the /boot partition of the SD
card:
$ sudo scp
<user>@<server>.ece.utexas.edu:<vivado_bit_file>
/boot/system.bit
$ sudo scp
<user>@<server>.ece.utexas.edu:<petalinux_dir>
/images/linux/system.dtb /boot/system.dtb
11. Reboot the board. This
will program the FPGA with the bitstream on the SD card and load the new device
tree into the Linux kernel.
(c)
Software
part
1. Go into the software application subdirectory (board_app/) of the tutorial example, which contains the source code for an application that initializes the hardware IP, feeds two operands into the hardware, waits for the result, and reads/prints the output. Note that Vitis HLS will automatically synthesize a set of registers into the hardware that allow the software to configure and control its operation. This includes interrupt enable registers that the above application code needs to first initialize in order for the hardware to generate interrupts. To find the register map automatically defined by Vitis HLS, open the xhls_macc_hw.h file found under ‘<your HLS project> -> <your solution> -> impl -> drivers -> hls_macc_top_v1_0 -> src’ in the Vitis HLS explorer window.
2. The application example includes an updated kernel module and device driver (fpga_drv.c) for the board. This driver was modified from Lab 3 to match the compatible name, properly acknowledge and clear interrupts, and access the correct memory-mapped addresses and interrupt status register in the synthesized hardware. Note that it is always a good idea to double-check the system memory and interrupt mapping and compatible naming, as the kernel and driver rely on this information to interface software and hardware. Once you rebooted your board, check the information of your hardware module in the kernel’s device tree:
$ dtc -I fs /sys/firmware/devicetree/base
You
should see an amba_pl { … }; section
listing your hls_macc device information. An example output should
be as follows:
hls_macc@a0000000 {
xlnx,s-axi-hls-macc-periph-bus-addr-width = <0x6>;
xlnx,s-axi-hls-macc-periph-bus-data-width
= <0x20>;
compatible = "xlnx,hls-macc-1.0";
interrupt-parent = <0x4>;interrupts
= <0x0 0x59 0x4>;
reg = <0x0 0xa0000000 0x0 0x10000>;
};
};
In
the above example, the accelerator base address is located at 0xa0000000, the
interrupt ID is 0x59 (89), and the compatible name is "xlnx,hls-macc-1.0". Make sure this
information is aligned to what you have in your driver code. Make necessary
changes in the driver code if necessary.
3.
Compile
the application example and driver code either on the LRC machines or directly
on the board. If you compile on the board, you need to make sure to add the -fno-stack-protector option to the KFLAGS macro in the provided Makefile or the kernel module will not load. If you want to
cross-compile the application on the LRC machines, you need to either use the
2018.3 version of the cross-compiler that is compatible with the setup on the
board as you have done for Lab 1 and add the -fno-stack-protector option in KFLAGS, or use the 2022.2
cross-compiler as you have done in Lab 3 and then enable the -static option in the CCFLAGS to statically compile
in all libraries (which will, however, create a large application binary). In
either case, after cross-compiling, copy the compiled example binary and kernel
module to the board:
$ scp <user>@<server>.ece.utexas.edu:<path>/example .
$ scp <user>@<server>.ece.utexas.edu:<path>/fpga_drv.ko
.
4. Run the commands to insert the device driver on the board:
$ sudo insmod
fpga_drv.ko
You can look at the kernel logs including any
messages generated by the driver using:
$ dmesg
5. Finally, run the application example:
$ ./example <number1> <number2>
If everything works fine, you should see the
following message in your terminal:
A is <number1>
B is <number2>
C += A*B is <number3>