1 Overview

The goals of this lab are to:

• Use Xilinx's Vitis high-level synthesis (HLS) tool to synthesize the GEMM accelerator and generate Verilog or VHDL code at the register transfer level (RTL).

• Validate the generated RTL code and compare the results with the reference C model.

• Explore various architectural alternatives.

2 Tutorial

Please refer to the following materials for a tutorial by Xilinx that you can follow at your own pace:

• Vitis High-Level Synthesis User Guide (UG1399)

• Vitis HLS tutorial

3 Creating Standalone, Synthesizable GEMM Code

Starting from your isolated GEMM function in Lab #1, we will now create a standalone GEMM function that can be fed into Vitis HLS for synthesis:

a) Take the single, isolated GEMM function that you created in Lab 1 and modify the code, if necessary, such that Vitis HLS can synthesize this top-level function into an RTL description. If you didn’t do so already in Lab 1, modify its function prototype such that it now receives int-type inputs and generates int-type outputs, i.e. the fixed-point conversion does not happen inside the GEMM function. Make sure the GEMM is a single, standalone C/C++ function that is side-effect free, i.e. any and all required inputs and outputs are passed as function parameters or return value as you proceed with the isolation.

b) Develop a C/C++ testbench to test your standalone, fixed-point GEMM function.

c) Compile and run the code to check if the GEMM is functioning correctly.

Deliverables:

• A directory named part3 in the lab-2 repository in Github Classroom, that includes the following:

– A README file including how to compile, run and verify your design.

– A Makefile or script to compile and/or run your code, if necessary.

– All required C/C++ code (source file and testbench file).

– Any golden input/output test files. Please keep the total size of these files to less than 50MB.

• A writeup in your lab report (submitted on Canvas) documenting your isolated GEMM and testbench.

• Note: the TA should be able to run your main program and compare the results using the testbench you provide.

4 Synthesizing the GEMM Accelerator

We will now synthesize the standalone GEMM functionality down to a cycle-by-cycle RTL description:

a) On an ECE-LRC machine, setup the Xilinx environment and launch Vitis HLS:

% module load xilinx/2022
% vitis_hls

b) Create a new Vitis HLS project for your design with following settings:

• Project name: hls_gemm (or whatever you want)

• Location: wherever you want, but it is recommended that you create your projects in your local scratch space under /misc/scratch/<your_username>/

• Top Function: the name of your top-level GEMM function

• Design Files: add your design files that are supposed to be synthesized (no header files, but .c or .cpp files).

• TestBench Files: add your testbench files and input data files for test. These files are not synthesized.

• Solution Name: solution1 (or whatever you want). In a Vitis HLS project, you can create multiple solutions, and each can have different synthesis options. And you can compare the solutions in Vitis HLS environment.

• Clock Period: There is no requirement for this lab. A good starting point is 10 ns, but the target clock should eventually be one of the parameters driving optimizations.

• Part Selection:

– Select: Parts -> Browse and select 'xczu3eg-sbva484-1-e’

c) In this lab, we will synthesize the core computational GEMM kernel of our accelerator that will operate out of local accelerator SRAM and will later be combined (connected) with SRAMs and external bus interfaces to integrate it with the rest of the system. By default, Vitis HLS will synthesize any of your C function’s parameters that are specified as scalars or fixed-size arrays (e.g. int A[1024]) into ports of ap_none (register) and ap_memory (SRAM) interface type, respectively. However, if your GEMM C function uses arguments of pointer type (int *A) or arrays of undefined size (e.g. int A[]), you will need to provide directives (either through pragmas in the code or via the Vitis HLS GUI) to synthesize them into the correct ap_memory port interfaces in the generated RTL. In the process, make sure to set the depth parameter to be equal to the size of the corresponding array or the co-simulation will fail, e.g.:

#pragma HLS_INTERFACE port=A mode=ap_memory depth=1024

d) Start from a default architectural constraint, i.e. don't specify any other architectural constraints (=synthesis directives) yet at this point. You will be exploring different architectural alternatives in the next part of this lab.

e) Click Project -> Run C simulation. Check the simulation log. A successful simulation will have a “*****CSIM finish*******” message in the end.

f) Select the top-level function in Project -> Project Settings -> Synthesis and run C synthesis. Discuss the results of the synthesis report that is automatically shown in Vitis HLS after synthesis.

g) Validate your RTL code running C/RTL co-simulation. Check the digital waveform and confirm the correct GEMM operation.

Deliverables:

• A directory named part4 in the lab-2 repository in Github Classroom containing the following:

– A README file including information of how to compile and run your files in Vitis.

– All required C or C++ code (source code and testbench files, as well as any header files).

• A write-up in the report, briefly explaining how your RTL GEMM works and how you validated the RTL design.

5 Optimizing the Design

Freely explore at least 3 different architectural alternatives using various features offered by Vitis HLS (e.g. unrolling, pipelining, memory optimization, etc.) to come up with an area-performance optimal design. Discuss your approaches to different solutions and compare them in terms of various design metrics, i.e. area, latency, throughput, and operating clock frequency.

Deliverables:

• A directory named part5 in the lab-2 repository in Github Classroom. For each of your designs, create a subdirectory under part5 named design_<design_number> that includes the following files:

– A README file including how to run and what to compare.

– All required C or C++ code of the design.

– The directives.tcl files to synthesize/verify your designs. The directives.tcl file you are required to submit is under the Solution_# directory. The TA should be able to synthesize and verify all your designs. Place each .tcl file in its corresponding subdirectory.

– Generated Verilog code of the design.

• A write-up in your lab report (submitted on Canvas):

– Explain the approaches you have used for each solution.

– Comparison of synthesis results.

– Discussion of the results.

Lab Report Submission

Submit the following deliverable via Canvas:

• A write-up in PDF format (Part 3, 4, and 5)

• Parts 3, 4 and 5 in Github Classroom, as described previously, including.

– Source code, scripts, generated Verilog and README files.