System-on-Chip (SoC) Design

ECE382M.20, Fall 2021

QEMU/SystemC Tutorial

Notes:

• This is a tutorial related to the class project.

• Please use the discussion board on Piazza for Q&A.

• Please check relevant web pages.

1 Overview

The goal of this tutorial is to:

• Give an introduction to the QEMU/SystemC virtual platform simulation environment.

This tutorial includes the following:

• A simulation model of the Zynq UltraScale+ development/prototyping platform.

• A SystemC model of a simple hardware module mapped to the Zynq's FPGA fabric.

• An application example running on the ARM that calls and interfaces with the hardware.

2 QEMU/SystemC Environment

We use the free QEMU simulation model provided by Xilinx for their Zynq platform for virtual prototyping of our SoC. More information about Xilinx's QEMU setup is available through their wiki page and support forums.

We will be using a special setup of QEMU provided by Xilinx that integrates with a SystemC co-simulation environment. For QEMU/SystemC co-simulation, a separate SystemC simulator instance is launched in parallel to the QEMU simulation, where the two simulators communicate through special co-simulation ports that model the available bus and other interfaces between the Zynq’s programmable ARM subsystem and the FPGA fabric. On the SystemC side, these interfaces are represented as standard TLM2.0 bus models and sockets that other SystemC modules representing blocks in the programmable logic can be connected and interfaced to in order to construct a complete SoC virtual platform simulation. More information about the QEMU/SystemC integration from Xilinx is available in their wiki page.

3 QEMU/SystemC Example and Tutorial

The following example demonstrates a QEMU/SystemC simulation of a Zynq UltraScale+ platform that includes a simple hardware module implemented in the FPGA fabric, where the application running on the ARM accesses the external hardware through memory-mapped I/O or a Linux kernel module.

(a) Environment Setup

We will be using the QEMU simulator that comes with the PetaLinux tools provided by Xilinx. We provide a pre-built image of a minimal Linux installation to boot in QEMU running on the emulated ARM platform. To setup the simulation environment, log into the ECE-LRC servers (see instructions) and copy the pre-built image into a local working directory. You should work in the scratch space on the LRC machines. The image takes up quite a bit of disk space for the emulated SD card, and space in the LRC home directories is limited. Note, however, that the scratch space on LRC is not backed up and is wiped at the end of every semester.

% mkdir -p /misc/scratch/$USER/Lab2
% cd /misc/scratch/$USER/Lab2

% cp -a /home/projects/gerstl/ece382m/images .

At this point, you should be able to boot the pre-built image using a provided script:

% /home/projects/gerstl/ece382m/qemu-boot

This will launch two QEMU instances simulating the ARM subsystem and the PMU boot firmware/boot loader running on a MicroBlaze processor. The two simulators communicate via a local ./tmp directory that is created if it doesn’t already exist. If successful, you should see Linux booting on the ARM subsystem in the terminal. At the prompt, login as 'root' (password 'root') and you will have access to a bare-bone, minimal Linux installation (busybox) running on the emulated ARM platform. Note that to end the simulation, you should always cleanly shutdown the simulated Linux and then terminate the QEMU simulation using Ctrl-A X (Ctrl-C will not work).

Alternatively, if you want to work on your own machine, a Docker image with a QEMU and SystemC co-simulation installation for this lab is available from https://hub.docker.com/r/gerstla/qemu-systemc. Follow the provided instructions to mount or copy /home/projects/gerstl/ece382m/images into the container and launch the (co-)simulation.

(b) Custom Linux Image (optional, skip this step unless you want to customize the image)

If you want to modify the image that is booted in QEMU, you can create a custom kernel and root file system using Xilinx’s PetaLinux tools. The PetaLinux build process requires a lot of temporary disk space that is not available on all LRC machines. By default, PetaLinux will create a temporary directory under /tmp, which will fill up fast and crash the machine. In order to create and compile a PetaLinux project, you need to log into and use only the yoshi machine. On Yoshi, there is 440GB of local disk space that is mounted under /homework and must be used for temporary PetaLinux files. Also, PetaLinux projects must be created in the scratch space (/misc/scratch) on yoshi.

First, log into yoshi and setup the PetaLinux environment:

yoshi% source /usr/local/packages/xilinx_2018/petalinux/2018.3/settings.sh
yoshi% umask 022

Then create a new PetaLinux project under /misc/scratch. For simulation purposes, we are using a default board support package (BSP) provided by Xilinx that emulates their basic ZCU102 UltraScale+ evaluation board:

yoshi% mkdir -p /misc/scratch/$USER/Lab2
yoshi% cd /misc/scratch/$USER/Lab2
yoshi% petalinux-create -t project -n PetaLinux -s /home/projects/gerstl/ece382m/xilinx-zcu102-v2018.3-final.bsp.bsp --template zynqMP
yoshi% cd PetaLinux

Next, configure PetaLinux to use a temporary directory under /homework. In addition, the default PetaLinux setup creates an image that boots into Linux using a RAM disk as root filesystem, where any modifications will not be permanent and space is limited. Instead, we need to configure PetaLinux to boot from an SD card image instead:

yoshi% petalinux-config
Select Yocto Settings->TMPDIR Location->/homework/<unique directory name>
Select Image Packaging Configuration->Root filesystem type->SD Card

To create a custom kernel, run the kernel configuration and compile the kernel:

yoshi% petalinux-config -c kernel
Select Kernel hacking->KGDB: kernel debugger
Select or modify any other kernel options as you see fit.
yoshi% petalinux-build -c kernel

You can also create your own customized Linux root filesystem. If you do, make sure to include the necessary libraries (libstdc++) needed to run YOLO/Darknet:

yoshi% petalinux-config -c rootfs
Select Filesystem Package -> misc -> gcc-runtime -> libstdc++
Select Filesystem Package -> misc -> gdb -> gdbserver
Select apps -> peekpoke
Select other optional packages as desired…
yoshi% petalinux-build -c rootfs

You can then boot the QEMU simulation as described in part (a) and inside the simulator copy the new kernel and extract the new root filesystem onto the emulated SD card:

# scp <user>@yoshi.ece.utexas.edu:/misc/scratch/<user>/Lab2/PetaLinux/images/linux/Image /boot
# scp <user>@yoshi.ece.utexas.edu:/misc/scratch/<user>/Lab2/PetaLinux/images/linux/system.dtb /boot
# ssh <user>@yoshi.ece.utexas.edu cat /misc/scratch/<user>/Lab2/images/linux/rootfs.tar.gz | tar xzvpf - -C /

As an alternative to working on yoshi, you can work on your own machine using a PetaLinux Docker image available here: https://hub.docker.com/r/gerstla/petalinux-systemc. Important: to create a kernel and root filesystem that is compatible with the setup on the board, make sure to pull and work with the 2018.3 version/tag of the Docker image. The image also includes a complete QEMU/SystemC co-simulation setup, but that only works for the 2020.2 version of the image (the QEMU included with the 2018.3 version of PetaLinux is outdated). For co-simulation of 2018.3 kernels and filesystems, use the Docker image provided above under (a), which includes a custom QEMU setup compiled from sources.

As described by Xilinx for their QEMU/SystemC co-simulation setup, to make the external SystemC interfaces accessible in the QEMU simulation, we need to launch the QEMU simulation with extra co-simulation arguments:

% /home/projects/gerstl/ece382m/qemu-boot -c

The qemu-boot script accepts optional icount and sync quantum parameters (see ‘qemu-boot --help’) that are passed to QEMU to allow you to play with the tradeoff between simulation speed and accuracy.

QEMU will now start but immediately hang waiting for a connection from the SystemC simulator. To setup the SystemC simulation, we need to include a special co-simulation library provided by Xilinx. For this example and tutorial, we will be using a small SystemC demo platform provided by Xilinx that is built on top of this library. Open a new terminal, log into the same LRC machine, and download and compile the Xilinx SystemC co-simulation demo:

% module load systemc/2.3.3
% git clone https://github.com/Xilinx/systemctlm-cosim-demo.git
% cd systemctlm-cosim-demo
% git submodule update --init libsystemctlm-soc
% vi Makefile
Change SYSTEMC ?= /usr/local/systemc-2.3.2/
To SYSTEMC ?= /usr/local/packages/systemc-2.3.3/
% make

Then, launch the SystemC side of the co-simulation:

% ./zynqmp_demo unix:../tmp/qemu-rport-_amba@0_cosim@0 10000

The SystemC simulator will start and in turn wait to try to establish the connection to the QEMU simulator. Once the connection is made, both simulators will proceed, and the QEMU side should continue to boot Linux as normal (but slower). Note that the last argument is the sync quantum, which should match the value passed to QEMU.

This SystemC demo example includes a simple debug device (see debugdev.h/.cc) attached to the main system bus (see the top-level zyncmp_demo.cc SystemC module, which stitches everything together). The debug device is accessible on the following bus addresses:

0xa0000000 – Read SystemC time (in seconds) / Write debug message and measure time
0xa0000004 – Write output ASCII character to terminal
0xa0000008 – Write to stop the simulation via an exit(1) call
0xa000000c – Read/Write status of interrupt line (high/low, i.e. 1/0)
0xa0000010 – Read SystemC clock() count

You can use the peek and poke commands from within the busybox Linux shell to test reading/writing from/to these addresses.

(d) Application software

We demonstrate application development and HW/SW interfacing using a simple software application running on the virtual platform that interfaces with the debug hardware device modeled in SystemC. To compile and link applications for the board, we need to use the ARM cross-compiler tool chain provided by Xilinx together with their overall tool environment:

% module load xilinx/2018
% source /usr/local/packages/xilinx_2018/vivado_hl/SDK/2018.3/settings64.sh

Alternatively, you can also use the cross-compilers that are included with PetaLinux (source /usr/local/packages/xilinx_2018/petalinux/2018.3/settings.sh on LRC, or using the 2018.3 PetaLinux Docker image from https://hub.docker.com/r/gerstla/petalinux-systemc).

Then unpack the application example (QEMU_SystemC_app.tar.gz) and cross-compile the executable (example) for the ARM using the provided Makefile:

% tar xzf /home/projects/gerstl/ece382m/QEMU_SystemC_app.tar.gz
% cd QEMU_SystemC_app/application

% make

Next, we need to copy the cross-compiled application into the simulated platform. The virtual platform simulator includes emulated network access, so the easiest way to do this is by copying the executable into the simulated platform over the network. From within the simulation execute (you can use either scp or wget):

# scp <user>@<server>.ece.utexas.edu:<path>/application/example .

Finally, run the application example from there:

# ./example [<val>]

The example (see example.c) demonstrates use of memory-mapped I/O under Linux to access the debug device registers and either read the SystemC time or write to the debug port similar to the peek and poke commands above. See the included README.application for more details.

(e) Interrupt-driven application using a Linux kernel module

The application package also includes a version of the example code that uses a kernel module (fpga_drv.c) to implement an interrupt-based device driver for all accesses to the FPGA hardware. For the kernel module to work, we first need to tell the Linux kernel about the existence of the external hardware device. The application example provides a modified kernel device tree in the ./boot directory that includes an additional ‘ece382m,fpga’ device for this example. You can optionally inspect and potentially modify the device tree and then compile it into a device tree blob (DTB) for the kernel:

% cd ../boot
% dtc -I dtb -O dts -o system.dts system.dtb
% vi system.dts (search for ‘ece382m,fpga’)
% dtc -I dts -O dtb -o system.dtb system.dts

Finally, copy the pre-compiled device tree blob into the simulated platform and then reboot the simulation:

# scp <user>@<server>.ece.utexas.edu:<path>/boot/system.dtb /boot
# reboot

Alternatively, if you compiled your own kernel in step (b), go into your PetaLinux project directory and either copy the ‘project-spec/meta-user/recipes-bsp/device-tree/files/zynq-fpga.dtsi’ included with the application package into your project directory, or just unpack the application package right into the PetaLinux project tree:

yoshi% cd PetaLinux
yoshi% tar xzf /home/projects/gerstl/ece382m/QEMU_SystemC_app.tar.gz --strip-components=1

Update the PetaLinux recipes to include the additional ‘fpga’ device:

yoshi% vi project-spec/meta-user/recipes-bsp/device-tree/files/system-user.dtsi
Add /include/ "zynq-fpga.dtsi"
yoshi% vi project-spec/meta-user/recipes-bsp/device-tree/device-tree.bbappend
Add SRC_URI_append += "file://zynqmp-fpga.dtsi"
yoshi% petalinux-build -c device-tree

Finally, copy the device tree blob into the simulated platform as described in (b) and (re)start the QEMU+SystemC simulation of our (virtual) platform.

Then cross-compile the executable (example) and the kernel module (fpga_drv.ko) of the interrupt-driven application for the ARM:

% cd ../application.irq

% make

Copy both files into the virtual platform using scp or wget as shown above. From within the simulation, load the kernel module, check that the device driver is properly installed and registered for GIC interrupt 121, and look at the output reported by the driver under its /proc/fpga entry:

# insmod fpga_drv.ko
# lsmod
# cat /proc/interrupts
# cat /proc/fpga

Finally, run the application example:

# ./example [<val>]

In addition to the output from before, you should see messages from the 'fpga_drv' about I/O accesses, including handling of incoming interrupts for synchronization with the hardware.

4 QEMU/SystemC Debugging

In the following, we describe various options that are available for debugging both the SystemC side of the virtual platform as well as the application running on the simulated ARM.

(a) SystemC debugging

By default, the SystemC/C++ model of the virtual platform is already compiled with debug information (-g option) enabled. However, you may want to also disable compile optimizations (remove the -O2 switch in the Makefile and recompile). The ‘zynqmp_demo’ executable can then be executed in your favorite debugger, such as GDB:

% gdb zynqmp_demo

Or, if you prefer a graphical debugging environment (using the DDD graphical GDB frontend):

% ddd zynqmp_demo

Finally, inside the debugger, launch the ‘zyncmp_demo’ program with the correct command line arguments:

(gdb) run unix:../tmp/qemu-rport-_amba@0_cosim@0 10000

From there on, you can debug the virtual platform’s SystemC/C++ executable using standard means, e.g., Ctrl-C will interrupt the running program and bring you back to the (gdb) prompt.

(b) Application debugging

The Linux installation running on our platform includes a GDB server, which can be used to attach an external GDB instance and remotely debug an application running on top of it. In order to do that, first make sure that your application is compiled with debug information (-g switch enabled). In case of the given application example using the provided Makefiles:

% cd application[.irq]

% make debug

Then, (re)start the virtual platform and forward the debug port in the platform to any available port on the local simulation host:

% cd ..

% /home/projects/gerstl/ece382m/qemu-boot -n hostfwd=tcp::<port>-:1234

Copy your application into the platform and start it there under the control of the GDB server:

# gdbserver HOST:1234 example

Finally, open another shell on the host and launch an ARM cross-debugger instance pointing it to the same application binary (with compiled-in debug information):

% aarch64-linux-gnu-gdb application[.irq]/example

Inside GDB, connect to the (forwarded) remote server and run the application under control of the cross-debugger:

(gdb) target remote localhost:<port>

(gdb) b main

(gdb) c

If you want to venture into debugging the Linux kernel, including any applications it launches, QEMU supports the capability to attach a remote debugger to the simulator itself:

% /home/projects/gerstl/ece382m/qemu-boot -g tcp::<port> -S

Note that with the -S option, QEMU will not boot any code until a remote debugger is attached. If you want to be able to use source information, make sure whatever code you want to debug is compiled with -g and point the cross-debugger to it, e.g.:

% aarch64-linux-gnu-gdb application[.irq]/example

Finally, connect the cross-debugger to QEMU and start execution:

(gdb) target remote localhost:<port>

(gdb) b main

(gdb) c

You can then use the virtual platform normally, while being able to control the emulated code from the remote debugger (e.g. Ctrl-C will stop the simulated ARM and bring it back under GDB control). Setting a breakpoint as shown above will trigger when the application you pointed the debugger to is launched inside the platform (# ./example) and it hits the given source line.