# ECE382M.20: System-on-Chip (SoC) Design

#### **Lecture 1 – Project Overview**

Andreas Gerstlauer
Electrical and Computer Engineering
The University of Texas at Austin

gerstl@ece.utexas.edu



#### **Lecture 1: Outline**

- Marketing requirements
  - · Market focus, product description
  - · Cost metrics, product features
- Product requirements
  - Deep learning
  - · Hardware acceleration
- Project description
  - Deep/Convolutional Neural Networks (DNNs/CNNs)
  - Object recognition
  - You Only Look Once (YOLO) CNN
  - · Hardware and software development tasks

ECE382M.20: SoC Design, Lecture 1

© 2023 A. Gerstlauer

2

#### **Market Focus**

- Visual object recognition
  - Computer vision for drones, selfdriving cars, home automation, ...
  - Camera-based automotive driver assistance systems (ADAS)



Source: Jonathan Hui

#### What problem are we trying to solve?

- · Standard camera for
  - Collision avoidance
  - Lane tracking/keeping
  - Traffic sign recognition
  - ..
- Detect, locate and classify objects in video stream
  - Bounding boxes
  - Types of objects



ECE382M.20: SoC Design, Lecture 1

© 2023 A. Gerstlauer

3

## Competition

- MobilEye (an Intel company)
  - http://mobileye.com
  - Custom ASIC/SoC solution
- Movidius (an Intel company)
  - https://www.movidius.com/
  - Custom ASIC/SoC solution
- NVIDIA DRIVE
  - https://www.nvidia.com/en-us/self-driving-cars
  - ARM+GPU based solution
  - · Used by Tesla

ECE382M.20: SoC Design, Lecture 1

© 2023 A. Gerstlauer

4

## **Product Description**

- Visual object recognition SoC
  - Deliver hardware + software intellectual property (IP)
- Cost metrics
  - Real-time: frames per second (FPS), reaction time
  - Detection accuracy: mean average precision (mAP)
  - Power/thermal: W and operating temperature (°C)
  - Cost: \$ or die area (mm²)
- Product features
  - Supported image resolutions
  - · Supported detection classes
  - Flexibility: dynamic, over-the-air reprogramming/updating

ECE382M.20: SoC Design, Lecture 1

© 2023 A. Gerstlauer

5

## **Product Requirements**

- High detection accuracy → deep learning
  - Convolutional Neural Network (CNN)
  - · Trained on large image data set
  - · Very computationally intensive
- High frame rate, low power → hardware acceleration
  - Key/dominating computational kernels
  - Convolutions and matrix operations
  - General matrix-matrix multiplication (GEMM)
- Flexiblity → software support
  - · Standard embedded Linux environment
  - Software optimizations for performance and power

ECE382M.20: SoC Design, Lecture 1

© 2023 A. Gerstlauer

6

# **Objection Detection using Deep Learning**

· Classification vs. detection

Image classification, global feature



Object detection, classification + localization



- Convolutional neural networks (CNNs) widely used for image classification
- Sliding windows of different size/shape + CNN-based classification for brute-force, naïve object detection

ECE382M.20: SoC Design, Lecture 1

© 2023 A. Gerstlauer

7

# **Object Recognition (1)**

Region based (Fast/Faster/Mask R-CNN)









3. Compute CNN features

4. Classify regions

- Fast versions by sharing convolutional layers
- Common feature extraction for region proposal and classification



ECE382M.20: SoC Design, Lecture 1

# **Object Recognition (2)**

Single Shot Multibox Detector (SSD)



- Apply windows of fixed size and shape at multiple scales
- · Detect both bounding box and class within window
- Predict likelihood of different box/class combinations

ECE382M.20: SoC Design, Lecture 1

© 2023 A. Gerstlauer

9

# **Object Detection (3)**

- You Only Look Once (YOLO)
  - Single grid/scale, predict arbitrary bounding box (and class)



ECE382M.20: SoC Design, Lecture 1

© 2023 A. Gerstlauer

10

# You Only Look Once (YOLO)



- · Default implementation on top of Darknet
  - General open-source CNN framework/library in C
  - Also available for other deep learning frameworks
    - PyTorch, Caffee2 [Facebook], TensorFlow [Google]

ECE382M.20: SoC Design, Lecture 1

© 2023 A. Gerstlauer

11















## **Project Description**

- HW/SW co-design of an embedded SoC
  - Low-power YOLO/Darknet implementation
  - ARM-based target platform
    - ARM Cortex-A9 processor, memory components, I/O devices
    - Custom hardware accelerators (GEMM)
    - Interconnected via standard system busses or memory/cache interfaces
  - Virtual and physical prototyping
    - SystemC TLM-based virtual platform model (QEMU ARM simulator)
    - ARM- and Xilinx FPGA-based prototyping board (Zynq-7000)
  - > Lab and project in teams
    - > 2-3 per team, 20 teams for 20 boards

ECE382M.20: SoC Design, Lecture 1

© 2023 A. Gerstlauer

19

## **Project Objectives and Activities**

- Project objective:
  - Implement the YOLO/Darknet code on a ARM based SoC while meeting the performance, area and power metrics.
- Project activities:
  - Profile the YOLO/Darknet software implementation to determine performance bottlenecks
  - Optimize the YOLO/Darknet software (fixed point operation)
  - Partition the software into components which will run on the ARM processor and on hardware accelerators
  - Synthesize accelerators into Verilog for gate level implementation
  - Co-simulate and prototype the HW/SW implementation
  - Estimate timing, area and power metrics and validate against product requirements

ECE382M.20: SoC Design, Lecture 1

© 2023 A. Gerstlauer

20

#### **Development Tasks**

- ARM software development
  - Compile and profile YOLO/Darknet on ARM board
  - Convert floating-point to fixed-point code and check mAP
  - Compile and profile fixed-point Yolo on ARM board
  - Optimize software on dual-core ARM platform
  - Develop hardware abstraction layer (HAL) and I/O handler
  - Develop interrupt handler & driver (Linux kernel module)
- Hardware development on FPGA
  - Hardware accelerators (synthesize fixed-point code)
  - Interface to ARM board and on-chip bus
  - Memory/cache interfaces (optional DRAM controller)
  - Interrupt logic, clocking & reset
  - Debug, diagnostics

ECE382M.20: SoC Design, Lecture 1

© 2023 A. Gerstlauer

21

