

# H/S Codesign: A CAD Perspective

Margarida F. Jacome

The University of Texas at Austin

Margarida Jacome - UT Austin - Spring 97





- Working hypothesis: the overall system can be modeled consistently and be partitioned (either manually or automatically) into hardware and software components.
  - hardware components
    - » performance
    - » implemented using existing hardware synthesis tools
  - software components
    - » low cost, flexibility
    - » generated automatically (software compilation)
  - interfaces and synchronization

#### H/S Codesign: Research Issues

- 1 Models and Specification Languages
- 1 Design Space Exploration, Estimation, Partitioning
- 1 Co-simulation/Verification
- **1** Software, Hardware, and Interface Synthesis
- 1 Scheduling, Real-time Operating Systems

Margarida Jacome - UT Austin - Spring 97



→ 1 Vulcan (Stanford - DeMicheli et al)

1 Polis (UC Berkeley - Vicentelli et al)

Margarida Jacome - UT Austin - Spring 97



- 1 Leverages research in behavioral synthesis

  - scheduling techniques (
     relative scheduling)
  - automatic path to synthesis (Olympus)
- 1 Automatic partitioning
- 1 Deterministic constraint analysis



#### $\Longrightarrow$ 1 Modeling

- **1** Constraint Analysis
- **1** Software and Runtime Environment
- 1 Target Architecture H/S Interface
- 1 Partitioning

Margarida Jacome - UT Austin - Spring 97

## **Example - Algorithmic Description**





- 1 Hierarchical control/data-flow graph:
  - control flow primitives (*iteration* and *model call*) modelled through hierarchy
- 1 Acyclic
  - models a partial order of tasks/operations
  - ◆ acyclic dependencies suffice ⇒ iteration is modeled outside the graph
- 1 Polar
  - source and sink vertices model No-Operations



Margarida Jacome - UT Austin - Spring 97



#### $\textbf{nodes} \Rightarrow$

- 1 *no-op*: no operation
- 1 cond: conditional fork
- 1 join: conditional join
- 1 op-logic: logical operations
- 1 op-arithmetic: arithmetic operations
- 1 op-relational: relational operations
- 1 op-io: I/O operations
- 1 wait: wait on a signal variable (synchronization)
- 1 link: hierarchical operations
  - call: procedure call (invocation times = 1)
  - ◆ *loop*: iteration (invocation times ≥ 1)

Margarida Jacome - UT Austin - Spring 97



System Model:  $\Phi = \{G_1^*, G_2^*, ..., G_n^*\}$ where

 $G_i^*$ : process graph model  $G_i$  and all the flow graphs hierarchically linked to  $G_i^*$ .

**★** Flow graph models can common to more than one hierarchy  $\Rightarrow$  shared models



- 1 Implementation, *I(G)*, of a graph model *G* :
  - assignment of *delays* and *size* properties to *operations* in *G*
  - choice of a *runtime scheduler*, γ, that enables the execution of *source* operations in G



- 1 Operation delay
- **1** Graph Latency
- 1 Rate of Execution (operations)



Margarida Jacome - UT Austin - Spring 97



Margarida Jacome - UT Austin - Spring 97



- $1 \delta(\mathbf{v}_i) = \lambda(\mathbf{G}_1) \cdot \mathbf{x}$ 
  - can be
  - → variable

→ unbounded (loop vertices with unbounded indices)



Link vertices: call and/or loop (point to other flow graphs in the hierarchy)

Margarida Jacome - UT Austin - Spring 97

#### **EXAMPLE 7** Rate of Execution (operations)







For each invocation of a flow graph model, an operation is invoked zero, one, or many times depending upon its *position on the hierarchy* of the flow model



The execution times  $t_{k(v)}$  of an operation v are determined by two separate mechanisms

 $\rightarrow$  The runtime scheduler,  $\gamma$ 

★ determines the invocation time of flow graphs

**The operation scheduler,**  $\Omega$ 

#### **Scheduling of Operations**

**Given a graph model G = (V, E), the selection of a** *schedule* refers to the choice of a function  $\Omega$  that determines the *start time of operations* such that

 $t_{k}(\mathbf{v}_{i}) \geq \max_{j:(\mathbf{v}_{j}, \mathbf{v}_{i}) \in E} [t_{k}(\mathbf{v}_{j}) + \delta(\mathbf{v}_{j})]$ 



is satisfied for each invocation k>0 of operations  $v_i$  and  $v_j$ 

Margarida Jacome - UT Austin - Spring 97

#### Modified Relative Schedule



Margarida Jacome - UT Austin - Spring 97



#### 1 Modeling

Constraint Analysis

- **1** Software and Runtime Environment
- 1 Target Architecture H/S Interface
- **1** Partitioning

Margarida Jacome - UT Austin - Spring 97





→ 1 Operation delay constraints

- unary: bounds on the delay of an operation
- binary: bounds on the delay between the starting time of two operations
- 1 Execution rate constraints





Margarida Jacome - UT Austin - Spring 97



#### **Data Rate Constraints**

1 *Minimum data rate constraint,*  $r_i$  (cycles<sup>-1</sup>) on operation  $v_i$ : lower bound on the execution rate of  $v_i$ 



1 Maximum data rate constraint,  $R_i$  (cycles-1) on operation  $v_i$ : upper bound on the execution rate of  $v_i$ 





#### **Ex.:** Specification of Rate Constraints



Margarida Jacome - UT Austin - Spring 97

#### **Timing Constraints and Scheduling**

- 1 Given a scheduling function, a timing constraint is considered *satisfied* if
  - the operation starting times determined by the scheduling function satisfy the inequalities



Margarida Jacome - UT Austin - Spring 97

# **Satisfiability - Delay Constraints**

A minimum delay constraint is always satisfiable

 $\theta_{vj}(\mathbf{v}_i) \geq \max(\mathbf{l}(\mathbf{v}_i, \mathbf{v}_i), \mathbf{l}_{ij})$ 

A maximum delay constraint may not always be satisfiable

#### **Modified Relative** Schedule



Margarida Jacome - UT Austin - Spring 97

#### Satisfiability - Delay Constraints

#### **Feasibility:**

A constraint graph is considered *feasible* if it contains *no positive cycle* when the delay of ND operations is assigned to zero.

Condition <u>necessary and sufficient</u> to determine the *satisfiability* of constraints in the presence of *ND* operations:

Operation delay constraints are *satisfiable* if and only if

the constraint graph is *feasible* 

there exists no cycles with ND operations



Margarida Jacome - UT Austin - Spring 97



Margarida Jacome - UT Austin - Spring 97

#### Satisfiability - Rate Constraints

→ Maximum rate constraints are always satisfiable



appropriate choice of overhead delay (γ) applicable to every execution of G



Margarida Jacome - UT Austin - Spring 97



A minimum rate constraint  $r_i$  on an operation  $v_i \in V(G)$ , where G contains no ND operations is satisfiable if



Margarida Jacome - UT Austin - Spring 97

#### Overhead Delay

- 1  $\gamma_k(G)$ : reinvocation delay for G
  - may be a fixed quantity: overhead due to a run time scheduler
  - may be variable: in case of conditional invocation of G













- 1 Specification
- 1 Modeling
- **1** Constraint Analysis
- >1 Software and Runtime Environment
- >1 Target Architecture H/S Interface
- **1** Partitioning



#### **Target Architecture**



Margarida Jacome - UT Austin - Spring 97











Margarida Jacome - UT Austin - Spring 97





### A Model for Software

1 Software is constructed as a set of *concurrent program threads* 

A tread is defined as a linearization of operations that may or may not begin by an *ND operation* 

the latency of a thread ( $\lambda$ ) is defined as the sum of the delay of its operations without including the ND operation

merged into the delay of the runtime scheduler



#### Non-prioritized FIFO Scheduler

- 1 A thread is enabled when its "id" is in the control FIFO
- Before detaching, a thread performs one or more enqueue operations to the FIFO, for its dependent threads





**Thread with Multiple Control Dependencies** 



### **Software Size/Delay Estimation**



Margarida Jacome - UT Austin - Spring 97



- **1** Specification
- 1 Modeling
- **1** Constraint Analysis
- **1** Software and Runtime Environment
- 1 Target Architecture H/S Interface
- Partitioning
  - 1 Co-simulation

#### **Problem Formulation**

For a given set of flow graph models and timing constraints, create two sets of flow graph models such that one can be implemented in hardware and the other in software and the following is true:

➡ Timing constraints are satisfied

 $\implies$  Processor utilization,  $P \le 1$ 

 $\implies$  Bus utilization,  $B \leq \overline{B}$ 

cumulative size of variables transferred across the partition

→ A cost function f(S<sub>H</sub>, S<sub>S</sub>, B, P-1, m) is minimized...

weights: represent a desired tradeoffs between size of the hardware, processor and bus utilization, and communication overhead

Margarida Jacome - UT Austin - Spring 97

#### Software Model



Margarida Jacome - UT Austin - Spring 97



Margarida Jacome - UT Austin - Spring 97



Margarida Jacome - UT Austin - Spring 97



Satisfiability to reaction rates of program threads

Sufficient condition for a program thread (for non-preemptive non-prioritized runtime scheduler)





Determined based on a worst case scenario
 ensure that worst case scenario is handled

Timing constraints: min/max delay and execution rate Performance constraints: processor and bus utilization, run-time scheduler (software)

Margarida Jacome - UT Austin - Spring 97

#### Greedy" Partition Algorithm

```
graph_partition(G) {
V_H = V(G);
V_{S} = {};
                                                          /* initialization */
for v \in V(G) {
                                                          /* All ND Loops go to Software */
  if v is a ND link operation
     V_{S} = V_{S} + \{v\};
}
create software threads (V<sub>S</sub>);
                                                           /* serialization, etc. */
compute reaction rates for each thread;
                                                           /* based on rate constraints */
if not check_feasibility (VH,Vs)
                                                          /* timing cnstr, processor and bus utilization */
  exit;
\mathbf{f}_{\min} = \mathbf{f} \left( \mathbf{V}_{\mathrm{H}}, \mathbf{V}_{\mathrm{S}} \right);
                                                           /* initialize cost function */
repeat {
  for v\,\in\,V_{H} and v is not ND
                                                         /* pick HW operation --> SW */
                                                        /* move(v) calls check_feasibility */
  f_{\min} = move(v);
  } until no further reduction in f<sub>min</sub>
return (V_H, V_S):
```