EE382C Embedded Software Systems - Hierarchical Scheduling
Jose Luis Pino, Shuvra S. Bhattacharyya, and Edward A. Lee,
"A Hierarchical Multiprocessor Scheduling Framework for Synchronous
Dataflow Graphs", Technical Report, UCB/ERL M95/36, May 30, 1995.
- Page 1:
The core of this framework is a clustering algorithm that reduces the
number of nodes before expanding the SDF graph into a precedence
directed acyclic graph (DAG).
- Page 2:
Generating a standalone application from a dataflow graph description
requires two phases: scheduling and synthesis.
In the scheduling phase, the dataflow graph is partitioned for
parallel execution.
We splice send and receive nodes into the graph for interprocessor
communication.
- Page 3:
An SDF graph is consistent if it is not deadlocked, and a repetitions
vector exists.
- Page 5:
To schedule SDF graphs onto multiple processors, a precedence graph
is constructed from the original SDF graph
In general, the SDF graph exposes some of the functional parallelism,
and in addition, it exposes the data parallelism available.
The precedence graph for the DSF graph of figure 1 is shown
in figure 2.
Formally, the precedence graph is constructed by first instantiating
q(x) nodes.
For each arc alpha in the SDF graph, an arc in the precedence graph
is instantiated from src(alpha)i to snk(alpha)j
for each ordered pair (i,j) that satisfies
1 <= i <= q(src(alpha)) and 1 <= j <= q(snk(alpha),
and at least one of the following two conditions holds (see the top
of page 6).
- Page 6:
An SDF graph is deadlocked if and only if its precedence graph contains
a cycle.
Unfortunately, the expansion due to the repetition count of each SDF
node can lead to an exponential growth of nodes in the DAG.
- Page 7:
A five-node SDF representation of a compact-disc to digital audio tape
sample rate conversion system expands to a DAG that contains over 600 nodes.
Most uniprocessor SDF schedulers do not require a DAG to be generated
for scheduling purposes.
- Page 9:
Tarjan developed an efficient algorithm to find strongly connected
components in linear tome with respect to the number of arcs and
nodes in the SDF system.
DAG multiprocessor schedulers that minimize the interprocessor communication
(IPC) costs typically have two distinct scheduling phases: (1) clustering,
and (2) processor assignment.
- Page 10:
The parallel time (PT) is defined as the length of the longest path in the
graph.
Figures 6 shows an initial labeled DAG and the result of one clustering step.
It is important to note that each resultant cluster is mapped onto a single
processors.
This observation motivates the modification of parallel time minimization
clustering heuristics for use on the SDF graph.
- Page 11: (Lemma 1)
Suppose that we divide the invocations of snk(alpha) into groups of Q
members, each denoted Ij, as shown by (4):
- No more than one invocation of src(alpha) has a precedence graph output arc
directed to a member of Ij.
- If src(alpha)y has an output (precedence graph) arc directed to
some member of Ij, then src(alpha)y has output arcs
directed to all members of Ij.
- Page 13: (Lemma 2) Same as Lemma 1 with the roles of the source and sink
nodes interchanged.
- Page 14: (Figure 8) Precedence graph illustrating Lemma 3.
ka = 1. Q can be either 1 or 3.
Q = 3 corresponds to the clustering operation shown.
Clearly, (14) is satisfied with j = 1.
- (a) cycle
A1 - B1-3 - C1 - A1
exists so there is deadlock.
da != k Q ka because
da = 2 which implies k = 2/3 which is not an integer
- (b) consistent.
da = k Q ka.
da = 3 which implies k = 1
- (c) consistent.
da = k Q ka.
da = 0 which implies k = 0
- Page 17:
Unfortunately, SDF lacks the composition property.
That is, if we cluster two arbitrary SDF nodes, we may introduce deadlock
into the SDF graph.
Thus, the SDF composition theorem provides a sufficient condition that a
clustering operation does not introduce deadlock.
Currently, there is no known exact condition (both necessary and sufficient)
that can be evaluated in polynomial time with respec to the number of notes
in the SDF graph.
This fact is explored on pages 18-21.
- Page 21:
The SDF composition theorem establishes four clustering criteria that
together produce a sufficient condition that a given clustering operation
involving two adjacent nodes does not produce deadlock.
- Precedence shift A: for the proposed cluster (x,y), test all arcs from
nodes outside of the cluster to x for which the nodes outside of the
cluster and x are in the same strongly connected component
- Precedence shift B: for the proposed cluster (x,y), test all arcs from
x to nodes outside of the cluster for which x and the nodes outside
of the cluster are in the same strongly connected component
- Hidden delay: applies for the proposed cluster (x,y) only if x and y
are in a strongly connected component
- Cycle introduction
Note that the conditions given in the SDF composition theorem may be
satisfied for the ordered pair (y,x), even though they are not satisfied
for (x,y).
Thus, in general, both orderings should be tried before ruling out a
clustering operation.
- Page 25: Four examples of clustering techniques.
- User specifies it
- Take resource constraints into account
- Group nodes into a well-ordered URC SDF subgraph where the nodes
do not have internal state
- Multiprocessor DAG scheduling
Updated 04/18/00.