EE382C Embedded Software Systems - Hierarchical Scheduling

Jose Luis Pino, Shuvra S. Bhattacharyya, and Edward A. Lee, "A Hierarchical Multiprocessor Scheduling Framework for Synchronous Dataflow Graphs", Technical Report, UCB/ERL M95/36, May 30, 1995.

Page 1: The core of this framework is a clustering algorithm that reduces the number of nodes before expanding the SDF graph into a precedence directed acyclic graph (DAG).
Page 2: Generating a standalone application from a dataflow graph description requires two phases: scheduling and synthesis. In the scheduling phase, the dataflow graph is partitioned for parallel execution. We splice send and receive nodes into the graph for interprocessor communication.
Page 3: An SDF graph is consistent if it is not deadlocked, and a repetitions vector exists.
Page 5: To schedule SDF graphs onto multiple processors, a precedence graph is constructed from the original SDF graph In general, the SDF graph exposes some of the functional parallelism, and in addition, it exposes the data parallelism available. The precedence graph for the DSF graph of figure 1 is shown in figure 2. Formally, the precedence graph is constructed by first instantiating q(x) nodes. For each arc alpha in the SDF graph, an arc in the precedence graph is instantiated from src(alpha)_i to snk(alpha)_j for each ordered pair (i,j) that satisfies 1 <= i <= q(src(alpha)) and 1 <= j <= q(snk(alpha), and at least one of the following two conditions holds (see the top of page 6).
Page 6: An SDF graph is deadlocked if and only if its precedence graph contains a cycle. Unfortunately, the expansion due to the repetition count of each SDF node can lead to an exponential growth of nodes in the DAG.
Page 7: A five-node SDF representation of a compact-disc to digital audio tape sample rate conversion system expands to a DAG that contains over 600 nodes. Most uniprocessor SDF schedulers do not require a DAG to be generated for scheduling purposes.
Page 9: Tarjan developed an efficient algorithm to find strongly connected components in linear tome with respect to the number of arcs and nodes in the SDF system. DAG multiprocessor schedulers that minimize the interprocessor communication (IPC) costs typically have two distinct scheduling phases: (1) clustering, and (2) processor assignment.
Page 10: The parallel time (PT) is defined as the length of the longest path in the graph. Figures 6 shows an initial labeled DAG and the result of one clustering step. It is important to note that each resultant cluster is mapped onto a single processors. This observation motivates the modification of parallel time minimization clustering heuristics for use on the SDF graph.
Page 11: (Lemma 1) Suppose that we divide the invocations of snk(alpha) into groups of Q members, each denoted I_j, as shown by (4):
1. No more than one invocation of src(alpha) has a precedence graph output arc directed to a member of I_j.
2. If src(alpha)_y has an output (precedence graph) arc directed to some member of I_j, then src(alpha)_y has output arcs directed to all members of I_j.
Page 13: (Lemma 2) Same as Lemma 1 with the roles of the source and sink nodes interchanged.
Page 14: (Figure 8) Precedence graph illustrating Lemma 3. k_a = 1. Q can be either 1 or 3. Q = 3 corresponds to the clustering operation shown. Clearly, (14) is satisfied with j = 1.
- (a) cycle A₁ - B_1-3 - C₁ - A₁ exists so there is deadlock. d_a != k Q k_a because d_a = 2 which implies k = 2/3 which is not an integer
- (b) consistent. d_a = k Q k_a. d_a = 3 which implies k = 1
- (c) consistent. d_a = k Q k_a. d_a = 0 which implies k = 0
Page 17: Unfortunately, SDF lacks the composition property. That is, if we cluster two arbitrary SDF nodes, we may introduce deadlock into the SDF graph. Thus, the SDF composition theorem provides a sufficient condition that a clustering operation does not introduce deadlock. Currently, there is no known exact condition (both necessary and sufficient) that can be evaluated in polynomial time with respec to the number of notes in the SDF graph. This fact is explored on pages 18-21.
Page 21: The SDF composition theorem establishes four clustering criteria that together produce a sufficient condition that a given clustering operation involving two adjacent nodes does not produce deadlock.
1. Precedence shift A: for the proposed cluster (x,y), test all arcs from nodes outside of the cluster to x for which the nodes outside of the cluster and x are in the same strongly connected component
2. Precedence shift B: for the proposed cluster (x,y), test all arcs from x to nodes outside of the cluster for which x and the nodes outside of the cluster are in the same strongly connected component
3. Hidden delay: applies for the proposed cluster (x,y) only if x and y are in a strongly connected component
4. Cycle introduction
Note that the conditions given in the SDF composition theorem may be satisfied for the ordered pair (y,x), even though they are not satisfied for (x,y). Thus, in general, both orderings should be tried before ruling out a clustering operation.
Page 25: Four examples of clustering techniques.
1. User specifies it
2. Take resource constraints into account
3. Group nodes into a well-ordered URC SDF subgraph where the nodes do not have internal state
4. Multiprocessor DAG scheduling

Updated 04/18/00.