EE382C Embedded Software Systems - Blocked Schedules
SDF Multiprocessor Scheduling
The unfolding factor of a periodic schedule S is the
the largest common factor (GCD) of the actor invocation counts
in S, which equals the number of minimal periodic schedules
that exist in the schedule.
Recall a previous SDF graph example:
A ---> B ---> C
20 10 20 10
The repetition vector for [A B C] is [1 2 4].
Any schedule that invokes A, B, and C for U, 2 U, and 4 U times,
respectively, for some positive integer U is also a periodic schedule.
U is the unfolding factor.
The code size is proportional to the unfolding factor.
The blocking factor is the unfolding factor of a blocked schedule.
A blocked schedule is the infinite repetition of a periodic
schedule in which each cycle of the schedule must complete before the
next cycle is begun.
This is a convenient but inefficient representation for periodic schedules
on a multiprocessor system.
Consider a parallel implementation of the example SDF graph above,
assuming that A, B, and C each takes one cycle to complete their execution,
and assume that it takes one cycle to communicate results between processors.
Blocked Schedule Overlapping Schedule
Cycle Cycle
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
- - - - - - - - - - - - - - - - - - -
Processor 1 A B C C A B C C A B C C A B C C
Processor 2 B C C B C C B C C B C C
The overlapping schedule will always be at least as good as the blocked
schedule.
Given an SDF graph, it is decidable whether or not a
rate-optimal schedule is attainable with a finite unfolding factor.
Rate-Optimal Schedules
The upper bound on the throughput of a homogeneous SDF graph is
known as the iteration period bound (IPB):
min # delays
IPB = -----------------------
directed sum of computation time
cycles
A schedule is rate optimal if it attains the IPB.
Consider the following example:
--> A ----> B ---
| 1 1 1 1 |
| |
-D------ C <------
1 1
where D means a delay of one time unit.
Here, there is only one directed cycle.
Assuming that A, B, and C each take 2 time units to execute,
the IPB would be 1/6 samples/time unit.
Now, we can increase the iteration period bound and therefore
increase the throughput of the system if we can improve the
throughput of the slowest cycle in the critical path in the graph.
We can improve the slowest cycle by retiming it by adding
an integer number of delays on all arcs.
This will of course propagate delays to the neighbors of the cycle.
If we retime the previous graph by one delay,
--> A --D-> B ---
| 1 1 1 1 |
| |
-DD----- C <---D--
1 1
then the new IPB increases to 4/6 samples/time unit.
So, retiming increases throughput at the cost of increasing memory.
Retiming can be iterative: retime the slowest cycle to create a new graph,
retime the slowest cycle in the new graph, and so forth.
Retiming was proposed by C.E. Leiserson, F.M. Rose, and J.B. Saxe
in 1983 in their paper entitled "Optimizing Synchronous Circuitry
By Retiming" at the Third Caltech Conference on VLSI.
A similar definition of IPB holds for generalized SDF.
For general SDF, we also have to take into account the repetitions
vector for the directed cycle.
--> A ----> B ---
| 40 20 10 20 |
| |
-40D---- C <-------
10 10
where 40D means a delay of 40 time units.
The repetitions vector for [A B C] would be [1 2 4] as before.
Assuming that A, B, and C each take 2 time units to execute, the IPB would
be 40 / (1*2 + 2*2 + 4*2) = 40 / 14 = 2.8571 samples/time unit.
Updated 04/25/00.