EE382C Embedded Software Systems - Blocked Schedules

SDF Multiprocessor Scheduling

The unfolding factor of a periodic schedule S is the the largest common factor (GCD) of the actor invocation counts in S, which equals the number of minimal periodic schedules that exist in the schedule. Recall a previous SDF graph example:

A   --->  B  --->  C
 20     10 20    10

The repetition vector for [A B C] is [1 2 4]. Any schedule that invokes A, B, and C for U, 2 U, and 4 U times, respectively, for some positive integer U is also a periodic schedule. U is the unfolding factor. The code size is proportional to the unfolding factor.

The blocking factor is the unfolding factor of a blocked schedule. A blocked schedule is the infinite repetition of a periodic schedule in which each cycle of the schedule must complete before the next cycle is begun. This is a convenient but inefficient representation for periodic schedules on a multiprocessor system. Consider a parallel implementation of the example SDF graph above, assuming that A, B, and C each takes one cycle to complete their execution, and assume that it takes one cycle to communicate results between processors.

              Blocked Schedule      Overlapping Schedule
                   Cycle                   Cycle        
             1 2 3 4 5 6 7 8 9 0     1 2 3 4 5 6 7 8 9  
             - - - - - - - - - -     - - - - - - - - -  
Processor 1  A B C C   A B C C       A B C C A B C C    
Processor 2      B C C     B C C         B C C   B C C

The overlapping schedule will always be at least as good as the blocked schedule.

Given an SDF graph, it is decidable whether or not a rate-optimal schedule is attainable with a finite unfolding factor.

Rate-Optimal Schedules

The upper bound on the throughput of a homogeneous SDF graph is known as the iteration period bound (IPB):

        min           # delays         
IPB =           -----------------------
      directed  sum of computation time
       cycles

A schedule is rate optimal if it attains the IPB. Consider the following example:

 -->  A  ---->  B  --- 
|    1 1       1 1    |
|                     |
 -D------   C  <------ 
           1 1

where D means a delay of one time unit. Here, there is only one directed cycle. Assuming that A, B, and C each take 2 time units to execute, the IPB would be 1/6 samples/time unit.

Now, we can increase the iteration period bound and therefore increase the throughput of the system if we can improve the throughput of the slowest cycle in the critical path in the graph. We can improve the slowest cycle by retiming it by adding an integer number of delays on all arcs. This will of course propagate delays to the neighbors of the cycle. If we retime the previous graph by one delay,

 -->  A  --D->  B  --- 
|    1 1       1 1    |
|                     |
 -DD-----   C  <---D-- 
           1 1

then the new IPB increases to 4/6 samples/time unit. So, retiming increases throughput at the cost of increasing memory. Retiming can be iterative: retime the slowest cycle to create a new graph, retime the slowest cycle in the new graph, and so forth. Retiming was proposed by C.E. Leiserson, F.M. Rose, and J.B. Saxe in 1983 in their paper entitled "Optimizing Synchronous Circuitry By Retiming" at the Third Caltech Conference on VLSI.

A similar definition of IPB holds for generalized SDF. For general SDF, we also have to take into account the repetitions vector for the directed cycle.

 -->  A   ---->   B  --- 
|   40 20       10 20   |
|                       |
 -40D----   C   <------- 
          10 10

where 40D means a delay of 40 time units. The repetitions vector for [A B C] would be [1 2 4] as before. Assuming that A, B, and C each take 2 time units to execute, the IPB would be 40 / (1*2 + 2*2 + 4*2) = 40 / 14 = 2.8571 samples/time unit.

Updated 04/25/00.