Introduction to Synchronous Dataflow

Synchronous Dataflow (SDF) is a model first proposed by Edward A. Lee in 1986. In SDF, all computation and data communication is scheduled statically. That is, algorithms expressed as SDF graphs can always be converted into an implementation that is guaranteed to take finite-time to complete all tasks and use finite memory. Thus, an SDF graph can be executed over and over again in a periodic fashion without requiring additional resources as it runs. This type of operation is well-suited to digital signal processing and communications systems which often process an endless supply of data.

An SDF graph consists of nodes and arcs. Nodes represent operations which are called actors. Arcs represent data values called tokens which stored in first-in first-out (FIFO) queues. The word token is used because each data values can represent any data type (e.g. integer or real) or any data structure (e.g. matrix or image).

SDF graphs obey the following rules:

Because of the second rule, the data that an actor consumes is removed from the buffers on the input arcs and not restored. Once an actor finishes execution, the input data is removed from the input queues (circular buffers). The consequence of the third rule is that an SDF graph may not contain data-dependent switch statements such as an if-then-else construct and data-dependent iteration such as a for loop. However, the actors may contain these constructs because the scheduling of an SDF graph is independent of the what tasks the actors do.

Example

This example is taken from Figure 1.5 of [1]. Considered the feedforward (acyclic) synchronous dataflow graph shown below:
A    ------>   B   ------>    C
  20        10   20        10
The notation means that when A executes, it produces 20 tokens. When B executes, it consumes 10 tokens and produces 20 tokens. When C executes, it consumes 10 tokens.

The first step in scheduling an SDF graph for execution is that we must figure out how many times to execute each actor so that all of the intermediate tokens that are produced get consumed. This process is known as load balancing. Load balancing is implemented by an algorithm that is linear in time and memory in the size of the SDF graph. The size of an SDF graph, as we will discover in more detail later, is

#nodes + #arcs * (1 + log2 delayPerArc + log2 inputTokensPerArc + log2 outputTokensPerArc)

where log2 gives the number of bits used to represent the integer argument.

In the example SDF graph above, we must

to balance the number of tokens produced and consumed. However, load balancing does not tell us the order in which to schedule the firings. If there were no constraints on the order, then the number of possible schedules would be combinatoric in the total number of executions (seven in this case). If at least one valid schedule exists, then the SDF graph is called consistent.

The next step is to schedule the firings required by load balancing so as to resolve the data dependencies. Due to rate changes in the graph, the worst case for scheduling is a polynomial function of an exponential function of the size of the SDF graph. The polynomial function is shown next for two scheduling algorithms:

Two variants on looped schedulers, the complementary algorithms called pairwise grouping of adjacent nodes [2] and recursive partitioning based on minimum cuts [2], avoid the exponential penalty. Instead, they are cubic in the size of the SDF graph. These scheduling algorithms are discussed in [1] and will be covered later in the class.

Possible schedules for the above SDF graph is ABCCBCC for the list scheduler and A (2 B(2 C)) for the looped scheduler. The generated code to execute the schedule A (2 B(2 C)) would be the following:

code block for A
for (i = 0; i < 2; i++) {
  code block for B
  for (j = 0; j < 2; j++) {
    code block for C
  }
}
The schedule A (2 B(2 C)) is an example of a single-appearance schedule since the invocation of each actor only appears once. When generating code that is "stitched" together, a single-appearance schedule requires the minimal amount of program memory because the code for each actor only appears once.

The scheduling algorithms could actually return several different valid schedules, such as those shown below.
1. List Scheduler ABCBCCC 50
2. Looped Scheduler A (2 B(2 C)) 40
3. Looped Scheduler A(2 B)(4 C) 60
4. Looped Scheduler A(2 BC)(2 C) 50
The smallest amount of buffer memory possible is 40, which is met by schedule #2. It is optimal in terms of data memory usage. The list scheduler could also have created a data optimal schedule of ABCCBCC, which is just the expanded version of schedule #2. Because schedule #2 is a single-appearance schedule, we know that it is optimal in terms of program memory usage.

References

  1. Shuvra S. Bhattacharyya, Praveen K. Murthy, and Edward A. Lee, Software Synthesis from Dataflow Graphs, Kluwer Academic Press, Norwell, MA, ISBN 0-7923-9722-3, 1996.
  2. S. S. Bhattacharyya, P. K. Murthy, and E. A. Lee, ``APGAN and RPMC: Complimentary Heuristics for Translating DSP Block Diagrams into Efficient Software Implementations'', Design Automation for Embedded Systems Journal, to appear.


Updated 02/08/00.