The order of computation in signal processing algorithms can be specified loosely in terms of data dependencies between kernels. Many possible of orders of execution of the kernels are possible. Therefore, it is beneficial not to use a model of computation which forces the designer to pick one fixed order in which to perform the operations, such as an in imperative programming language such as C. Rather, one should use a model of computation that can capture the flexibility in a signal processing algorithm by only specifying the dataflow in the algorithm.

These dataflow models of computation come in many varieties. One set of dataflow models can always be scheduled statically, e.g. Synchronous Dataflow, Cyclo-static Dataflow, and Static Dataflow. Using the static schedule, code generators can synthesize software, hardware, or software and hardware from the same specification.

A common practice in industry is to develop kernels and applications in a high-level language and cross-compile the application to an embedded processor. Compilers excel at optimizing local computation and data dependencies, and perform fairly well on the small blocks of code which implement kernels . Compared to manual coding of kernels in assembly language, the overhead required by the best compilers is 0-20% on data size and 50-60% on program size. Compilers are not well-suited at optimizing the global structure of programs. Compilers also have the following additional problems when generating code for embedded programmable processors:

- require stack sizes that are too large for the available memory
- no division operation in hardware
- difficulty in expressing fixed-point operations in the high-level language
- special data input and output architecture
- custom DSP operations

The key to the generation of efficient software is to model the global structure of an application using a static dataflow model in which kernels are connected together. Scheduling algorithms would then determine an efficient ordering of the kernels. For generation of the kernels of software, we could then use a compiler. This approach leverages the best of both types of tools.

Synchronous Dataflow (SDF) is a model first proposed by Edward A. Lee in 1986. In SDF, all computation and data communication is scheduled statically. That is, algorithms expressed as SDF graphs can always be converted into an implementation that is guaranteed to take finite-time to complete all tasks and use finite memory. Thus, an SDF graph can be executed over and over again in a periodic fashion without requiring additional resources as it runs. This type of operation is well-suited to digital signal processing and communications systems which often process an endless supply of data.

An SDF graph consists of nodes and arcs.
Nodes represent operations which are called *actors*.
Arcs represent data values called *tokens* which stored in
first-in first-out (FIFO) queues.
The word token is used because each data values can represent any
data type (e.g. integer or real) or any data structure (e.g. matrix
or image).

SDF graphs obey the following rules:

- An actor is enabled for execution when enough tokens are available at all of the inputs.
- When an actor executes, it always produces and consumes the same fixed amount of tokens.
- The flow of data through the graph may not depend on values of the data.

A ------> B ------> C 20 10 20 10The notation means that when A executes, it produces 20 tokens. When B executes, it consumes 10 tokens and produces 20 tokens. When C executes, it consumes 10 tokens.

The first step in scheduling an SDF for execution is that we must figure
out how many times to execute each actor so that all of the intermediate
tokens that are produced get consumed.
This process is known as *load balancing*.
Load balancing is implemented by an algorithm that is linear in time
and memory in the size of the SDF graph (number of vertices plus
number of edges plus three times the base-two logarithm of the number
of edges).
For example, in the example above, we must

- Fire A 1 time
- Fire B 2 times
- Fire C 4 times

where log is the base-two logarithm (i.e., the number of bits).

The next step is to schedule the firings required by load balancing. Several scheduling algorithms have been developed including

- list scheduling - quadratic algorithm
- looped scheduling - cubic algorithm

Possible schedules for the above SDF graph is ABCCBCC for the list scheduler and A (2 B(2 C)) for the looped scheduler. The generated code to execute the schedule A (2 B(2 C)) would be the following:

code block for A for (i = 0; i < 2; i++) { code block for B for (j = 0; j < 2; j++) { code block for C } }The schedule A (2 B(2 C)) is an example of a

The scheduling algorithms could actually return several different valid schedules, such as those shown below.

1. | List Scheduler | ABCBCCC | 50 |

2. | Looped Scheduler | A (2 B(2 C)) | 40 |

3. | Looped Scheduler | A(2 B)(4 C) | 60 |

4. | Looped Scheduler | A(2 BC)(2 C) | 50 |

- Shuvra S. Bhattacharyya, Praveen K. Murthy, and Edward A. Lee,
*Software Synthesis from Dataflow Graphs*, Kluwer Academic Press, Norwell, MA, ISBN 0-7923-9722-3, 1996. - S. S. Bhattacharyya, P. K. Murthy, and E. A. Lee,
``APGAN
and RPMC: Complimentary Heuristics for Translating DSP Block Diagrams
into Efficient Software Implementations'',
*Design Automation for Embedded Systems Journal*, to appear.

Updated 07/31/99.