A null schedule is an empty schedule, i.e. m = 0. A subschedule is any contiguous sequence of actors in a schedule, e.g. (3 AB)C is a subschedule of (2B(3AB)C)A which is a subschedule of A(2B(3AB)C)A(2B). We define two new operators:
For single appearance schedules, we define two new operators:
As we have already seen in previous lectures, the choice of schedule has a dramatic impact on the amount of buffer memory required on the arcs of an SDF graph. We will assume that each token in each arc takes one unit of memory. Several models for buffering exists. One model uses shared buffers. Chain-structured graphs can share one buffer. In the chain-structured graph in Figure 1, we can use the schedule (9A)(12B)(12C)(8D). This particular style of schedule, in which all of the firings of an actor are done before another actor is considered, makes it easier on the shared global buffer memory management because there is only one writer and one reader at a time. With shared buffers, the schedule requires max(9*4, 12*1, 12*2) = 36 tokens if the buffers are shared. The total buffer size would be 9*4 + 12*1 + 12*2 = 72 if the buffers were independent. When the SDF graph is not chain-structured, it becomes more difficult to allocate and manage shared buffers.
---- ---- ---- ---- | A | -----> | B | -----> | C | ----> | D | | | | | | | | | ---- ---- ---- ---- 4 3 1 1 2 3Figure 1: A Chain-Structured SDF Graph.
In chained-structured graphs, the total buffer size for a shared memory model will always be equal to or less than the total buffer size of the independent memory model. For example, for the schedule A(50B(2C))(4D) of the graph in Figure 2, we would require 200 tokens using shared memory and 250 tokens using independent memory. For the shared memory model, the required program memory for buffer management increases to reduce the amount of data memory by reusing memory locations.
Buffer memory management may be statically scheduled. If we use the schedule (A)(50B)(100C)(4D) to simplify shared memory management, then the shared memory model would require 5000 tokens, which is 20 fold increase in the data size. Note that the independent memory model would require 5150 tokens.
Memory management for the shared memory model becomes significantly more complex when delays are on edges or when the graph has feedback or multiple acyclic paths, but the complexity does not change for the independent memory model. The SDF scheduling algorithms discussed hereafter will use an independent buffer memory model. In the independent buffer memory model, each buffer is allocated in contiguous memory and implemented as a circular buffer.
---- ---- ---- ---- | A | -----> | B | -----> | C | ----> | D | | | | | | | | | ---- ---- ---- ---- 50 1 100 50 1 25Figure 2: Another Chain-Structured SDF Graph.
When we attempt to find an SDF schedule with minimal buffer cost, we know with certainty that this problem is not polynomial in complexity. The reason is that an SDF schedule for a graph G = (V, E) can have up to a number of actor appearances equal to the sum of the elements of the repetitions vector. In reality, the complexity of the problem to compute the minimize buffer size for all SDF graphs is NP-complete, because the problem of minimizing buffer sizes for Homogeneous SDF graphs (abbreviated HSDF-MIN-BUFFER) is NP-complete.
Clustering nodes B and C in the homogeneous SDF graph in Figure 3 into a supernode will introduce deadlock because there is no delay on the resulting feedback arc. The SDF Composition Theorem gives the sufficient conditions for clustering SDF graphs without introducing deadlock.
---- ---- ---- ---> | A | -------> | B | -- D --> | C | ----- | | | | | | | | | ---- ---- ---- | | 1 1 1 1 1 1 | | | -------------------------------------------------Figure 3: An example of an SDF graph that deadlocks when clustering BC into a supernode. In the graph, D on the B-C arc represents a delay of one token.
Clustering of SDF graphs is also key in generating efficient schedules for a single SDF graph that may be disconnected.
Blocking factors apply to connected SDF graphs but do not extend to disconnected graphs. We can generalize blocking factors to blocking vectors, in which we have one entry for each connected SDF graph. We can also apply this concept to subgraphs that are clustered hierarchically.
---- e1 ---- | C | -----> | D | | | | | 4 2 4 ---- ----\ e2 ---- ---- e5 ---- 1 1 1 ------> | | | | -----> | | | E | | F | | E | 1 ------> | | | | -----> | | ---- e3 ----/ e4 ---- ---- e6 ---- | A | -----> | B | 2 1 2 | | | | ---- ---- 1 1 (a) The SDF graph (b) F = subgraph({A,B,C,D}) q = [2 2 4 4 1]' q = [2 1] for F, E for A, B, C, D q = [1 1 2 2] for A,B,C,DFigure 4: An example of clustering a subgraph in an SDF graph.
Using Figure 4 as an example, we will cluster the graph in Figure 4a, V = { A, B, C, D, E } and E = { e1, e2, e3, e4 }, to create a new graph (V', E'). The repetition vector is q = [2 2 4 4 1]'. We then cluster the subgraph {A, B, C, D} using a blocking factor of 1 to create a new node F. We form V' by starting with V and removing all of the vertices in the subgraph and adding the new node, so V' = { E, F }. We form E' by starting with E and removing all edges that connect to vertices in the subgraph and adding the new edges, so E' = { e5, e6 }. The repetitions vector for the subgraph ( {A, B, C, D}, {e1, e2} ) is the repetitions vector formed from the elements of the original repetitions vector for { A, B, C, D } in the original graph, i.e., the coprime version of [2 2 4 4] which would be [1 1 2 2]. On one firing of F, two tokens are output on e5 and one token is output on e6. We update the original repetitions vector by replacing the first four elements with their greatest common divisor. Hence, we arrive at the system in Figure 4b.