Many signal and image processing systems for high-throughput, high-performance applications require concurrent implementations in order to realize desired performance. These implementations should scale with increasing availability of computational elements, avoid deadlock, and produce consistent results (determinism). However, coarse-grained locks yield systems that do not scale well, and insufficient locking may cause non-determinate execution. In standard industry approaches, the programmer must try to resolve the tension among scalability, deadlock and determinism, which becomes increasingly difficult as software complexity grows.

The Kahn Process Network model provides the mathematically provable property of determinism of a program result regardless of the execution order of its processes, including concurrent execution. This model is also natural for describing streams of data samples in a signal processing system, where processes transform streams from one data type to another. However, a Kahn Process Network may require infinite memory to execute.

The first contribution is the dynamic distributed deadlock detection and resolution (D4R) algorithm, which permits execution of Process Networks in bounded memory if it is possible. It detects local deadlocks in a Process Network, determines whether the deadlock can be resolved and, if so, identifies the process that must take action to resolve the deadlock.

The second contribution is the Computational Process Network (CPN) model which is based on the formalisms of Kahn's PN model, but with enhancements that are designed to make it efficiently implementable. These enhancements include multi-token transactions to reduce execution overhead, multi-channel queues for multi-dimensional synchronous data, zero-copy semantics, and consumer and producer firing thresholds for queues. Firing thresholds enable memoryless computation of sliding window algorithms, which are common in signal processing systems. I show that the Computational Process Network model preserves the formal properties of Process Networks, while reducing the operations required to implement sliding window algorithms on continuous streams of data.

The third contribution is a high-throughput software framework that implements the Computational Process Network model using C++ and maps naturally onto distributed targets. This framework uses POSIX threads, and can exploit parallelism in both multi-core and distributed systems. We have developed case studies to demonstrate the performance and utility of the framework. One case study is a three-dimensional circular convolution sonar beamformer and replica correlator, which demonstrates the high throughput and scalability of a real-time signal processing algorithm using the CPN model and framework.

Mail comments about this page to