2000 IEEE Asilomar Conf. on Signals, Systems, and Computers

VLIW DSP vs. Superscalar Implementation of a Baseline H.263 Video Encoder

Serene Banerjee, Hamid R. Sheikh, Lizy K. John, Brian L. Evans, and Alan C. Bovik

Department of Electrical and Computer Engineering, Engineering Science Building, The University of Texas at Austin, Austin, TX 78712-1084 USA
serene@ece.utexas.edu - ljohn@ece.utexas.edu - bevans@ece.utexas.edu - bovik@ece.utexas.edu

Paper - Updated PowerPoint Talk - Software release

Abstract

A Very Long Instruction Word (VLIW) processor and a superscalar processor can execute multiple instructions simultaneously. A VLIW processor depends on the compiler and programmer to find the parallelism in the instructions, whereas a superscaler processor determines the parallelism at runtime. This paper compares TI TMS320C6700 VLIW digital signal processor (DSP) and SimpleScalar superscalar implementations of a baseline H.263 video encoder in C. With level two C compiler optimization, a one-way issue superscalar processor is 7.5 times faster than the VLIW DSP for the same processor clock speed. The superscalar speedup from one-way to four-way issue is 2.88:1, and from four-way to 256-way issue is 2.43:1. To reduce the execution time on the C6700, we write assembly routines for sum-of-absolute-difference, interpolation, and reconstruction, and place frequently used code and data into on-chip memory. We use TI's discrete cosine transform assembly routines. The hand optimized VLIW DSP implementation is 61x faster than the C version compiled with level two optimization. Most of the improvement was due to the efficient placement of data and programs in memory. The hand optimized VLIW implementation is 14% faster than a 256-way superscalar implementation without hand optimizations.

Last Updated 11/08/04.