2000 IEEE Asilomar
Conf. on Signals, Systems, and Computers
VLIW DSP vs. Superscalar Implementation of a Baseline
H.263 Video Encoder
Hamid R. Sheikh,
Lizy K. John,
Brian L. Evans, and
Alan C. Bovik
Department of Electrical and Computer Engineering,
Engineering Science Building,
The University of Texas at Austin,
Austin, TX 78712-1084 USA
Updated PowerPoint Talk -
A Very Long Instruction Word (VLIW) processor and a superscalar processor
can execute multiple instructions simultaneously.
A VLIW processor depends on the compiler and programmer to find the
parallelism in the instructions, whereas a superscaler processor determines
the parallelism at runtime.
This paper compares TI TMS320C6700 VLIW digital signal processor (DSP)
and SimpleScalar superscalar implementations of a baseline H.263 video
encoder in C.
With level two C compiler optimization, a one-way issue superscalar
processor is 7.5 times faster than the VLIW DSP for the same processor
The superscalar speedup from one-way to four-way issue is 2.88:1,
and from four-way to 256-way issue is 2.43:1.
To reduce the execution time on the C6700, we write assembly routines
for sum-of-absolute-difference, interpolation, and reconstruction,
and place frequently used code and data into on-chip memory.
We use TI's discrete cosine transform assembly routines.
The hand optimized VLIW DSP implementation is 61x faster than the
C version compiled with level two optimization.
Most of the improvement was due to the efficient placement of
data and programs in memory.
The hand optimized VLIW implementation is 14% faster than a 256-way
superscalar implementation without hand optimizations.
Last Updated 11/08/04.