Proc.
IEEE
Int. Conf. on Computer Design,
Sep., 2000, pp. 163-172.
Evaluating Signal Processing and Multimedia Applications
on SIMD, VLIW, and Superscalar Architectures
Deependra Talla,
Lizy K. John,
Viktor Lapinskii,
and
Brian L. Evans
Department of Electrical and Computer Engineering,
Engineering Science Building,
The University of Texas at Austin,
Austin, TX 78712-1084 USA
deepu@ece.utexas.edu -
ljohn@ece.utexas.edu -
lapinski@ece.utexas.edu -
bevans@ece.utexas.edu -
Abstract
This paper aims to provide a quantitative understanding
of the performance of DSP and multimedia applications
on very long instruction word (VLIW), single instruction
multiple data (SIMD), and superscalar processors. We
evaluate the performance of the VLIW paradigm using
Texas Instruments Inc.'s TMS320C62xx processor and
the SIMD paradigm using Intel's Pentium II processor
(with MMX) on a set of DSP and media benchmarks.
Tradeoffs in superscalar performance are evaluated with
a combination of measurements on Pentium II and simulation
experiments on the SimpleScalar simulator. Our
benchmark suite includes kernels (filtering, autocorrelation,
and dot product) and applications (audio effects,
G.711 speech coding, and speech compression). Optimized
assembly libraries and compiler intrinsics were
used to create the SIMD and VLIW code. We used the
hardware performance counters on the Pentium II and the
stand-alone simulator for the C62xx to obtain the execution
cycle counts. In comparison to non-SIMD Pentium II
performance, the SIMD version exhibits a speedup ranging
from 1.0 to 5.5 while the speedup of the VLIW version
ranges from 0.63 to 9.0. The benchmarks are seen to
contain large amounts of available parallelism, however,
most of it is inter-iteration parallelism. Out-of-order execution
and branch prediction are observed to be extremely
important to exploit such parallelism in media
applications.
A draft of the submitted paper is available in
PDF format.
Last Updated 11/24/00.