Proc. IEEE Int. Conf. on Computer Design, Sep., 2000, pp. 163-172.

Evaluating Signal Processing and Multimedia Applications on SIMD, VLIW, and Superscalar Architectures

Deependra Talla, Lizy K. John, Viktor Lapinskii, and Brian L. Evans

Department of Electrical and Computer Engineering, Engineering Science Building, The University of Texas at Austin, Austin, TX 78712-1084 USA
deepu@ece.utexas.edu - ljohn@ece.utexas.edu - lapinski@ece.utexas.edu - bevans@ece.utexas.edu -

Abstract

This paper aims to provide a quantitative understanding of the performance of DSP and multimedia applications on very long instruction word (VLIW), single instruction multiple data (SIMD), and superscalar processors. We evaluate the performance of the VLIW paradigm using Texas Instruments Inc.'s TMS320C62xx processor and the SIMD paradigm using Intel's Pentium II processor (with MMX) on a set of DSP and media benchmarks. Tradeoffs in superscalar performance are evaluated with a combination of measurements on Pentium II and simulation experiments on the SimpleScalar simulator. Our benchmark suite includes kernels (filtering, autocorrelation, and dot product) and applications (audio effects, G.711 speech coding, and speech compression). Optimized assembly libraries and compiler intrinsics were used to create the SIMD and VLIW code. We used the hardware performance counters on the Pentium II and the stand-alone simulator for the C62xx to obtain the execution cycle counts. In comparison to non-SIMD Pentium II performance, the SIMD version exhibits a speedup ranging from 1.0 to 5.5 while the speedup of the VLIW version ranges from 0.63 to 9.0. The benchmarks are seen to contain large amounts of available parallelism, however, most of it is inter-iteration parallelism. Out-of-order execution and branch prediction are observed to be extremely important to exploit such parallelism in media applications.

A draft of the submitted paper is available in PDF format.


Last Updated 11/24/00.