Conventional DSPs (those before 1996) had a very simple architecture and a short pipeline (3-4 stages). You could program them by hand and still get efficient code. The first generation of fixed-point processors (in the 1980s) were too primitive to give good compiler support. Compilers were so bad that they sometimes generate code that either (1) was 40-50 times worse than a programmer could do by hand or (2) could not fit on the chip. The first generation fixed-point DSPs had very little on-chip memory (2-64 kwords of on-chip data memory and the same amount of program memory). Compile a simple C program and see how big the executable is.
Before 1990, C compilers for DSPs were in very low demand, so very little progress was made on improving their performance. By 1997, however, C compilers were actually pretty good for some fixed-point DSPs. We benchmarked the freely available Motorola 56000 C compiler. The compiler generated code that was about twice as slow and efficient as a very good programmer in assembly language, as shown in slide #10 in the talk
http://www.ece.utexas.edu/~bevans/talks/benchmarking97/sld001.htm
Before 1992, many embedded DSPs were used in products without an operating system. The assembly code was burned into ROM and loaded into the DSP on startup. By 1992, as systems became more complex, products such as cell phones and printers had operating systems on them. (A common real-time OS is VxWorks by Wind River.) It is easier to interface with the operating system from C than assembly language. So, a good C compiler is important in complex embedded systems.
There is a benchmarking methodology for C compilers for DSPs called DSPStone in 1995. Since then, the author (Dr. Vojin Zivojnovic) has moved to AXYS Design Automation, Inc., in Irvine, CA. He can be reached at vz@axys.de.
The long answer is an entire course by itself. A good reference is the "dragon" book:
http://www.ece.utexas.edu/~bevans/courses/ee382c/lectures/18_multiproc_sdf/intro_code_opt.html
DSP compilers have been poor because it is very difficult for compilers to recognize programming constructs that can be easily mapped into the special-purpose instructions on a DSP, such as modulo indexing (for circular buffers for FIR/IIR filters) and bit-reversed addressing (for FFTs) and combined multiply-accumulate operations.
The biggest problem is that C assumes a Von Neuman (sequential) architecture that works only on char, int, float, and double data. Fixed-point DSPs perform most of its work on fixed-point real data. And fixed-point DSPs have a Hardvard architecture. Oops!
Mail comments about this page to bevans@ece.utexas.edu.