TI TMS320C6x VLIW DSP Architecture
One instruction cycle every clock cycle
Deep pipeline
- 7-11 stages in C62x: fetch 4, decode 2, execute 1-5
- 7-16 stages in C67x: fetch 4, decode 2, execute 1-10
- If a branch is in the pipeline, interrupts are disabled (the latency of a branch is 5 cycles)
- Avoid branches by using conditional execution
No hardware protection against pipeline hazards
- Compiler and assembler must prevent pipeline hazards
C67x computes floating-point multiply in 4 cycles