Lookup Table Bit-Reversed Sorting
Store pre-computed bit-reversed indices in table
Goals for hand-coded implementation
- Minimize accesses to memory (equal to array length)
- Minimize execution time
Limitations on C6x architecture
- Five conditional registers: A0, A1, A2, B0, and B1
- Delay of 5 cycles for branch and 4 cycles for load/store
- No more than four reads per register per cycle
- One read of register file on another data path: maintain copy of loop counter and array pointer in each data path
Example: Assume transform of length 256
- Array indices fit into a byte (lookup table is 256 bytes)
- Data array is a 256-word array (16 bits per coefficient)