Better Lookup Table Bit-Reversed Sorting
Improve execution time by 53%
- For a 256-length data array, only 120 swaps occur
- Use 2 120-element arrays: index and bit-reversed index
; A5 and B5 120-byte index and bit-reversed index lut MVK .S1 120,A2 ; loop counter|| MV .S2 A3,B3 ; A3/B3 point to array data
sort .trip 120 ; tell assembler loop 120X LDBU .D1 *A5++,A4 ; A4=index|| LDBU .D2 *B5++,B4 ; B4=bit-reversed index
MV .S1 B4,A7 ; swap indices to swap vals|| MV .S2 A4,B7|| LDW .D1 *A3[A4],A6|| LDW .D2 *B3[B4],B6
[A2] SUB .S1 A2,1,A2 ; decrement loop counter
||[A2] B .S2 sort ; if A2 != 0, then branch
|| STW .D1 A6,*A3[A7]|| STW .D2 B6,*B3[B7]
Throughputof 1.4 cycles/coefficient