Optimized Vector Dot Product on the C6x
Prologue
- Retime dot product to compute two terms per cycle
- Initialize pointers: A5 for a(n), B6 for x(n), A7 for y(n)
- Move number of times to loop (N) divided by 2 into A2
Inner loop
- Put a(n) and a(n+1) in A0 andx(n) and x(n+1) in A1 (packed data)
- Multiply a(n) x(n) and a(n+1) x(n+1)
- Accumulate even (odd) indexedterms in A4 (B4)
- Decrement loop counter (A2)