FIR Filter Implementation on the C6x
MVK .S1 0x0001,AMR ; modulo block size 2^2
MVKH .S1 0x4000,AMR ; modulo addr register B6
MVK .S2 2,A2 ; A2 = 2 (four-tap filter)
ZERO .L1 A4 ; initialize accumulators
; initialize pointers A5, B6, and A7
fir LDW .D1 *A5++,A0 ; load a(n) and a(n+1)
LDW .D2 *B6++,B1 ; load x(n) and x(n+1)
MPY .M1X A0,B1,A3 ; A3 = a(n) * x(n)
MPYH .M2X A0,B1,B3 ; B3 = a(n+1) * x(n+1)
ADD .L1 A3,A4,A4 ; yeven(n) += A3
ADD .L2 B3,B4,B4 ; yodd(n) += B3
[A2] SUB .S1 A2,1,A2 ; decrement loop counter
[A2] B .S2 fir ; if A2 != 0, then branch
ADD .L1 A4,B4,A4 ; Y = Yodd + Yeven
Throughput of two multiply-accumulates per cycle