More Efficient Ordered Dithering on the C6x
MVK .S1 0x00ff,A8 ; white pixel #1|| MVK .S2 0x0001,AMR ; modulo block size 2^2
SHL .S1 A8,8,A9 ; white pixel #2
|| MVKH .S2 0x4000,AMR ; modulo addr reg. B6 SHL .S1 A8,16,A10 ; white pixel #3|| SHL .S2 A8,24,B9 ; white pixel #4; initialize; A2 number of pixels divided by 4; A6 pointer to pixels (will be overwritten); B6 pointer to thresholdsdith2: LDW .D1 *A6,A4 ; read 4 pixels (bytes)
LDW .D2 *B6++,B4 ; read 4 thresholds EXTU .S1 A4,24,24,A12 ; extract pixel #2 EXTU .S2 B4,24,24,B12 ; extract threshold #2 ZERO .L1 A5 ; store output in A5 CMPLTU .L2 A12,B12,B0 ; B0 = (A12 < B12)
Throughput of 1.25 pixels