Intel64 Multi-Precision Arithmetic
Eric Bainville - Dec 2006Memory Zero
This function sets to 0 all words of a vector (Z,n).
The Core 2 Duo architecture can write one 128-bit word per clock cycle (the Athlon 64 could write two 64-bit words per cycle). Doing this requires using a 128-bit XMM register, as in the following code:
shr N, 1
pxor xmm0, xmm0
align 16
.a:
movdqa [Z], xmm0
lea Z, [Z + 16]
dec N
jnz .a
The lea and dec are both computed in the same cycle as the movdqa, so the loop runs at 1 cycle/iteration, leading 0.50 cycle/word.
![]() Intel64 Multi-Precision : Introduction | ![]() Top of Page | ![]() Intel64 Multi-Precision : Unary OP |




