# AMD64 Multiprecision Arithmetic

Eric Bainville - Dec 2006

## Division by a scalar

This function divides the integer represented by (Z,N) by a single word k. Let's start with a naive implementation, using the 64-bit div instruction:

```        ; Loop from msb to lsb
lea     Z, [Z + 8*N]
; rem is the input remainder, in 0..k-1
xor     REM, REM
align   16
.a
lea     Z, [Z - 8]
mov     RDX, REM
mov     RAX, [Z]
div     K
mov     [Z], RAX
mov     REM, RDX
dec     N
jnz     .a
; rem is the output remainder, in 0..k-1
```

The div instruction is very slow, and we get near 74 cycles/word! Efficient alternatives require using mul instead. The trick to do this is to compute once a pseudo inverse k'=264/k, and multiply by k' instead of divide by k. At each step, we multiply ai+264r (the current word combined with the incoming remainder r in 0..k-1) by k' to get a close (and always lower) guess of the effective quotient. Then we multiply it by k to get the next remainder, and make corrections until the remainder is less than k. Actually, we shift all values to maximize the number of exact bits obtained by a single product by the pseudo inverse. The exact quotient and remainder are then determined by additions iteratively.

```        ; Loop from msb to lsb
lea     Z, [Z + 8*N]
; rem is the input remainder, in 0..k-1
xor     REM, REM
; Build the pseudo-inverse of k: 2^(127-)/k
; First determine the shift in rcx to get the max number of bits in kinv
xor     RCX, RCX
mov     RAX, K
mov     RDX, 1
.kinv1:
inc     RCX
ror     RDX, 1
shl     RAX, 1
jnc     .kinv1
dec     RCX
; Here, rcx is a left shift moving the msb of k to bit 63

ror     MASK, cl ; rcx bits at msb

; Then divide 2^(64+cx) by k (rdx already ok)
xor     RAX, RAX
div     K
mov     KINV, RAX
; Here kinv has 64 bits
align   16
.a:
; Get 64 bits of quotient approx, multiplying
; most significant word of (rem*2^64+input)
and     RAX, [Z - 8]
or      RAX, REM
rol     RAX, cl
mul     KINV
rcl     RAX, 1
rcl     RDX, 1
mov     QUOT, RDX

; Multiply by k and subtract to get remainder
; Subtraction must be done on two words
mov     HREM, REM
mov     REM, [Z - 8]
mov     RAX, QUOT
mul     K
sub     REM, RAX
sbb     HREM, RDX
jz      .b  ; high word is 0, goto adjust on single word
; Adjust quotient and remainder on two words
.d:     inc     QUOT
sub     REM, K
sbb     HREM, 0
jnz     .d

; Adjust quotient and remainder on single word
.b:     cmp     REM, K
jb      .c ; rem in 0..k-1, OK
sub     REM, K
inc     QUOT
jmp     .b

; Store result
.c:     mov     [Z - 8], QUOT
lea     Z, [Z - 8]

dec	N
jnz	.a
```

This code runs at 17 cycles/word.