c - Conditional tests in primality by trial division -
my question conditional test in trial division. there seems debate on conditional test employ. let's @ code rosettacode.
int is_prime(unsigned int n) { unsigned int p; if (!(n & 1) || n < 2 ) return n == 2; /* comparing p*p <= n can overflow */ (p = 3; p <= n/p; p += 2) if (!(n % p)) return 0; return 1; }
wheel factorization or using predetermined list of primes not change essence of question.
there 3 cases can think of conditional test:
- p<=n/p
- p*p<=n
- int cut = sqrt(n); (p = 3; p <= cut; p += 2)
case 1: works n has division every iteration (edit: not need division it's still slower. i'm not sure why. see assembly output below). have found twice slow case 2 large values of n prime (on sandy bridge system).
case 2: signficantly faster case 1 has problem overflows large n , goes infinitive loop. max value can handle
(sqrt(n) + c)^2 = int_max //solve n = int_max -2*c*sqrt(int_max) + c^2 //int_max = 2^32 -> n = 2^32 - c*s^17 + c^2; in our case c = 2
for example uint64_t case 2 go infinite loop x =-1l-58 (x^64-59) prime.
case 3: no division or multiplication has done every iteration , not overflow case 2. it's faster case 2 well. question if sqrt(n) accurate enough.
can explain me why case 2 faster case 1? case 1 not use division though despite it's still lot slower.
here times prime 2^56-5;
case 1 9.0s case 2 4.6s case 3 4.5s
here code used test http://coliru.stacked-crooked.com/a/69497863a97d8953. added functions end of question.
here assembly output form gcc 4.8 -o3 case 1 , case 2. both have 1 division. case 2 has multiplication first guess case 2 slower it's twice fast on both gcc , msvc. don't know why.
case 1:
.l5: testl %edx, %edx je .l8 .l4: addl $2, %ecx xorl %edx, %edx movl %edi, %eax divl %ecx cmpl %ecx, %eax jae .l5
case 2:
.l20: xorl %edx, %edx movl %edi, %eax divl %ecx testl %edx, %edx je .l23 .l19: addl $2, %ecx movl %ecx, %eax imull %ecx, %eax cmpl %eax, %edi jae .l20
here functions i'm using test time:
int is_prime(uint64_t n) { uint64_t p; if (!(n & 1) || n < 2 ) return n == 2; /* comparing p*p <= n can overflow */ (p = 3; p <= n/p; p += 2) if (!(n % p)) return 0; return 1; } int is_prime2(uint64_t n) { uint64_t p; if (!(n & 1) || n < 2 ) return n == 2; /* comparing p*p <= n can overflow */ (p = 3; p*p <= n; p += 2) if (!(n % p)) return 0; return 1; } int is_prime3(uint64_t n) { uint64_t p; if (!(n & 1) || n < 2 ) return n == 2; /* comparing p*p <= n can overflow */ uint32_t cut = sqrt(n); (p = 3; p <= cut; p += 2) if (!(n % p)) return 0; return 1; }
added content after bounty.
aean discovered in case 1 saving quotient remainder fast (or faster) case 2. let's call case 4. following following code twice fast case 1.
int is_prime4(uint64_t n) { uint64_t p, q, r; if (!(n & 1) || n < 2 ) return n == 2; (p = 3, q=n/p, r=n%p; p <= q; p += 2, q = n/p, r=n%p) if (!r) return 0; return 1; }
i'm not sure why helps. in case there no need use case 2 anymore. case 3, versions of sqrt
function in hardware or software perfect squares right it's safe use in general. case 3 case work openmp.
upd: compiler optimization issue, obviously. while mingw used 1 div
instruction in loop body, both gcc on linux , msvc failed reuse quotient previous iteration.
i think best explicitly define quo
, rem
, calculate them in same basic instruction block, show compiler want both quotient , remainder.
int is_prime(uint64_t n) { uint64_t p = 3, quo, rem; if (!(n & 1) || n < 2) return n == 2; quo = n / p; (; p <= quo; p += 2){ quo = n / p; rem = n % p; if (!(rem)) return 0; } return 1; }
i tried code http://coliru.stacked-crooked.com/a/69497863a97d8953 on mingw-w64 compiler, case 1
faster case 2
.
so guess compiling targeted 32-bit architecture , used uint64_t
type. assembly shows doesn't use 64-bit register.
if got right, there reason.
on 32-bit architecture, 64-bit numbers represented in 2 32-bit registers, compiler concatenation works. it's simple 64-bit addition, subtraction , multiplication. modulo , division done small function call named ___umoddi3
, ___udivdi3
in gcc, aullrem
, aulldiv
in msvc.
so need 1 ___umoddi3
, 1 ___udivdi3
each iteration in case 1
, 1 ___udivdi3
, 1 concatenation of 64-bit multiplication in case 2
. that's why case 1
seems twice slower case 2
in test.
what in case 1
:
l5: addl $2, %esi adcl $0, %edi movl %esi, 8(%esp) movl %edi, 12(%esp) movl %ebx, (%esp) movl %ebp, 4(%esp) call ___udivdi3 // call div cmpl %edi, %edx ja l6 jae l21 l6: movl %esi, 8(%esp) movl %edi, 12(%esp) movl %ebx, (%esp) movl %ebp, 4(%esp) call ___umoddi3 // call modulo. orl %eax, %edx jne l5
what in case 2
:
l26: addl $2, %esi adcl $0, %edi movl %esi, %eax movl %edi, %ecx imull %esi, %ecx mull %esi addl %ecx, %ecx addl %ecx, %edx cmpl %edx, %ebx ja l27 jae l41 l27: movl %esi, 8(%esp) movl %edi, 12(%esp) movl %ebp, (%esp) movl %ebx, 4(%esp) call ___umoddi3 // 1 call modulo orl %eax, %edx jne l26
msvc failed reuse result of div
. optimization broken return
. try these code:
__declspec(noinline) int is_prime_a(unsigned int n) { unsigned int p; int ret = -1; if (!(n & 1) || n < 2) return n == 2; /* comparing p*p <= n can overflow */ p = 1; { p += 2; if (p >= n / p) ret = 1; /* let's return latter outside loop. */ if (!(n % p)) ret = 0; } while (ret < 0); return ret; } __declspec(noinline) int is_prime_b(unsigned int n) { unsigned int p; if (!(n & 1) || n < 2) return n == 2; /* comparing p*p <= n can overflow */ p = 1; { p += 2; if (p > n / p) return 1; /* common routine. */ if (!(n % p)) return 0; } while (1); }
the is_prime_b
twice slower is_prime_a
on msvc / icc windows.
Comments
Post a Comment