c - Conditional tests in primality by trial division -


my question conditional test in trial division. there seems debate on conditional test employ. let's @ code rosettacode.

int is_prime(unsigned int n) {     unsigned int p;     if (!(n & 1) || n < 2 ) return n == 2;      /* comparing p*p <= n can overflow */     (p = 3; p <= n/p; p += 2)         if (!(n % p)) return 0;     return 1; } 

wheel factorization or using predetermined list of primes not change essence of question.

there 3 cases can think of conditional test:

  1. p<=n/p
  2. p*p<=n
  3. int cut = sqrt(n); (p = 3; p <= cut; p += 2)

case 1: works n has division every iteration (edit: not need division it's still slower. i'm not sure why. see assembly output below). have found twice slow case 2 large values of n prime (on sandy bridge system).

case 2: signficantly faster case 1 has problem overflows large n , goes infinitive loop. max value can handle

(sqrt(n) + c)^2 = int_max  //solve n = int_max -2*c*sqrt(int_max) + c^2 //int_max = 2^32 -> n = 2^32 - c*s^17 + c^2; in our case c = 2 

for example uint64_t case 2 go infinite loop x =-1l-58 (x^64-59) prime.

case 3: no division or multiplication has done every iteration , not overflow case 2. it's faster case 2 well. question if sqrt(n) accurate enough.

can explain me why case 2 faster case 1? case 1 not use division though despite it's still lot slower.

here times prime 2^56-5;

case 1 9.0s case 2 4.6s case 3 4.5s 

here code used test http://coliru.stacked-crooked.com/a/69497863a97d8953. added functions end of question.

here assembly output form gcc 4.8 -o3 case 1 , case 2. both have 1 division. case 2 has multiplication first guess case 2 slower it's twice fast on both gcc , msvc. don't know why.

case 1:

.l5:   testl %edx, %edx   je  .l8 .l4:   addl  $2, %ecx   xorl  %edx, %edx   movl  %edi, %eax   divl  %ecx   cmpl  %ecx, %eax   jae .l5 

case 2:

.l20:   xorl  %edx, %edx   movl  %edi, %eax   divl  %ecx   testl %edx, %edx   je  .l23 .l19:   addl  $2, %ecx   movl  %ecx, %eax   imull %ecx, %eax   cmpl  %eax, %edi   jae .l20 

here functions i'm using test time:

int is_prime(uint64_t n) {     uint64_t p;     if (!(n & 1) || n < 2 ) return n == 2;      /* comparing p*p <= n can overflow */     (p = 3; p <= n/p; p += 2)         if (!(n % p)) return 0;     return 1; }  int is_prime2(uint64_t n) {     uint64_t p;     if (!(n & 1) || n < 2 ) return n == 2;      /* comparing p*p <= n can overflow */     (p = 3; p*p <= n; p += 2)         if (!(n % p)) return 0;     return 1; }  int is_prime3(uint64_t n) {     uint64_t p;     if (!(n & 1) || n < 2 ) return n == 2;      /* comparing p*p <= n can overflow */     uint32_t cut = sqrt(n);     (p = 3; p <= cut; p += 2)         if (!(n % p)) return 0;     return 1; } 

added content after bounty.

aean discovered in case 1 saving quotient remainder fast (or faster) case 2. let's call case 4. following following code twice fast case 1.

int is_prime4(uint64_t n) {     uint64_t p, q, r;     if (!(n & 1) || n < 2 ) return n == 2;      (p = 3, q=n/p, r=n%p; p <= q; p += 2, q = n/p, r=n%p)         if (!r) return 0;     return 1; } 

i'm not sure why helps. in case there no need use case 2 anymore. case 3, versions of sqrt function in hardware or software perfect squares right it's safe use in general. case 3 case work openmp.

upd: compiler optimization issue, obviously. while mingw used 1 div instruction in loop body, both gcc on linux , msvc failed reuse quotient previous iteration.

i think best explicitly define quo , rem , calculate them in same basic instruction block, show compiler want both quotient , remainder.

int is_prime(uint64_t n) {     uint64_t p = 3, quo, rem;     if (!(n & 1) || n < 2) return n == 2;      quo = n / p;     (; p <= quo; p += 2){         quo = n / p; rem = n % p;         if (!(rem)) return 0;     }     return 1; } 

i tried code http://coliru.stacked-crooked.com/a/69497863a97d8953 on mingw-w64 compiler, case 1 faster case 2.

enter image description here

so guess compiling targeted 32-bit architecture , used uint64_t type. assembly shows doesn't use 64-bit register.

if got right, there reason.

on 32-bit architecture, 64-bit numbers represented in 2 32-bit registers, compiler concatenation works. it's simple 64-bit addition, subtraction , multiplication. modulo , division done small function call named ___umoddi3 , ___udivdi3 in gcc, aullrem , aulldiv in msvc.

so need 1 ___umoddi3 , 1 ___udivdi3 each iteration in case 1, 1 ___udivdi3 , 1 concatenation of 64-bit multiplication in case 2. that's why case 1 seems twice slower case 2 in test.

what in case 1:

l5:     addl    $2, %esi     adcl    $0, %edi     movl    %esi, 8(%esp)     movl    %edi, 12(%esp)     movl    %ebx, (%esp)     movl    %ebp, 4(%esp)     call    ___udivdi3         // call div     cmpl    %edi, %edx     ja  l6     jae l21 l6:     movl    %esi, 8(%esp)     movl    %edi, 12(%esp)     movl    %ebx, (%esp)     movl    %ebp, 4(%esp)     call    ___umoddi3        // call modulo.     orl %eax, %edx     jne l5 

what in case 2:

l26:     addl    $2, %esi     adcl    $0, %edi     movl    %esi, %eax     movl    %edi, %ecx     imull   %esi, %ecx     mull    %esi     addl    %ecx, %ecx     addl    %ecx, %edx     cmpl    %edx, %ebx     ja  l27     jae l41 l27:     movl    %esi, 8(%esp)     movl    %edi, 12(%esp)     movl    %ebp, (%esp)     movl    %ebx, 4(%esp)     call    ___umoddi3         // 1 call modulo     orl %eax, %edx     jne l26 

msvc failed reuse result of div. optimization broken return. try these code:

__declspec(noinline) int is_prime_a(unsigned int n) {     unsigned int p;     int ret = -1;     if (!(n & 1) || n < 2) return n == 2;      /* comparing p*p <= n can overflow */     p = 1;     {         p += 2;         if (p >= n / p) ret = 1; /* let's return latter outside loop. */         if (!(n % p)) ret = 0;     } while (ret < 0);     return ret; }  __declspec(noinline) int is_prime_b(unsigned int n) {     unsigned int p;     if (!(n & 1) || n < 2) return n == 2;      /* comparing p*p <= n can overflow */     p = 1;     {         p += 2;         if (p > n / p) return 1; /* common routine. */         if (!(n % p)) return 0;     } while (1); } 

the is_prime_b twice slower is_prime_a on msvc / icc windows.


Comments

Popular posts from this blog

javascript - Count length of each class -

What design pattern is this code in Javascript? -

hadoop - Restrict secondarynamenode to be installed and run on any other node in the cluster -