## Other

drummers-lowrise

Message boards : Generalized Cullen/Woodall prime search : How to speed up (some) GCW units by 10%+

Author Message
serge

Joined: 21 Jun 12
Posts: 110
ID: 144858
Credit: 193,629,962
RAC: 5

Message 107873 - Posted: 13 May 2017 | 2:09:26 UTC

Nothing of what I am about to write is new. Or difficult.
And it has already been discussed multiple times.

Observe that some the chosen bases for this particular sub-project happen to be squares.
25 = 5^2, 49 = 7^2, 121 = 11^2. Some people even know that this was a deliberate choice. I've been waiting for a workunit to arrive for one of these bases for a while, and now I have one.

The candidate is 754806*121^754806+1 which can also be easily regrouped as 754806*11^1509612+1.

Let's compare:

/home/serge/NumTheory/GCW> llr -d d.npg
Base prime factor(s) taken : 11
Starting N-1 prime test of 754806*121^754806+1
Using zero-padded AVX FFT length 720K, Pass1=320, Pass2=2304, a = 3
754806*121^754806+1, bit: 70000 / 5222413 [1.34%]. Time per bit: 5.654 ms.

/home/serge/NumTheory/GCW/2> llr -d d2.npg
Base prime factor(s) taken : 11
Starting N-1 prime test of 754806*11^1509612+1
Using zero-padded AVX FFT length 640K, Pass1=640, Pass2=1K, a = 3
754806*11^1509612+1, bit: 40000 / 5222416 [0.76%]. Time per bit: 4.698 ms.

That's 20% faster. (Results are similar for AVX2.)

Why is the server sending this workunit task as 754806*121^754806+1 ?
Isn't it trivial, server-side, to fetch the candidate from the database and if/when b=x^2, send it to the client not as
100000000000000:P:1:b:1
n n

but as
100000000000000:P:1:x:1
n 2n

In this particular case:
100000000000000:P:1:11:1
754806 1509612

Too hard to implement?

Michael Goetz
Volunteer moderator
Project scientist

Joined: 21 Jan 10
Posts: 11085
ID: 53948
Credit: 141,007,729
RAC: 127,012

Message 107874 - Posted: 13 May 2017 | 2:28:00 UTC

Thanks, Serge.

Do you know why LLR chooses different FFT sizes for the same number? I was not aware it would do that.
____________

My lucky number is 75898^524288+1

serge

Joined: 21 Jun 12
Posts: 110
ID: 144858
Credit: 193,629,962
RAC: 5

Message 107876 - Posted: 13 May 2017 | 4:32:55 UTC - in response to Message 107874.

For LLR it is not the number that matters, but the (k,b,n,c) form, and b is taken by it verbatim, as given.

I am not ready to go into a very deep explanation, but I will try to make an approximation to an explanation (not meant to be taken for that this is exact). With b=11, it is possible to form an array of length 640K where each element is a quasi-digit ("limb") in an unusual representation: some digit's weights are perhaps 11^6 and some digit's weights are 11^7, or something like that. (Off the top of my head, what I remember is that each limb on average is limited to keeping ~30 bits of information, or something like that.) Only powers of b can be used as limb weights.

In contrast if the number is entered with b=121, the program can only work with limbs of, say, 121^3 and 121^2. (It has less opportunities to pack, the larger the b.) For that reason it ponders the array of length 640K and thinks, "nah, some elements will be too large; gotta go for next FFT size", and so it does.

Long story short, using the simplest possible (k,b,n,c) (with b as low as possible) will lend more possibilities for more dense FFT arrays. If not only b=121, but also k is divisible by 11, the FFT size may be even smaller if (k,121,n,c) is transformed into (k/11,11,2*n+1,c).

As an aside, yes, it would be nice if LLR did it all itself, but as the timing test (shown earlier) demonstrates - it doesn't. But here's where we can help LLR and do transformation externally, server-side. The transformation logic is quite straightforward.[/i]

Michael Goetz
Volunteer moderator
Project scientist

Joined: 21 Jan 10
Posts: 11085
ID: 53948
Credit: 141,007,729
RAC: 127,012

Message 107877 - Posted: 13 May 2017 | 4:53:35 UTC - in response to Message 107876.

Got it, thanks for the explanation. That makes sense.
____________

My lucky number is 75898^524288+1

mackerel
Volunteer tester

Joined: 2 Oct 08
Posts: 1952
ID: 29980
Credit: 258,241,957
RAC: 75,164

Message 107879 - Posted: 13 May 2017 | 7:50:09 UTC

Maybe propose this optimisation as a feature request to go in future LLR if not already done?

JeppeSN

Joined: 5 Apr 14
Posts: 701
ID: 306875
Credit: 8,975,207
RAC: 8,495

Message 107882 - Posted: 13 May 2017 | 9:00:33 UTC - in response to Message 107873.

Observe that some the chosen bases for this particular sub-project happen to be squares.
25 = 5^2, 49 = 7^2, 121 = 11^2. Some people even know that this was a deliberate choice.

25, 49, 121; that is all the prime squares in the range 13 ≤ b ≤ 121. I wonder if there is some easy reason why n*b^n + 1 is more often composite when b is a perfect square. Does sieving remove a larger fraction, so that the expected occurrence of primes is lower for these b values?

"Deliberate choice"? I though these b values were chosen simply because they were the smallest b for which no known n with n>b-2 gives a prime n*b^n + 1?

/JeppeSN

Addition: I checked on Steven Harvey's page on GC, and for all prime square b among 121, 169, 289, 361, 529, 841, 961, 1369, 1681, 1849, 2209, 2809, 3481, 3721, 4489, 5041, 5329, 6241, 6889, 7921, 9409, the only time an n is known that satisfies n>b-2 is for b=5041 where:

8398*5041^8398 + 1 = 8398*71^16796 + 1

is a prime.

Michael Goetz
Volunteer moderator
Project scientist

Joined: 21 Jan 10
Posts: 11085
ID: 53948
Credit: 141,007,729
RAC: 127,012

Message 107885 - Posted: 13 May 2017 | 12:29:07 UTC

Serge, thanks for the optimization tip. It's appreciated!

By the way, we (and by "we", I mean "Jim") have recomputed many of the FFT sizes for the candidates. You don't always get a reduced FFT size when you use the square root of b, but you do for the vast majority of candidates. For the other bases, dividing k by b one or more times hasn't reduced the FFT size once yet.
____________

My lucky number is 75898^524288+1

serge

Joined: 21 Jun 12
Posts: 110
ID: 144858
Credit: 193,629,962
RAC: 5

Message 107890 - Posted: 13 May 2017 | 16:15:55 UTC - in response to Message 107885.

It seems that in your client-server set up the ideal place to put the reformatter would be the primegrid_llr_wrapper. It would keep the initial task parameters, reformat for (c)llr, get the result back from (c)llr, report back to server as initially requested. Then the database, the server and the accounting code would be unchanged.

primegrid_llr_wrapper for now can do only:
▪ square simplification,
▪ k simplification

Later, it can be extended to recognize b being any power. See here -

Curiously, these numbers may be hard to recognize when written in standard form (emphasis mine).

For example, they may be like
18740*3^168662-1
which could be written
168660*3^168660-1.

More difficult to spot are those like the following:

9750*7^29250-1 = 9750*7^(3*9750)-1 = 9750*343^9750-1
8511*2^374486-1 = (8511*2^2)*2^(11*8511)*4-1 = 34044*2048^34044-1.

This is in fact how the GCWs for 25, 49, 121 will end up showing in UTM lists. (And this is how GW for b=4 looks, indeed.)

Michael Goetz
Volunteer moderator
Project scientist

Joined: 21 Jan 10
Posts: 11085
ID: 53948
Credit: 141,007,729
RAC: 127,012

Message 107891 - Posted: 13 May 2017 | 17:06:52 UTC - in response to Message 107890.

It seems that in your client-server set up the ideal place to put the reformatter would be the primegrid_llr_wrapper.

That's not the prefered place for the change, but we're still evaluating options.
____________

My lucky number is 75898^524288+1

axn
Volunteer developer

Joined: 29 Dec 07
Posts: 278
ID: 16874
Credit: 25,359,431
RAC: 8,401

Message 107907 - Posted: 14 May 2017 | 16:11:42 UTC

I could've sworn that LLR itself does the normalizing of the bases (perhaps only for power of 2?). This feature needs to be in LLR itself, tbh.

1. Normalize b if it is a power.
2. Normalize k if b divides k.

I would guess that it is a trivial change in LLR (except for printing output -- where it is arguably important to use the unnormalized values).

serge

Joined: 21 Jun 12
Posts: 110
ID: 144858
Credit: 193,629,962
RAC: 5

Message 107909 - Posted: 14 May 2017 | 17:11:18 UTC - in response to Message 107907.

Maybe LLR's philosophy is "the client is always right!"
I.e.: If the input file calls for a test of a specific FFT or a "specific arrangement of bits", then that's what it will run (even if slower, because "this is the test that was ordered").

But it indeed doesn't follow this rule for powers of 2.

-bash-4.2\$ llr -d -q"27*1024^10007+1"
Starting Proth prime test of 27*2^100070+1
Using all-complex FMA3 FFT length 10K, Pass1=128, Pass2=80, a = 11
27*2^100070+1 is not prime. Proth RES64: C14E6737D2E78E5E Time : 5.261 sec.

-bash-4.2\$ llr -d -q"28*729^10007+1"
Base prime factor(s) taken : 3
Starting N-1 prime test of 28*729^10007+1
Using all-complex FMA3 FFT length 10K, Pass1=128, Pass2=80, a = 3
28*729^10007+1 is not prime. RES64: E83080E955E9B281. OLD64: B89182BC01BD1780 Time : 4.888 sec.

-bash-4.2\$ llr -d -q"28*10000^10007+1"
Base factorized as : 2^4*5^4
Base prime factor(s) taken : 5
Starting N-1 prime test of 28*10000^10007+1
Using all-complex FMA3 FFT length 18K, Pass1=384, Pass2=48, a = 3
28*10000^10007+1 is not prime. RES64: 59CCA66A39ED54C4. OLD64: 0D65F33EADC7FE48 Time : 13.645 sec.

(and of course it is fully equipped to normalize the base, as a side effect of factoring the base for the purposes of the N-1 mechanics.)

PFGW does what it is ordered by the input file, too.

JeppeSN

Joined: 5 Apr 14
Posts: 701
ID: 306875
Credit: 8,975,207
RAC: 8,495

Message 107910 - Posted: 14 May 2017 | 17:53:52 UTC - in response to Message 107909.

And GeneFer seems to do different things, not normalizing or de-normalizing:

.\genefer_windows64.exe -q "6^8388608+1"
.\genefer_windows64.exe -q "36^4194304+1"
.\genefer_windows64.exe -q "1296^2097152+1"
.\genefer_windows64.exe -q "1679616^1048576+1"

Even though the first form (where b=6 is not a square) is "canonical" and the one you would expect to see on Top 5000, it is not clear which form would actually be fastest.

Testing 6^8388608+1... 21684224 steps to go (1849:28:44 remaining)

Testing 36^4194304+1... 21684224 steps to go (747:08:06 remaining)

Testing 1296^2097152+1... 21684224 steps to go (367:41:08 remaining)

Testing 1679616^1048576+1... 21684224 steps to go (160:41:45 remaining)
Estimated time remaining for 1679616^1048576+1 is 1716:50:53

(the last one is switches to x87 (80-bit) transform).

/JeppeSN

Michael Goetz
Volunteer moderator
Project scientist

Joined: 21 Jan 10
Posts: 11085
ID: 53948
Credit: 141,007,729
RAC: 127,012

Message 107912 - Posted: 14 May 2017 | 19:20:37 UTC

Genefer and LLR/PFGW are completely different. Genefer doesn't normalize anything (although I suppose it could.)

With regards to LLR doing some normalizations but not others, does anyone know if that's LLR's code or gwnum's code?
____________

My lucky number is 75898^524288+1

composite
Volunteer tester

Joined: 16 Feb 10
Posts: 572
ID: 55391
Credit: 447,344,855
RAC: 196,213

Message 107936 - Posted: 16 May 2017 | 5:09:50 UTC

If you let LLR do the normalization, it will be impossible to rerun serge's benchmark comparison. But once is enough to prove a point.

KEP

Joined: 10 Aug 05
Posts: 238
ID: 110
Credit: 1,918,174
RAC: 0

Message 107998 - Posted: 18 May 2017 | 15:39:39 UTC - in response to Message 107907.

I could've sworn that LLR itself does the normalizing of the bases (perhaps only for power of 2?). This feature needs to be in LLR itself, tbh.

This appears also not to be the case. Currently, I've seen reduction in overall testing times, ranging from 9% to 33%, dependant on FFT length and weather I'm on my Sandy Bridge or Haswell. So it appears, that LLR is also not doing a normalizing for bases that are powers of 2, but in fact still tests k*16^n+/-1 as base 16 number and not k*2^(n*4)+/-1 - even though the screen shows that k*2^(n*4)+/-1 is being tested.

To sum up, at least on my system, there can be up to 33% reduction of testing time per k*b^n+/-1 test, by normalizing the test, if it is a power of a base, to smallest possible base.

Just my 2 cents, take care :)

Regards

KEP

Michael Goetz
Volunteer moderator
Project scientist

Joined: 21 Jan 10
Posts: 11085
ID: 53948
Credit: 141,007,729
RAC: 127,012

Message 109348 - Posted: 14 Aug 2017 | 13:29:21 UTC - in response to Message 107873.

The candidate is 754806*121^754806+1 which can also be easily regrouped as 754806*11^1509612+1.

Let's compare:
/home/serge/NumTheory/GCW> llr -d d.npg
Base prime factor(s) taken : 11
Starting N-1 prime test of 754806*121^754806+1
Using zero-padded AVX FFT length 720K, Pass1=320, Pass2=2304, a = 3
754806*121^754806+1, bit: 70000 / 5222413 [1.34%]. Time per bit: 5.654 ms.

/home/serge/NumTheory/GCW/2> llr -d d2.npg
Base prime factor(s) taken : 11
Starting N-1 prime test of 754806*11^1509612+1
Using zero-padded AVX FFT length 640K, Pass1=640, Pass2=1K, a = 3
754806*11^1509612+1, bit: 40000 / 5222416 [0.76%]. Time per bit: 4.698 ms.

That's 20% faster. (Results are similar for AVX2.)

Why is the server sending this workunit task as 754806*121^754806+1 ?
Isn't it trivial[?]

You would be surprised at how utterly non-trivial it turned out to be. But it is done. Thanks for pushing us along in the right direction.
____________

My lucky number is 75898^524288+1

Honza
Volunteer moderator
Volunteer tester
Project scientist

Joined: 15 Aug 05
Posts: 1697
ID: 352
Credit: 2,086,144,169
RAC: 847,040

Message 109353 - Posted: 14 Aug 2017 | 20:08:30 UTC

So, 3 of 14 bases will be 20% faster?
About 4% overall speed-up for GCW LLR?
____________
My stats
Badge score: 1*1 + 5*2 + 8*10 + 9*5 + 12*3 = 172

Michael Goetz
Volunteer moderator
Project scientist

Joined: 21 Jan 10
Posts: 11085
ID: 53948
Credit: 141,007,729
RAC: 127,012

Message 109355 - Posted: 14 Aug 2017 | 20:52:19 UTC - in response to Message 109353.

So, 3 of 14 bases will be 20% faster?
About 4% overall speed-up for GCW LLR?

Something like that, yes.
____________

My lucky number is 75898^524288+1

axn
Volunteer developer

Joined: 29 Dec 07
Posts: 278
ID: 16874
Credit: 25,359,431
RAC: 8,401

Message 109381 - Posted: 16 Aug 2017 | 7:53:38 UTC - in response to Message 109348.

You would be surprised at how utterly non-trivial it turned out to be. But it is done. Thanks for pushing us along in the right direction.

Is the base the only thing normalized or do you normalize k as well (the latter is applicable for all the bases, not just the square ones)?

Michael Goetz
Volunteer moderator
Project scientist

Joined: 21 Jan 10
Posts: 11085
ID: 53948
Credit: 141,007,729
RAC: 127,012

Message 109382 - Posted: 16 Aug 2017 | 11:02:29 UTC - in response to Message 109381.

You would be surprised at how utterly non-trivial it turned out to be. But it is done. Thanks for pushing us along in the right direction.

Is the base the only thing normalized or do you normalize k as well (the latter is applicable for all the bases, not just the square ones)?

Just the base. In our tests there was no advantage to normalizing k.
____________

My lucky number is 75898^524288+1

axn
Volunteer developer

Joined: 29 Dec 07
Posts: 278
ID: 16874
Credit: 25,359,431
RAC: 8,401

Message 109387 - Posted: 16 Aug 2017 | 14:39:55 UTC - in response to Message 109382.

Just the base. In our tests there was no advantage to normalizing k.

Hmmm... That was ... unexpected! Can you give me the set of (n,b) numbers used to test this? I am assuming that you used LLR's setup feature to get the FFTs?

JimB
Volunteer moderator
Project developer

Joined: 4 Aug 11
Posts: 761
ID: 107307
Credit: 598,042,098
RAC: 553,680

Message 109390 - Posted: 16 Aug 2017 | 23:46:46 UTC

Speaking as the person who made the code changes, we are in fact reducing k for all bases. I was supposed to remove that code, but chose to leave it in. I neglected to tell Mike about it until now. My real life is a bit busy at the moment, so sometimes I'm forgetting things like that.

while (\$k % \$b == 0) { \$k /= \$b; \$n++; }

Michael Millerick
Volunteer tester

Joined: 4 Feb 09
Posts: 621
ID: 35074
Credit: 122,983,688
RAC: 183

Message 109403 - Posted: 18 Aug 2017 | 2:54:20 UTC

Maximizing the return the challenge will have. Excellent!
____________

axn
Volunteer developer

Joined: 29 Dec 07
Posts: 278
ID: 16874
Credit: 25,359,431
RAC: 8,401

Message 109405 - Posted: 18 Aug 2017 | 4:35:18 UTC - in response to Message 109390.

Speaking as the person who made the code changes, we are in fact reducing k for all bases. I was supposed to remove that code, but chose to leave it in. I neglected to tell Mike about it until now. My real life is a bit busy at the moment, so sometimes I'm forgetting things like that.

while (\$k % \$b == 0) { \$k /= \$b; \$n++; }

LOL! Well it doesn't hurt. But I replicated the result, and Mike's right -- there is no need to normalize the k, since apparently LLR (or perhaps gwnum library) is doing it. I can see that when k is a multiple of base, it chooses a lower FFT (compared to adjacent k's), even without explicit normalizing.
Sorry about that -- I should've done my homework before posting about it.

composite
Volunteer tester

Joined: 16 Feb 10
Posts: 572
ID: 55391
Credit: 447,344,855
RAC: 196,213

Message 109411 - Posted: 18 Aug 2017 | 23:22:01 UTC - in response to Message 109405.

Hmm, a process akin to normalization could be responsible for the WTF effect, which is using a timing sidechannel during sieving to "discover" small primes in the blocking factor. So far there is no other explanation for that weirdness.

Honza
Volunteer moderator
Volunteer tester
Project scientist

Joined: 15 Aug 05
Posts: 1697
ID: 352
Credit: 2,086,144,169
RAC: 847,040

Message 109548 - Posted: 24 Aug 2017 | 7:31:14 UTC

While discussed and implement trick with b=25,49,121 brings about 4% speed-up, recently found prime makes GCW yet another 7% faster on top of that. Nice!
____________
My stats
Badge score: 1*1 + 5*2 + 8*10 + 9*5 + 12*3 = 172

Message boards : Generalized Cullen/Woodall prime search : How to speed up (some) GCW units by 10%+