## Other

drummers-lowrise

Message boards : Sieving : Reminder: GFN15 sieving hits optimal level at p=26000P

 Subscribe SortOldest firstNewest firstHighest rated posts first
Author Message
JimB
Honorary cruncher

Joined: 4 Aug 11
Posts: 914
ID: 107307
Credit: 974,118,817
RAC: 338

Message 102596 - Posted: 22 Dec 2016 | 23:43:12 UTC

I don't remember if I ever mentioned this publicly, but my calculations put the optimal sieving level for GFN15 at p=26000P. As I write this, we're reserved through 25023P. Sieving for GFN16, 17, 21 and 22 will still be available. At some point in the future, I'll probably open up GFN18, 19 and 20 for further sieving.

While other n's give less credit per 1P sieved, they all give more or less the same credit per hour of sieving. Sieving higher n's is faster, so the credit rate is lower.

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 857
ID: 370496
Credit: 308,957,976
RAC: 67,503

Message 102597 - Posted: 22 Dec 2016 | 23:59:38 UTC - in response to Message 102596.

I don't remember if I ever mentioned this publicly, but my calculations put the optimal sieving level for GFN15 at p=26000P. As I write this, we're reserved through 25023P. Sieving for GFN16, 17, 21 and 22 will still be available. At some point in the future, I'll probably open up GFN18, 19 and 20 for further sieving.

While other n's give less credit per 1P sieved, they all give more or less the same credit per hour of sieving. Sieving higher n's is faster, so the credit rate is lower.

Could you elaborate on "optimal depth"? I really wonder what that means... well, I get the part where "optimal sieve is when you remove candidates faster by PRP testing than sieving", but that's kinda vague. See, different hardware are sure to have different depths, as one might GFN faster / slower, when normalizing per sieve speed. Speaking of sieve, what about n=15 and strong GPUs that take multiple instances of the program running (and therefore, multiple CPU cores that could be used for PRP testing instead)? And is this optimal point considering the full b=400.000 range, or more of a "we only expect to go up until 150k in the near future" type of calculation? And perhaps even something else I'm forgetting right now...

I'm really curious as to how you / PG sees it.

stream
Volunteer moderator
Volunteer developer
Volunteer tester

Joined: 1 Mar 14
Posts: 686
ID: 301928
Credit: 472,276,418
RAC: 168,108

Message 102628 - Posted: 23 Dec 2016 | 14:45:32 UTC - in response to Message 102597.

Could you elaborate on "optimal depth"? I really wonder what that means... well, I get the part where "optimal sieve is when you remove candidates faster by PRP testing than sieving", but that's kinda vague. See, different hardware are sure to have different depths, as one might GFN faster / slower, when normalizing per sieve speed.

Correct. This applies to all kind of sieves, not only GFN. Even simple LLR sieving may give completely different optimal depth for SSE2 and AVX2 CPUs because AVX2 gives significant boost for LLR but not for sieving (common sieving programs are optimized for SSE2 only). So you should read this statement as something like "on a set of machines which we consider most common or just have nearby, average factor removal ratio is somewhere near 1.0". Although some PCs may consider it oversieved, some undersieved.

Speaking of sieve, what about n=15 and strong GPUs that take multiple instances of the program running (and therefore, multiple CPU cores that could be used for PRP testing instead)? And is this optimal point considering the full b=400.000 range, or more of a "we only expect to go up until 150k in the near future" type of calculation? And perhaps even something else I'm forgetting right now...

I'm really curious as to how you / PG sees it.

I have quite complex excel file which calculates optimal depth for b=400,000 based on:
- Sieving speed for given core (P/day);
- Genefer speed for GPU and CPU;
- Number of CPU cores required to feed the GPU;
- Number of factors removed per 100P;
may be something else, which I don't remember right now. I think Jim have something similar for his setup. The idea is still the same, find how many useful candidates per day could be removed by sieving, and how many could processed by Genefer using GPU and given number of CPU cores, then compare these numbers.

It not possible to find exact GFN optimal sieving depth which will suit everybody because it depends greatly on which generation of GPU is used. E.g. on my 750ti, which is not so fast in Genefer but quite good for sieving, current ratio is still about 1,6. On a faster GPUs, it'll be closer to optimal 1,0. Since GFN-15 uses lot of CPU time, it also may shift depending on how much CPU cores you're ready to throw in it... so, it's all complicated.

JimB
Honorary cruncher

Joined: 4 Aug 11
Posts: 914
ID: 107307
Credit: 974,118,817
RAC: 338

Message 102631 - Posted: 23 Dec 2016 | 15:25:48 UTC

I really wish people would just take my word for it once in a while... (because it took me three hours to write this).

The optimal sieving calculation is really not all that complex. I have a first-generation Titan card and any timings I'm giving involve running on that card. I'm not really interested in what other cards can do because I have no intention of building some huge model to take everything into account.

On my card, it takes 100 seconds to test (with genefer) the largest GFN15 candidate below b=400M. By using that timing, rather than the 80 seconds it takes for current leading edge candidates, my optimal sieving level is, if anything, conservative. In other words the optimal level is higher using the longer testing time. I'm going to double those times to 200 and 160 seconds to account for the fact that each genefer test is run twice, whereas each sieving range is run once.

Sieving produces a list of factors and the candidate each factor refers to. How many factors are found doesn't matter for this calculation. What matters is how many candidates are removed from the sieve file. That's because a lot of candidates have already been removed by previous sieving. In fact at the moment in GFN15 only about 14.76% of the factors found result in removals from the sieve. Those numbers are available in the sieving statistics page. I'm specifically looking at the 160P 23931P-24091P range because larger ranges give more accurate numbers. Plus it straddles 24000P, which we'll get to below.

GFN15 and GFN16 sieving cover 1<b<2G. While we currently can't make use of anything over 400M, those extra factors are "free". The algorithm used calculates them in any case so we might as well collect them. The sieving stats table for GFN15 is only reporting data for b<400M as genefer currently cannot test beyond that. But that number of factors needs to be adjusted for where we are in genefer testing GFN15 in BOINC. Factors for candidates we've already tested aren't worth anything. The leading edge of Genefer testing for n=15 is at about 42.2M, so we're really only interested in factors for b=42.4M-400M. That's 357.8M or 89.45% of the full 400M range. So, the 4447 factors resulting in removals becomes 4447*.8945=3978 usable factors. Divide that by the 160P range length and you get 24.86 removals per 1P of sieving on the 23931P-24091P range.

On my card, sieving 1P of GFN15 takes 76 minutes or 4560 seconds. On that 23931P-24091P range there were 24.86 candidates removed per 1P tested. But in 4560 seconds I can primality test 22.8 candidates at 200 seconds each (or 28.5 candidates at 160 seconds each). So sieving gives 24.86, testing gives 22.8/28.5 candidates. Sieving is superior at the moment. That 24.86 sieving number becomes 24.86*24000/26000 = 22.95 candidates at p=26000P. 22.8 and 22.95 are close enough for me to call this optimal sieving.

How about running two sieves simultaneously? On my card each range now takes a minute longer but we get double the candidates. Now I've got 49.72 removals in 4620 seconds. I can primality test 23.1 @ 200 seconds or 28.88 candidates @ 160 seconds. In this scenario, the optimal level is much higher. 49.72 / 23.1 * 24000P gives an optimal level of 51657P.

How about running three sieves simultaneously? Each range now takes 79 minutes on my card. In 4740 seconds we're removing 74.58 candidates. I can primality test 23.7 candidates on the GPU (@ 200 seconds) or 29.63 candidates @ 160 seconds. The optimal level here is 74.58 / 23.7 * 24000P = 75524P.

So why do I want to close GFN15 sieving now? First of all, not every GPU can run three sieving processes simultaneously and I'm not sure how many people bother. Plus each process requires a full CPU core to feed the GPU.

Secondly, all the higher n's are much further from optimal. GFN16 is only 1/10 of the way to optimal. I don't have numbers handy for the rest, suffice it to say they've all got a long way to go. Sieving is more valuable (more candidates removed) the further from optimal it is. We have a limited number of users willing/able to do manual sieving and I'd like to use them as efficiently as I can. We also have a lot of users who like running GFN15 in BOINC. There are far more users testing with Genefer than doing sieving. Removing a few hundred GFN15 candidates doesn't really make that much of an impact vs the 9000+ GFN15 tests done per day recently.

I'd much rather eliminate candidates for less-popular higher n's that take much longer to test. The sieving credit rate is the same per hour, as close as we could make it. If GFN15 was the only GFN sieving I'd push it higher, but it's not worth it when there are 7 other n's that could use the attention.

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 554
ID: 164101
Credit: 304,715,793
RAC: 0

Message 102634 - Posted: 23 Dec 2016 | 16:14:27 UTC - in response to Message 102631.

Secondly, all the higher n's are much further from optimal. GFN16 is only 1/10 of the way to optimal. I don't have numbers handy for the rest, suffice it to say they've all got a long way to go.

Do you have an estimate for n > 16?
This is an indirect question to know if a GFN sieve in BOINC is to be considered.

I am also wondering if few steps of Pollard's P−1 could help. Now, genefer is able to check primality (not only prp) and to run Pépin's test (Selfridge-Hurwitz residues). I can add a P−1 test.

stream
Volunteer moderator
Volunteer developer
Volunteer tester

Joined: 1 Mar 14
Posts: 686
ID: 301928
Credit: 472,276,418
RAC: 168,108

Message 102646 - Posted: 23 Dec 2016 | 21:58:21 UTC - in response to Message 102634.

This is an indirect question to know if a GFN sieve in BOINC is to be considered.

Technically, it is possible, I have this app. I've used it myself with few remote systems. But it makes sense only with high GFN, may be starting from GFN-18, where sieving uses no CPU and can fully utilize GPU. Otherwise it's impossible to describe (in Boinc) a task which uses lot of CPU and undefined percent of GPU - the GPU will never be fully utilized.

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 857
ID: 370496
Credit: 308,957,976
RAC: 67,503

Message 102647 - Posted: 23 Dec 2016 | 22:36:40 UTC - in response to Message 102646.

Technically, it is possible, I have this app. I've used it myself with few remote systems. But it makes sense only with high GFN, may be starting from GFN-18, where sieving uses no CPU and can fully utilize GPU. Otherwise it's impossible to describe (in Boinc) a task which uses lot of CPU and undefined percent of GPU - the GPU will never be fully utilized.

Watch out for screen lag. We either go back to having a B field like the CUDA app days (which Michael once said was NOT going to happen ever again) or force people to deal with app_config and scare away many crunchers.

JimB
Honorary cruncher

Joined: 4 Aug 11
Posts: 914
ID: 107307
Credit: 974,118,817
RAC: 338

Message 102654 - Posted: 24 Dec 2016 | 0:29:50 UTC

There's no pressing need to do GFN sieving in BOINC.

Michael Goetz
Honorary cruncher

Joined: 21 Jan 10
Posts: 13214
ID: 53948
Credit: 218,265,676
RAC: 10,041

Message 102657 - Posted: 24 Dec 2016 | 1:37:46 UTC - in response to Message 102647.

Technically, it is possible, I have this app. I've used it myself with few remote systems. But it makes sense only with high GFN, may be starting from GFN-18, where sieving uses no CPU and can fully utilize GPU. Otherwise it's impossible to describe (in Boinc) a task which uses lot of CPU and undefined percent of GPU - the GPU will never be fully utilized.

Watch out for screen lag. We either go back to having a B field like the CUDA app days (which Michael once said was NOT going to happen ever again) or force people to deal with app_config and scare away many crunchers.

I'm not sure what you're referring to, but I suspect the comment you may be thinking of was about GeneferCUDA rather than the sieving app (which is also CUDA). I haven't personally done any GFN sieving in a while. Don't we still use the same app we've always used? The one with the B (block) parameter for tuning?

I've also said that new apps should be written for OCL rather than CUDA, but that's because I'd rather have an app that runs on both red and green GPUs rather than only half of them. There's nothing inherently wrong with CUDA, except, of course, that it won't run on AMD GPUs.
____________
My lucky number is 75898524288+1
(I am NOT an administrator anymore, so please don't PM me with questions. I can't help.)

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 857
ID: 370496
Credit: 308,957,976
RAC: 67,503

Message 102660 - Posted: 24 Dec 2016 | 3:34:20 UTC - in response to Message 102657.

I'm not sure what you're referring to, but I suspect the comment you may be thinking of was about GeneferCUDA rather than the sieving app (which is also CUDA). I haven't personally done any GFN sieving in a while. Don't we still use the same app we've always used? The one with the B (block) parameter for tuning?

I've also said that new apps should be written for OCL rather than CUDA, but that's because I'd rather have an app that runs on both red and green GPUs rather than only half of them. There's nothing inherently wrong with CUDA, except, of course, that it won't run on AMD GPUs.

Remember the days when we crunched GFN with the CUDA app? Yeah, in those days, there was a B parameter that the user could select in the preferences page. When proposing that such option was made available for a potential GFN SV, you've said that it was a mistake to ever allow it to show up on the site, as it raised more problems with confused users than it solved, so such solution would never be implemented ever again.

Michael Goetz
Honorary cruncher

Joined: 21 Jan 10
Posts: 13214
ID: 53948
Credit: 218,265,676
RAC: 10,041

Message 102661 - Posted: 24 Dec 2016 | 5:03:02 UTC - in response to Message 102660.

I'm not sure what you're referring to, but I suspect the comment you may be thinking of was about GeneferCUDA rather than the sieving app (which is also CUDA). I haven't personally done any GFN sieving in a while. Don't we still use the same app we've always used? The one with the B (block) parameter for tuning?

I've also said that new apps should be written for OCL rather than CUDA, but that's because I'd rather have an app that runs on both red and green GPUs rather than only half of them. There's nothing inherently wrong with CUDA, except, of course, that it won't run on AMD GPUs.

Remember the days when we crunched GFN with the CUDA app? Yeah, in those days, there was a B parameter that the user could select in the preferences page. When proposing that such option was made available for a potential GFN SV, you've said that it was a mistake to ever allow it to show up on the site, as it raised more problems with confused users than it solved, so such solution would never be implemented ever again.

Do you have a link to that? I'd like to see precisely what I said. Perhaps I was just having a grumpy day.

Or maybe I'm just forgetting the ugly parts. :)

To be honest, I have no idea how Yves and Brian managed to tune the new BOINC GPU apps without having such a parameter. When we wrote GeneferCUDA, it was impossible to tune the block size for a reasonable balance between efficiency and screen lag for all GPU types -- and there were a whole lot fewer GPUs to worry about at that time. And of course people running the app on their daily driver would need better screen responsiveness, while people running the tasks on a dedicated cruncher (or a secondary GPU not connected to a monitor) would want maximum performance.

A "lag vs. speed" setting (which is what that was) was certainly useful, although it did indeed confuse some. I wouldn't be opposed to having such settings in other GPU apps. For manual sieving -- where the user is expected to be able to deal with more complex operating procedures -- I certainly would have no objection.

Either I'm not remembering something correctly, or I poorly explained what I was trying to say, or you're taking something out of context.
____________
My lucky number is 75898524288+1
(I am NOT an administrator anymore, so please don't PM me with questions. I can't help.)

Tyler
Volunteer tester

Joined: 4 Dec 12
Posts: 1062
ID: 183129
Credit: 1,213,385,934
RAC: 493,668

Message 102662 - Posted: 24 Dec 2016 | 5:16:49 UTC - in response to Message 102661.

Do you have a link to that? I'd like to see precisely what I said. Perhaps I was just having a grumpy day.

Or maybe I'm just forgetting the ugly parts. :)

Although we initially went that way with CUDA and block sizes, we replaced it with an internal self-optimization that would select the fastest block size. Offering a user selectable block size was a mistake I'm not looking to repeat.

____________

275*2^3585539+1 is prime!!! (1079358 digits)

Proud member of Aggie the Pew

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 857
ID: 370496
Credit: 308,957,976
RAC: 67,503

Message 102663 - Posted: 24 Dec 2016 | 5:26:20 UTC - in response to Message 102662.

Do you have a link to that? I'd like to see precisely what I said. Perhaps I was just having a grumpy day.

Or maybe I'm just forgetting the ugly parts. :)

Although we initially went that way with CUDA and block sizes, we replaced it with an internal self-optimization that would select the fastest block size. Offering a user selectable block size was a mistake I'm not looking to repeat.

^This.

Now, I get that it was referring to selecting the GFN transform, but the same idea still applies for a hypothetical GFN SV.

Michael Goetz
Honorary cruncher

Joined: 21 Jan 10
Posts: 13214
ID: 53948
Credit: 218,265,676
RAC: 10,041

Message 102664 - Posted: 24 Dec 2016 | 5:43:06 UTC - in response to Message 102663.

Do you have a link to that? I'd like to see precisely what I said. Perhaps I was just having a grumpy day.

Or maybe I'm just forgetting the ugly parts. :)

Although we initially went that way with CUDA and block sizes, we replaced it with an internal self-optimization that would select the fastest block size. Offering a user selectable block size was a mistake I'm not looking to repeat.

^This.

Now, I get that it was referring to selecting the GFN transform, but the same idea still applies for a hypothetical GFN SV.

Ug. That's a confusing post. I answered a question about selecting transforms by discussing block sizes.

Rather than confirming or denying anything, I'll just say that software changes and hardware changes, and something like this might or might not be useful in the future. Things change. Including opinions.

If a GPU app had a command line option to change the block size (as a trade off between speed and lag), people could fairly easily use it via the app_config mechanism. It could also configured via the preferences web page. (Which one would be easier to use is a matter of personal opinion.)
____________
My lucky number is 75898524288+1
(I am NOT an administrator anymore, so please don't PM me with questions. I can't help.)

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 554
ID: 164101
Credit: 304,715,793
RAC: 0

Message 102676 - Posted: 24 Dec 2016 | 13:29:25 UTC

My question was not "is it possible to run a GFN sieve 'efficiently' (i.e. CPU load ~0 and no screen lag)?" because to my point of view the answer is yes.
If the GPU tasks are too small, the CPU load is high. If they are too big, screen lags can happen. The duration of a GPU task should be > 100 µs and < 1 ms. In OpenCL, some profiling commands can be used to measure this time accurately (with a GPU timer). Then we can set block size such that duration ~ 500 µs during the initialisation of the program.

My question is: "Let n = 20. According to http://www.primegrid.com/server_status_tasks.php, genefer execution time on GPU is ~ 10 hours. How many candidates does the sieve remove in 10 hours?". Of course if the reference GPU is a Titan, 10 => 4 hours.

If the answer is 5 or 10 candidates then a GFN sieve in BOINC is to be considered!
Then we can also study the algorithm. Today the sieve is based on http://fatphil.org/maths/GFN/maths.html (no?). Is it really more efficient than trial divisions if we want to sieve a small range first (800,000-2,000,000 for n=20) and when divisors are large? Pollard's P−1 may be efficient too.

JimB
Honorary cruncher

Joined: 4 Aug 11
Posts: 914
ID: 107307
Credit: 974,118,817
RAC: 338

Message 102695 - Posted: 25 Dec 2016 | 2:13:00 UTC

I don't have those answers at the moment and real life is going to keep me from calculating them for a few days. Meanwhile I've enabled sieving on GFN18, GFN19 and GFN20. I don't remember any more why they weren't turned on originally - possibly because genefer couldn't test very high on GPU back then and we were optimal for the more limited b-range then possible.

We don't have any plans to start GFN sieving on BOINC. While we're not at optimal levels we're far more heavily sieved than we ever thought possible before the availability of GPU-based sieving. Moving the sieves to BOINC has costs in both time and impact on the server. We don't feel those costs are worth it at this time.

[B^S]ST47

Joined: 12 Jul 05
Posts: 93
ID: 48
Credit: 181,401,682
RAC: 0

Message 102696 - Posted: 25 Dec 2016 | 2:47:41 UTC

With all these ranges to sieve - which GFN is best to sieve? Are any of them currently decreasing the BOINC workload? (Where is a GPU the most effective?)
____________

JimB
Honorary cruncher

Joined: 4 Aug 11
Posts: 914
ID: 107307
Credit: 974,118,817
RAC: 338

Message 102697 - Posted: 25 Dec 2016 | 3:01:50 UTC

I'll have a good answer on that in a few days, between Christmas and New Years Day some time. I have to either find all my timings from the last time I did this (which is probably before I obtained the Titan card) or redo them all.

Personally I'd sieve GFN17 at the moment as we're pulling candidates from two different places in the sieve file.

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 554
ID: 164101
Credit: 304,715,793
RAC: 0

Message 102719 - Posted: 25 Dec 2016 | 18:28:42 UTC

Here is an estimate on a slow computer (i3-3217U + GeForce GT 740M).

It takes 39 h 20 mn to test 900000^2^20+1 with genefer/ocl4.

The sieve/ocl finds about 6 factors / mn around 40000P, then approximately 14000 factors for about 39 h 20 mn.

63.8% candidates were already removed from sieve then 14000 * (1 - 0.638) ~ 5000 new candidates are removed from sieve.

The range of the sieve is b in [2-2.10^9] and 5000 / 2.10^9 = 1 / 400000.

The optimal sieving calculation depends on the goal. At n=20, the goal is at least one prime, ten primes within a few years.
The expected number of primes (n=20) is 1 at b ~ 900,000, 2 at b ~ 2,000,000 and 10 at b ~ 11,000,000.
If the target is ten GFN20 primes, the range is b in [2-12,000,000]. In this range, the sieve removes 30 candidates when genefer removes one.
But of course if we want to find one prime, genefer can do it but not the sieve, the dilemma of sieving...

Message boards : Sieving : Reminder: GFN15 sieving hits optimal level at p=26000P