PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise

Advanced search

Message boards : Sierpinski/Riesel Base 5 Problem : Optimal CPU count?

Author Message
Profile Jordan Romaidis
Avatar
Send message
Joined: 11 May 17
Posts: 244
ID: 880615
Credit: 678,297,128
RAC: 61,279
Discovered 4 mega primesEliminated 1 conjecture "k"Discovered 1 AP26Found 1 prime in the 2018 Tour de PrimesFound 2 primes in the 2019 Tour de PrimesFound 2 primes in the 2020 Tour de PrimesFound 1 mega prime in the 2020 Tour de PrimesFound 1 prime in the 2020 Tour de Primes Mountain StageFound 1 mega prime in the 2020 Tour de Primes Mountain Stage321 LLR Turquoise: Earned 5,000,000 credits (5,014,730)Cullen LLR Ruby: Earned 2,000,000 credits (2,080,460)ESP LLR Gold: Earned 500,000 credits (502,325)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (6,000,054)PPS LLR Emerald: Earned 50,000,000 credits (96,063,121)PSP LLR Silver: Earned 100,000 credits (168,701)SoB LLR Sapphire: Earned 20,000,000 credits (40,278,369)SR5 LLR Jade: Earned 10,000,000 credits (16,535,456)SGS LLR Jade: Earned 10,000,000 credits (18,577,642)TRP LLR Gold: Earned 500,000 credits (525,711)Woodall LLR Jade: Earned 10,000,000 credits (15,028,246)321 Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,084,376)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,606,749)PPS Sieve Sapphire: Earned 20,000,000 credits (23,323,949)AP 26/27 Double Silver: Earned 200,000,000 credits (232,884,886)WW Silver: Earned 100,000 credits (204,000)GFN Double Bronze: Earned 100,000,000 credits (146,527,266)PSA Emerald: Earned 50,000,000 credits (52,891,089)
Message 121070 - Posted: 13 Oct 2018 | 21:17:45 UTC

Has anyone done experimenting with 4+ cores assigned to a single WU? Does SR5 have a thread limit that won't help crunching? Trying to crank these out as fast as possible.

Profile Rafael
Volunteer tester
Avatar
Send message
Joined: 22 Oct 14
Posts: 888
ID: 370496
Credit: 346,354,112
RAC: 546,356
321 LLR Turquoise: Earned 5,000,000 credits (8,236,942)Cullen LLR Turquoise: Earned 5,000,000 credits (8,028,695)ESP LLR Turquoise: Earned 5,000,000 credits (8,027,771)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (8,011,297)PPS LLR Turquoise: Earned 5,000,000 credits (8,008,287)PSP LLR Turquoise: Earned 5,000,000 credits (7,137,796)SoB LLR Turquoise: Earned 5,000,000 credits (6,941,728)SR5 LLR Turquoise: Earned 5,000,000 credits (7,360,227)SGS LLR Turquoise: Earned 5,000,000 credits (7,273,964)TRP LLR Turquoise: Earned 5,000,000 credits (8,751,781)Woodall LLR Turquoise: Earned 5,000,000 credits (7,119,125)321 Sieve (suspended) Jade: Earned 10,000,000 credits (10,033,828)Generalized Cullen/Woodall Sieve (suspended) Jade: Earned 10,000,000 credits (10,037,204)PPS Sieve Jade: Earned 10,000,000 credits (10,305,147)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,000,053)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,030,160)AP 26/27 Jade: Earned 10,000,000 credits (12,468,612)WW Jade: Earned 10,000,000 credits (13,920,000)GFN Sapphire: Earned 20,000,000 credits (29,911,497)PSA Double Bronze: Earned 100,000,000 credits (170,761,999)
Message 121072 - Posted: 13 Oct 2018 | 21:37:18 UTC - in response to Message 121070.

Has anyone done experimenting with 4+ cores assigned to a single WU? Does SR5 have a thread limit that won't help crunching? Trying to crank these out as fast as possible.

As with most things in life... it depends, mainly on your processador architecture, core count, RAM and what you do with your PC.

For instance, take my 4c Haswell. It has relatively low clocks (3.5ghz) and Dual channel, Dual rank 2133mhz RAM, but I found running 4C would normally be the best... but because it is also my daily driver, regular usage steals cycles and it ends up being SLOWER than running with 3C only, even if it's faster to run with 4c on a vaccum.

So beest advice is to test for yourself. Turn of Hyperthreading (if you can) and run a couple benchmarks, Prime95 has an intuitive tool for helping you easily figure out performance numbers.

MonkeydeeProject donor
Volunteer tester
Avatar
Send message
Joined: 8 Dec 13
Posts: 440
ID: 284516
Credit: 429,422,985
RAC: 667,707
Discovered 1 mega primeFound 2 primes in the 2018 Tour de PrimesFound 3 primes in the 2019 Tour de PrimesFound 2 primes in the 2020 Tour de Primes321 LLR Jade: Earned 10,000,000 credits (10,001,743)Cullen LLR Jade: Earned 10,000,000 credits (10,012,652)ESP LLR Turquoise: Earned 5,000,000 credits (5,417,463)Generalized Cullen/Woodall LLR Jade: Earned 10,000,000 credits (10,010,594)PPS LLR Jade: Earned 10,000,000 credits (12,679,772)PSP LLR Turquoise: Earned 5,000,000 credits (5,032,097)SoB LLR Jade: Earned 10,000,000 credits (10,042,754)SR5 LLR Jade: Earned 10,000,000 credits (10,002,093)SGS LLR Jade: Earned 10,000,000 credits (10,000,020)TRP LLR Jade: Earned 10,000,000 credits (10,002,149)Woodall LLR Jade: Earned 10,000,000 credits (10,011,525)321 Sieve (suspended) Jade: Earned 10,000,000 credits (10,120,432)Generalized Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,004,494)PPS Sieve Double Bronze: Earned 100,000,000 credits (107,524,787)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,002,980)TRP Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,010,755)AP 26/27 Emerald: Earned 50,000,000 credits (90,813,866)WW Turquoise: Earned 5,000,000 credits (6,536,000)GFN Double Bronze: Earned 100,000,000 credits (103,769,330)PSA Silver: Earned 100,000 credits (443,652)
Message 121074 - Posted: 14 Oct 2018 | 2:44:36 UTC

As Rafael says, test with various core counts and work unit counts.
And "fast" here can be described in two ways. One is throughput, or how many units you can do in a set amount of time. Two is fastest unit. They are not always the same. So it is best to test to strike the balance however you want it.
Also, there is no limit of how many cores you can throw at a single task. The more cores the faster the task, but you might get more tasks done by running more than one task at a time within the same time span.
____________
My Primes
Badge Score: 2*1 + 4*2 + 6*4 + 7*10 + 9*1 + 10*2 = 133

Profile mikey
Avatar
Send message
Joined: 17 Mar 09
Posts: 1243
ID: 37043
Credit: 519,835,681
RAC: 129,910
Discovered 1 mega prime321 LLR Ruby: Earned 2,000,000 credits (2,038,739)Cullen LLR Ruby: Earned 2,000,000 credits (2,074,615)ESP LLR Ruby: Earned 2,000,000 credits (2,013,823)Generalized Cullen/Woodall LLR Ruby: Earned 2,000,000 credits (2,142,353)PPS LLR Turquoise: Earned 5,000,000 credits (5,225,319)PSP LLR Ruby: Earned 2,000,000 credits (2,049,284)SoB LLR Ruby: Earned 2,000,000 credits (2,700,268)SR5 LLR Ruby: Earned 2,000,000 credits (2,053,250)SGS LLR Turquoise: Earned 5,000,000 credits (5,147,768)TRP LLR Ruby: Earned 2,000,000 credits (2,025,737)Woodall LLR Ruby: Earned 2,000,000 credits (2,014,811)321 Sieve (suspended) Sapphire: Earned 20,000,000 credits (23,770,672)Cullen/Woodall Sieve (suspended) Gold: Earned 500,000 credits (944,431)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,813,253)PPS Sieve Double Silver: Earned 200,000,000 credits (339,665,412)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,446,797)AP 26/27 Sapphire: Earned 20,000,000 credits (33,140,471)WW Ruby: Earned 2,000,000 credits (2,884,000)GFN Sapphire: Earned 20,000,000 credits (46,233,672)PSA Sapphire: Earned 20,000,000 credits (20,457,430)
Message 132049 - Posted: 14 Aug 2019 | 20:21:35 UTC - in response to Message 121070.
Last modified: 14 Aug 2019 | 20:22:38 UTC

My AMD 1920X was doing them using 4 cores per wu, 5 wu at a time in about 14 hours for each wu. I switched to 5 cores per wu, 4 wu at a time and the time is closer to 4 hours for each wu. I have 32gb of ddr4 quad channel ram in the pc.

mackerelProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Oct 08
Posts: 2468
ID: 29980
Credit: 449,457,152
RAC: 300,933
Discovered 6 mega primesEliminated 1 conjecture "k"Found 3 primes in the 2018 Tour de PrimesFound 1 mega prime in the 2018 Tour de PrimesFound 5 primes in the 2019 Tour de PrimesFound 6 primes in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (8,774,878)Cullen LLR Turquoise: Earned 5,000,000 credits (5,903,451)ESP LLR Turquoise: Earned 5,000,000 credits (6,454,573)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,122,074)PPS LLR Emerald: Earned 50,000,000 credits (89,789,563)PSP LLR Jade: Earned 10,000,000 credits (15,223,714)SoB LLR Jade: Earned 10,000,000 credits (17,319,914)SR5 LLR Sapphire: Earned 20,000,000 credits (23,996,561)SGS LLR Turquoise: Earned 5,000,000 credits (7,342,780)TPS LLR (retired) Bronze: Earned 10,000 credits (34,130)TRP LLR Jade: Earned 10,000,000 credits (19,866,589)Woodall LLR Turquoise: Earned 5,000,000 credits (8,171,820)321 Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,236,219)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,383,853)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,626,419)PPS Sieve Emerald: Earned 50,000,000 credits (76,969,144)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,293,882)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,012,757)AP 26/27 Sapphire: Earned 20,000,000 credits (21,918,894)WW Turquoise: Earned 5,000,000 credits (9,368,000)GFN Emerald: Earned 50,000,000 credits (76,708,240)PSA Ruby: Earned 2,000,000 credits (2,939,755)
Message 138938 - Posted: 14 Mar 2020 | 18:09:12 UTC - in response to Message 121072.
Last modified: 14 Mar 2020 | 18:15:46 UTC

So beest advice is to test for yourself. Turn of Hyperthreading (if you can) and run a couple benchmarks, Prime95 has an intuitive tool for helping you easily figure out performance numbers.


A rough guide to use Prime95 (assuming Windows version below):
1, download it, extract it, run it
2, click "just stress testing", then cancel the next window that appears since we're not actually stress testing
3, go to Options > Benchmark window
4, It should be on Throughput benchmark. For SR5, enter 768 under both minimum and maximum FFT size, as that is the current maximum size I've seen in use. For other projects, you'd need to find out one way other other what size they are instead.
5, check "Benchmark all-complex FFTs" - apparently this makes it work in a way more representative of LLR (as commonly used here) although it doesn't seem to make that much difference overall
6, Set "Number of CPU cores to benchmark" to the number of real cores you have. It should detect this by itself.
7, Uncheck "Benchmark hyperthreading" - unless you really want to try it, it doesn't usually help.
8, You may want to edit "Number of workers to benchmark". A worker is what we would call a task, so this is how many tasks to run at once as a comma separated list. It tries to pick some sensible combinations, but you might want to add more. I'd run all factors of the real core count, including non-prime ones, including 1. For example, if you have a 12 core CPU, I'd run "1, 2, 3, 4, 6". If you have a 8 core CPU, try "1, 2, 4, 8". For a 6 core CPU, try "1, 2, 3, 6"...
9, personally I set "Time to run each benchmark" to 5 seconds which is the minimum it allows. I'd do a couple of repeats in case something else was happening to change the results e.g. background tasks.

Look at the results for the highest throughput value. This is the combination that will get you the most overall throughput - tasks completed in a given time.

Example results for a i9-7920X with turbo disabled:
Timings for 768K all-complex FFT length (12 cores, 1 worker): 0.54 ms. Throughput: 1835.54 iter/sec.
Timings for 768K all-complex FFT length (12 cores, 2 workers): 0.55, 0.55 ms. Throughput: 3659.43 iter/sec.
Timings for 768K all-complex FFT length (12 cores, 3 workers): 0.73, 0.72, 0.71 ms. Throughput: 4158.77 iter/sec.
Timings for 768K all-complex FFT length (12 cores, 4 workers): 0.93, 0.95, 0.95, 0.91 ms. Throughput: 4280.10 iter/sec.
Timings for 768K all-complex FFT length (12 cores, 6 workers): 1.57, 1.58, 1.58, 1.51, 1.59, 1.58 ms. Throughput: 3825.98 iter/sec.
Timings for 768K all-complex FFT length (12 cores, 12 workers): 3.99, 4.01, 4.00, 4.08, 4.05, 4.09, 4.04, 4.02, 4.14, 3.99, 4.03, 4.06 ms. Throughput: 2968.99 iter/sec.

We see the highest throughput is obtained for running 4 workers (tasks) at once, implicitly each with 3 cores. But what about the relative speed of each task? This may be interesting for those aiming to be "1st" more often and trade off some throughput for the shorter time. One way to do it is to look at the timings shown before the throughput. Because this is the time taken to do each step of the calculation, lower is better. It is clear as we assign more cores to fewer simultaneous tasks, it gets faster, to a point. 1 task with 12 cores is barely faster than running 1 task with 6 cores, which you can do two at the same time. So that makes no sense to use. Of interest is 3 workers (of 4 cores), as it is only slightly slower overall throughput than 4 workers (of 3 cores), but it is somewhere over 20% faster per unit. And that is the configuration I'm running it in. While 2 workers (of 6 cores) is faster per unit again, there is more hit to the overall throughput.


Example results for an i7-8086k with turbo disabled:
Timings for 768K all-complex FFT length (6 cores, 1 worker): 0.49 ms. Throughput: 2033.05 iter/sec.
Timings for 768K all-complex FFT length (6 cores, 2 workers): 0.96, 0.97 ms. Throughput: 2070.05 iter/sec.
Timings for 768K all-complex FFT length (6 cores, 3 workers): 2.22, 2.22, 2.23 ms. Throughput: 1349.15 iter/sec.
Timings for 768K all-complex FFT length (6 cores, 6 workers): 5.49, 5.47, 5.50, 5.49, 5.48, 5.48 ms. Throughput: 1094.29 iter/sec.

Here we see 2 workers has the highest throughput, but 1 worker is only slightly less but will turn around a task in half the time. So that's what I'm using.


Example result for a Ryzen 7 3700X (stock operation):
Timings for 768K all-complex FFT length (8 cores, 1 worker): 0.43 ms. Throughput: 2319.20 iter/sec.
Timings for 768K all-complex FFT length (8 cores, 2 workers): 0.68, 0.68 ms. Throughput: 2940.09 iter/sec.
Timings for 768K all-complex FFT length (8 cores, 4 workers): 1.34, 1.34, 1.34, 1.33 ms. Throughput: 2987.78 iter/sec.
Timings for 768K all-complex FFT length (8 cores, 8 workers): 6.43, 6.42, 6.41, 6.43, 6.34, 6.30, 6.42, 6.49 ms. Throughput: 1249.39 iter/sec.

Here we have 4 workers (of 2 cores) fastest but 2 workers (of 4 cores) only slightly behind. So I might as well run 2 tasks of 4 cores with about half the turn around time.

The throughput numbers can be used as a way to compare the speeds of different systems.


My AMD 1920X was doing them using 4 cores per wu, 5 wu at a time in about 14 hours for each wu. I switched to 5 cores per wu, 4 wu at a time and the time is closer to 4 hours for each wu. I have 32gb of ddr4 quad channel ram in the pc.


This is a 12 core CPU so those combinations of tasks/threads are not what I'd have considered. Intuitively 4 tasks of 3 cores each would be a safe choice, as it keeps the data on each CCX. I hope that when you are running 4 tasks of 5 threads each, your OS is smart enough to keep them on the same CCX. Another interesting combination to try might be 2 tasks of 6 cores each, but as this involves crossing a CCX, the memory bandwidth comes into play. Thankfully at only 6 Zen 1 cores per die, there isn't too much demand of the ram bandwidth. I'm not sure about running 1 task on all 12 cores since this particular CPU has NUMA nodes.

Post to thread

Message boards : Sierpinski/Riesel Base 5 Problem : Optimal CPU count?

[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2021 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 3.84, 4.00, 4.12
Generated 24 Jan 2021 | 0:28:46 UTC