PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise

Advanced search

Message boards : General discussion : primegrid_llr not threading very well

Author Message
Jan Engelhardt
Send message
Joined: 16 Jun 13
Posts: 1
ID: 234943
Credit: 210,975,411
RAC: 83,535
321 LLR Bronze: Earned 10,000 credits (13,636)PPS LLR Silver: Earned 100,000 credits (365,960)SoB LLR Sapphire: Earned 20,000,000 credits (44,909,554)SGS LLR Bronze: Earned 10,000 credits (40,832)321 Sieve (suspended) Bronze: Earned 10,000 credits (14,027)PPS Sieve Double Bronze: Earned 100,000,000 credits (165,623,972)
Message 141758 - Posted: 16 Jul 2020 | 19:15:59 UTC
Last modified: 16 Jul 2020 | 19:16:54 UTC

Given a machine of this kind:

boinc@localhost:~> lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 43 bits physical, 48 bits virtual CPU(s): 256 On-line CPU(s) list: 0-255 Thread(s) per core: 2 Core(s) per socket: 64 Socket(s): 2 NUMA node(s): 2 Vendor ID: AuthenticAMD CPU family: 23 Model: 49 Model name: AMD EPYC 7742 64-Core Processor Stepping: 0 ... NUMA node0 CPU(s): 0-63,128-191 NUMA node1 CPU(s): 64-127,192-255

running SOB (4x64T):
boinc@localhost:~/slots/3> cat stderr.txt BOINC llr wrapper (version 8.04) Using Jean Penne's llr (64 bit) LLR2 Program - Version 0.9.4, using Gwnum Library Version 29.8 LLR command line: primegrid_llr -d -oDiskWriteTime=10 -oThreadsPerTest=64 llr.in Using all-complex FMA3 FFT length 2880K, Pass1=768, Pass2=3840, clm=1, 64 threads, a = 3, L2 = 2015*1027

and I observe in top(1) a lackluster utiliziation of only max. 41/64ths by each llr process on average (total system: 101/256ths).
Tasks: 2248 total, 5 running, 2243 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 23.5 sy, 28.1 ni, 48.3 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st MiB Mem : 515770.0+total, 189199.3+free, 4230.004 used, 322340.6+buff/cache MiB Swap: 0.000 total, 0.000 free, 0.000 used. 508124.8+avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10503 boinc 39 19 749860 200352 3192 R 4173 0.038 123:36.99 primegrid_llr 10301 boinc 39 19 749864 213016 11768 S 3877 0.040 84:50.40 primegrid_llr 10373 boinc 39 19 749864 200432 3192 R 3492 0.038 115:59.75 primegrid_llr 10369 boinc 39 19 768060 218548 3192 R 2169 0.041 108:03.26 primegrid_llr

Is this a known issue? The picture looks similar when forcibly reducing it to run 16threaded, where it takes about 14/16ths:
top - 21:12:58 up 41 days, 5:18, 5 users, load average: 14.33, 14.68, 14.88 307 boinc 39 19 426664 282868 15536 S 1425 0.430 9424:35 primegrid_llr
[/code]

Profile j.sheridanProject donor
Volunteer tester
Send message
Joined: 21 Mar 11
Posts: 812
ID: 91622
Credit: 1,816,278,809
RAC: 5,139,982
Discovered 2 mega primesFound 4 primes in the 2021 Tour de PrimesFound 2 mega primes in the 2021 Tour de Primes321 LLR Jade: Earned 10,000,000 credits (10,011,712)Cullen LLR Jade: Earned 10,000,000 credits (10,025,728)ESP LLR Jade: Earned 10,000,000 credits (10,176,490)Generalized Cullen/Woodall LLR Jade: Earned 10,000,000 credits (10,020,580)PPS LLR Jade: Earned 10,000,000 credits (16,245,019)PSP LLR Jade: Earned 10,000,000 credits (10,055,994)SoB LLR Jade: Earned 10,000,000 credits (15,798,693)SR5 LLR Jade: Earned 10,000,000 credits (10,885,368)SGS LLR Jade: Earned 10,000,000 credits (10,007,143)TRP LLR Jade: Earned 10,000,000 credits (10,009,531)Woodall LLR Jade: Earned 10,000,000 credits (10,039,527)321 Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,019,388)Cullen/Woodall Sieve (suspended) Double Silver: Earned 200,000,000 credits (265,102,350)PPS Sieve Double Gold: Earned 500,000,000 credits (547,992,812)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,801,812)AP 26/27 Emerald: Earned 50,000,000 credits (70,744,414)WW Double Silver: Earned 200,000,000 credits (453,372,000)GFN Double Silver: Earned 200,000,000 credits (330,094,630)
Message 141760 - Posted: 16 Jul 2020 | 20:05:34 UTC - in response to Message 141758.

See this thread.

Also, the latest Prime95 has a benchmark function which will tell you the best number of cores to allocate to each task for a given FFT size on your processor.

stream
Volunteer moderator
Project administrator
Volunteer developer
Volunteer tester
Send message
Joined: 1 Mar 14
Posts: 888
ID: 301928
Credit: 505,121,297
RAC: 13,068
Discovered 1 mega primeFound 1 prime in the 2018 Tour de PrimesFound 1 prime in the 2019 Tour de PrimesFound 1 prime in the 2020 Tour de PrimesFound 1 prime in the 2021 Tour de Primes321 LLR Jade: Earned 10,000,000 credits (10,011,570)Cullen LLR Jade: Earned 10,000,000 credits (10,009,374)ESP LLR Jade: Earned 10,000,000 credits (10,009,221)Generalized Cullen/Woodall LLR Jade: Earned 10,000,000 credits (10,012,217)PPS LLR Jade: Earned 10,000,000 credits (16,259,650)PSP LLR Jade: Earned 10,000,000 credits (10,044,081)SoB LLR Jade: Earned 10,000,000 credits (10,064,750)SR5 LLR Jade: Earned 10,000,000 credits (10,002,051)SGS LLR Jade: Earned 10,000,000 credits (10,001,295)TRP LLR Jade: Earned 10,000,000 credits (10,002,411)Woodall LLR Jade: Earned 10,000,000 credits (10,013,921)321 Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,004,228)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,047,667)PPS Sieve Sapphire: Earned 20,000,000 credits (20,866,490)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,043,271)TRP Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,015,177)AP 26/27 Sapphire: Earned 20,000,000 credits (20,045,194)WW Jade: Earned 10,000,000 credits (12,012,000)GFN Emerald: Earned 50,000,000 credits (55,355,287)PSA Double Silver: Earned 200,000,000 credits (200,301,443)
Message 141767 - Posted: 17 Jul 2020 | 8:19:56 UTC
Last modified: 17 Jul 2020 | 8:21:25 UTC

LLR (really, GWNUM) multithreading is not 100%-efficient. More threads mean more losses for synchronization between them.

On the other hand, having too small number of threads mean more independent tasks, which will start fighting for CPU cache, dropping calculation speed dramatically.

On a home 4-core CPUs, using 4 cores is best solution in most cases. In complex setups like yours, optimal throughput depends on type of the task (it's FFT size) and size of CPU cache, and can be determined only by benchmarking under different scenarios.

Profile firedrakes
Avatar
Send message
Joined: 7 Feb 09
Posts: 74
ID: 35268
Credit: 127,654,056
RAC: 59,758
321 LLR Gold: Earned 500,000 credits (878,090)Cullen LLR Bronze: Earned 10,000 credits (57,357)ESP LLR Gold: Earned 500,000 credits (921,205)Generalized Cullen/Woodall LLR Silver: Earned 100,000 credits (273,624)PPS LLR Turquoise: Earned 5,000,000 credits (6,650,374)PSP LLR Silver: Earned 100,000 credits (250,432)SoB LLR Silver: Earned 100,000 credits (401,858)SR5 LLR Amethyst: Earned 1,000,000 credits (1,080,396)SGS LLR Ruby: Earned 2,000,000 credits (3,721,639)TRP LLR Silver: Earned 100,000 credits (253,030)Woodall LLR Silver: Earned 100,000 credits (487,112)321 Sieve (suspended) Turquoise: Earned 5,000,000 credits (7,602,200)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (480,693)Generalized Cullen/Woodall Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,644,320)PPS Sieve Jade: Earned 10,000,000 credits (11,082,615)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (16,303)AP 26/27 Sapphire: Earned 20,000,000 credits (29,276,299)WW Sapphire: Earned 20,000,000 credits (21,220,000)GFN Sapphire: Earned 20,000,000 credits (41,350,527)
Message 142181 - Posted: 30 Jul 2020 | 1:25:03 UTC

agreed. atm i notice the riesl are very poor in multi thread . also their very un balance in terms of using multi threads itself. i can see the temps doing strange stuff on this specif wu.

Pavel Atnashev
Send message
Joined: 11 Aug 17
Posts: 54
ID: 914937
Credit: 2,658,201,070
RAC: 5,247,646
Discovered 3 mega primesEliminated 2 conjecture "k"s321 LLR Jade: Earned 10,000,000 credits (11,980,856)Cullen LLR Emerald: Earned 50,000,000 credits (71,772,873)ESP LLR Double Gold: Earned 500,000,000 credits (793,108,793)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (67,073)PPS LLR Double Bronze: Earned 100,000,000 credits (133,968,831)PSP LLR Double Silver: Earned 200,000,000 credits (273,829,354)SoB LLR Double Amethyst: Earned 1,000,000,000 credits (1,010,504,273)SR5 LLR Double Bronze: Earned 100,000,000 credits (100,345,041)TRP LLR Double Silver: Earned 200,000,000 credits (257,876,275)Woodall LLR Ruby: Earned 2,000,000 credits (4,756,356)Generalized Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (24,029)
Message 142441 - Posted: 13 Aug 2020 | 17:29:09 UTC

EPYC 7742 has 16MB L3 cache per CCX, so any test that requires more than 16MB will run suboptimal.
You can try to run LLR 8-threaded (two CCXs) and set affinity to bind them to specific CCXs.

Post to thread

Message boards : General discussion : primegrid_llr not threading very well

[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2021 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 4.09, 4.92, 5.27
Generated 13 May 2021 | 21:59:25 UTC