Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Seventeen or Bust :
SoB 100% CPU or 50% CPU?
Author |
Message |
|
I have 8 of these tasks running + one GPU task running on my laptop and I was curious if I should let all the tasks run at once or if I should limit the amount of tasks running to 3-4? I have turbo boost set to 3.8ghz when all 4 cores are active but it jumps to 4.3 if I have only 3 cores pegged. I am set to complete the tasks in 8 days while having 6-7 tasks running and 9 days when all are consuming 100% of my cores.
I guess I will let them all run at once for the time being and just complete 1 task next and see how long it takes. | |
|
Dave  Send message
Joined: 13 Feb 12 Posts: 3253 ID: 130544 Credit: 2,433,727,965 RAC: 4,119,959
                           
|
8 tasks? Have you got HT on? I'd turn that off to speed things up. Other than that just try 4 cores + then benchmark doing just 3 cores, see if it's worth it or not. | |
|
|
Turning off hyperthreading may not be an option.
My Haswell laptop BIOS doesnt let me turn it off. :(
____________
My Lucky Number is 1893*2^1283297+1 | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2417 ID: 1178 Credit: 20,030,002,971 RAC: 20,692,518
                                                
|
Just set the BOINC client to run 50% of the CPUs and it will have almost the same effect.
| |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 1051 ID: 301928 Credit: 563,881,725 RAC: 1,288
                         
|
In general, you MUST turn HT OFF for LLR tasks. In my experiments, all what it does it either heating up CPU like hell without any change in overall performance on small projects (got 95C at PPSE, and I thought I had a good tower fan! And speed of each core was exactly two times slower then without HT). Either it slowing things down because more tasks are fighting for memory access, which becomes bottleneck in big workunits like SOB.
If you cannot turn it off, limit number of tasks to 3 or 4.
With typical "stock" DDR3-1600 CL11 (and slower memory), running 3 or 4 cores on big projects like SOB usually gives same (or very close to) overall performance, because everybody is waiting for memory access. Must often 4 tasks are still a bit faster (although almost unnoticeable), but in some setups (overclocked CPU with big cache and slow memory) running 4 tasks degrades performance.
In my opinion, most "credit-effective" way is to run 3 "BIG LLR" + 1 Sieve task. Unfortunately, as far as I understood it's not possible to do this completely automatically in Boinc - you have to change projects preferences every time and abort unneeded tasks.
What is best for your computer? It's not so difficult to do a benchmark from command line without waiting for SOB workunit to complete. Before benchmarking, be sure that Boinc is stopped and you're not running any heavy tasks like web browser with flash and animation.
I'll assume you're using Windows-32, for Windows-64 and Linux everything will be same with minor changes in file names.
You'll need two files from Boinc projects/www.primegrid.com directory:
1. llr.ini.6.07
2. primegrid_cllr_3.8.13_windows_intelx86.exe
- Copy them to empty temporary directory, e.g. "test1"
- Rename "llr.ini.6.07" as "llr.ini"
- Optionally, to avoid long typing, rename primegrid_cllr_3.8.13_windows_intelx86.exe to llr.exe
- To avoid long typing again, create batch file, e.g. "test.cmd" with following contents. Note the quotes!
llr -d -q"24737*2^27334063+1"
24737*2^27334063+1 is just a one of big SOB workunits currently being tested by PG (got it from project status page).
The test environment is ready. Now make total of 3 more copies (i.e. total of 4) of these files in different folders - e.g. to "test2", "test3" and "test4".
Now go to first temporary folder and run "test". The LLR executable will run, periodically it will print it's status:
24737*2^27334063+1, bit: 200 / 27334077 [0.00%]. Time per bit: 128.733 ms.
If status does not appear in reasonable time (10-30 seconds), stop the program with Ctrl-C, edit LLR.INI and decrease "OutputIterations=..." parameter. The test must be run at least for 20-30 seconds until "Time per bit" stabilizes and status line is printed few times (first output is often incorrect due to different initialization stuff and CPU boost thing).
Write down the "Time per bit" value.
Now, while first task is still running, go to second folder and run "test" there. When the numbers stabilizes again, write down new "time per bit" for two tasks mode. Usually values for both tasks are very close to each other, but in some setups they could differ. In any case, calculate average value.
Repeat this with 3 and 4 tasks running.
Now, calculate overall output of your system. Since we know average time per bit in milliseconds (T), let's calculate number of bit per second. Then multiply this by number of active tasks and you'll know overall performance for each number of cores "N". BPS = 1000/T*N.
In our example,
1 core: BPS = 1000 / 128.733 * 1 = 7,77
2 cores: BPS = 1000 / ...... * 2 = .....
and so on for 3 and 4 cores.
Compare overall output and you'll know how much cores is best for your PC in SOB.
This test applies to SOB only, for other projects you must use typical workunit from specific project.
| |
|
|
I can turn off HT but since I use this as my work/gaming laptop I need HT on for when I do VM work and with HT on, my overall general use performance doesn't suffer at all while even running 8 SoB tasks. I might try it when I only have one SoB running and see how much fast it can be. I have my TDP set to 72 watts in order to turbo that high. My temps sit around 85-90c (depending if I'm at the office or at home)
Thanks for your information, I will run the tests now and report back!
Results:
1 Task: 80
2 Tasks: 130
3 Tasks: 186
4 Tasks: 220
5 Tasks: 240
So I assume once the number I calculated starts to slow it's upward count then it's safe to say it's best to keep it a core below. So It looks like 4 cores is best, which is 50% with HT on, makes senses. I have DDR3-1866 and doing a quick winsat mem gives me 25GB/s. While running 5 test tasks I get 11GB/s | |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 1051 ID: 301928 Credit: 563,881,725 RAC: 1,288
                         
|
One more note about test procedure for those who want to repeat. I found that LLR.EXE, when used in "-q" -option mode, will delete LLR.INI file after Ctrl-C, making use and editing of this file hard for tweaking. So,
1. Do not copy LLR.INI at all, all you'll need is LLR executable.
2. The batch file to run test must be following:
llr -d -q"24737*2^27334063+1" -oOutputIterations=7000
Instead of using LLR.INI, we're passing "OutputIterations=7000" parameter directly on command line, using "-o" option. Probably "7000" value is too high for current SOB, 2000 or 1000 could be more reasonable.
| |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 1172 ID: 55391 Credit: 1,211,781,066 RAC: 1,245,486
                        
|
Another thing you can tweak is CPU affinity. I see about 10% reduction in WU run time by setting CPU affinity for each WU to a distinct physical core. YMMV.
<tl;dr>You WILL want to select certain logical CPUs to avoid hyperthreading. Linux and Windows use different mapping schemes between physical cores and logical CPUs.</tl;dr>
In Linux, the command "lstopo" (for "list topology") draws a diagram showing the relationship between logical CPUs, physical cores, and layers of cache. What I've always seen is that one side every physical core is mapped to a sequentially numbered range of logical CPUs. Then to avoid hyperthreading WUs in Linux, use half of all the logical CPUs in sequential numerical order (it doesn't matter where you start, you'll wrap around and uniquely hit all the physical cores).
In Windows 7 and 8, based on experimental performance observations, each even-odd pair of logical CPU numbers corresponds to a physical core. To avoid hyperthreading WUs in Windows, use only the even numbered (or only the odd numbered) logical CPUs.
You can't set CPU affinity through configuration with stock versions of BOINC, but you can set it manually on running WUs. This only lasts as long as your patience. Or set the CPU affinity of the running BOINC client to the subset of logical CPUs you want WUs to use, and hope that the O/S is proficient at dispatching simultaneously running WUs onto distinct physical cores.
It's very easy with PRPNet to guarantee a distinct physical core is always used by each instance of prpclient, but this forum's scope isn't wide enough to contain it. | |
|
|
Set CPU to 50% as well as affinity as advised.
All 4 tasks are around 75% at 150 hours time spent so far. Almost done!
| |
|
Message boards :
Seventeen or Bust :
SoB 100% CPU or 50% CPU? |