Author |
Message |
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,382,143 RAC: 302,741
                               
|
I'm currently running 4 SoB WUs on my Core2Quad Q6600. One of those WUs started running two days ago. It appeared to be running at a pace that would set it to complete in about 10 days.
Yesterday, I started running the other three SoB WUs. Those seem to be looking at a 12 day duration. In addition, the first WU has also slowed down.
During that first day, the other cores were not idle; they were running tasks from another project (SIMAP to be specific).
It appears that running multiple instances of this application causes some type of interference amongst the WUs.
____________
My lucky number is 75898524288+1 |
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
It appears that running multiple instances of this application causes some type of interference amongst the WUs.
There is no solution for this yet. See this post: LLR slowdown across multiple cores
____________
|
|
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 686 ID: 845 Credit: 3,010,950,487 RAC: 689,746
                              
|
Unfortunately, that's a known effect, but the reason is still unknown. See this thread.
Edit: OK, I was too slow this time. :)
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,382,143 RAC: 302,741
                               
|
It appears that running multiple instances of this application causes some type of interference amongst the WUs.
There is no solution for this yet. See this post: LLR slowdown across multiple cores
I thought that might be the issue. Thanks.
____________
My lucky number is 75898524288+1 |
|
|
|
It appears that running multiple instances of this application causes some type of interference amongst the WUs.
Yes, it is generally believed that is does on multicore systems such as yours, but the exact reason is not yet known. However, you are lucky on it only being a 12 day duration.
On my Q6600, I was running PPS LLR. The PPS LLR extended units were finishing in about 7.5 minutes each (all 4 of 'em). Then yesterday, I let it download a few SoB LLR units. I was trying for two, but got 3. They started running within about 10 minutes of each other.
So now, I have 3 SoB LLR and one PPS LLR running. The extended PPS LLR units went to about 9.5 minutes to finish after the SoB LLR units started, and they (all 3) are showing about 15 to 16 day durations working from the %done and elapsed time. Its not just 4 copies of the same App running either (PPS and SoB both use the same app: primegrid_llr_5.09_windows_intelx86.exe) because the PPS units didn't show much of a change in run time when running 1 through 4 processes (with no other BOINC jobs running). The working set of the SoB units is, however, considerably larger: 77.14MB vs 13.27MB for the PPS units.
System specs:
CPU: Intel Core2 Quad Q6600 (Kentsfield) @ 2.4 GHz
RAM: 3 GBytes DDR2 PC2-5300 (the MB has 4 GBytes installed, but the 32-bit OS only sees 3 of it, and yes I know I could use faster RAM... soon...)
OS: Microsoft Windows Vista Home Premium x86 Edition, Service Pack 2, (06.00.6002.00) (soon going to a 64bit Windoze)
MB: MSI MS-7366
chipset: nvidia GeForce 7150 Rev. A2
southbridge: nvidia nForce 630i
Graphics: nVidia GeForce 9600GT in PCIe slot.
IMO, its quite likely to be caused by cache misses and bus congestion, but YMMV, SSFD. I'll run some more detailed tests as soon as these monster SoB units finish. I'm not sure if this app checkpoints properly, so I am afraid to stop these units. The last time I rebooted, I noticed that all 4 of the PPS LLR units that were running started over from 0%, and these SoB units use the same app. The elapsed time did not start over, but the progress bars went from reading in the 70%'s down to the first 'report' of 30-something percent around 3 minutes after the restart, and they ran for almost twice as long as normal. I can tolerate that on 5-10 minute workunits, but not on workunits that are only around 5.1% to 5.2% done after over 18 hours.
EDIT: boy was I really slow.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,382,143 RAC: 302,741
                               
|
My four SoB WUs have working sets of 52 MB each. I'm also running 32 bit Vista SP2, on a Q6600 (same as you), so it seems kind of odd that yours is so different.
Then again, I seem to recall the working sets being a different size, so perhaps it varies over time.
My WUs are checkpointing every 15 CPU minutes. I have my minimum checkpoint interval set to 300 seconds (5 minutes), but I don't think that's affecting this application's checkpointing, although I could be wrong. Anyway, it does checkpoint, but not every minute like some other projects do.
IIRC, when running PPS (LLR), it also did checkpoints every 15 minutes, which meant it never checkpointed on the Q6600 since it took less than 15 minutes to run. It did checkpoint at 15 minutes on slower computers.
I see you're running 6.10.18 on that machine. There's a relatively new feature available in the BOINC manager. If you select a task and hit the "Properties" button, a dialog box is displayed that shows you information about the selected task. One of the things it tells you is when the last checkpoint was.
____________
My lucky number is 75898524288+1 |
|
|
|
This slowdown also occurs with Prime95. Exactly what the cause is, I am not sure. I find that running only two or three cores increases the speed by as much as 10%.
These applications cause the CPU temps to rise significantly and it may be the built-in Intel protection mechanism kicking-in. I suggest you download one of the core temp monitoring apps and see what the temps are doing.
____________
Warped
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,382,143 RAC: 302,741
                               
|
This slowdown also occurs with Prime95. Exactly what the cause is, I am not sure. I find that running only two or three cores increases the speed by as much as 10%.
These applications cause the CPU temps to rise significantly and it may be the built-in Intel protection mechanism kicking-in. I suggest you download one of the core temp monitoring apps and see what the temps are doing.
Not a bad guess, but I'm not seeing core temps any higher than with other applications. Clock speed is normal.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,382,143 RAC: 302,741
                               
|
Perhaps this might help in figuring out what the problem is:
So I have these four SoB WUs running, and they slowed down.
You know what also slowed down by about 15%? The GPUGRID task running on my GPU. The slowdown we see in the LLR processes appears to also be affecting the CPU thread of the GPUGRID application.
The GPUGRID application hasn't changed in a while, and consistently runs at around 77% GPU utilization (according to GPU-Z). It's now running at about 60%, and fluctuating more than it normally does.
I suspended the four SoB tasks, one by one. As each was suspended, the GPU load went up. Even a single SoB task caused the GPU to slow down. That's really odd since it's just a single thread running on a quad core system with three other cores almost idle.
It should be noted that other applications don't cause this slowdown.
____________
My lucky number is 75898524288+1 |
|
|
|
I've got 4 running on my PII 940 OCed @ 3.5Ghz. from my general calculations it appears that the WU's should run about 145- 150 hours or just over 6 days. Far less than the advertised 10+ days. I assume slower computers are really going to have a hard time with these WU's |
|
|
|
I've got 4 running on my PII 940 OCed @ 3.5Ghz.
You overclocked a Pentium II to 3.5 Ghz!!!
Very impressive ;)
|
|
|
|
I've got 4 running on my PII 940 OCed @ 3.5Ghz.
PII has 4 threads?
|
|
|
|
I think he is talking of a Phenom II 940 |
|
|
|
Yes sorry, it would have been more clear if I had said AMD PII...
Update... 64.5 hours and 45% done. finishing in around 143 total hours now. |
|
|
|
finished 1 so far at around 152 hours. Now if I could get a wingman to complete theirs |
|
|
|
got first batch of Sob in from two computers...
Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz (OCed to 3.5GHz):~120 hours or 5 days
http://www.primegrid.com/workunit.php?wuid=103764285
GenuineIntel Intel(R) Xeon(R) CPU X5472 @ 3.00GHz: ~200 hours or 8.3 days
http://www.primegrid.com/workunit.php?wuid=103654824
Both computers running 4 cores of SoB (hyperthreading turned off for i7).
As others have noted, there is a HUGE advantage for a ci7 here.
____________
|
|
|
|
Crap, looks like my dual Xeon (single core) 3GHz machines are going to take over 70+ days.
I'll let it run a few more days to see if other WUs finishing up are involved in the slowing. |
|
|