Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Seventeen or Bust :
Long estimates to completion time
Author |
Message |
Jay Send message
Joined: 27 Feb 10 Posts: 136 ID: 56067 Credit: 65,749,514 RAC: 11,920
                    
|
Hello,
I usually run Seventeen or Bust tasks and lately some of the larger tasks are taking about 410+/- hours on my machine. For most of these the "To completion" time is pretty accurate.
3 of the last 4 tasks though are giving very odd, extra long completion time estimates, in the 2,000 hour time range.
I've checked one of these and over 11 hours my machine progressed 2.69% of the way through the task. Over 48 hours it progressed 11.81%. Based on those numbers I'd expect the total task to take about 408 hours, which is similar to the estimate I usually get with new tasks.
Any ideas why the estimates indicate such huge times to completion, especially when progress appears to continue at the expected rate? Restarting the machine, manually forcing a benchmark when the machine is otherwise not very active have not helped the issue.
Thanks,
Jay | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,339,677 RAC: 301,055
                               
|
I actually know exactly what the problem is. What I don't have, however, is a satisfactory solution.
First of all, nothing whatsoever has changed with the SoB work units. The problem is that you've been running the CUDA-based ppsieve tasks. The time estimates for those tasks are way, way too low. When one of them completes, your BOINC client says "Oh, I'm doing PrimeGrid tasks slower than the estimates say I should, so I'm going to raise the correction factor for PrimeGrid." What's important to note is that BOINC isn't saying "I'm doing ppsieve-CUDA too slow," it's saying "I'm doing PrimeGrid too slow."
This correction factor ("DCF") is for all of PrimeGrid, not for individual sub-projects. When raising an estimate, BOINC raises the estimate by 100%, to instantly try to prevent deadline overruns. However, when lowering the estimate factor, it only lowers it partially, around 10%. So if you shut down your CUDA work, you would need to complete a number of CPU tasks before the estimates return to normal.
Alternatively, after shutting down the CUDA work, you could shut down BOINC and manually edit the client_state.xml file to set the DCF back to a reasonable valiue.
One of the nasty side effects of this problem is that those 2000 hour estimates are going to put newly downloaded tasks in high priority mode. When that happens, they'll preempt the almost-finsihed tasks. The result will be the only time BOINC will finish a task is when it gets close to its deadline and it too is in high priority mode (this time, legitimately.)
There's only a few things you can do to fix this.
1) Don't run the CUDA tasks.
2) Run CUDA and CPU tasks on different computers, or at least different virtual computers. You could create a Virtual Machine using VMWare or other virtualization software. Then you can run the CUDA tasks on the real machine and the CPU tasks in the VM. That way two different copies of BOINC are running and each has its own DCF. The CUDA tasks have to run on the real machine because the VM's (at least VMWare) doesn't virtualize the CUDA drivers.
3) There might be a way to create an annonymous platform application for the CUDA work, where you could set the estimates manually to avoid this problem. This requires a lot of manual configuration, and you'll need tto manually replace the .exe every time it gets upgraded.
It might be possible for this problem to be fixed by the project admins, but I'm not certain about that. But right now, this is a big problem.
The root cause of the problem is a design flaw in BOINC itself, specifically having a single DCF for the entire project rather than individual applications. Good luck getting Berkley to fix that, however. They seem less in touch with what projects and users really need these days than the aliens they've been searching for.
____________
My lucky number is 75898524288+1 | |
|
Jay Send message
Joined: 27 Feb 10 Posts: 136 ID: 56067 Credit: 65,749,514 RAC: 11,920
                    
|
Thanks for the information. I did try downloading a batch of the CUDA tasks just to see how my machine handled them. I then updated the drivers and tried a second batch. I wasn't real happy with how they were impacting other things so I was planning on not doing them again for awhile.
I will try setting the DCF back to a more reasonable value in the client_state.xml file. I have no idea what it used to be though. Since I went from an estimated completion time of about 400 hours to about 2000 hours does the DCF need to change by a factor of 5? Or is there some other relationship I can use to estimate a better DCF value?
They seem less in touch with what projects and users really need these days than the aliens they've been searching for.
I'm starting to wonder if they'll find an alien before someone finds another SOB prime. It's been awhile, although SOB has 11 hits while the aliens have none.
Thanks again for the help | |
|
Jay Send message
Joined: 27 Feb 10 Posts: 136 ID: 56067 Credit: 65,749,514 RAC: 11,920
                    
|
I will try setting the DCF back to a more reasonable value in the client_state.xml file. I have no idea what it used to be though. Since I went from an estimated completion time of about 400 hours to about 2000 hours does the DCF need to change by a factor of 5? Or is there some other relationship I can use to estimate a better DCF value?
I did end up changing the DCF by a factor of 5. So far it looks like the completion time values are back to what I had expected. I'll see what happens when the next one completes and the DCF is updated by the BOINC client.
| |
|
|
I am running my first Seventeen or Bust task on this older Centrino Duo laptop. I had turned that task OFF in my preferences months ago due to never finishing such a long task. However, I recently decided to turn it back on. The current SOB task is 1348 hours with a deadline 32 days out, and is running in high priority mode. Based on this estimate, which I know is inaccurate, it would finish about 20 days late. I have decided to keep it running and would appreciate your comments on my questions below.
I wont get new tasks for Enistein and Milky Way. Will this help?
Can I configure both CPU's to run just the one task? in mid progress? I have two processors. Currently set to run 100% of the processors, at 100% of the CPU time. Do I change one of these settings to 200% ?
Will checkbox use GPU help?
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,339,677 RAC: 301,055
                               
|
I am running my first Seventeen or Bust task on this older Centrino Duo laptop. I had turned that task OFF in my preferences months ago due to never finishing such a long task. However, I recently decided to turn it back on. The current SOB task is 1348 hours with a deadline 32 days out, and is running in high priority mode. Based on this estimate, which I know is inaccurate, it would finish about 20 days late. I have decided to keep it running and would appreciate your comments on my questions below.
I wont get new tasks for Enistein and Milky Way. Will this help?
SoB only takes up one core, so you can run something else on the other core. Just make sure SoB has at least a 50% resource share so it runs 24/7 without being interrupted by the other projects. Then again, if you only run one core, your machine will run cooler, which may prevent it from slowing down if it overheats.
I'm assuming this is a laptiop? You'll want to keep it running 24/7 and plugged in all the time, and set to stay in high power mode. Otherwise, this will take even longer. As it is, I suspect that with running full throttle without interruption you'll barely make the deadline. If you run on batteries, let the laptop get too hot (it will slow itself down), or don't run SoB 24/7 I don't think you'll even come close to the deadline.
Can I configure both CPU's to run just the one task?
No, not possible.
Will checkbox use GPU help?
No.
____________
My lucky number is 75898524288+1 | |
|
|
It is a laptop, and I will keep it running 24/7 on one core.
I will let the other tasks complete, and not get new tasks.
This should keep things cool, and not overheat.
Hopefully it will complete on time.
Thanks Mike for your kind reply.
~Charles
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,339,677 RAC: 301,055
                               
|
I would also suggest the following, in order to keep the laptop cool (both to protect the laptop, and to keep it from slowing down):
* Leave the laptop lid open (it helps radiate heat away from the keyboard surface)
* Elevate the corners of the latpop somewhat -- either the back only, or the entire laptop -- by about an inch. This helps with airflow at the cooling vents.
* Get a portable desk fan, and have it blowing on the laptop.
I don't usually crunch on laptops (IMHO they're just not built to handle to constant high temperature operation this requires), but when I do, that's what I do to keep the temps down.
My son was complaining that he was experiencing noticable slow downs when playing some games for extended periods of time. His laptop has a Core2Duo chip, which is one of the coolest running CPUs around short of the netbook Atom processors. Even so, he was still having temperature problems with it. A little attention to keeping the laptop cooler solved his problems, though, and I strongly recommend that people who crunch on laptops do the same. That goes double if you're running LLR tasks, since they run hotter than the sieve tasks. (I'll spare you the very long explanation of why that is.)
____________
My lucky number is 75898524288+1 | |
|
Message boards :
Seventeen or Bust :
Long estimates to completion time |