Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Seventeen or Bust :
SoB tasks can really be a S.O.B. to accomplish
Author |
Message |
|
I'm finding this to be quite a challenging sub project. It does take a serious amount of CPU time, even while multi threading them with 8 cores it takes me over 2 days for a single task.
I've gotten a few invalid tasks as well as a couple inconclusive tasks.
http://www.primegrid.com/results.php?userid=172547&offset=0&show_names=0&state=3&appid=13
http://www.primegrid.com/results.php?userid=172547&offset=0&show_names=0&state=6&appid=13
At between 1.2 - 1.3 MILLION seconds of CPU time it can be frustrating. But... this makes me want to try harder for this as it's more difficult to achieve.
So SoB can be a S.O.B. at times but I do feel it is worth the effort. So let the crunching continue!
This is a great incentive - "This project has a 50% long job credit bonus and a 10% conjecture credit bonus." | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14011 ID: 53948 Credit: 433,157,676 RAC: 1,017,804
                               
|
If it were me, I would fix whatever is wrong with the computer. Might as well abort everything that's in progress since there's a good chance that they're already corrupted and further processing on them would just be wasted.
If the CPU or memory is overclocked, go back to base clocks and see if you stop getting bad results.
Next thing to try is removing the ram and cleaning the contacts. Often, just re-seating the ram can fix the problem, but if you're going to do that you might as well grab a pencil eraser and clean off the contacts while you have the memory out.
It is not normal to get even a single bad result, and you have 3 and probably 5, out of a total of 9.
____________
My lucky number is 75898524288+1 | |
|
Monkeydee Volunteer tester
 Send message
Joined: 8 Dec 13 Posts: 540 ID: 284516 Credit: 1,529,047,472 RAC: 770,698
                            
|
I had an FX-8120 a few years back that took over a week per SoB unit, but there was no MT and I ran 7 at a time on it. The newer Intels, like your i5-6400, are significantly faster at SoB than the AMD FX line.
That said, you shouldn't be getting errors no matter how long the task takes. It might be a good idea to do some testing on that FX-8350 system to see what the trouble could be.
____________
My Primes
Badge Score: 4*2 + 6*2 + 7*4 + 8*9 + 11*3 + 12*1 = 165
| |
|
|
If it were me, I would fix whatever is wrong with the computer. Might as well abort everything that's in progress since there's a good chance that they're already corrupted and further processing on them would just be wasted.
If the CPU or memory is overclocked, go back to base clocks and see if you stop getting bad results.
Next thing to try is removing the ram and cleaning the contacts. Often, just re-seating the ram can fix the problem, but if you're going to do that you might as well grab a pencil eraser and clean off the contacts while you have the memory out.
It is not normal to get even a single bad result, and you have 3 and probably 5, out of a total of 9.
Hmm, thanks for the input. I will look into it. Those times last May and June when the results occurred I was running A LOT of different stuff at once including different projects. I was also starting/stopping them often. Now i have mainly one task going instead of multiple ones and running them straight through. I will see what happens. I also have some running on other machines. The memory is new and not overclocked at all.
The only bad results have been on these SoB tasks. Though there are a GCW and Cullen task as inconclusive but another machine, not mine, had the same issue.
Everything else seems to be working well and without error issues aside from a few PPS sieve tasks on another machine which have been corrected and some GPU tasks on this machine which have also been corrected.
I will try returning to base clock speeds. | |
|
|
I had an FX-8120 a few years back that took over a week per SoB unit, but there was no MT and I ran 7 at a time on it. The newer Intels, like your i5-6400, are significantly faster at SoB than the AMD FX line.
That said, you shouldn't be getting errors no matter how long the task takes. It might be a good idea to do some testing on that FX-8350 system to see what the trouble could be.
So far this is the only known trouble I've had. I will run mem86 or whatever that is to check my memory when I get a chance. For now I returned everything to base speeds and will restart the one task on this machine and see what happens now. | |
|
|
Windows memory diagnostic tool, which I just found out about, detected no errors.
Base CPU clock restored.
Current SoB task aborted, new started. Will see how the next few turn out. | |
|
Dave  Send message
Joined: 13 Feb 12 Posts: 3207 ID: 130544 Credit: 2,285,331,764 RAC: 787,864
                           
|
A few years ago I had 4 SoBs going & after stopping & restarting several times over their 4-day period all of a sudden upon restarting again they all went BOOM & blew up. Were 3 days in so 12 days of processing wasted. | |
|
|
A few years ago I had 4 SoBs going & after stopping & restarting several times over their 4-day period all of a sudden upon restarting again they all went BOOM & blew up. Were 3 days in so 12 days of processing wasted.
lol, yep, that could be similar to my issue. Its hard to keep the running for 2-3 days on this main machine with other things going on. I will try. | |
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3233 ID: 50683 Credit: 151,443,349 RAC: 99,549
                         
|
One more thing about AMD CPU ( that was heapen several times)
From unknown reason AMD CPU start to freeze from time time, then work few days perfectly.
Solution: remove cooler, remove CPU from socket, put CPU to socket again and put cooler back. I do that and next two years was no problem at all. So it looks like something is shifted in socket ( maybe from fan vibration) or from hot-cool period ( when computer is on or off)
Dont know is that case with you but cannot be hurt to do that.
Memory diagnostic from Windows will not find many errors at all. use memtest86+: it is very good tool and will find far more errors then windows diag tool
____________
92*10^1585996-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! | |
|
|
One more thing about AMD CPU ( that was heapen several times)
From unknown reason AMD CPU start to freeze from time time, then work few days perfectly.
Solution: remove cooler, remove CPU from socket, put CPU to socket again and put cooler back. I do that and next two years was no problem at all. So it looks like something is shifted in socket ( maybe from fan vibration) or from hot-cool period ( when computer is on or off)
Dont know is that case with you but cannot be hurt to do that.
Memory diagnostic from Windows will not find many errors at all. use memtest86+: it is very good tool and will find far more errors then windows diag tool
Thanks for the tip with the cpu and memtes86+. I don't think it's a socket issue. And memtest96+ was my original choice. I will run that through a few complete passes and see what the results are. Thanks again.
I am saving for a major upgrade within the next year or two. Basically everything except the hdds, blu ray drive, and the video card. Going to go Intel this next build for sure. Just better number crunching.
| |
|
|
memtest86+ running on the 8350 machine. First pass won't be done b4 I go to work so I'm going to let it run several passes as I mentioned. About 100min to check the 32gb. Will see what the results are by this evening.
Thanks again for the help everyone. | |
|
|
memtest86+ running on the 8350 machine. First pass won't be done b4 I go to work so I'm going to let it run several passes as I mentioned. About 100min to check the 32gb. Will see what the results are by this evening.
Thanks again for the help everyone.
7 hrs later, no memory errors detected. Back to crunching. | |
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3233 ID: 50683 Credit: 151,443,349 RAC: 99,549
                         
|
memtest86+ running on the 8350 machine. First pass won't be done b4 I go to work so I'm going to let it run several passes as I mentioned. About 100min to check the 32gb. Will see what the results are by this evening.
Thanks again for the help everyone.
7 hrs later, no memory errors detected. Back to crunching.
So memory is not faulty, but problem still exist. So you still have problem, and using those type of SOB task ,you in the time throw away big time for nothing ( if WU completed and marked as invalid)
Stop doing SOB, take shortest task and observe you computer. You can back to SOB every time later
____________
92*10^1585996-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! | |
|
|
memtest86+ running on the 8350 machine. First pass won't be done b4 I go to work so I'm going to let it run several passes as I mentioned. About 100min to check the 32gb. Will see what the results are by this evening.
Thanks again for the help everyone.
7 hrs later, no memory errors detected. Back to crunching.
So memory is not faulty, but problem still exist. So you still have problem, and using those type of SOB task ,you in the time throw away big time for nothing ( if WU completed and marked as invalid)
Stop doing SOB, take shortest task and observe you computer. You can back to SOB every time later
The CPU clock speeds have been un-overclocked, so the need to run more SOB still exists to see if overclocking caused the issue. | |
|
|
As Michael Goetz said above, it seems to be this AMD 8350 machine. I don't know what is causing the problems, but it seems to just not be a good machine for SoB tasks. On the two other Intel machines, no issues. I checked the RAM, I returned the CPU to stock speeds, and still had issues with the main machine. It is my main machine and I do a lot of different things with it so no more SoB for now with that machine.
I just got my GOLD badge for SoB so I will be crunching other sub-projects with this AMD 8350.
Thank you everyone for your suggestions, ideas, and support.
| |
|
|
Penguin wrote: Penguin wrote: memtest86+ running on the 8350 machine.
7 hrs later, no memory errors detected. Back to crunching.
At work, we had a PC on which some application programs kept crashing at random. Not very frequently, but enough to be perceived as a problem to be corrected. Memtest86+ took more than a day to hit a memory fault on this machine. We replaced the RAM and the problems were gone.
The difference between SoB-LLR and the other LLR subprojects at PrimeGrid is that SoB currently has the largest FFT sizes, AFAIK. (Somebody correct me if I'm wrong about this.)
FX-8350 (i.e. 4M/8C Piledriver) has got 2 MB L2 cache per module plus 8 MB L3 cache shared across all modules. (I can't find right now whether it's inclusive or exclusive L3.) This amount of cache is less than half of what a single SoB-LLR task needs for its "hot" data (see e.g. this post), but enough for most of the other LLR subprojects (if the bigger ones among them are configured for running only 1 task at a time on desktop processors such as yours). This means that SoB-LLR needs to read and write RAM a lot more frequently than the other LLR subprojects. Combined with long durations per task, this makes for a lot more memory accesses per task in comparison to other LLR subprojects.
Hence an unstable RAM module, or bad contact somewhere between CPU and RAM, could still be candidates for the errors which you saw, IMO. | |
|
|
I am having problems getting S.O.B. jobs to finish within the deadline. I currently run a stock (non-overclocked) I7 8550U, 12 Gb main memory, although not multi-threaded. I currently have a task that is 86.49% complete with 4 days to go showing, (that figure does not come down as fast as it should, though...) that was due on Dec 26, 2018 - it shows as running for 28 days, so far! Any hints?
____________
| |
|
Monkeydee Volunteer tester
 Send message
Joined: 8 Dec 13 Posts: 540 ID: 284516 Credit: 1,529,047,472 RAC: 770,698
                            
|
I am having problems getting S.O.B. jobs to finish within the deadline. I currently run a stock (non-overclocked) I7 8550U, 12 Gb main memory, although not multi-threaded. I currently have a task that is 86.49% complete with 4 days to go showing, (that figure does not come down as fast as it should, though...) that was due on Dec 26, 2018 - it shows as running for 28 days, so far! Any hints?
The original deadline may have been December 26th, but there are variable deadlines in place. So you can look on your Tasks page and see that the current deadline is January 11th.
The initial deadline is currently 35 days and it will extend out to 140 days as long as you continue to do work. Even if you pass the 140 days there is a possibility (not a guarantee) of still being able to return that task for full credit.
Are you running 24/7 or just some hours of the day?
Multithreading can definitely help reduce the run time by a significant amount. So it is recommended for larger tasks where meeting the initial deadline can be an issue.
You have a quad core so in theory you can reduce the run time by up to 75% with running the one task on all four cores. However, this may cause more heat to be made and since that's a "U" CPU it's probably in a small device with little ventilation. So keeping an eye on the temperatures would be a good idea.
____________
My Primes
Badge Score: 4*2 + 6*2 + 7*4 + 8*9 + 11*3 + 12*1 = 165
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14011 ID: 53948 Credit: 433,157,676 RAC: 1,017,804
                               
|
I am having problems getting S.O.B. jobs to finish within the deadline. I currently run a stock (non-overclocked) I7 8550U, 12 Gb main memory, although not multi-threaded. I currently have a task that is 86.49% complete with 4 days to go showing, (that figure does not come down as fast as it should, though...) that was due on Dec 26, 2018 - it shows as running for 28 days, so far! Any hints?
The following two paragraphs apply to a "normal" usage pattern where you would be running 8 SoB tasks on your computer. I realize that doesn't describe your situation; I'll get to that afterwards.
You have a hyperthreaded CPU (4 cores/8 threads), and hyperthreading hurts overall throughput. If you run 8 LLR tasks (such as SoB) at once, they take MORE than twice as long as running just 4 tasks at once. If you haven't already done so, you should set your boinc preferences to "use only 50% of the processors" to effectively turn off hyperthreading.
If you also set up multi-threading, running one task consuming the 4 cores, you'll find that the task actually runs MORE than 4 times as fast. Between disabling hyperthreading and using multi-threading, you'll get tasks that run more than 8 times faster.
Now back to your specific situation. There's no way your computer should take that long to run an SoB task.
There has to be some reason that the SoB task is running so slowly, and it's not anything that's visible from our end. It's up to you to describe what else the computer is doing, or what else is running on the computer, if you want help from here. We simply don't have the visibility into what else is happening in your computer.
You have only one task running on there. I don't know if the rest of the computer is idle, or if you're running tasks from other projects. What I can say is that I have a much older and slower Haswell i3 laptop than can do an SoB in 5 to 6 days using 2 cores, so, at most, your computer should need less than 12 days to run an SoB on a single core.
Looking in the database, computers with the same CPU as yours have completed SoB tasks using between 8 and 13 days of CPU time. Your computer has been working on this task for 44 days. Something's clearly wrong.
____________
My lucky number is 75898524288+1 | |
|
|
I wonder whether multithreading makes that much of a difference on a KBL-R CPU which is power-limited down to 10/15/25(?) W.
KBL-S at 4.2 GHz on dual-channel RAM, with LLR's multithreading enabled, completes a SoB task in less than a day. Running SoB-LLR single-threaded but nothing else on this CPU at the same clock takes about 3 times as long.
Let's say you run the single-threaded SoB-LLR task and nothing else on your KBL-R at 1 GHz (with dual-channel RAM I presume). This should take less than 12 days. This coincides with what Michael saw in the database. | |
|
|
To answer the questions raised:
1) yes, that computer runs 24/7, power failures and Micro$oft Win 10 updates permitting.
2) that computer is also (currently) running 7 SETi@home CPU tasks, and a SAH Beta GPU task.
3) I don't know if I have Dual-channel RAM with two sticks of RAM in this computer - it is under warranty, and I don't want to unscrew the back to find out... With 12Gb, I probably have 8 and a 4 Gb sticks of SODIMM, so I only have the benefit of dual channel on 8 Gb...
Notice that I say "that computer", as I have 7 currently running, (5 laptops) with at least another three waiting for me to fix them! (I just got back from a 'round the world trip Dec 26, and am only now getting my biologic clock back on my home time zone {American Pacific Standard Time...[there is a Australian Pacific time zone, too, as I understand it...]})
When I gave my figures, I had to use the figure in the "elapsed" column, as I don't know when this task actually started, not being home at the time (see last paragraph...) that the WU started. I would not be surprised if the discrepancy is because the WU started and was then laid aside for other higher-priority WU's, not necessarily from PrimeGrid, or because PrimeGrid ran out of allotted "Resource Share" time... (as I recall, I got 8 PG WU's at once, only one of which was this SOB task...) | |
|
|
So, do you run the SoB-LLR task together with SETI@Home CPU tasks and/or with a S@H GPU task? All of these are memory intensive, and degrade each others' performance.
My suggestion:
1. Suspend the SoB task.
2. Finish all SETI tasks and don't fetch more.
3. Create a projects\www.primegrid.com\app_config.xml which specifies multithreading for llrSOB. See e.g. the top of the current Conjunction of Venus & Jupiter Challenge forum thread for an example file.
4. Shutdown and restart the boinc-client to bring the app_config.xml into effect.
5. Resume the SoB task and run it exclusively. Double-check in taskmanager that the task is indeed using more than one core.
Most of PrimeGrid's LLR based subprojects run very poorly if not multithreaded (or rather: if they they run together with other tasks and have to fight them for cache and for RAM bandwidth). On another note, I was never satisfied with boinc-client's built-in task scheduling policy whenever I had a mix of singlethreaded and multithreaded applications. Therefore I never mix them anymore; when I want to run a multithreaded application, I run it exclusively and have all other BOINC projects disabled. | |
|
dukebgVolunteer tester
 Send message
Joined: 21 Nov 17 Posts: 242 ID: 950482 Credit: 23,670,125 RAC: 0
                  
|
I don't know if I have Dual-channel RAM with two sticks of RAM in this computer - it is under warranty, and I don't want to unscrew the back to find out...
You don't need to look at it physically to find out at all. Actually, how would you even tell by looking into the case if the stick is single-channel or dual-channel...
Instead download and run cpu-z and have a look at the memory tab there. You can also look at the SPD tab to see details about what stick is in what slot. | |
|
Message boards :
Seventeen or Bust :
SoB tasks can really be a S.O.B. to accomplish |