Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Seventeen or Bust :
"current CPU time" reset
Author |
Message |
|
Just noted that this one made a weird move: "current cpu time" (and "checktpoint cpu time") have been reset. Those are ~1.5day instead of +20day. stdoutdae.txt has this to say:
13-Mar-2015 06:53:12 [---] Suspending network activity - user request
13-Mar-2015 07:06:43 [PrimeGrid] Task llr_sob_222376598_3 exited with zero status but no 'finished' file
13-Mar-2015 07:06:43 [PrimeGrid] If this happens repeatedly you may need to reset the project.
13-Mar-2015 07:06:43 [---] [wfd] Request work fetch: application exited
13-Mar-2015 07:06:44 [PrimeGrid] Restarting task llr_sob_222376598_3 using llrSOB version 624
13-Mar-2015 07:33:37 [---] [wfd] Request work fetch: Backoff ended for ABC@home
and stderr.txt of WU (excerpt):
wrapper: running primegrid_llr -d
FFT length: 2400K
07:06:43 (25533): No heartbeat from core client for 30 sec - exiting
BOINC llr wrapper
Using Jean Penne's llr
Shall I be pessimistic and abort?
____________
I'm counting for science,
Points just make me sick. | |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 1033 ID: 301928 Credit: 543,608,970 RAC: 7,830
                         
|
Assuming your hardware is OK, it's OK to continue because the LLR itself should be restarted from it's own checkpoint. What does "Percent complete" says?
According to your logs, LLR application "lost connection" with Boinc core due to unknown reasons, and clearly exited, then was restarted by core. It's not clear why it happened. Are you using some antivirus software? You must exclude Boinc data directory from scanning, it might interfere - lock files and do other bad things.
| |
|
|
It says accordingly with +20day: 91.85%, estimated cpu time remaining: 2.19day
____________
I'm counting for science,
Points just make me sick. | |
|
JimB Honorary cruncher Send message
Joined: 4 Aug 11 Posts: 920 ID: 107307 Credit: 989,246,873 RAC: 201,706
                     
|
I can make my linux boxes do this repeatably by running a CPU-intensive program. In my case it's a multi-core PAR2 creator. BOINC will give the same "No heartbeat from core client for 30 sec" error and restart. I don't believe I've ever had a problem with the final result being screwed up, but I remember jobs aborting if that error happened enough times in a row. So if I want to run something like PAR2 creator now, I shut down BOINC first. | |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 1033 ID: 301928 Credit: 543,608,970 RAC: 7,830
                         
|
It says accordingly with +20day: 91.85%, estimated cpu time remaining: 2.19day
yes, it was restarted from checkpoint. It's not much time left to go, so go on. Let us know will it be validated or not - some projects require a minimum limit of elapsed CPU time reported, and your data is reset. I don't know is it enabled for SoB or not, but I hope PG team could help you if it happens. | |
|
JimB Honorary cruncher Send message
Joined: 4 Aug 11 Posts: 920 ID: 107307 Credit: 989,246,873 RAC: 201,706
                     
|
Our credit system doesn't depend on CPU expended. If the result is correct (and there's no reason it shouldn't be) then the task will validate normally. | |
|
|
Well, it's not entirely so -- remember the 3-seconds-error, yes, it's not this case, but... OK then, let's wait for two days now.
____________
I'm counting for science,
Points just make me sick. | |
|
|
I noticed a dramatic increase of 'No heartbeat ...' errors with BOINC version 7.4.36_windows_x86_64 on 3 different systems (2 Intels; 1 AMD). Downgraded back to 7.4.27_windows_x86_64 and problem disappeared. This doesn't apply to you but may help others using version 7.4.36_windows_x86_64 and are having this problem too.
____________
Largest Primes to Date:
As Double Checker: SR5 109208*5^1816285+1 Dgts-1,269,534
As Initial Finder: SR5 243944*5^1258576-1 Dgts-879,713
| |
|
|
It made it through. And CPU time has nothing to do with reality.
____________
I'm counting for science,
Points just make me sick. | |
|
Message boards :
Seventeen or Bust :
"current CPU time" reset |