PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise

Advanced search

Message boards : Generalized Cullen/Woodall prime search : GCW units not surviving client restart

Author Message
River~~
Send message
Joined: 17 Mar 07
Posts: 338
ID: 6533
Credit: 11,333,866
RAC: 42,311
321 LLR Silver: Earned 100,000 credits (124,889)Cullen LLR Silver: Earned 100,000 credits (200,779)ESP LLR Silver: Earned 100,000 credits (112,791)Generalized Cullen/Woodall LLR Silver: Earned 100,000 credits (106,156)PPS LLR Amethyst: Earned 1,000,000 credits (1,358,025)PSP LLR Silver: Earned 100,000 credits (150,832)SoB LLR Gold: Earned 500,000 credits (573,744)SR5 LLR Gold: Earned 500,000 credits (500,731)SGS LLR Silver: Earned 100,000 credits (479,242)TRP LLR Silver: Earned 100,000 credits (328,373)Woodall LLR Silver: Earned 100,000 credits (119,260)Generalized Cullen/Woodall Sieve Ruby: Earned 2,000,000 credits (2,904,683)PPS Sieve Silver: Earned 100,000 credits (326,987)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (174,708)TRP Sieve (suspended) Gold: Earned 500,000 credits (505,558)AP 26/27 Gold: Earned 500,000 credits (598,364)GFN Ruby: Earned 2,000,000 credits (2,766,163)
Message 122939 - Posted: 27 Nov 2018 | 16:02:18 UTC

I have a crop of GCW workunits that die at a client restart, usually after a reboot.

The problem is *nearly* replicable by using systemctl restart boinc.client.service. About one in ten survive this test.

Symptoms are that immediately on restart of the client the workunit finishes, uploads, and goes into the wait for validation, which it inevitable fails in due course.

This is a loss of credit to me for the crunching before the restart (though I do not expect credit where work fails, I do not expect restarting to trigger the "end of task" sequence prematurely). It also means (as with any invalid worK) the original wingman gets mixed feelings: they are first finisher, but the new wingman is not appointed till after the original wingman's task completes and they have no chance of being first.

This seems specific to llrGCW:

I also tried this with AP, a variety of other LLR tasks, and two different GFN n values. All of these survive the restart without problem.

At present I have only seen this on my Qubes / Xen machine, but will not have time to check out this behaviour on my "real" machines for a few days, and will update this thread by the weekend after I have done those tests. My hunch, without testing, is that this will turn out to be another Qubes-specific issue.

I would be glad of any immediate comments if this kind of issue has arisen before?

I did not see it before the server move, but was not trying to run llrGCW so I have no reason to suspect the migration.

R~~

GCW does not seem to have any huge memry requirements that might trigger this, but if there are significant differences with other LLR tasks please give me a heads up.


____________
My computers found:

9831*21441403+1 is a quadhectokilo prime prime, ie >400,000 digits ;)

252031090528237591 + 65521*149*23*19*17*13*11*7*5*3*2*n is prime for every n in { 0..20 } (an arithemtic progression of 21 primes)

River~~
Send message
Joined: 17 Mar 07
Posts: 338
ID: 6533
Credit: 11,333,866
RAC: 42,311
321 LLR Silver: Earned 100,000 credits (124,889)Cullen LLR Silver: Earned 100,000 credits (200,779)ESP LLR Silver: Earned 100,000 credits (112,791)Generalized Cullen/Woodall LLR Silver: Earned 100,000 credits (106,156)PPS LLR Amethyst: Earned 1,000,000 credits (1,358,025)PSP LLR Silver: Earned 100,000 credits (150,832)SoB LLR Gold: Earned 500,000 credits (573,744)SR5 LLR Gold: Earned 500,000 credits (500,731)SGS LLR Silver: Earned 100,000 credits (479,242)TRP LLR Silver: Earned 100,000 credits (328,373)Woodall LLR Silver: Earned 100,000 credits (119,260)Generalized Cullen/Woodall Sieve Ruby: Earned 2,000,000 credits (2,904,683)PPS Sieve Silver: Earned 100,000 credits (326,987)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (174,708)TRP Sieve (suspended) Gold: Earned 500,000 credits (505,558)AP 26/27 Gold: Earned 500,000 credits (598,364)GFN Ruby: Earned 2,000,000 credits (2,766,163)
Message 122940 - Posted: 27 Nov 2018 | 16:13:48 UTC
Last modified: 27 Nov 2018 | 16:43:33 UTC

Examples:

I restarted the client at about 1603 by the computer's clock while two llrGCW tasks were running, but without rebooting the virtual machine -- or the host ;).

this unit survived and carried on with about 43 mins elapsed. THis WU has survived several restarts.

whereas this one exited showing "success" as the outcome on the task's web page, but cannot conceivably validate with a run time of only about 35 mins

Here is the event log for this client start, showing the WU ending and being reported and showing the next one starting.

Tue 27 Nov 2018 16:03:15 GMT | | Starting BOINC client version 7.6.33 for x86_64-pc-linux-gnu Tue 27 Nov 2018 16:03:15 GMT | | log flags: file_xfer, task, cpu_sched Tue 27 Nov 2018 16:03:15 GMT | | Libraries: libcurl/7.52.1 OpenSSL/1.0.2l zlib/1.2.8 libidn2/0.16 libpsl/0.17.0 (+libidn2/0.16) libssh2/1.7.0 nghttp2/1.18.1 librtmp/2.3 Tue 27 Nov 2018 16:03:15 GMT | | Data directory: /var/lib/boinc-client Tue 27 Nov 2018 16:03:15 GMT | | No usable GPUs found Tue 27 Nov 2018 16:03:15 GMT | | Host name: G Tue 27 Nov 2018 16:03:15 GMT | | Processor: 2 GenuineIntel Intel(R) Core(TM) m5-6Y54 CPU @ 1.10GHz [Family 6 Model 78 Stepping 3] Tue 27 Nov 2018 16:03:15 GMT | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush acpi mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves Tue 27 Nov 2018 16:03:15 GMT | | OS: Linux: 4.14.18-1.pvops.qubes.x86_64 Tue 27 Nov 2018 16:03:15 GMT | | Memory: 2.09 GB physical, 1024.00 MB virtual Tue 27 Nov 2018 16:03:15 GMT | | Disk: 4.86 GB total, 1.65 GB free Tue 27 Nov 2018 16:03:15 GMT | | Local time is UTC +0 hours Tue 27 Nov 2018 16:03:15 GMT | | Config: GUI RPCs allowed from: Tue 27 Nov 2018 16:03:15 GMT | PrimeGrid | URL http://www.primegrid.com/; Computer ID 941669; resource share 169 Tue 27 Nov 2018 16:03:15 GMT | PrimeGrid | General prefs: from PrimeGrid (last modified 10-Jul-2017 15:20:22) Tue 27 Nov 2018 16:03:15 GMT | PrimeGrid | Computer location: Pluto Tue 27 Nov 2018 16:03:15 GMT | PrimeGrid | General prefs: no separate prefs for Pluto; using your defaults Tue 27 Nov 2018 16:03:15 GMT | | Reading preferences override file Tue 27 Nov 2018 16:03:15 GMT | | Preferences: Tue 27 Nov 2018 16:03:15 GMT | | max memory usage when active: 2119.44MB Tue 27 Nov 2018 16:03:15 GMT | | max memory usage when idle: 2119.44MB Tue 27 Nov 2018 16:03:15 GMT | | max disk usage: 1.78GB Tue 27 Nov 2018 16:03:15 GMT | | suspend work if non-BOINC CPU load exceeds 65% Tue 27 Nov 2018 16:03:15 GMT | | (to change preferences, visit a project web site or select Preferences in the Manager) Tue 27 Nov 2018 16:03:15 GMT | | gui_rpc_auth.cfg is empty - no GUI RPC password protection Tue 27 Nov 2018 16:03:16 GMT | PrimeGrid | [cpu_sched] Restarting task llrGCW_307836160_0 using llrGCW version 801 in slot 1 Tue 27 Nov 2018 16:03:18 GMT | PrimeGrid | [cpu_sched] Restarting task llrGCW_307835931_3 using llrGCW version 801 in slot 0 Tue 27 Nov 2018 16:03:21 GMT | PrimeGrid | Computation for task llrGCW_307835931_3 finished Tue 27 Nov 2018 16:03:23 GMT | PrimeGrid | Started upload of llrGCW_307835931_3_r303567640_0 Tue 27 Nov 2018 16:03:24 GMT | PrimeGrid | Finished upload of llrGCW_307835931_3_r303567640_0 Tue 27 Nov 2018 16:03:24 GMT | PrimeGrid | Started download of llrGCW_307836266 Tue 27 Nov 2018 16:03:25 GMT | PrimeGrid | Finished download of llrGCW_307836266 Tue 27 Nov 2018 16:03:26 GMT | PrimeGrid | Starting task llrGCW_307836266_1 Tue 27 Nov 2018 16:03:26 GMT | PrimeGrid | [cpu_sched] Starting task llrGCW_307836266_1 using llrGCW version 801 in slot 0 Tue 27 Nov 2018 16:33:31 GMT | PrimeGrid | work fetch suspended by user


Original Example:

When I initially spotted this behaviour, two WU were running and had many hours crunching on each, and they both failed on reboot of the host and virtual machine. They are this one and that one

River~~
Send message
Joined: 17 Mar 07
Posts: 338
ID: 6533
Credit: 11,333,866
RAC: 42,311
321 LLR Silver: Earned 100,000 credits (124,889)Cullen LLR Silver: Earned 100,000 credits (200,779)ESP LLR Silver: Earned 100,000 credits (112,791)Generalized Cullen/Woodall LLR Silver: Earned 100,000 credits (106,156)PPS LLR Amethyst: Earned 1,000,000 credits (1,358,025)PSP LLR Silver: Earned 100,000 credits (150,832)SoB LLR Gold: Earned 500,000 credits (573,744)SR5 LLR Gold: Earned 500,000 credits (500,731)SGS LLR Silver: Earned 100,000 credits (479,242)TRP LLR Silver: Earned 100,000 credits (328,373)Woodall LLR Silver: Earned 100,000 credits (119,260)Generalized Cullen/Woodall Sieve Ruby: Earned 2,000,000 credits (2,904,683)PPS Sieve Silver: Earned 100,000 credits (326,987)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (174,708)TRP Sieve (suspended) Gold: Earned 500,000 credits (505,558)AP 26/27 Gold: Earned 500,000 credits (598,364)GFN Ruby: Earned 2,000,000 credits (2,766,163)
Message 122968 - Posted: 28 Nov 2018 | 14:04:27 UTC
Last modified: 28 Nov 2018 | 14:07:49 UTC

This is NOT just Xen, NOT just Qubes

I have just replicated this behaviour on a "real" computer running Linux as a real OS

That means the behaviour has been seen on two computers. running two different Debian derivatives, one running on "real metal" and one on a VM.


The Linux Mint machine has been used for some time on PG without displaying this behaviour on any other type of task, and the software is the same as always, other than regular updating from the OS repositories.

This is the first time the machine has ever downloaded a GCW task, and each of the first eight ended early (but with a "success" code) when the client was restated. The OS was NOT rebooted either time. The first 4 failed when the client was restarted after some 18 hours running, the second four when the client was restarted after only a very short run.

These tasks are listed here - if there are more than eight look for the oldest ones, reported between 1100 and 1200 UT on Nov 28.
____________
My computers found:

9831*21441403+1 is a quadhectokilo prime prime, ie >400,000 digits ;)

252031090528237591 + 65521*149*23*19*17*13*11*7*5*3*2*n is prime for every n in { 0..20 } (an arithemtic progression of 21 primes)

Post to thread

Message boards : Generalized Cullen/Woodall prime search : GCW units not surviving client restart

[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2018 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 0.53, 0.76, 0.97
Generated 17 Dec 2018 | 11:45:00 UTC