PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise

Advanced search

Message boards : Problems and Help : Computation error Genefer CUDA

Author Message
StojagProject donor
Send message
Joined: 8 Jun 10
Posts: 35
ID: 62023
Credit: 42,138,433
RAC: 0
321 LLR Silver: Earned 100,000 credits (115,936)Cullen LLR Silver: Earned 100,000 credits (169,920)ESP LLR Silver: Earned 100,000 credits (104,695)PPS LLR Amethyst: Earned 1,000,000 credits (1,260,929)PSP LLR Silver: Earned 100,000 credits (129,609)SoB LLR Silver: Earned 100,000 credits (137,062)SR5 LLR Gold: Earned 500,000 credits (594,172)SGS LLR Silver: Earned 100,000 credits (115,118)TRP LLR Silver: Earned 100,000 credits (104,510)Woodall LLR Silver: Earned 100,000 credits (106,100)PPS Sieve Jade: Earned 10,000,000 credits (12,999,728)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (103,975)TRP Sieve (suspended) Gold: Earned 500,000 credits (516,130)GFN Turquoise: Earned 5,000,000 credits (5,533,296)PSA Sapphire: Earned 20,000,000 credits (20,146,521)
Message 74525 - Posted: 15 Mar 2014 | 14:42:22 UTC
Last modified: 15 Mar 2014 | 15:02:39 UTC

I ran a genefer CUDA WU and after some time (2 to 5 minutes) the GPU utilisation dropped to zero, while the WU was only 0.5% done.
After some more time it had disappeared from the BOINC task list and in the message window the message

15.03.2014 14:37:17 | PrimeGrid | Task genefer_1048576_380855_1 exited with zero status but no 'finished' file 15.03.2014 14:37:17 | PrimeGrid | If this happens repeatedly you may need to reset the project.


Those 2 messages were reported.

After running a second WU the behaviour occured another time

give nealy the same Messages

15.03.2014 15:39:10 | PrimeGrid | Task genefer_1048576_380855_1 exited with zero status but no 'finished' file 15.03.2014 15:39:10 | PrimeGrid | If this happens repeatedly you may need to reset the project. 15.03.2014 15:39:13 | PrimeGrid | Computation for task genefer_1048576_380855_1 finished 15.03.2014 15:39:13 | PrimeGrid | Output file genefer_1048576_380855_1_0 for task genefer_1048576_380855_1 absent


Has anyone a clue what's the problem.

CUDA WUs from other projects run without problems.

The task just came up in the error list
http://www.primegrid.com/result.php?resultid=532419513

Edit:
It may have been an OC issue, as another task at standard clock rate runs without problems at the moment

Profile Michael GoetzProject donor
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 21 Jan 10
Posts: 13621
ID: 53948
Credit: 272,158,395
RAC: 250,557
The "Shut up already!" badge:  This loud mouth has mansplained on the forums over 10 thousand times!  Sheesh!!!Discovered the World's First GFN-19 prime!!!Discovered 1 mega primeFound 1 prime in the 2018 Tour de PrimesFound 1 prime in the 2019 Tour de PrimesFound 1 prime in the 2020 Tour de PrimesFound 2 primes in the 2021 Tour de Primes321 LLR Ruby: Earned 2,000,000 credits (4,810,528)Cullen LLR Ruby: Earned 2,000,000 credits (3,624,591)ESP LLR Turquoise: Earned 5,000,000 credits (5,021,269)Generalized Cullen/Woodall LLR Ruby: Earned 2,000,000 credits (2,145,754)PPS LLR Sapphire: Earned 20,000,000 credits (20,751,038)PSP LLR Turquoise: Earned 5,000,000 credits (5,197,957)SoB LLR Sapphire: Earned 20,000,000 credits (36,067,618)SR5 LLR Jade: Earned 10,000,000 credits (10,007,110)SGS LLR Ruby: Earned 2,000,000 credits (3,718,606)TRP LLR Turquoise: Earned 5,000,000 credits (5,084,329)Woodall LLR Ruby: Earned 2,000,000 credits (2,963,361)321 Sieve (suspended) Jade: Earned 10,000,000 credits (10,061,196)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,170,256)Generalized Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,059,304)PPS Sieve Sapphire: Earned 20,000,000 credits (22,885,121)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,035,522)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,051,121)AP 26/27 Jade: Earned 10,000,000 credits (10,118,303)WW Sapphire: Earned 20,000,000 credits (27,032,000)GFN Emerald: Earned 50,000,000 credits (77,916,132)PSA Jade: Earned 10,000,000 credits (12,445,029)
Message 74526 - Posted: 15 Mar 2014 | 15:26:50 UTC - in response to Message 74525.

I ran a genefer CUDA WU and after some time (2 to 5 minutes) the GPU utilisation dropped to zero, while the WU was only 0.5% done.
After some more time it had disappeared from the BOINC task list and in the message window the message

Has anyone a clue what's the problem.

CUDA WUs from other projects run without problems.


GeneferCUDA and GenerferOCL are the "FMA3 LLR" of GPU apps. They push the GPU harder than other apps and are significantly more likely to have problems with hardware reliability than other apps. It's not just that they run the GPU harder. They're also using parts of the GPU that most other apps don't use at all, specifically the double precision floating point hardware.

This is the heavily edited stderr output from your task:

genefercuda 3.1.2-9 (Windows 32-bit CUDA)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider

Command line: projects/www.primegrid.com/primegrid_genefer_3_1_2_9_2.12_windows_intelx86__cudaGFN.exe -boinc -q 381010^1048576+1 --device 0

Priority change succeeded.
Generalized Fermat Number Bench 2
SHIFT=5 381010^1048576+1 Time: 5.29 ms/mul. Err: 1.02e-001 5852036 digits
SHIFT=6 381010^1048576+1 Time: 2.25 ms/mul. Err: 4.53e-001 5852036 digits
SHIFT=7 381010^1048576+1 Time: 2.05 ms/mul. Err: 4.73e-001 5852036 digits
SHIFT=8 381010^1048576+1 Time: 2.09 ms/mul. Err: 1.05e-001 5852036 digits
SHIFT=9 381010^1048576+1 Time: 2.35 ms/mul. Err: 1.02e-001 5852036 digits
SHIFT=10 381010^1048576+1 Time: 3.84 ms/mul. Err: 4.84e-001 5852036 digits
Best SHIFT determined experimentally. Saving AUTOSHIFT|genefercuda|3.1.2-9|0|GeForce GTX 770|1110|381010|1048576=7 to genefer.cfg.
GPU=GeForce GTX 770
Global memory=2147483648 Shared memory/block=49152 Registers/block=65536 Warp size=32
Max threads/block=1024
Max thread dim=1024 1024 64
Max grid=2147483647 65535 65535
CC=3.0
Clock=1110 MHz
# of MP=8
No project preference set; using AUTO-SHIFT=7

Starting initialization...
maxErr during b^N initialization = 0.0000 (23.876 seconds).
Testing 381010^1048576+1...
Estimated total run time for 381010^1048576+1 is 11:22:59

maxErr exceeded for 381010^1048576+1, 0.5000 > 0.4500
MaxErr exceeded may be caused by overclocking, overheated GPUs and other transient errors.
Waiting 10 minutes before attempting to continue from last checkpoint...


...

Starting initialization...
maxErr during b^N initialization = 0.0000 (24.284 seconds).
Testing 381010^1048576+1...
Estimated total run time for 381010^1048576+1 is 11:18:27

maxErr exceeded for 381010^1048576+1, 4503599627370496.0000 > 0.4500
MaxErr exceeded may be caused by overclocking, overheated GPUs and other transient errors.
Waiting 10 minutes before attempting to continue from last checkpoint...


...

Resuming 381010^1048576+1 from a checkpoint (19398655 iterations left)
Estimated total run time for 381010^1048576+1 is 11:19:45

maxErr exceeded for 381010^1048576+1, 0.5000 > 0.4500
MaxErr exceeded may be caused by overclocking, overheated GPUs and other transient errors.
Waiting 10 minutes before attempting to continue from last checkpoint...


...

maxErr exceeded for 381010^1048576+1, 0.5000 > 0.4500
MaxErr exceeded may be caused by overclocking, overheated GPUs and other transient errors.
Waiting 10 minutes before attempting to continue from last checkpoint...


...

maxErr exceeded for 381010^1048576+1, 0.5000 > 0.4500
MaxErr exceeded may be caused by overclocking, overheated GPUs and other transient errors.
Waiting 10 minutes before attempting to continue from last checkpoint...


...

maxErr exceeded for 381010^1048576+1, 0.4868 > 0.4500
MaxErr exceeded may be caused by overclocking, overheated GPUs and other transient errors.
Waiting 10 minutes before attempting to continue from last checkpoint...


...

maxErr exceeded for 381010^1048576+1, 0.5000 > 0.4500
MaxErr exceeded may be caused by overclocking, overheated GPUs and other transient errors.
Waiting 10 minutes before attempting to continue from last checkpoint...


...

maxErr exceeded for 381010^1048576+1, 0.4941 > 0.4500
MaxErr exceeded may be caused by overclocking, overheated GPUs and other transient errors.
Waiting 10 minutes before attempting to continue from last checkpoint...


maxErr exceeded for 381010^1048576+1, 0.5000 > 0.4500
MaxErr exceeded may be caused by overclocking, overheated GPUs and other transient errors.
Waiting 10 minutes before attempting to continue from last checkpoint...


Calculation failed after 6 tries, please test your GPU or CPU carefully to ensure it can reliably compute GFN tests.


When Genefer encounters an error, it shuts down for 10 minutes. The pause accomplishes 3 things:

* It allows the GPU to cool off.
* Some errors are transient and environmental, such as the GPU being blocked by Windows Remote Desktop. The delay potentially may allow the condition to clear and for Genefer to continue.
* If the problem is unfixable, the delay of an hour (6 times 10 minutes) prevents a flood of instantly failing tasks.

That's why you were noticing periods where the usage dropped to 0.

In the best case scenario, the user notices that something is wrong and comes here to ask for help. This, therefore, is the best case scenario, and it should be relatively easy to solve the problem.

GeneferCUDA is extremely sensitive to overclocking, especially of the memory clock. It's also been shown to be sensitive to temperature. Even "factory overclocked" often fails, and it's necessary to lower clocks down to the Nvidia stock clocks.

I've got a pair of factory overclocked GTX 580s. Although I *can* run with the shaders at the overclocked factory speed, I did need to lower the memory clock not just to the stock setting, but to slightly below stock (to 1900 MHz) to get stable operation on one of the two GPUs.

The good news is I'm almost certain we can get you working.

I have two suggestions for you:

1) With a GTX 770, I'd suggest selecting GeneferOCL rather than GeneferCUDA. On the 770 the OpenCL app should be faster than the CUDA app. Try that, and see if the problem goes away.

2) If you still experience the problem with the OpenCL app, try lowering the GPU memory clock a bit. That should solve the problem and won't affect the speed too much.
____________
My lucky number is 75898524288+1

Profile Michael GoetzProject donor
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 21 Jan 10
Posts: 13621
ID: 53948
Credit: 272,158,395
RAC: 250,557
The "Shut up already!" badge:  This loud mouth has mansplained on the forums over 10 thousand times!  Sheesh!!!Discovered the World's First GFN-19 prime!!!Discovered 1 mega primeFound 1 prime in the 2018 Tour de PrimesFound 1 prime in the 2019 Tour de PrimesFound 1 prime in the 2020 Tour de PrimesFound 2 primes in the 2021 Tour de Primes321 LLR Ruby: Earned 2,000,000 credits (4,810,528)Cullen LLR Ruby: Earned 2,000,000 credits (3,624,591)ESP LLR Turquoise: Earned 5,000,000 credits (5,021,269)Generalized Cullen/Woodall LLR Ruby: Earned 2,000,000 credits (2,145,754)PPS LLR Sapphire: Earned 20,000,000 credits (20,751,038)PSP LLR Turquoise: Earned 5,000,000 credits (5,197,957)SoB LLR Sapphire: Earned 20,000,000 credits (36,067,618)SR5 LLR Jade: Earned 10,000,000 credits (10,007,110)SGS LLR Ruby: Earned 2,000,000 credits (3,718,606)TRP LLR Turquoise: Earned 5,000,000 credits (5,084,329)Woodall LLR Ruby: Earned 2,000,000 credits (2,963,361)321 Sieve (suspended) Jade: Earned 10,000,000 credits (10,061,196)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,170,256)Generalized Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,059,304)PPS Sieve Sapphire: Earned 20,000,000 credits (22,885,121)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,035,522)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,051,121)AP 26/27 Jade: Earned 10,000,000 credits (10,118,303)WW Sapphire: Earned 20,000,000 credits (27,032,000)GFN Emerald: Earned 50,000,000 credits (77,916,132)PSA Jade: Earned 10,000,000 credits (12,445,029)
Message 74527 - Posted: 15 Mar 2014 | 15:28:34 UTC - in response to Message 74525.


Edit:
It may have been an OC issue, as another task at standard clock rate runs without problems at the moment


Even if you're running stable on GeneferCUDA at stock clocks (and that's exactly what I would expect), you still should try GeneferOCL. On the 600 and 700 series GPUs the OpenCL is faster. Sometimes a LOT faster.
____________
My lucky number is 75898524288+1

StojagProject donor
Send message
Joined: 8 Jun 10
Posts: 35
ID: 62023
Credit: 42,138,433
RAC: 0
321 LLR Silver: Earned 100,000 credits (115,936)Cullen LLR Silver: Earned 100,000 credits (169,920)ESP LLR Silver: Earned 100,000 credits (104,695)PPS LLR Amethyst: Earned 1,000,000 credits (1,260,929)PSP LLR Silver: Earned 100,000 credits (129,609)SoB LLR Silver: Earned 100,000 credits (137,062)SR5 LLR Gold: Earned 500,000 credits (594,172)SGS LLR Silver: Earned 100,000 credits (115,118)TRP LLR Silver: Earned 100,000 credits (104,510)Woodall LLR Silver: Earned 100,000 credits (106,100)PPS Sieve Jade: Earned 10,000,000 credits (12,999,728)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (103,975)TRP Sieve (suspended) Gold: Earned 500,000 credits (516,130)GFN Turquoise: Earned 5,000,000 credits (5,533,296)PSA Sapphire: Earned 20,000,000 credits (20,146,521)
Message 74528 - Posted: 15 Mar 2014 | 15:36:23 UTC

Thanks for the detailed and fast answer

As i experienced the 10 minutes error pauses even on the factory overclocked
settings (although the programm seems to somehow recover from it), i'll change to the opencl apps and give them a try.

Profile Michael GoetzProject donor
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 21 Jan 10
Posts: 13621
ID: 53948
Credit: 272,158,395
RAC: 250,557
The "Shut up already!" badge:  This loud mouth has mansplained on the forums over 10 thousand times!  Sheesh!!!Discovered the World's First GFN-19 prime!!!Discovered 1 mega primeFound 1 prime in the 2018 Tour de PrimesFound 1 prime in the 2019 Tour de PrimesFound 1 prime in the 2020 Tour de PrimesFound 2 primes in the 2021 Tour de Primes321 LLR Ruby: Earned 2,000,000 credits (4,810,528)Cullen LLR Ruby: Earned 2,000,000 credits (3,624,591)ESP LLR Turquoise: Earned 5,000,000 credits (5,021,269)Generalized Cullen/Woodall LLR Ruby: Earned 2,000,000 credits (2,145,754)PPS LLR Sapphire: Earned 20,000,000 credits (20,751,038)PSP LLR Turquoise: Earned 5,000,000 credits (5,197,957)SoB LLR Sapphire: Earned 20,000,000 credits (36,067,618)SR5 LLR Jade: Earned 10,000,000 credits (10,007,110)SGS LLR Ruby: Earned 2,000,000 credits (3,718,606)TRP LLR Turquoise: Earned 5,000,000 credits (5,084,329)Woodall LLR Ruby: Earned 2,000,000 credits (2,963,361)321 Sieve (suspended) Jade: Earned 10,000,000 credits (10,061,196)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,170,256)Generalized Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,059,304)PPS Sieve Sapphire: Earned 20,000,000 credits (22,885,121)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,035,522)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,051,121)AP 26/27 Jade: Earned 10,000,000 credits (10,118,303)WW Sapphire: Earned 20,000,000 credits (27,032,000)GFN Emerald: Earned 50,000,000 credits (77,916,132)PSA Jade: Earned 10,000,000 credits (12,445,029)
Message 74529 - Posted: 15 Mar 2014 | 16:27:36 UTC - in response to Message 74528.

Thanks for the detailed and fast answer

As i experienced the 10 minutes error pauses even on the factory overclocked
settings (although the programm seems to somehow recover from it), i'll change to the opencl apps and give them a try.


A lot of people have problems running at factory overclocked settings and need to go all the way down to stock speeds. My advice, if you still have trouble, is to lower the memory clock first.
____________
My lucky number is 75898524288+1

Profile rroonnaallddProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 3 Jul 09
Posts: 1213
ID: 42893
Credit: 34,634,263
RAC: 0
321 LLR Silver: Earned 100,000 credits (101,692)Cullen LLR Silver: Earned 100,000 credits (104,876)ESP LLR Silver: Earned 100,000 credits (101,979)PPS LLR Silver: Earned 100,000 credits (148,018)PSP LLR Silver: Earned 100,000 credits (140,441)SoB LLR Silver: Earned 100,000 credits (119,475)SR5 LLR Silver: Earned 100,000 credits (120,939)SGS LLR Silver: Earned 100,000 credits (122,783)TRP LLR Silver: Earned 100,000 credits (100,115)Woodall LLR Silver: Earned 100,000 credits (107,459)321 Sieve (suspended) Silver: Earned 100,000 credits (202,757)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (6,908,135)PPS Sieve Sapphire: Earned 20,000,000 credits (25,450,104)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (130,966)TRP Sieve (suspended) Silver: Earned 100,000 credits (201,525)AP 26/27 Silver: Earned 100,000 credits (100,015)GFN Silver: Earned 100,000 credits (246,369)PSA Silver: Earned 100,000 credits (226,594)
Message 74561 - Posted: 16 Mar 2014 | 15:01:29 UTC - in response to Message 74529.

Changing the nVidia clockrates on linux boxes are a big problem nowadays. If you use a current driver, a clockrate change seems to be possible only for older gpus (pre GT/GTXxxx).
IIRC, driverversion 260.x (minimum required driver version for Cuda3.2 applications) was the last one for changing the clockrates on my old GT240 card. After upgrading the burnt out GT240-card to a GTS450eco lost i the ability to change the clockrates. I tried it with "Coolbits = 1 until 5" without success. With "Coolbits = 4 and 5" could and can i only change the fan speed...

____________
Best wishes. Knowledge is power. by jjwhalen

StojagProject donor
Send message
Joined: 8 Jun 10
Posts: 35
ID: 62023
Credit: 42,138,433
RAC: 0
321 LLR Silver: Earned 100,000 credits (115,936)Cullen LLR Silver: Earned 100,000 credits (169,920)ESP LLR Silver: Earned 100,000 credits (104,695)PPS LLR Amethyst: Earned 1,000,000 credits (1,260,929)PSP LLR Silver: Earned 100,000 credits (129,609)SoB LLR Silver: Earned 100,000 credits (137,062)SR5 LLR Gold: Earned 500,000 credits (594,172)SGS LLR Silver: Earned 100,000 credits (115,118)TRP LLR Silver: Earned 100,000 credits (104,510)Woodall LLR Silver: Earned 100,000 credits (106,100)PPS Sieve Jade: Earned 10,000,000 credits (12,999,728)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (103,975)TRP Sieve (suspended) Gold: Earned 500,000 credits (516,130)GFN Turquoise: Earned 5,000,000 credits (5,533,296)PSA Sapphire: Earned 20,000,000 credits (20,146,521)
Message 74758 - Posted: 20 Mar 2014 | 10:23:20 UTC

A lot of people have problems running at factory overclocked settings and need to go all the way down to stock speeds. My advice, if you still have trouble, is to lower the memory clock first.


Thanks for those helpfull advises.
After some tries and many failures i succeded in completing a openCL WU some days ago.
And now it even got validated :)
That gives me now the confidence to try a GFN-WR WU. If that succeeds too, i'll switch from sieving to the GFN WU.

The downside is that i have to run it with the the lowest possible Core and Memory Clock rate.

Profile rroonnaallddProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 3 Jul 09
Posts: 1213
ID: 42893
Credit: 34,634,263
RAC: 0
321 LLR Silver: Earned 100,000 credits (101,692)Cullen LLR Silver: Earned 100,000 credits (104,876)ESP LLR Silver: Earned 100,000 credits (101,979)PPS LLR Silver: Earned 100,000 credits (148,018)PSP LLR Silver: Earned 100,000 credits (140,441)SoB LLR Silver: Earned 100,000 credits (119,475)SR5 LLR Silver: Earned 100,000 credits (120,939)SGS LLR Silver: Earned 100,000 credits (122,783)TRP LLR Silver: Earned 100,000 credits (100,115)Woodall LLR Silver: Earned 100,000 credits (107,459)321 Sieve (suspended) Silver: Earned 100,000 credits (202,757)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (6,908,135)PPS Sieve Sapphire: Earned 20,000,000 credits (25,450,104)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (130,966)TRP Sieve (suspended) Silver: Earned 100,000 credits (201,525)AP 26/27 Silver: Earned 100,000 credits (100,015)GFN Silver: Earned 100,000 credits (246,369)PSA Silver: Earned 100,000 credits (226,594)
Message 74760 - Posted: 20 Mar 2014 | 10:28:08 UTC - in response to Message 74759.

Try to lowering only the memory clockrate in the first step. The second step would be lowering the core too.
____________
Best wishes. Knowledge is power. by jjwhalen

Message boards : Problems and Help : Computation error Genefer CUDA

[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2021 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 4.63, 4.70, 4.50
Generated 8 May 2021 | 3:31:35 UTC