Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Problems and Help :
Scheduler request failed: Couldn't connect to server
Author |
Message |
TimT  Send message
Joined: 2 Dec 11 Posts: 437 ID: 121414 Credit: 1,527,733,977 RAC: 641,266
                          
|
One of my machines suddenly started behaving strangely yesterday and I have not been able to figure out the issue. I'm hoping someone else here might have some suggestions...
This is a long time running machine on the same network with my other computers, it will run several tasks, then start failing to upload or download new ones. it seems to sit in this state until I hit 'project update' in the boinc manager, then it connects, grabs a new set of workunits and crunches away. after a round (or several) of workunits, it will get hung again.
Here's a relevant section of the boinc log showing some working connections leading to the failure:
3538 2/13/2020 5:59:59 PM [http_xfer] [ID#527] HTTP: wrote 34 bytes
3539 PrimeGrid 2/13/2020 5:59:59 PM [http] [ID#527] Info: Connection #682 to host www.primegrid.com left intact
3540 PrimeGrid 2/13/2020 5:59:59 PM [http] [ID#528] Received header from server: HTTP/1.1 200 OK
3541 PrimeGrid 2/13/2020 5:59:59 PM [http] [ID#528] Received header from server: Date: Thu, 13 Feb 2020 22:59:59 GMT
3542 PrimeGrid 2/13/2020 5:59:59 PM [http] [ID#528] Received header from server: Server: Apache/2.4.25 (Debian)
3543 PrimeGrid 2/13/2020 5:59:59 PM [http] [ID#528] Received header from server: Content-Length: 34
3544 PrimeGrid 2/13/2020 5:59:59 PM [http] [ID#528] Received header from server: Content-Type: text/plain;charset=UTF-8
3545 PrimeGrid 2/13/2020 5:59:59 PM [http] [ID#528] Received header from server:
3546 PrimeGrid 2/13/2020 5:59:59 PM [http] [ID#528] Received header from server: 1000000000:P:1:2:257
3547 PrimeGrid 2/13/2020 5:59:59 PM [http] [ID#528] Received header from server: 3589 1565458
3548 2/13/2020 5:59:59 PM [http_xfer] [ID#528] HTTP: wrote 34 bytes
3549 PrimeGrid 2/13/2020 5:59:59 PM [http] [ID#528] Info: Connection #683 to host www.primegrid.com left intact
3550 PrimeGrid 2/13/2020 6:00:00 PM [file_xfer] http op done; retval 0 (Success)
3551 PrimeGrid 2/13/2020 6:00:00 PM [file_xfer] http op done; retval 0 (Success)
3552 PrimeGrid 2/13/2020 6:00:00 PM [file_xfer] file transfer status 0 (Success)
3553 PrimeGrid 2/13/2020 6:00:00 PM Finished download of llrPPSE_332530205
3554 PrimeGrid 2/13/2020 6:00:00 PM [file_xfer] Throughput 103 bytes/sec
3555 PrimeGrid 2/13/2020 6:00:00 PM [file_xfer] file transfer status 0 (Success)
3556 PrimeGrid 2/13/2020 6:00:00 PM Finished download of llrPPSE_332530070
3557 PrimeGrid 2/13/2020 6:00:00 PM [file_xfer] Throughput 103 bytes/sec
3558 PrimeGrid 2/13/2020 6:00:00 PM Starting task llrPPSE_332530205_1
3559 PrimeGrid 2/13/2020 6:00:00 PM Starting task llrPPSE_332530070_0
3560 PrimeGrid 2/13/2020 6:03:10 PM Computation for task genefer16_31776850_1 finished
3561 PrimeGrid 2/13/2020 6:03:12 PM [fxd] starting upload, upload_offset 0
3562 PrimeGrid 2/13/2020 6:03:12 PM [http] HTTP_OP::libcurl_exec(): ca-bundle 'C:\Program Files\BOINC\ca-bundle.crt'
3563 PrimeGrid 2/13/2020 6:03:12 PM [http] HTTP_OP::libcurl_exec(): ca-bundle set
3564 PrimeGrid 2/13/2020 6:03:12 PM Started upload of genefer16_31776850_1_r1286889301_0
3565 PrimeGrid 2/13/2020 6:03:12 PM [file_xfer] URL: http://www.primegrid.com/cgi/file_upload_handler
3566 PrimeGrid 2/13/2020 6:03:13 PM [http] [ID#529] Info: Connection 682 seems to be dead!
3567 PrimeGrid 2/13/2020 6:03:13 PM [http] [ID#529] Info: Closing connection 682
3568 PrimeGrid 2/13/2020 6:03:13 PM [http] [ID#529] Info: Connection 683 seems to be dead!
3569 PrimeGrid 2/13/2020 6:03:13 PM [http] [ID#529] Info: Closing connection 683
3570 PrimeGrid 2/13/2020 6:03:13 PM [http] [ID#529] Info: Trying 185.193.25.77...
3571 PrimeGrid 2/13/2020 6:03:13 PM Sending scheduler request: To fetch work.
3572 PrimeGrid 2/13/2020 6:03:13 PM Requesting new tasks for NVIDIA GPU
3573 PrimeGrid 2/13/2020 6:03:13 PM [http] HTTP_OP::init_post(): http://www.primegrid.com/cgi/cgi
3574 PrimeGrid 2/13/2020 6:03:13 PM [http] HTTP_OP::libcurl_exec(): ca-bundle set
3575 PrimeGrid 2/13/2020 6:03:14 PM [http] [ID#1] Info: Found bundle for host www.primegrid.com: 0x42fa050 [serially]
3576 PrimeGrid 2/13/2020 6:03:14 PM [http] [ID#1] Info: Hostname www.primegrid.com was found in DNS cache
3577 PrimeGrid 2/13/2020 6:03:14 PM [http] [ID#1] Info: Trying 185.193.25.77...
3578 PrimeGrid 2/13/2020 6:03:18 PM Computation for task genefer16_31777023_0 finished
3579 PrimeGrid 2/13/2020 6:03:20 PM [fxd] starting upload, upload_offset 0
3580 PrimeGrid 2/13/2020 6:03:20 PM [http] HTTP_OP::libcurl_exec(): ca-bundle 'C:\Program Files\BOINC\ca-bundle.crt'
3581 PrimeGrid 2/13/2020 6:03:20 PM [http] HTTP_OP::libcurl_exec(): ca-bundle set
3582 PrimeGrid 2/13/2020 6:03:20 PM Started upload of genefer16_31777023_0_r2038060960_0
3583 PrimeGrid 2/13/2020 6:03:20 PM [file_xfer] URL: http://www.primegrid.com/cgi/file_upload_handler
3584 PrimeGrid 2/13/2020 6:03:21 PM [http] [ID#530] Info: Found bundle for host www.primegrid.com: 0x42fa050 [serially]
3585 PrimeGrid 2/13/2020 6:03:21 PM [http] [ID#530] Info: Hostname www.primegrid.com was found in DNS cache
3586 PrimeGrid 2/13/2020 6:03:21 PM [http] [ID#530] Info: Trying 185.193.25.77...
3587 PrimeGrid 2/13/2020 6:03:34 PM [http] [ID#529] Info: connect to 185.193.25.77 port 80 failed: Timed out
3588 PrimeGrid 2/13/2020 6:03:34 PM [http] [ID#529] Info: Failed to connect to www.primegrid.com port 80: Timed out
3589 PrimeGrid 2/13/2020 6:03:34 PM [http] [ID#529] Info: Closing connection 684
3590 PrimeGrid 2/13/2020 6:03:34 PM [http] HTTP error: Couldn't connect to server
3591 PrimeGrid 2/13/2020 6:03:34 PM [file_xfer] http op done; retval -107 (connect() failed)
3592 PrimeGrid 2/13/2020 6:03:34 PM [file_xfer] file transfer status -107 (connect() failed)
3593 PrimeGrid 2/13/2020 6:03:34 PM Temporarily failed upload of genefer16_31776850_1_r1286889301_0: connect() failed
3594 PrimeGrid 2/13/2020 6:03:34 PM Backing off 00:02:05 on upload of genefer16_31776850_1_r1286889301_0
3595 PrimeGrid 2/13/2020 6:03:35 PM [http] [ID#1] Info: connect to 185.193.25.77 port 80 failed: Timed out
3596 PrimeGrid 2/13/2020 6:03:35 PM [http] [ID#1] Info: Failed to connect to www.primegrid.com port 80: Timed out
3597 PrimeGrid 2/13/2020 6:03:35 PM [http] [ID#1] Info: Closing connection 685
3598 PrimeGrid 2/13/2020 6:03:35 PM [http] HTTP error: Couldn't connect to server
3599 2/13/2020 6:03:35 PM Project communication failed: attempting access to reference site
3600 2/13/2020 6:03:35 PM [http] HTTP_OP::init_get(): https://www.google.com/
3601 2/13/2020 6:03:35 PM [http] HTTP_OP::libcurl_exec(): ca-bundle set
3602 PrimeGrid 2/13/2020 6:03:35 PM Scheduler request failed: Couldn't connect to server
meanwhile, while the machine is in it's error state, I can access primegrid, google, etc. with a web browser no problem.
I've tried project reset, rebooting the machine a few times, and restarting boinc with no luck.
Any ideas what to try next?
--Tim | |
|
mikey Send message
Joined: 17 Mar 09 Posts: 1252 ID: 37043 Credit: 523,287,229 RAC: 141,057
                    
|
One of my machines suddenly started behaving strangely yesterday and I have not been able to figure out the issue. I'm hoping someone else here might have some suggestions...
This is a long time running machine on the same network with my other computers, it will run several tasks, then start failing to upload or download new ones. it seems to sit in this state until I hit 'project update' in the boinc manager, then it connects, grabs a new set of workunits and crunches away. after a round (or several) of workunits, it will get hung again.
meanwhile, while the machine is in it's error state, I can access primegrid, google, etc. with a web browser no problem.
I've tried project reset, rebooting the machine a few times, and restarting boinc with no luck.
Any ideas what to try next?
--Tim
I had a machine that refused to get ANY tasks at all, I went thru the same steps you did but what fixed it was removing PG and then reloading it in Boinc, now it works just fine. | |
|
TimT  Send message
Joined: 2 Dec 11 Posts: 437 ID: 121414 Credit: 1,527,733,977 RAC: 641,266
                          
|
Thanks for the suggestion, Mikey -- but that alone didn't do the trick in my case...
for the sake of documenting the issue:
-- I tried detaching from primegrid and reattached to the project. I quickly ran into the exact same issue
-- I uninstalled boinc, deleted the boinc data directory, re-installed, and re-attached to PG. Again, ran into the same issue -- but this time, while re-downloading boinc, I suddenly lost access to the boinc download site... hmm - I have a theory....
-- My ISP is verizon fios, and I use their phone app to control my kid's internet access. In the app, I can turn off 'network protection' completely -- as soon as I did that, boom -- everything worked. I triple checked to make sure this computer was not in any restricted group and there were no firewall rules specific to the machine. the machine was in the 'default' access group which has no specific restrictions. As a test, I created a 'open access' group that specifically disabled all restrictions and put all my crunching machines in there, then I turned the protection service back on... I'm guessing that somewhere my router decided that primegrid was trying to hack just one of the many computers it's talking to on my network.
I'll give it some time and update if that happened to work
--Tim | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2222 ID: 1178 Credit: 9,228,352,452 RAC: 3,198,058
                                        
|
flush DNS can be your firend. :)
| |
|
TimT  Send message
Joined: 2 Dec 11 Posts: 437 ID: 121414 Credit: 1,527,733,977 RAC: 641,266
                          
|
flush DNS can be your firend. :)
Good point, and you are uncovering my failings in proper issue documentation -- I actually tried that at some point, and it didn't have any effect.
Also, it's been running properly since that change -- so I'm feeling more confident that it's some change Verizon made to the firewall in the router.
| |
|
Message boards :
Problems and Help :
Scheduler request failed: Couldn't connect to server |