Gents It seems my issue did not raise any big concerns out there. Today I noticed an old job that had been submitted about three weeks ago. The log-file is
crammed full with three weeks worth of ‘007 (162.000 …) 03/05….. Shadow exception! Etc’. Until tonight when I noticed it. I queried the negotiator a bit, checked out local log files. Found nothing new really. Then after
a couple minutes, the job started and is now running. Magic! Any thoughts? P From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of Peter Ellevseth Gents I am having some issues with one of the machines in my cluster. I keep getting ‘Shadow exception’, e.g. 01/18/21 23:58:11 condor_read(fd=17 <127.0.0.1:21523>,,size=5,timeout=10,flags=0,non_blocking=0) 01/18/21 23:58:11 condor_read(): Socket closed abnormally when trying to read 5 bytes from <127.0.0.1:21523>, errno=104 Connection reset by peer 01/18/21 23:58:11 Stream::get(int) failed to read padding 01/18/21 23:58:11 CLOSE TCP <127.0.0.1:31043> fd=17 01/18/21 23:58:11 Starter pid 5973 exited with status 1 Now, the really strange part is that if keep fiddling around with the STARTD-machine (checking logs, running condor_status etc), the job just magically starts. I have no idea what actions make it start, but it does. The startd-machine is running a newer version of condor (8.8.10) versus the remaining cluster running 8.6. Could that be an issue? I added startd_debug = D_NETWORK, but didn’t really learn anything. Are there any other useful debugs I should check out? Peter |