Hi Last week we faced a strange schedd behaviour. The server running the schedd has been shutdown in a cold way. *** [root@schedd-03 ~]# condor_version $CondorVersion: 8.2.9 Aug 13 2015 BuildID: 335839 $ *** We restarted it again just after a minute and we found that all the running jobs before the shutdown in IDLE status though they were still running on the WN. In the schedd log I found these log messages ^^^ 11/17/16 10:07:14 Marked job 1024870.0 as IDLE 11/17/16 10:07:14 Marked job 1024871.0 as IDLE ^^^ I wasnât able to reproduce the issue on a test schedd instance running this condor version *** [root@ui01 ~]# condor_version $CondorVersion: 8.4.9 Sep 29 2016 BuildID: 382747 $ *** trying the following actions - restating condor - shutting down the server - killing -9 all the condor processes the startd always get back a shadow connection ^^^ 11/23/16 11:58:44 (pid:26843) Accepted request to reconnect from <90.147.168.55:26166> 11/23/16 11:58:44 (pid:26843) Ignoring old shadow <90.147.168.55:9618?addrs=90.147.168.55-9618&noUDP&sock=2400_c45b_1> 11/23/16 11:58:44 (pid:26843) Communicating with shadow <90.147.168.55:9618?addrs=90.147.168.55-9618&noUDP&sock=3913_cff8_1> ^^^ why has the job been marked as IDLE ? thanks in advance Ale |
Attachment:
smime.p7s
Description: S/MIME cryptographic signature