Hello, We are having problems with our dedicated scheduler. The schedd daemon dies and then restarts, causing all jobs to start from the beginning (we can't use checkpoint). Here goes the message from the MasterLog. 11/7 14:24:12 The SCHEDD (pid 17884) died due to signal 11 11/7 14:24:12 Sending obituary for "/condor/sbin/condor_schedd" 11/7 14:24:12 restarting /condor/sbin/condor_schedd in 10 seconds 11/7 14:24:22 Started DaemonCore process "/condor/sbin/condor_schedd", pid and pgroup = 32506 11/7 14:24:44 The SCHEDD (pid 32506) exited with status 4 11/7 14:24:44 Sending obituary for "/condor/sbin/condor_schedd" 11/7 14:24:44 restarting /condor/sbin/condor_schedd in 11 seconds 11/7 14:24:55 Started DaemonCore process "/condor/sbin/condor_schedd", pid and pgroup = 32533 What does signal 11 mean? What are the possible reasons for this to happen? This is the mail sent by condor: Subject: [Condor] Problem This is an automated email from the Condor system on machine "cluster00.itqb.unl.pt". Do not reply. "/condor/sbin/condor_schedd" on "cluster00.itqb.unl.pt" died due to signal 11. Condor will automatically restart this process in 10 seconds. *** Last 20 line(s) of file SchedLog: 11/7 14:23:20 (pid:17884) condor_write(): Socket closed when trying to write 504 bytes to unknown source, fd is 14, errno=107 11/7 14:23:20 (pid:17884) Buf::write(): condor_write() failed 11/7 14:23:20 (pid:17884) SECMAN: failed to end classad message 11/7 14:23:20 (pid:17884) ERROR: SECMAN:2007:Failed to end classad message 11/7 14:23:20 (pid:17884) condor_write(): Socket closed when trying to write 6 bytes to unknown source, fd is 14, errno=107 11/7 14:23:20 (pid:17884) Buf::write(): condor_write() failed 11/7 14:24:03 (pid:17884) (Can't send alive message to ) 11/7 14:24:05 (pid:17884) Sent ad to central manager for ... 11/7 14:24:05 (pid:17884) Sent ad to 1 collectors for ... 11/7 14:24:05 (pid:17884) Sent ad to central manager for ... 11/7 14:24:05 (pid:17884) Sent ad to 1 collectors for ... 11/7 14:24:05 (pid:17884) Sent ad to central manager for ... 11/7 14:24:05 (pid:17884) Sent ad to central manager for ... 11/7 14:24:05 (pid:17884) Sent ad to 1 collectors for ... 11/7 14:24:07 (pid:17884) Inserting new attribute Scheduler into non-active cluster cid=335 acid=-1 11/7 14:24:07 (pid:17884) Inserting new attribute Scheduler into non-active cluster cid=336 acid=-1 11/7 14:24:07 (pid:17884) Inserting new attribute Scheduler into non-active cluster cid=302 acid=-1 11/7 14:24:08 (pid:17884) Inserting new attribute Scheduler into non-active cluster cid=333 acid=-1 11/7 14:24:08 (pid:17884) Inserting new attribute Scheduler into non-active cluster cid=334 acid=-1 *** End of file SchedLog Thanks in advance Sara Campos |