Subject: [Condor-users] Job Lease Duration / Jobs stop running
Hi,
I submitted 21 jobs to condor , out of which 8 stopped running. At some point of time, they were all running because the image size of the 8 idle jobs is quite big. I submitted another job and it started running immediately, but the other 8 remain idle. My job log file shows :
006 (056.000.000) 05/02 22:59:45 Image size of job updated: 1163528 ... 022 (056.000.000) 05/02 22:59:51 Job disconnected, attempting to reconnect Socket between submit and execute hosts closed unexpectedly
Trying to reconnect to vm3@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <128.226.128.45:38130> ... 022 (057.000.000) 05/02 23:00:04 Job disconnected, attempting to reconnect
Socket between submit and execute hosts closed unexpectedly Trying to reconnect to vm2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <
128.226.128.45:38130> ... 024 (057.000.000) 05/02 23:00:04 Job reconnection failed Job disconnected too long: JobLeaseDuration (1200 seconds) expired Can not reconnect to
vm2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, rescheduling job ... 022 (055.000.000) 05/02 23:00:43 Job disconnected, attempting to reconnect Socket between submit and execute hosts closed unexpectedly Trying to reconnect to
vm1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <128.226.128.45:38130> ... 024 (055.000.000) 05/02 23:00:43 Job reconnection failed
Job disconnected too long: JobLeaseDuration (1200 seconds) expired Can not reconnect to vm1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, rescheduling job ...
024 (056.000.000) 05/02 23:19:51 Job reconnection failed Job disconnected too long: JobLeaseDuration (1200 seconds) expired Can not reconnect to vm3@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
, rescheduling job