I am running Condor 6.6.10 on Windows on a set of lab machines. I am
seeing problems with jobs never finishing once they are suspended due
to
someone physically using the computer. I believe that I have the
execute
machines set to suspend the jobs, but keep the job on the machine so
they can continue when the machine returns to unclaimed and idle.
However, the suspended jobs never seem to unsuspend and continue
working
(even if they have to start from scratch). Instead, they get evicted
seconds after the job supposedly unsuspends. Is this how it should
work?
I've included a snippet from my condor_config file and the job log
file.
Any help would be appreciated.
Jess Cannata
condor_config
StartIdleTime = 15 * $(MINUTE)
ContinueIdleTime = 5 * $(MINUTE)
MaxSuspendTime = 300 * $(MINUTE)
MaxVacateTime = 10 * $(MINUTE)
WANT_SUSPEND = TRUE
WANT_VACATE = FALSE
START = $(UWCS_START)
SUSPEND = $(UWCS_SUSPEND)
CONTINUE = $(UWCS_CONTINUE)
PREEMPT = $(UWCS_PREEMPT)
KILL = $(UWCS_KILL)
PERIODIC_CHECKPOINT = $(UWCS_PERIODIC_CHECKPOINT)
PREEMPTION_REQUIREMENTS = $(UWCS_PREEMPTION_REQUIREMENTS)
PREEMPTION_RANK = $(UWCS_PREEMPTION_RANK)
NEGOTIATOR_PRE_JOB_RANK = $(UWCS_NEGOTIATOR_PRE_JOB_RANK)
NEGOTIATOR_POST_JOB_RANK = $(UWCS_NEGOTIATOR_POST_JOB_RANK)
log file
000 (619.000.000) 05/03 15:27:29 Job submitted from host:
<141.161.x.156:17835>
...
001 (619.000.000) 05/04 09:46:54 Job executing on host:
<141.161.x.246:1217>
...
010 (619.000.000) 05/04 09:53:08 Job was suspended.
Number of processes actually suspended: 1
...
006 (619.000.000) 05/04 09:53:08 Image size of job updated: 986304
...
011 (619.000.000) 05/04 09:53:10 Job was unsuspended.
...
004 (619.000.000) 05/04 09:53:11 Job was evicted.
(0) Job was not checkpointed.
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
0 - Run Bytes Sent By Job
4480328 - Run Bytes Received By Job
...
001 (619.000.000) 05/04 22:08:37 Job executing on host:
<141.161.x.233:1223>
...
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users