Re: [HTCondor-users] Jobs on Windows Pool are being preempted for no obvious reason

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

On Thu, Aug 29, 2013 at 9:47 AM, Ralph Finch <ralphmariafinch@xxxxxxxxx> wrote:

HTCondor 8.0.2, pool is entirely Windows 7x64.

Being a Windows pool, there is no checkpointing and we do not want eviction or preemption. Therefore in the global config file I have (copied from the manual):

#Disable preemption by machine activity.
PREEMPT = False
#Disable preemption by user priority.
PREEMPTION_REQUIREMENTS = False
#Disable preemption by machine RANK by ranking all jobs equally.
RANK = 0
#Since we are disabling claim preemption, we
# may as well optimize negotiation for this case:
NEGOTIATOR_CONSIDER_PREEMPTION = False
# Without preemption, it is advisable to limit the time during
# which the submit node may keep reusing the same slot for
# more jobs.
CLAIM_WORKLIFE = 3600
UPDATE_INTERVAL = 180
WANT_SUSPEND = TRUE
KILL = FALSE

However, jobs continue to be stopped on one machine, and restarted (from new, since no checkpointing) on the same or another machine [from a job .log file]:

000 (231.001.000) 08/29 08:06:11 Job submitted from host: <1.2.3.189:9685>
...
001 (231.001.000) 08/29 08:06:29 Job executing on host: <1.2.3.246:9651>
...
006 (231.001.000) 08/29 08:06:37 Image size of job updated: 2500
1 - MemoryUsage of job (MB)
400 - ResidentSetSize of job (KB)

001 (231.001.000) 08/29 08:27:30 Job executing on host: <1.2.3.102:9619>

The job started on host .246, ran 20 minutes, then started over on .102.

So finally, my question: how can I examine the details of why HTC is doing this machine switching? I've poked around in various log files but don't see anything obvious. Or, what condor_status or condor_q commands would reveal the motive for the switching?

Thanks,
Ralph Finch
Calif. Dept. of Water Resources

Mailing List Archives

Authenticated access

Re: [HTCondor-users] Jobs on Windows Pool are being preempted for no obvious reason