Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] How to see why jobs suspend/continue
- Date: Fri, 06 Sep 2013 16:56:55 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] How to see why jobs suspend/continue
On 9/6/2013 2:00 PM, Ralph Finch wrote:
I misunderstand the implementation. I have in each machine's condor_config
file:
SUSPEND = $(MachineBusy)
junk = debug($(MachineBusy))
[snip]
How should I implement the debug(expression) statement?
I believe you want to do
SUSPEND = debug( $(MachineBusy) )
and then look in the StartLog file (not StarterLog*).
And is the load average calculation on windows machines unreliable? The
reason I need it is, we all sometimes run long-running (hours) numerical
models with no interactive use, so testing only keyboard and console is
insufficient to prevent my HTC jobs from interfering with the machine
owner's use.
Understood.
I just played around with it a bit. I think the load average calculation
is pretty good , but what looks wonky to me is the assignment of load
out to different slots.
Assuming you started from the default condor_config, I think if you change
NonCondorLoadAvg = (LoadAvg - CondorLoadAvg)
to instead be
NonCondorLoadAvg = (TotalLoadAvg - TotalCondorLoadAvg)
I think you will get results much along the lines of what you were
hoping for.
(of course do not forget to do a condor_reconfig as usual after changing
the config file)
regards,
Todd