Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Understanding Condor Policies on Jobs
- Date: Fri, 22 Mar 2013 01:18:12 -0700
- From: Andrey Kuznetsov <akuznet1@xxxxxxxx>
- Subject: [HTCondor-users] Understanding Condor Policies on Jobs
Hi,
I'm having problems figuring out how to setup a good policy on our machines.
The reason I'm changing the defaults is because some machines are also
used as desktop machines and I want to vacate 1 processor if the
console or keyboard are being used, indicating a user on the machine.
We are running Scientific Linux v6.2
I've been reading the documentation, and slowly figuring out what is
what, but some things are unclear.
>From the documentation, WANT_SUSPEND = A boolean expression that, when
True, tells Condor to evaluate the SUSPEND expression.
SUSPEND = A boolean expression that, when True, causes Condor to
suspend running a Condor job. The machine may still be claimed, but
the job makes no further progress, and Condor does not generate a load
on the machine.
>From default config, UWCS_WANT_SUSPEND = ( $(SmallJob) ||
$(KeyboardNotBusy) || $(IsVanilla) ) && ( $(SUSPEND) )
So SUSPEND will be evaluated if the job is small, likely some kind of
error in the job, but I am having trouble understanding the rest.
1) Why is SUSPEND evaluated if there is no user at the keyboard
"KeyboardNotBusy", shouldn't it be the opposite? If the keyboard is
busy then I want the SUSPEND to be evaluated on the basis that someone
is using the machine, thus I want the job to be suspended to free
resources/processor for the user.
2) Why is SUSPEND evaluated when the job is running in VANILLA
universe? We are submitting jobs under VANILLA universe and add our
own environmental variables inside the jobs. It doesn't make sense why
condor would attempt to suspend a VANILLA universe job.
3) Why is SUSPEND in WANT_SUSPEND since when WANT_SUSPEND=TRUE, then
SUSPEND is evaluated, seems kind of redundant?!
Regarding, UWCS_CONTINUE = ( $(CPUIdle) && ($(ActivityTimer) > 10) &&
(KeyboardIdle > $(ContinueIdleTime)) )
ActivityTimer = Amount of time in seconds in the current activity.
4) What kind of activity is the timer tracking? CONTINUE is supposed
to reactivate a suspended job, that means that when the machine is
free from users and nothing is running on it, then ActivityTimer is
somehow supposed to be non-zero, and thus > 10, so what is it
tracking? Is ActivityTimer tracking the time since last user
click/interaction was made, thus if the user steps away for more than
10 seconds, condor job will continue/resume?
5) What's the purpose of WANT_SUSPEND and SUSPEND? Seems like they
accomplish the same thing, except you run the check twice. Does
WANT_SUSPEND has some other kind of use?
6) Why are some variable in the config in the bash form, and others
not, or is it a typo?
Take a look at where SUSPEND is evaluated:
UWCS_WANT_SUSPEND = ( $(SmallJob) || $(KeyboardNotBusy) ||
$(IsVanilla) ) && ( $(SUSPEND) )
UWCS_PREEMPT = ( ((Activity == "Suspended") && ($(ActivityTimer) >
$(MaxSuspendTime))) || (SUSPEND && (WANT_SUSPEND == False)) )
7) Are variables case sensitive? In condor_config_var, they are
printed as all capitals, but in the defaults UWCS they are used often
as lower-case with first capital letters of the word:
"$(ActivityTimer)" vs "ACTIVITYTIMER = (time() -
EnteredCurrentActivity)"
8) How do you differentiate between variables set/updated by condor
and variables that you define? Like SUSPEND is defined in the config
by user, but "KeyboardIdle" is not in the config.
9) What is =?= and =!= ?
I am using:
SLOTS_CONNECTED_TO_CONSOLE = 1
SLOTS_CONNECTED_TO_KEYBOARD = 1
10) How does condor know which SlotID to reserve for the user when the
desktop is being used? Where is this set?
Here's what my SUSPEND looks line:
SUSPEND = ( ($(KeyboardBusy) || $(ConsoleBusy)) && ((SlotID <=
SLOTS_CONNECTED_TO_CONSOLE) || (SlotID <= SLOTS_CONNECTED_TO_CONSOLE))
&& $(ActivationTimer) > 90)
In other words, if console or keyboard is being used, and the SlotID
is 1, meaning processor #1 out of a total of 4 processors (cores) in
my computer, and the job is mature, has been running for some time,
then suspend the job.
PREEMPT = ( ((Activity == "Suspended") && ($(ActivityTimer) >
$(MaxSuspendTime))) || (SUSPEND) )
WANT_SUSPEND = ( $(SmallJob) || $(KeyboardBusy) || $(ConsoleBusy) )
CONTINUE = ( $(CPUIdle) && ($(ActivityTimer) > 10) && (KeyboardIdle >
$(ContinueIdleTime)) )
I welcome any suggestions to improve my attempts at forcing condor to
relinquish 1 processor when a user is utilizing the computer.
Thank you very much for taking a look.
--
Andrey Kuznetsov <akuznet1@xxxxxxxx>