Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Back to starting point... making my jobs to suspend for as long as required
- Date: Fri, 22 Jul 2005 10:13:15 +0100
- From: "Miguel Dilaj" <mdilaj@xxxxxxxxxxxxx>
- Subject: [Condor-users] Back to starting point... making my jobs to suspend for as long as required
Hi all,
I haven't hit the correct way to make Condor behave as I need...
If I've read the documentation for 6.6.10 vanilla on Win32 correctly, I can
(somehow!) submit a job that will run if the machine is free, be suspended
if there is user activity and resume later, all this without being
preempted, interrupted (that will mean it'll have to start again from
scratch) and/or killed...
Is the sentece above right for a start???
Also if I understood everything correctly, I've to expect the Status column
of condor_q to show only R, meaning my job is either currently running or
suspended, but NOT to come back to I.
The Activity column of condor_status must show either Busy or Suspended, but
not come back to Idle. Is that correct??
What I've here is that I submit my jobs (I submit them in Hold status and
then manually release them as required), I can see they going into Running
with condor_q, and I can see the machine going to Busy/Suspended.
However, when I check the next day, the machine has come back to Idle (with
user activity, of course).
I expected the machine to be in Suspended status, so my jobs have been
killed.
These are the relevant lines from the default installation in the
condor_config on the nodes (thank you taking the time to look at them ;-):
CONDOR_HOST = {the IP address of my central manager}
RELEASE_DIR = C:\Condor
LOCAL_DIR = C:\Condor
LOCAL_CONFIG_FILE = $(LOCAL_DIR)/condor_config.local
CONDOR_ADMIN = mdilaj@xxxxxxxxxxxx
MAIL = $(BIN)/condor_mail.exe
SMTP_SERVER = {the IP address of my SMTP server}
UID_DOMAIN = $(FULL_HOSTNAME)
FILESYSTEM_DOMAIN = $(FULL_HOSTNAME)
HOSTALLOW_ADMINISTRATOR = {the IP address of my central manager}
HOSTALLOW_OWNER = $(FULL_HOSTNAME), $(HOSTALLOW_ADMINISTRATOR)
HOSTALLOW_READ = {the IP address of my central manager}
HOSTALLOW_WRITE = {the IP address of my central manager}
HOSTALLOW_NEGOTIATOR = $(NEGOTIATOR_HOST)
HOSTALLOW_NEGOTIATOR_SCHEDD = $(NEGOTIATOR_HOST),
$(FLOCK_NEGOTIATOR_HOSTS)
HOSTALLOW_WRITE_COLLECTOR = $(HOSTALLOW_WRITE), $(FLOCK_FROM)
HOSTALLOW_WRITE_STARTD = $(HOSTALLOW_WRITE), $(FLOCK_FROM)
HOSTALLOW_READ_COLLECTOR = $(HOSTALLOW_READ), $(FLOCK_FROM)
HOSTALLOW_READ_STARTD = $(HOSTALLOW_READ), $(FLOCK_FROM)
USE_NFS = False
USE_AFS = False
MaxSuspendTime = 100 * $(HOUR)
WANT_SUSPEND = TRUE
WANT_VACATE = FALSE
VACATE = FALSE
START = $(UWCS_START)
SUSPEND = $(UWCS_SUSPEND)
CONTINUE = $(UWCS_CONTINUE)
PREEMPT = FALSE
KILL = FALSE
PERIODIC_CHECKPOINT = $(UWCS_PERIODIC_CHECKPOINT)
PREEMPTION_REQUIREMENTS = $(UWCS_PREEMPTION_REQUIREMENTS)
PREEMPTION_RANK = $(UWCS_PREEMPTION_RANK)
NEGOTIATOR_PRE_JOB_RANK = $(UWCS_NEGOTIATOR_PRE_JOB_RANK)
NEGOTIATOR_POST_JOB_RANK = $(UWCS_NEGOTIATOR_POST_JOB_RANK)
UWCS_WANT_SUSPEND = ( $(SmallJob) || $(KeyboardNotBusy) \
|| $(IsPVM) || $(IsVanilla) )
UWCS_WANT_VACATE = ( $(ActivationTimer) > 10 * $(MINUTE) \
|| $(IsPVM) || $(IsVanilla) )
UWCS_START = ( (KeyboardIdle > $(StartIdleTime)) \
&& ( $(CPUIdle) || \
(State != "Unclaimed" && State != "Owner")) )
UWCS_SUSPEND = ( $(KeyboardBusy) || \
( (CpuBusyTime > 2 * $(MINUTE)) \
&& $(ActivationTimer) > 90 ) )
UWCS_CONTINUE = ( $(CPUIdle) && ($(ActivityTimer) > 10) \
&& (KeyboardIdle > $(ContinueIdleTime)) )
UWCS_PREEMPT = ( ((Activity == "Suspended") && \
($(ActivityTimer) > $(MaxSuspendTime))) \
|| (SUSPEND && (WANT_SUSPEND == False)) )
UWCS_KILL = $(ActivityTimer) > $(MaxVacateTime)
UWCS_PERIODIC_CHECKPOINT = $(LastCkpt) > (3 * $(HOUR))
UWCS_NEGOTIATOR_PRE_JOB_RANK = RemoteOwner =?= UNDEFINED
UWCS_PREEMPTION_REQUIREMENTS = $(StateTimer) > (1 * $(HOUR)) &&
RemoteUserPrio > SubmittorPrio * 1.2
UWCS_PREEMPTION_RANK = (RemoteUserPrio * 1000000) - TARGET.ImageSize
NUM_CPUS = 1
All other active (i.e., non-commented) lines in the default config file are
unmodified, I hope those are not breaking anything.
Any suggestions on how to have a condor grid that can be managed only from
my central manager, where you can issue jobs only from the central manager,
that keeps a job "alive" until it's finished some 4 days later, suspending
it when there's user activity and resuming later, ARE WELCOME ;-)
TIA!
Regards,
Miguel
***********************************************************************************************************
DISCLAIMER:
This e-mail contains proprietary information, some or all of which may be legally privileged.
It is for the intended recipient only. If an addressing or transmission error has misdirected this e-mail,
please notify the author by replying to this e-mail. If you are not the intended recipient you may not use,
disclose, distribute, copy, print or rely on this e-mail.
***********************************************************************************************************