Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] MPI dedicated schedulers --> to condor admins
- Date: Thu, 1 Feb 2007 14:34:18 +0100
- From: Nicolas GUIOT <nicolas.guiot@xxxxxxx>
- Subject: Re: [Condor-users] MPI dedicated schedulers --> to condor admins
Hi Ana,
Thanks for your help, but after googling a bit, I found the solution (at least for one of my problems...)
It has been refered in this mail long time ago :
https://lists.cs.wisc.edu/archive/condor-users/pre-2004-June/msg01195.shtml
in my condor_config.local.dedicated.resource, for "Option 3", I need to add :
RANK = 0
just before :
RANK = (Scheduler =?= $(DedicatedScheduler) * $(RANK_FACTOR)) + $(RANK)
Though it seems to be an old error, it is still present in the newer versions of example files : Maybe condor admin could correct this in next releases ? Or am I totally wrong ?
Nicolas
----------------
On Thu, 01 Feb 2007 12:22:03 +0100
Ana Silva <asilva@xxxxxxx> wrote:
> Hi Nicolas...
>
> Last time, I had this problem... but I have resolved it with the next
> configuration
>
> You need to configure that:
>
> In condor.config.local of your central manager (dedicated scheduler)
> write the next:
>
> ######################################################################
> # DEDICATED SCHEDULER
> ######################################################################
>
> ######################################################################
> ######################################################################
> ## Settings you MUST customize!
> ######################################################################
> ######################################################################
>
> ## What is the name of the dedicated scheduler for this resource?
> ## You MUST fill in the correct full hostname where you're running
> ## the dedicated scheduler, and where users will submit their
> ## dedicated jobs. The "DedicateScheduler@" part should not be
> ## changed, ONLY the hostname.
> DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxx"
> STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler
>
> ######################################################################
> ######################################################################
> ## Settings you should leave alone, but that must be defined
> ######################################################################
> ######################################################################
>
> ## Path to the special version of rsh that's required to spawn MPI
> ## jobs under Condor. WARNING: This is not a replacement for rsh,
> ## and does NOT work for interactive use. Do not use it directly!
> MPI_CONDOR_RSH_PATH = $(LIBEXEC)
>
> ## Path to OpenSSH server binary
> ## Condor uses this to establish a private SSH connection between execute
> ## machines. It is usually in /usr/sbin, but may be in /usr/local/sbin
> CONDOR_SSHD = /usr/sbin/sshd
>
> ## Path to OpenSSH keypair generator.
> ## Condor uses this to establish a private SSH connection between execute
> ## machines. It is usually in /usr/bin, but may be in /usr/local/bin
> CONDOR_SSH_KEYGEN = /usr/bin/ssh-keygen
>
> ## This setting puts the DedicatedScheduler attribute, defined above,
> ## into your machine's classad. This way, the dedicated scheduler
> ## (and you) can identify which machines are configured as dedicated
> ## resources.
> STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler
>
>
> And in the execute nodes (dedicated resources), write in the
> condor_config.local
>
> ######################################################################
> # DEDICATED RESOURCE
> ######################################################################
>
> ######################################################################
> ######################################################################
> ## Settings you MUST customize!
> ######################################################################
> ######################################################################
>
> ## What is the name of the dedicated scheduler for this resource?
> ## You MUST fill in the correct full hostname where you're running
> ## the dedicated scheduler, and where users will submit their
> ## dedicated jobs. The "DedicateScheduler@" part should not be
> ## changed, ONLY the hostname.
> DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxx"
> STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler
>
> ######################################################################
> ######################################################################
> ## Settings you should leave alone, but that must be defined
> ######################################################################
> ######################################################################
>
> ## Path to the special version of rsh that's required to spawn MPI
> ## jobs under Condor. WARNING: This is not a replacement for rsh,
> ## and does NOT work for interactive use. Do not use it directly!
> MPI_CONDOR_RSH_PATH = $(LIBEXEC)
>
> ## Path to OpenSSH server binary
> ## Condor uses this to establish a private SSH connection between execute
> ## machines. It is usually in /usr/sbin, but may be in /usr/local/sbin
> CONDOR_SSHD = /usr/sbin/sshd
>
> ## Path to OpenSSH keypair generator.
> ## Condor uses this to establish a private SSH connection between execute
> ## machines. It is usually in /usr/bin, but may be in /usr/local/bin
> CONDOR_SSH_KEYGEN = /usr/bin/ssh-keygen
>
> ## This setting puts the DedicatedScheduler attribute, defined above,
> ## into your machine's classad. This way, the dedicated scheduler
> ## (and you) can identify which machines are configured as dedicated
> ## resources.
> STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler
>
> ##--------------------------------------------------------------------
> ## 2) Always run jobs, but prefer dedicated ones
> ##--------------------------------------------------------------------
> START = True
> SUSPEND = False
> CONTINUE = True
> PREEMPT = False
> KILL = False
> WANT_SUSPEND = False
> WANT_VACATE = False
> RANK = Scheduler =?= $(DedicatedScheduler)
>
>
> Next you must restart the "master" daemon in all nodes, with this
> command : condor restart -master
>
> Other thing, your daemon list of execute nodes must be :
>
> DAEMON_LIST = MASTER, STARTD, *SCHEDD*
>
>
> I hope this help...
>
> PD: sorry for my english
>
>
> Nicolas GUIOT escribió:
> > Hi all,
> >
> > I'm trying to setup condor to submit MPI jobs. If I understood correctly, I need to first setup a dedicated scheduler.
> > I then checked the example "condor_config.local.dedicated.submit" file, but eveything is commented, so eventually I have "nothing" in this file (see attach.)
> >
> > I found this page : (http://www.openems.org/display/CONDOR/Ask+Mike ---> Optena), which says I should add something like :
> > DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxx"
> > STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler
> >
> > in the local condor_config file.
> >
> > So, which of these solution is the right one ? a mix of both ? so why is the example file empty ?
> >
> > For now, I didn't change anything on the dedicated scheduler config file, and added (and modified) the config file for one machine I wanted to use as dedicated resource :
> >
> > #################################
> > # Start only as EXECUTE machine
> > DAEMON_LIST = MASTER, STARTD
> > ##### Changes so that we don't care of KeyboardIdle
> > START = ( $(CPUIdle) || (State != "Unclaimed" && \
> > State !="Owner") )
> > WANT_SUSPEND = ( $(SmallJob) || $(IsPVM) || $(IsVanilla) )
> >
> > SUSPEND = ( (CpuBusyTime > 2 * $(MINUTE)) \
> > && $(ActivationTimer) > 90 )
> >
> > CONTINUE = ( $(CPUIdle) && ($(ActivityTimer) > 10) )
> >
> > ## condor_config.local.dedicated.resource
> > DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxxxx"
> > ## 3) Always run dedicated jobs, but only allow non-dedicated jobs to
> > ## run on an opportunistic basis.
> > SUSPEND = Scheduler =!= $(DedicatedScheduler) && ($(SUSPEND))
> > PREEMPT = Scheduler =!= $(DedicatedScheduler) && ($(PREEMPT))
> > #RANK_FACTOR = 1000000
> > RANK_FACTOR = 100
> > RANK = (Scheduler =?= $(DedicatedScheduler) * $(RANK_FACTOR)) + $(RANK)
> > START = (Scheduler =?= $(DedicatedScheduler)) || ($(START))
> >
> > MPI_CONDOR_RSH_PATH = $(LIBEXEC)
> > CONDOR_SSHD = /usr/sbin/sshd
> > CONDOR_SSH_KEYGEN = /usr/bin/ssh-keygen
> > STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler
> >
> > #################
> >
> > if I "ps ax|grep condor" this dedicated resource, I don't see any startd running (that I usually see on execute machines...) :
> > $ ps ax|grep cond
> > 24677 ? Ss 0:00 /nfs/opt/condor_x86_64/sbin/condor_master
> > 24843 pts/0 S+ 0:00 grep cond
> >
> > And this "dedicated resource" just disappeared from my "condor_status" list
> >
> > Any idea to solve that ?
> > I'm using condor 6.8.3
> >
> > Thanks for your help
> >
> > Nicolas
----------------------------------------------------
CNRS - UPR 9080 : Laboratoire de Biochimie Theorique
Institut de Biologie Physico-Chimique
13 rue Pierre et Marie Curie
75005 PARIS - FRANCE
Tel : +33 158 41 51 70
Fax : +33 158 41 50 26
----------------------------------------------------