[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job Scheduling



Hello

yes 10.0.0.1 is the IP address of pucitServer.CentOSWorld.com

Please correct me if I'm wrong. All I'm getting from your answer is that we have to make configuration in /etc/condor/condor_config.local.

But in the following blog it is mentioned that we have to make configuration in /etc/condor/condor_config.

http://spinningmatt.wordpress.com/2011/06/12/getting-started-creating-a-multiple-node-condor-pool/

These are two different files where should I need to configure for scheduling my job from master to workers machines.

Greetings....

> Date: Tue, 21 May 2013 13:18:36 +0100
> From: B.Candler@xxxxxxxxx
> To: htcondor-users@xxxxxxxxxxx
> Subject: Re: [HTCondor-users] Job Scheduling
>
> On Tue, May 21, 2013 at 11:47:32AM +0000, Muak rules wrote:
> > Hello
> > I'm going to explain all that I'd done.
> > I did configurations in /etc/condor/condor_config
> >
> > In client machine I did following configurations
> > CONDOR_HOST = pucitServer.CentOSWorld.com(name of a server machine)
> > ALLOW_WRITE = $(ALLOW_WRITE), $(CONDOR_HOST)
> > COLLECTOR_HOST = 10.0.0.1 (IP Address of server)
> > DAEMON_LIST = master,startd
>
> You are using a mix of names and IP addresses. Is
> pucitServer.CentOSWorld.com the machine with IP address 10.0.0.1? Do you
> have
>
> 10.0.0.1 pucitServer.CentOSWorld.com
>
> in your /etc/hosts file?
>
> I can describe a simple config where one job is the "master" (contains the
> job queue and is where you submit jobs) and others are "workers" (where the
> jobs actually execute).
>
> If pucitserver.centosworld.com is the 'master', then on a 'worker' machine I
> would make condor_local.config something like this:
>
> ---- 8< ----
> ## What machine is your central manager?
>
> CONDOR_HOST = pucitserver.centosworld.com
>
> ## Other global settings
>
> UID_DOMAIN = centosworld.com
> CONDOR_ADMIN = yourmail@xxxxxxxxxxxxxx
> MAIL = /usr/bin/mail
>
> ## Pool's short description
>
> COLLECTOR_NAME = My org condor pool
>
> ## When is this machine willing to start a job?
>
> #START = TRUE
> BackgroundLoad = 0.5
> START = $(CPUIdle) || (State != "Unclaimed" && State != "Owner")
>
> ## When to suspend a job?
>
> SUSPEND = FALSE
>
> ## When to nicely stop a job?
> ## (as opposed to killing it instantaneously)
>
> PREEMPT = FALSE
>
> ## When to instantaneously kill a preempting job
> ## (e.g. if a job is in the pre-empting stage for too long)
>
> KILL = FALSE
>
> ## This macro determines what daemons the condor_master will start and keep its watchful eyes on.
> ## The list is a comma or space separated list of subsystem names
>
> DAEMON_LIST = MASTER, STARTD
> ALLOW_WRITE = $(FULL_HOSTNAME), $(IP_ADDRESS), $(CONDOR_HOST)
>
> ## Optional: dynamic slots
>
> SLOT_TYPE_1 = cpus=100%, ram=75%, swap=100%, disk=100%
> SLOT_TYPE_1_PARTITIONABLE = True
> NUM_SLOTS_TYPE_1 = 1
> ---- 8< ----
>
> And on the 'master' node I would use the same file but change the bit from
> DAEMON_LIST onwards like this:
>
> DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD
> ALLOW_WRITE = $(FULL_HOSTNAME), $(IP_ADDRESS), $(CONDOR_HOST), 10.0.0.*
> # Optional if you are using dagman
> DAGMAN_MAX_SUBMITS_PER_INTERVAL = 200
> DAGMAN_SUBMIT_DELAY = 0
>
> condor_restart everywhere. Then login to the master node, check that
> "condor_status" shows the worker node(s), and then submit some jobs.
>
> If you want to make the master node run jobs as well, then I believe it
> should just be a question of adding STARTD to DAEMON_LIST.
>
> Regards,
>
> Brian.
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/