Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Flocking
- Date: Tue, 12 Jun 2007 15:43:58 +0200
- From: Urs Fitze <fitze@xxxxxxxxxxxx>
- Subject: Re: [Condor-users] Flocking
On Tue, Jun 12, 2007 at 02:04:48PM +0100, Kewley, J (John) wrote:
> > On Tue, Jun 12, 2007 at 01:21:49PM +0100, Kewley, J (John) wrote:
> > > Re: Flocking.
> > > * Can all your submit nodes in your first pool "see" (i.e.
> > no firewalls in the way,
> > > and not behind a NAT) all execute nodes in your other pool?
> > Yes, I get the full answer when I do a
> > ---------------------------------------------
> > condor_status -pool <manager of second pool>
> > ---------------------------------------------
> > on a submitter of pool A.
>
> That queries the head node only, not all the execute nodes.
> Maybe try
> condor_status -direct -pool <pool B> -name <executer in pool B>
>
> Even still that may not be enough,
> Remember, some fixed ports and an ephemeral (high) port range need
> to be open in each direction for tcp AND udp.
> See (quick plug)
> http://epubs.cclrc.ac.uk/work-details?w=34452
>
> for more details if you do need to open firewalls
>
> > > * -remote is for direct submission to another pool, not for
> > flocking.
> > Hmm, I see, but does it make sense to
> > ----------------------------------------
> > condor_submit -pool <manager of pool B>
> > ----------------------------------------
> > or should a blank 'condor_submit <submit-file>' lead to flocking if
> > pool A is completely booked out?
>
> Not really. The idea is that in the presence of flocking you submit jobs to your
> own pool as normal and if the system decides it is too busy, it tries to find out if
> a pool it can flock to can share some of the load. By using -remote you are
> bypassing this stage and forcing them onto the other pool (where if 2 way flocking was
> enabled they may even flock back again!)
>
> > > * Check your HOSTALLOW values in pool B
> > >
> > Ahh! Do you mean flocking could work if I inlcude the
> > submitters of pool A into
> > ------------------------
> > HOSTALLOW_WRITE = ...
> > ------------------------
> > At least I already have
> > --------------------------------------------------------------
> > HOSTALLOW_WRITE_COLLECTOR = $(HOSTALLOW_WRITE), $(FLOCK_FROM)
> > HOSTALLOW_WRITE_STARTD = $(HOSTALLOW_WRITE), $(FLOCK_FROM)
> > HOSTALLOW_READ_COLLECTOR = $(HOSTALLOW_READ), $(FLOCK_FROM)
> > HOSTALLOW_READ_STARTD = $(HOSTALLOW_READ), $(FLOCK_FROM)
> > --------------------------------------------------------------
> > as by default and also mentioned in the manual.
>
> That may be the case, but I am not certain.
> Which machine is that config on, A, B, all on A, all on B ?
I added the submitter of pool A to HOSTALLOW_WRITE residing on all
of pool B namely in pool B's global config-file. ==> Et voila!
I submitted the job on the pool-A-submitter w/o any '-pool' or '-remote'
and it flocked really to pool B, actually only to pool B's manager, because
that's the only executer in pool B with open firewalls towards pool A.
Some other free nodes of pool B first 'matched' but then did not execute, I guess
missing/blocked network-communication hindered the execution on them.
But so far it works and much easier (i.e. no credentials etc) than I feared!
Thnaks a lot for your help!
Urs
>
> > > One test you could do is to name, say, the head node of the
> > 2nd pool (assuming it
> > > can run jobs) in the REQUIREMENTS statement of a job on
> > pool A. It then CANNOT
> > > run on poll A and, assuming all else is setup correctly,
> > will run on pool B via flocking.
> > > If that works, name one of the workers in Pool B and try
> > again. Don't use -remote for this.
> > >
> > > Cheers
> > >
> > > JK
> >
> > How do I define such a requirement? Something like
> > -------------------------------------------------
> > Requirements = TARGET.HOST == <manager of pool B>
> > ------------------------------------------------- ?
>
> Requirements = (Machine == "<manager of pool B>")
>
> cheers
>
> JK
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
>
> !DSPAM:466e9c0986296524071659!
>