Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] How To TroubleShoot Flocking
- Date: Wed, 5 Jul 2006 18:49:53 -0500
- From: "John Alberts" <Alberts@xxxxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] How To TroubleShoot Flocking
Well, I guess I wasn't waiting long enough for the job to flock. I
didn't realize I had to wait longer than a few minutes. So, now the job
is getting 'flocked' to the other pool, but I notice a strange problem.
The job fails to run on the remote pool, saying it failed to execute
...condor_exec.exe. This is a Linux machine submitting to another Linux
machine. I'm not sure why it is trying to use condor_exec.exe instead
of just condor_exec. Another strange thing I noticed is, if I submit
the same job on this remote pool, but from a machine locally to that
remote pool, it works fine.
John Alberts
Technical Assistant for EMS
alberts@xxxxxxxxxxxxxxxxxx
219-989-2083
CLO 332
http://public.xdi.org/=john.alberts
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Erik Paulson
Sent: Wednesday, July 05, 2006 2:55 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] How To TroubleShoot Flocking
On Wed, Jul 05, 2006 at 02:40:38PM -0500, John Alberts wrote:
> Hi. I am trying to setup flocking between 2 condor pools. 1 pool I
> have complete control/access to, the other pool I can log in using ssh
> and submit jobs. The administrator of the other pool is currently on
> vacation and said he has configured flocking to/from our pool. I'm
> trying to test it, and it seems like flocking isn't working.
>
>
>
> I was wondering how I can troubleshoot flocking to see what the
culprit
> is. I already tried to submit a job whose requirements can only be
> fulfilled on the other pool. Condor_status -analyze <jobid> shows
that
> all machines can't meet the requirements.
1. I think you mean 'condor_q -analyze'
2. I'm not sure that condor_q -analyze works with remote pools.
> I have also run condor_status
> -pool <otherpoolname> and it properly displays all available machines
on
> the other pool. I'm not sure what to check next.
>
The next thing to check is to make sure that you're actually flocking
to the remote pool. When a schedd "flocks" to a remote pool, all it does
is send a ClassAd announcing that it has idle jobs to the remote pool.
You can check to see if the remote pool know that you have idle jobs
with
condor_status -pool remote.pool.central.manager -submitters
The schedd will not flock to the remote pool right away - it will wait
until
it has had a few negotiation cycles with the local pool before it
decides to "increase the flock level". This usually happens within
15 or 20 minutes of submtting a job that can't be satisifed in the local
pool.
-Erik
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR