Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Flocking
- Date: Tue, 12 Jun 2007 14:51:36 +0200
- From: Urs Fitze <fitze@xxxxxxxxxxxx>
- Subject: Re: [Condor-users] Flocking
On Tue, Jun 12, 2007 at 01:21:49PM +0100, Kewley, J (John) wrote:
> Re: Flocking.
> * Can all your submit nodes in your first pool "see" (i.e. no firewalls in the way,
> and not behind a NAT) all execute nodes in your other pool?
Yes, I get the full answer when I do a
---------------------------------------------
condor_status -pool <manager of second pool>
---------------------------------------------
on a submitter of pool A.
> * -remote is for direct submission to another pool, not for flocking.
Hmm, I see, but does it make sense to
----------------------------------------
condor_submit -pool <manager of pool B>
----------------------------------------
or should a blank 'condor_submit <submit-file>' lead to flocking if
pool A is completely booked out?
> * Check your HOSTALLOW values in pool B
>
Ahh! Do you mean flocking could work if I inlcude the submitters of pool A into
------------------------
HOSTALLOW_WRITE = ...
------------------------
At least I already have
--------------------------------------------------------------
HOSTALLOW_WRITE_COLLECTOR = $(HOSTALLOW_WRITE), $(FLOCK_FROM)
HOSTALLOW_WRITE_STARTD = $(HOSTALLOW_WRITE), $(FLOCK_FROM)
HOSTALLOW_READ_COLLECTOR = $(HOSTALLOW_READ), $(FLOCK_FROM)
HOSTALLOW_READ_STARTD = $(HOSTALLOW_READ), $(FLOCK_FROM)
--------------------------------------------------------------
as by default and also mentioned in the manual.
> One test you could do is to name, say, the head node of the 2nd pool (assuming it
> can run jobs) in the REQUIREMENTS statement of a job on pool A. It then CANNOT
> run on poll A and, assuming all else is setup correctly, will run on pool B via flocking.
> If that works, name one of the workers in Pool B and try again. Don't use -remote for this.
>
> Cheers
>
> JK
How do I define such a requirement? Something like
-------------------------------------------------
Requirements = TARGET.HOST == <manager of pool B>
------------------------------------------------- ?
Thanks for the fast help!
Urs
>
> > -----Original Message-----
> > From: condor-users-bounces@xxxxxxxxxxx
> > [mailto:condor-users-bounces@xxxxxxxxxxx]On Behalf Of Urs Fitze
> > Sent: Tuesday, June 12, 2007 12:48 PM
> > To: condor-users@xxxxxxxxxxx
> > Subject: [Condor-users] Flocking
> >
> >
> > Hi,
> >
> > I'm trying to set up flocking between 2 pools having
> > different UID_DOMAIN and FILESYSTEM_DOMAIN.
> > I followed the (partially unclear) instructions from the
> > manual '5.2 Connecting Condor Pools with Flocking'
> > i.e. by setting
> > --------------------------------
> > FLOCK_TO = <manager of pool B>
> > --------------------------------
> > on the submitter of pool A and setting
> > --------------------------------------------------------------
> > FLOCK_FROM = <list of hosts containing submitter of pool A>.
> > --------------------------------------------------------------
> > After solving all firewall-issues I submitted a job(-cluster)
> > on the submitter in pool A by:
> > --------------------------------------------------------------
> > ---------------------
> > condor_submit -remote <manager of B> -pool <manager of B>
> > <name of submit-file>
> > --------------------------------------------------------------
> > ---------------------
> > (when obmitting the '-remote ..' option the job would NEVER
> > flock to B, even if there
> > were no ressources in A, why?)
> > This way I finally got some tracks in the logs of the manager
> > of B, namely in
> > '/scratch/condor/log/SchedLog':
> > --------------------------------------------------------------
> > ---------------------------------------------------
> > 6/12 12:17:07 (pid:31692) authenticate_self_gss: acquiring
> > self credentials failed. Please check your Condor
> > configuration file if this is a server process. Or the user
> > environment variable if this is a user process.
> >
> > GSS Major Status: General failure
> > GSS Minor Status Error Chain:
> > globus_gsi_gssapi: Error with GSI credential
> > globus_gsi_gssapi: Error with gss credential handle
> > globus_credential: Valid credentials could not be found in
> > any of the possible locations specified by the credential
> > search order.
> > Valid credentials could not be found in any of the possible
> > locations specified by the credential search order.
> >
> > Attempt 1
> >
> > globus_credential: Error reading host credential
> > globus_sysconfig: Could not find a valid certificate file:
> > The host cert could not be found in:
> > 1) env. var. X509_USER_CERT
> > 2) /etc/grid-security/hostcert.pem
> > 3) $GLOBUS_LOCATION/etc/hostcert.pem
> > 4) $HOME/.globus/hostcert.pem
> >
> > The host key could not be found in:
> > 1) env. var. X509_USER_KEY
> > 2) /etc/grid-security/hostkey.pem
> > 3) $GLOBUS_LOCATION/etc/hostkey.pem
> > 4) $HOME/.globus/hostkey.pem
> >
> >
> >
> > Attempt 2
> >
> > globus_credential: Error reading proxy credential
> > globus_sysconfig: Could not find a valid proxy certificate
> > file location
> > globus_sysconfig: Error with key filename
> > globus_sysconfig: File does not exist: /tmp/x509up_u0 is not
> > a valid file
> >
> > Attempt 3
> >
> > globus_credential: Error reading user credential
> > globus_sysconfig: Error with certificate filename: The user
> > cert could not be found in:
> > 1) env. var. X509_USER_CERT
> > 2) $HOME/.globus/usercert.pem
> > 3) $HOME/.globus/usercred.p12
> >
> >
> >
> >
> > 6/12 12:17:07 (pid:31692) AUTHENTICATE: no available
> > authentication methods succeeded, failing!
> > 6/12 12:17:07 (pid:31692) SCHEDD: authentication failed:
> > AUTHENTICATE:1003:Failed to authenticate with any
> > method|AUTHENTICATE:1004:Failed to authenticate using
> > GSI|GSI:5003:Failed to authenticate. Globus is reporting
> > error (851968:133). There is probably a problem with your
> > credentials. (Did you run
> > grid-proxy-init?)|AUTHENTICATE:1004:Failed to authenticate
> > using KERBEROS|AUTHENTICATE:1004:Failed to authenticate using
> > FS|FS:1004:Unable to lstat(/tmp/FS_XXX5hDIkK)
> > --------------------------------------------------------------
> > -----------------------------------------
> > What happened here? I wonder because in the Flocking chapter
> > in the manual there is no
> > mentioning of 'credentials', 'authentification' etc...only
> > the reference to 'file-transfer-mechanism'
> > contains some infos in this direction.
> > Btw. I got the above log both for vanilla and standard jobs and had
> > -----------------------------------
> > should_transfer_files = YES
> > when_to_transfer_output = ON_EXIT
> > -----------------------------------
> > in the submit-file for the vanilla job.
> >
> > On possibly remarkable thing is that in the (global)
> > config-file for pool A there is the line
> > -----------------------------------
> > AUTHENTICATION_METHODS = FS_REMOTE
> > -----------------------------------
> > while there is no such thing for pool B.
> >
> > What else do I need to make flocking from A to B work?
> >
> > Thanks for any help
> >
> > Regards
> >
> > Urs Fitze
> >
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to
> > condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/
> >
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
>
> !DSPAM:466e91be213406491211187!
>