Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Flocking
- Date: Tue, 12 Jun 2007 13:21:49 +0100
- From: "Kewley, J \(John\)" <j.kewley@xxxxxxxx>
- Subject: Re: [Condor-users] Flocking
Re: Flocking.
* Can all your submit nodes in your first pool "see" (i.e. no firewalls in the way,
and not behind a NAT) all execute nodes in your other pool?
* -remote is for direct submission to another pool, not for flocking.
* Check your HOSTALLOW values in pool B
One test you could do is to name, say, the head node of the 2nd pool (assuming it
can run jobs) in the REQUIREMENTS statement of a job on pool A. It then CANNOT
run on poll A and, assuming all else is setup correctly, will run on pool B via flocking.
If that works, name one of the workers in Pool B and try again. Don't use -remote for this.
Cheers
JK
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx]On Behalf Of Urs Fitze
> Sent: Tuesday, June 12, 2007 12:48 PM
> To: condor-users@xxxxxxxxxxx
> Subject: [Condor-users] Flocking
>
>
> Hi,
>
> I'm trying to set up flocking between 2 pools having
> different UID_DOMAIN and FILESYSTEM_DOMAIN.
> I followed the (partially unclear) instructions from the
> manual '5.2 Connecting Condor Pools with Flocking'
> i.e. by setting
> --------------------------------
> FLOCK_TO = <manager of pool B>
> --------------------------------
> on the submitter of pool A and setting
> --------------------------------------------------------------
> FLOCK_FROM = <list of hosts containing submitter of pool A>.
> --------------------------------------------------------------
> After solving all firewall-issues I submitted a job(-cluster)
> on the submitter in pool A by:
> --------------------------------------------------------------
> ---------------------
> condor_submit -remote <manager of B> -pool <manager of B>
> <name of submit-file>
> --------------------------------------------------------------
> ---------------------
> (when obmitting the '-remote ..' option the job would NEVER
> flock to B, even if there
> were no ressources in A, why?)
> This way I finally got some tracks in the logs of the manager
> of B, namely in
> '/scratch/condor/log/SchedLog':
> --------------------------------------------------------------
> ---------------------------------------------------
> 6/12 12:17:07 (pid:31692) authenticate_self_gss: acquiring
> self credentials failed. Please check your Condor
> configuration file if this is a server process. Or the user
> environment variable if this is a user process.
>
> GSS Major Status: General failure
> GSS Minor Status Error Chain:
> globus_gsi_gssapi: Error with GSI credential
> globus_gsi_gssapi: Error with gss credential handle
> globus_credential: Valid credentials could not be found in
> any of the possible locations specified by the credential
> search order.
> Valid credentials could not be found in any of the possible
> locations specified by the credential search order.
>
> Attempt 1
>
> globus_credential: Error reading host credential
> globus_sysconfig: Could not find a valid certificate file:
> The host cert could not be found in:
> 1) env. var. X509_USER_CERT
> 2) /etc/grid-security/hostcert.pem
> 3) $GLOBUS_LOCATION/etc/hostcert.pem
> 4) $HOME/.globus/hostcert.pem
>
> The host key could not be found in:
> 1) env. var. X509_USER_KEY
> 2) /etc/grid-security/hostkey.pem
> 3) $GLOBUS_LOCATION/etc/hostkey.pem
> 4) $HOME/.globus/hostkey.pem
>
>
>
> Attempt 2
>
> globus_credential: Error reading proxy credential
> globus_sysconfig: Could not find a valid proxy certificate
> file location
> globus_sysconfig: Error with key filename
> globus_sysconfig: File does not exist: /tmp/x509up_u0 is not
> a valid file
>
> Attempt 3
>
> globus_credential: Error reading user credential
> globus_sysconfig: Error with certificate filename: The user
> cert could not be found in:
> 1) env. var. X509_USER_CERT
> 2) $HOME/.globus/usercert.pem
> 3) $HOME/.globus/usercred.p12
>
>
>
>
> 6/12 12:17:07 (pid:31692) AUTHENTICATE: no available
> authentication methods succeeded, failing!
> 6/12 12:17:07 (pid:31692) SCHEDD: authentication failed:
> AUTHENTICATE:1003:Failed to authenticate with any
> method|AUTHENTICATE:1004:Failed to authenticate using
> GSI|GSI:5003:Failed to authenticate. Globus is reporting
> error (851968:133). There is probably a problem with your
> credentials. (Did you run
> grid-proxy-init?)|AUTHENTICATE:1004:Failed to authenticate
> using KERBEROS|AUTHENTICATE:1004:Failed to authenticate using
> FS|FS:1004:Unable to lstat(/tmp/FS_XXX5hDIkK)
> --------------------------------------------------------------
> -----------------------------------------
> What happened here? I wonder because in the Flocking chapter
> in the manual there is no
> mentioning of 'credentials', 'authentification' etc...only
> the reference to 'file-transfer-mechanism'
> contains some infos in this direction.
> Btw. I got the above log both for vanilla and standard jobs and had
> -----------------------------------
> should_transfer_files = YES
> when_to_transfer_output = ON_EXIT
> -----------------------------------
> in the submit-file for the vanilla job.
>
> On possibly remarkable thing is that in the (global)
> config-file for pool A there is the line
> -----------------------------------
> AUTHENTICATION_METHODS = FS_REMOTE
> -----------------------------------
> while there is no such thing for pool B.
>
> What else do I need to make flocking from A to B work?
>
> Thanks for any help
>
> Regards
>
> Urs Fitze
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to
> condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>