[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_g error when globus-job-run works



Hello again,

So, as is always the way - as soon as you ask for help you find the answer yourself!
I replaced the /etc/grid-security directory with one from another 
machine and everything started working correctly! I'm not sure of the 
differences unfortunately as (stupidly) I overwrote the old directory 
and all of its files so cannot compare, but I know it is not to do with 
the grid-mapfile (as there wasn't one before or after) which this error 
code often relates to.
Sorry I can't be more specific about the solution, but thanks for reading!

Rich

Rich Bruin wrote:
Hello All,

I'm trying to debug a problem we're having with a submit machine here but can't find any help via google / the list archive, so hopefully someone here can help!
In short, I can run globus jobs using globus directly (via 
globus-job-run etc) but running condor jobs leads to them quickly going 
held and reporting the following error message (from the job's log file):
Globus job submission failed!
Reason: 7 authentication failed: GSS Major Status: Authentication Failed GSS Minor Status Error Chain: init.c:499: globus_gss_assist_init_sec_context_async: Error during context initialization init_sec_context
In more detail, I am submitting via condor_g from a Debian sarge 
installation (2.4.27-2-386 kernel) running globus toolkit 4.0.1 and 
condor version 6.6.10 to any of a few remote resources running various 
versions of Linux and globus toolkits 2.4.3, 3.2.1 and 4.0.1. Simply 
running globus-job-run type commands works fine (directed to both the 
fork and pbs jobmanagers) but any job run via condor_g fails with the 
above error message in the local logs.
The logs on the remote machine read as follows:

Notice: 5: Authenticated globus user: /C=UK/O=eScience/OU=Cambridge/L=UCS/CN=richard bruin
Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6
Notice: 5: Requested service: jobmanager-fork
Notice: 5: Authorized as local user: rbru03
Notice: 5: Authorized as local uid: 501
Notice: 5:           and local gid: 501
Notice: 0: executing /usr/local/globus/libexec/globus-job-manager
Notice: 0: GRID_SECURITY_CONTEXT_FD=9
Notice: 0: Child 10758 started
Notice: 6: globus-gatekeeper pid=10838 starting at Tue Mar  7 16:25:06 2006

Notice: 6: Got connection 128.232.232.27 at Tue Mar  7 16:25:06 2006

Failed reading length 0
GSS authentication failure
     globus_gss_assist token :3: read failure: Connection closed
Failure: GSS failed Major:01090000 Minor:00000000 Token:00000003

Failure: GSS failed Major:01090000 Minor:00000000 Token:00000003

Notice: 6: globus-gatekeeper pid=10839 starting at Tue Mar  7 16:25:06 2006

Notice: 6: Got connection 128.232.232.27 at Tue Mar  7 16:25:06 2006

Failed reading length 0
GSS authentication failure
     globus_gss_assist token :3: read failure: Connection closed
Failure: GSS failed Major:01090000 Minor:00000000 Token:00000003

Failure: GSS failed Major:01090000 Minor:00000000 Token:00000003

Does anyone have any idea what is happening here? I have other machines with near enough identical installations and they work fine, it just seems to be this one client machine!
Any help you could provide would be much appreciated, thanks in advance,

Rich

-------------------------------
Richard Bruin
PhD Student
Department of Earth Sciences
University of Cambridge
eMinerals project www.eminerals.org
rbru03@xxxxxxxxxxxxx
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users