Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Condor-G/Globus Problem
- Date: Wed, 19 Oct 2005 09:54:57 -0400 (EDT)
- From: "James E. Dobson" <James.E.Dobson@xxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Condor-G/Globus Problem
> Hmm. I'd have to look at the gridmanager (client side) and jobmanager
> (server side) log files to diagnose this. One possibility: does your CA
> use CRLs with short lifetimes (shorter than the runtime of your jobs)?
> We've seen problems where the CRL gets cached in memory and never
> refreshed as long as the gridmanager is running.
I'm having continued problems. We are not updating CRLs from my CA. I have
a process which creates a new grid-mapfile. My guess is we are looking for
the file when it isn't to be found. Shouldn't Condor retry this before
killing the job?
[jed@bellows-falls jed]$ condor_q |grep H | awk '{print $1}' | while read
x; do condor_q -l $x | egrep "^HoldReason =|^GlobusResource" ; done
GlobusResource = "pbs-01.grid.dartmouth.edu/jobmanager-condor"
HoldReason = "Globus error 7: authentication with the remote server
failed"
GlobusResource = "pbs-01.grid.dartmouth.edu/jobmanager-condor"
HoldReason = "Globus error 7: authentication with the remote server
failed"
GlobusResource = "pbs-01.grid.dartmouth.edu/jobmanager-condor"
HoldReason = "Globus error 7: authentication with the remote server
failed"
GlobusResource = "pbs-01.grid.dartmouth.edu/jobmanager-condor"
HoldReason = "Globus error 7: authentication with the remote server
failed"
GlobusResource = "pbs-01.grid.dartmouth.edu/jobmanager-condor"
HoldReason = "Globus error 7: authentication with the remote server
failed"
GlobusResource = "pbs-01.grid.dartmouth.edu/jobmanager-condor"
HoldReason = "Globus error 7: authentication with the remote server
failed"
>From the UserLog:
000 (2443.000.000) 10/18 14:03:55 Job submitted from host:
<129.170.30.5:32793>
...
017 (2443.000.000) 10/18 14:04:15 Job submitted to Globus
RM-Contact: pbs-01.grid.dartmouth.edu/jobmanager-condor
JM-Contact: https://pbs-01.grid.dartmouth.edu:37397/14340/1129658645/
Can-Restart-JM: 1
...
001 (2443.000.000) 10/18 14:04:21 Job executing on host:
pbs-01.grid.dartmouth.e
du
...
012 (2443.000.000) 10/18 23:10:21 Job was held.
Globus error 7: authentication with the remote server failed
Code 2 Subcode 7
...
On the server side:
TIME: Tue Oct 18 23:10:20 2005
PID: 18511 -- Failure: globus_gss_assist_gridmap() failed authorization.
gridma
p.c:globus_l_gss_assist_gridmap_lookup:1621:
Gridmap lookup failure: Could not map
/O=Dartmouth/CN=host/bellows-falls.grid.dartmouth.edu
Verbose error follows:
gridmap.c:globus_l_gss_assist_gridmap_lookup:1621:
Gridmap lookup failure: Could not map
/O=Dartmouth/CN=host/bellows-falls.grid.dartmouth.edu