There is a problem with file stage-in that appears in Condor-G
for Condor version 6.9.1, but did not appear in Condor 6.7.x.
Specifically, the following submit file
$ more host.condorg
universe = grid
grid_resource = gt4 FQDN:8443 Fork
Executable = /bin/hostname
InitialDir = /home/gabriel/SubmitCondor
Output = host.condorg.$(Cluster).out
Error = host.condorg.$(Cluster).err
Log = host.condorg.$(Cluster).log
log_xml = True
Notification = Never
Transfer_Executable = False
when_to_transfer_output = ON_EXIT_OR_EVICT
queue
works in Condor 6.7.x with FQDN being the same host as the
central manager (i.e., Condor and Globus GRAM are co-located),
as well as being a remote host (i.e., submit to a
remote Globus resource).
However, in Condor 6.9.1, it only works when FQDN is
the local host, i.e., Condor submits to the Globus GRAM
on the same machine as the Condor central manager.
If FQDN is a a remote machine, the job is put on hold and
the following error occurs
HoldReason = "Globus error: Staging error for RSL element
fileStageIn."
In both cases, in the job class-ad I see
x509userproxy = "/tmp/x509up_u501"
where
$ grid-proxy-info -type -timeleft
Proxy draft (pre-RFC) compliant impersonation proxy
571631
I think that before submitting the job, Condor-G delegates
the X509 credential, then inserts the EPR of the delegated
credential resource in the job RSL submitted to Globus.
Has anything in the credential handling changed between
versions 6.7.x and 6.9.1 of Condor?