Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Executable Fails to Transfer
- Date: Thu, 17 Apr 2014 15:27:28 -0400
- From: Frederick Luehring <luehring@xxxxxxxxxxx>
- Subject: [HTCondor-users] Executable Fails to Transfer
Hi Everyone,
I have a condor submit host which succeeds in running a simple test job and
a condor submit host the test job fails. The failing submit host is running
this version of condor because that's what comes with ROCKS:
$CondorVersion: 7.8.5 Oct 09 2012 BuildID: 68720 $
$CondorPlatform: x86_64_rhap_6.3 $
The job returns this set of messages from condor:
000 (096.000.000) 04/17 13:55:20 Job submitted from host:
<129.79.157.90:11015?sock=2507_a415_3>
...
018 (096.000.000) 04/17 14:00:30 Globus job submission failed!
Reason: 43 the job manager failed to stage the executable
...
009 (096.000.000) 04/17 14:00:30 Job was aborted by the user.
Globus error 43: the job manager failed to stage the executable
...
The working submit host is running a newer version of condor:
$CondorVersion: 8.1.1 Sep 11 2013 BuildID: 171174 $
$CondorPlatform: x86_64_RedHat6 $
The working job returns these messages from condor:
009 (096.000.000) 04/17 14:00:30 Job was aborted by the user.
Globus error 43: the job manager failed to stage the executable
...
000 (096.000.000) 04/17 15:11:11 Job submitted from host:
<129.79.157.89:11015?sock=2742_2cf7_4>
...
017 (096.000.000) 04/17 15:11:20 Job submitted to Globus
RM-Contact: gate04.aglt2.org/jobmanager-condor
JM-Contact: gate04.aglt2.org/jobmanager-condor
Can-Restart-JM: 1
...
027 (096.000.000) 04/17 15:11:20 Job submitted to grid resource
GridResource: gt5 gate04.aglt2.org/jobmanager-condor
GridJobId: gt5 gate04.aglt2.org/jobmanager-condor
https://gate04.aglt2.org:59832/16361969724494590991/6276480034496635811/
...
001 (096.000.000) 04/17 15:11:55 Job executing on host: gt5
gate04.aglt2.org/jobmanager-condor
...
005 (096.000.000) 04/17 15:12:10 Job terminated.
(1) Normal termination (return value 0)
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
0 - Run Bytes Sent By Job
0 - Run Bytes Received By Job
0 - Total Bytes Sent By Job
0 - Total Bytes Received By Job
...
The jobs are submitted from the same nfs mounted directory on both submit
hosts. The job commands are:
grid_resource=gt5 gate04.aglt2.org/jobmanager-condor
globusrsl=(jobtype=single)(queue=Tier3Test)
copy_to_spool = True
+Nonessential = True
universe=grid
notify_user=luehring@xxxxxxxxxxx
+MATCH_APF_QUEUE="ANALY_AGLT2_TIER3_TEST"
x509userproxy=$ENV(HOME)/x509_Proxy
executable=foo.sh
Dir=/s/luehring/panda_wrapper
output=$(Dir)/$(Cluster).$(Process).log
error=$(Dir)/$(Cluster).$(Process).log
log=$(Dir)/$(Cluster).log
stream_output=False
stream_error=False
notification=Error
transfer_executable = True
Should_Transfer_Files = Yes
queue 1
where foo.sh contains this trivial payload:
#!/bin/zsh
/bin/env
/bin/ls -l
/usr/bin/voms-proxy-info -all
Any advice would be appreciated.
Thanks greatly!
Fred
--
Fred Luehring Indiana U. HEP mailto:luehring@xxxxxxxxxxx +1 812 855 1025 IU
http://cern.ch/Fred.Luehring mailto:Fred.Luehring@xxxxxxx +41 22 767 1166 CERN
http://cern.ch/Fred.Luehring/Luehring_pub.asc +1 812 391 0225 GSM