[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Executable Fails to Transfer



Hi Fred,

Judging from the failure message:

> 018 (096.000.000) 04/17 14:00:30 Globus job submission failed!
>    Reason: 43 the job manager failed to stage the executable

it appears to have nothing to do with the age of HTCondor (although, ahem, time to upgrade ;).  Instead, it appears to be a firewall issue on that submit host.

Are you aware of the various considerations for combining a firewall and a HTCondor-G submit host?  Can you do a quick comparison between the working and non-working ones?

Thanks,

Brian

On Apr 17, 2014, at 2:27 PM, Frederick Luehring <luehring@xxxxxxxxxxx> wrote:

> Hi Everyone,
> 
>   I have a condor submit host which succeeds in running a simple test job and
> a condor submit host the test job fails. The failing submit host is running
> this version of condor because that's what comes with ROCKS:
> 
> $CondorVersion: 7.8.5 Oct 09 2012 BuildID: 68720 $
> $CondorPlatform: x86_64_rhap_6.3 $
> 
> The job returns this set of messages from condor:
> 
> 000 (096.000.000) 04/17 13:55:20 Job submitted from host:
> <129.79.157.90:11015?sock=2507_a415_3>
> ...
> 018 (096.000.000) 04/17 14:00:30 Globus job submission failed!
>    Reason: 43 the job manager failed to stage the executable
> ...
> 009 (096.000.000) 04/17 14:00:30 Job was aborted by the user.
> 	Globus error 43: the job manager failed to stage the executable
> ...
> 
> The working submit host is running a newer version of condor:
> 
> $CondorVersion: 8.1.1 Sep 11 2013 BuildID: 171174 $
> $CondorPlatform: x86_64_RedHat6 $
> 
> The working job returns these messages from condor:
> 
> 009 (096.000.000) 04/17 14:00:30 Job was aborted by the user.
> 	Globus error 43: the job manager failed to stage the executable
> ...
> 000 (096.000.000) 04/17 15:11:11 Job submitted from host:
> <129.79.157.89:11015?sock=2742_2cf7_4>
> ...
> 017 (096.000.000) 04/17 15:11:20 Job submitted to Globus
>    RM-Contact: gate04.aglt2.org/jobmanager-condor
>    JM-Contact: gate04.aglt2.org/jobmanager-condor
>    Can-Restart-JM: 1
> ...
> 027 (096.000.000) 04/17 15:11:20 Job submitted to grid resource
>    GridResource: gt5 gate04.aglt2.org/jobmanager-condor
>    GridJobId: gt5 gate04.aglt2.org/jobmanager-condor
> https://gate04.aglt2.org:59832/16361969724494590991/6276480034496635811/
> ...
> 001 (096.000.000) 04/17 15:11:55 Job executing on host: gt5
> gate04.aglt2.org/jobmanager-condor
> ...
> 005 (096.000.000) 04/17 15:12:10 Job terminated.
> 	(1) Normal termination (return value 0)
> 		Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
> 		Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
> 		Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
> 		Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
> 	0  -  Run Bytes Sent By Job
> 	0  -  Run Bytes Received By Job
> 	0  -  Total Bytes Sent By Job
> 	0  -  Total Bytes Received By Job
> ...
> 
> The jobs are submitted from the same nfs mounted directory on both submit
> hosts. The job commands are:
> 
> grid_resource=gt5 gate04.aglt2.org/jobmanager-condor
> globusrsl=(jobtype=single)(queue=Tier3Test)
> copy_to_spool = True
> +Nonessential = True
> universe=grid
> notify_user=luehring@xxxxxxxxxxx
> +MATCH_APF_QUEUE="ANALY_AGLT2_TIER3_TEST"
> x509userproxy=$ENV(HOME)/x509_Proxy
> 
> executable=foo.sh
> 
> Dir=/s/luehring/panda_wrapper
> output=$(Dir)/$(Cluster).$(Process).log
> error=$(Dir)/$(Cluster).$(Process).log
> log=$(Dir)/$(Cluster).log
> 
> stream_output=False
> stream_error=False
> notification=Error
> transfer_executable = True
> Should_Transfer_Files   = Yes
> queue 1
> 
> where foo.sh contains this trivial payload:
> 
> #!/bin/zsh
> 
> /bin/env
> /bin/ls -l
> /usr/bin/voms-proxy-info -all
> 
> 
> Any advice would be appreciated.
> 
> Thanks greatly!
> 
> Fred
> 
> -- 
> Fred Luehring Indiana U. HEP mailto:luehring@xxxxxxxxxxx  +1 812 855 1025 IU
> http://cern.ch/Fred.Luehring mailto:Fred.Luehring@xxxxxxx +41 22 767 1166 CERN
> http://cern.ch/Fred.Luehring/Luehring_pub.asc             +1 812 391 0225 GSM
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/