Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Executable Fails to Transfer
- Date: Sun, 20 Apr 2014 21:36:10 -0500
- From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Executable Fails to Transfer
Hi Fred,
Judging from the failure message:
> 018 (096.000.000) 04/17 14:00:30 Globus job submission failed!
> Reason: 43 the job manager failed to stage the executable
it appears to have nothing to do with the age of HTCondor (although, ahem, time to upgrade ;). Instead, it appears to be a firewall issue on that submit host.
Are you aware of the various considerations for combining a firewall and a HTCondor-G submit host? Can you do a quick comparison between the working and non-working ones?
Thanks,
Brian
On Apr 17, 2014, at 2:27 PM, Frederick Luehring <luehring@xxxxxxxxxxx> wrote:
> Hi Everyone,
>
> I have a condor submit host which succeeds in running a simple test job and
> a condor submit host the test job fails. The failing submit host is running
> this version of condor because that's what comes with ROCKS:
>
> $CondorVersion: 7.8.5 Oct 09 2012 BuildID: 68720 $
> $CondorPlatform: x86_64_rhap_6.3 $
>
> The job returns this set of messages from condor:
>
> 000 (096.000.000) 04/17 13:55:20 Job submitted from host:
> <129.79.157.90:11015?sock=2507_a415_3>
> ...
> 018 (096.000.000) 04/17 14:00:30 Globus job submission failed!
> Reason: 43 the job manager failed to stage the executable
> ...
> 009 (096.000.000) 04/17 14:00:30 Job was aborted by the user.
> Globus error 43: the job manager failed to stage the executable
> ...
>
> The working submit host is running a newer version of condor:
>
> $CondorVersion: 8.1.1 Sep 11 2013 BuildID: 171174 $
> $CondorPlatform: x86_64_RedHat6 $
>
> The working job returns these messages from condor:
>
> 009 (096.000.000) 04/17 14:00:30 Job was aborted by the user.
> Globus error 43: the job manager failed to stage the executable
> ...
> 000 (096.000.000) 04/17 15:11:11 Job submitted from host:
> <129.79.157.89:11015?sock=2742_2cf7_4>
> ...
> 017 (096.000.000) 04/17 15:11:20 Job submitted to Globus
> RM-Contact: gate04.aglt2.org/jobmanager-condor
> JM-Contact: gate04.aglt2.org/jobmanager-condor
> Can-Restart-JM: 1
> ...
> 027 (096.000.000) 04/17 15:11:20 Job submitted to grid resource
> GridResource: gt5 gate04.aglt2.org/jobmanager-condor
> GridJobId: gt5 gate04.aglt2.org/jobmanager-condor
> https://gate04.aglt2.org:59832/16361969724494590991/6276480034496635811/
> ...
> 001 (096.000.000) 04/17 15:11:55 Job executing on host: gt5
> gate04.aglt2.org/jobmanager-condor
> ...
> 005 (096.000.000) 04/17 15:12:10 Job terminated.
> (1) Normal termination (return value 0)
> Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
> Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
> Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
> Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
> 0 - Run Bytes Sent By Job
> 0 - Run Bytes Received By Job
> 0 - Total Bytes Sent By Job
> 0 - Total Bytes Received By Job
> ...
>
> The jobs are submitted from the same nfs mounted directory on both submit
> hosts. The job commands are:
>
> grid_resource=gt5 gate04.aglt2.org/jobmanager-condor
> globusrsl=(jobtype=single)(queue=Tier3Test)
> copy_to_spool = True
> +Nonessential = True
> universe=grid
> notify_user=luehring@xxxxxxxxxxx
> +MATCH_APF_QUEUE="ANALY_AGLT2_TIER3_TEST"
> x509userproxy=$ENV(HOME)/x509_Proxy
>
> executable=foo.sh
>
> Dir=/s/luehring/panda_wrapper
> output=$(Dir)/$(Cluster).$(Process).log
> error=$(Dir)/$(Cluster).$(Process).log
> log=$(Dir)/$(Cluster).log
>
> stream_output=False
> stream_error=False
> notification=Error
> transfer_executable = True
> Should_Transfer_Files = Yes
> queue 1
>
> where foo.sh contains this trivial payload:
>
> #!/bin/zsh
>
> /bin/env
> /bin/ls -l
> /usr/bin/voms-proxy-info -all
>
>
> Any advice would be appreciated.
>
> Thanks greatly!
>
> Fred
>
> --
> Fred Luehring Indiana U. HEP mailto:luehring@xxxxxxxxxxx +1 812 855 1025 IU
> http://cern.ch/Fred.Luehring mailto:Fred.Luehring@xxxxxxx +41 22 767 1166 CERN
> http://cern.ch/Fred.Luehring/Luehring_pub.asc +1 812 391 0225 GSM
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/