Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] condor problem : shadow unable to transmit output file
- Date: Wed, 6 Jun 2007 13:20:36 +0100 (BST)
- From: o c <send_junk_here_10101@xxxxxxxxxxx>
- Subject: Re: [Condor-users] condor problem : shadow unable to transmit output file
Hi,
Take out the "transfer_output_files" line -- the file
"true" does not exist on the worker side.
In general, you do not need to tell Condor to tranfer
back the output files, or list the files to be
returned.
OC.
--- USTV_condor_Task_Force USTV_condor_Task_Force
<ustv.condor.task.force@xxxxxxxxx> wrote:
> Hello , We are making a test grid in order to
> harness all our lab computer
> processing power
> and we ran in a problem we are unable to solve.
>
> our pool is for currently made out of
> licinfo10.uni LINUX INTEL Owner Idle
> 0.000 502
> 0+00:10:02 - ubuntu edgy eft
> licinfo11.uni LINUX INTEL Owner Idle
> 0.000 502
> 0+00:10:01 - ubuntu edgy eft
> vm1@moua LINUX INTEL Owner Idle
> 0.060 504
> 0+00:08:24 - RH FC 6
> vm2@moua LINUX INTEL Owner Idle
> 0.000 504
> 0+00:08:25
> vm1@nocte LINUX INTEL Owner Idle
> 0.270 506
> 0+00:10:09 - debian sid
> vm2@nocte LINUX INTEL Owner Idle
> 0.000 506
> 0+00:10:10
> vm1@nous LINUX INTEL Owner Idle
> 0.070 505
> 0+00:10:09 - ubuntu festy fawn
> vm2@nous LINUX INTEL Owner Idle
> 0.000 505
> 0+00:10:10
>
> i tested a test submit i had on this ml :
>
> executable = /bin/hostname
> universe = vanilla
> TransferExecutable = true
> transfer_output_files= true
> output=results.output.$(Process)
> error=results.error.$(Process)
> log=results.log.$(Process)
> should_transfer_files = YES
> when_to_transfer_output = ON_EXIT_OR_EVICT
> queue 5
>
> our problem consist in all our jobs going quickly
> from idle to held state
> with all our job logs telling :
>
>
> 000 (001.003.000) 06/05 14:07:02 Job submitted from
> host: <10.9.185.29:38947
> >
> ...
> 001 (001.003.000) 06/05 14:17:11 Job executing on
> host: <10.9.185.211:42641>
> ...
> 007 (001.003.000) 06/05 14:17:11 Shadow exception!
> Error from starter on licinfo11.xxx: STARTER
> at 10.9.185.211 failed
> to send file(s) to <10.9.185.29:60059>: error
> reading from
> /condor/licinfo11/execute/dir_9027/true: (errno 2)
> No such file or
> directory; SHADOW failed to receive file(s) from
> <10.9.185.211:53966>
> 0 - Run Bytes Sent By Job
> 8572 - Run Bytes Received By Job
> ...
> 012 (001.003.000) 06/05 14:17:11 Job was held.
> Error from starter on licinfo11.xxx: STARTER
> at 10.9.185.211 failed
> to send file(s) to <10.9.185.29:60059>: error
> reading from
> /condor/licinfo11/execute/dir_9027/true: (errno 2)
> No such file or
> directory; SHADOW failed to receive file(s) from
> <10.9.185.211:53966>
> Code 13 Subcode 2
> ...
>
> i have
> LOCAL_DIR = /condor/$(HOSTNAME)
> previously had
> #LOCAL_DIR = $(RELEASE_DIR)/hosts/$(HOSTNAME)
>
> changed it in order to have the local dir local to
> the nodes as i saw on the
> ml that remote local dirs could pose some problems
> if the machines weren't
> correctly time synchronised (our /home/condor is nfs
> shared amoung all our
> nodes)
>
> additionnal info : all our UIDs are shared among our
> hosts
>
> apparently condor don't manage to create the dirs in
> $(LOCAL_DIR)/execute
> (wich i chmoded to be world writable) to sed them
> back
>
> Hope somebody can Help :)
>
> The USTV Condor Task Force
> > _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to
> condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
>
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
___________________________________________________________
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html