Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] weird reason for held job
- Date: Wed, 15 Dec 2010 21:04:17 +0000
- From: Ian Cottam <Ian.Cottam@xxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] weird reason for held job
Did it start
#!/bin/bash
?
Did you prepare it under Windows, such that Linux sees
#!/bin/bash\r
?
-Ian
On 15/12/2010 20:38, "Dennis Box" <dbox@xxxxxxxx> wrote:
>
>I can create a condor job which gets held almost immediately after
>submission:
>[dbox@gpsn01 ~]$ condor_q dbox
>
>
>-- Submitter: gpsn01.fnal.gov : <131.225.67.70:60205> : gpsn01.fnal.gov
> ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
>2070.0 dbox 12/15 13:58 0+00:00:00 H 0 0.0
>hello_world.sh_201
>
>Looking deeper into why it is held:
>
>[dbox@gpsn01 ~]$ condor_q -l 2070.0 | grep HoldReason
>LastHoldReason = "Error from slot2@xxxxxxxxxxxxxxxx: Failed to execute
>'/grid/fermiapp/lbne/condor-exec/dbox/hello_world.sh_20101215_135745_1_wra
>p.sh'
>with arguments 360: No such file or directory"
>LastHoldReasonCode = 6
>LastHoldReasonSubCode = 2
>
>
>Here's the weird part: I ssh to the machine where the error occurs and
>look at it, the file seems to be fine!
>[dbox@gpwn002 ~]$ ls -la
>/grid/fermiapp/lbne/condor-exec/dbox/hello_world.sh_20101215_135745_1_wrap
>.sh
>-rwxr-xr-x 1 dbox gpcf 800 Dec 15 13:57
>/grid/fermiapp/lbne/condor-exec/dbox/hello_world.sh_20101215_135745_1_wrap
>.sh
>
>
>Any suggestions on how to proceed to debug this?
>
>Thanks
>Dennis
>
>_______________________________________________
>Condor-users mailing list
>To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>subject: Unsubscribe
>You can also unsubscribe by visiting
>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>The archives can be found at:
>https://lists.cs.wisc.edu/archive/condor-users/