Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] weird reason for held job
- Date: Wed, 15 Dec 2010 14:38:27 -0600
- From: Dennis Box <dbox@xxxxxxxx>
- Subject: [Condor-users] weird reason for held job
I can create a condor job which gets held almost immediately after
submission:
[dbox@gpsn01 ~]$ condor_q dbox
-- Submitter: gpsn01.fnal.gov : <131.225.67.70:60205> : gpsn01.fnal.gov
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
2070.0 dbox 12/15 13:58 0+00:00:00 H 0 0.0
hello_world.sh_201
Looking deeper into why it is held:
[dbox@gpsn01 ~]$ condor_q -l 2070.0 | grep HoldReason
LastHoldReason = "Error from slot2@xxxxxxxxxxxxxxxx: Failed to execute
'/grid/fermiapp/lbne/condor-exec/dbox/hello_world.sh_20101215_135745_1_wrap.sh'
with arguments 360: No such file or directory"
LastHoldReasonCode = 6
LastHoldReasonSubCode = 2
Here's the weird part: I ssh to the machine where the error occurs and
look at it, the file seems to be fine!
[dbox@gpwn002 ~]$ ls -la
/grid/fermiapp/lbne/condor-exec/dbox/hello_world.sh_20101215_135745_1_wrap.sh
-rwxr-xr-x 1 dbox gpcf 800 Dec 15 13:57
/grid/fermiapp/lbne/condor-exec/dbox/hello_world.sh_20101215_135745_1_wrap.sh
Any suggestions on how to proceed to debug this?
Thanks
Dennis