On Wed, Aug 18, 2010 at 1:56 PM, Lee Mitchell
<mr.lee.mitchell@xxxxxxxxx> wrote:
Hello All, Does anyone have a suggestion for how to get past this issue?
When I submit from my negotiator host, and jobs can run on my
negotiator host, but if I force a job to run on some other machine (
not run on the negotiator) in the job submission requirements, eg
( machine != "uskyarpds0310.air.ups.com" )
Then the job Runs for 2 seconds and goes into the Hold state.
condor_q -better says:
-- Submitter: uskyarpds0310.air.ups.com : <10.224.217.231:8452> :
uskyarpds0310.air.ups.com
---
2891.000: Request is held.
Hold reason: Error from slot1@xxxxxxxxxxxxxxxxxxxxxxxxx: STARTER at
10.224.176.128 failed to write to file
/opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe:
(errno 13) Permission denied
-------------
In the logs I see:
== Shadow Log on submit machine ==
08/18 17:47:26 Initializing a VANILLA shadow for job 2891.0
08/18 17:47:27 (2891.0) (18621): Request to run on
slot1@xxxxxxxxxxxxxxxxxxxxxxxxx <10.224.176.128:50433> was ACCEPTED
08/18 17:47:28 (2891.0) (18621): DoUpload: (Condor error code 12,
subcode 13) SHADOW at 10.224.217.231 failed to send file(s) to
<10.224.176.128:53124>; STARTER at 10.224.176.128 failed to write to
file /opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe:
(errno 13) Permission denied
08/18 17:47:28 (2891.0) (18621): Job 2891.0 going into Hold state
(code 12,13): Error from slot1@xxxxxxxxxxxxxxxxxxxxxxxxx: STARTER at
10.224.176.128 failed to write to file
/opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe:
(errno 13) Permission denied
08/18 17:47:28 (2891.0) (18621): **** condor_shadow (condor_SHADOW)
pid 18621 EXITING WITH STATUS 112
== StarterLog.slot1 on the remote execute node ==
08/18 17:47:27 get_file(): Failed to open file
/opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe,
errno = 13: Permission denied.
08/18 17:47:28 get_file(): consumed 18446296 bytes of file transmission
08/18 17:47:28 DoDownload: consuming rest of transfer and failing
after encountering the following error: STARTER at 10.224.176.128
failed to write to file
/opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe:
(errno 13) Permission denied
08/18 17:47:28 WARNING: File
/opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe
can not be accessed by Quill file transfer tracking.
08/18 17:47:28 File transfer failed (status=0).
08/18 17:47:28 ERROR "Failed to transfer files" at line 1882 in file
jic_shadow.cpp
---------
Actually, the failure to write the file to the execute sub dir
happens for all files transfered, not just the exe. I see the same
block of messages in the StarterLog.slot1 for every file that is
specified in my submit file's transfer_input_files value
On the remote execute machine, the permissions for the directory
/opt/condor/app/installation/local.compute-node/execute/dir_10143/
were: (from ls -l )
drwxr-xr-x 2 nobody nobody 4096 Aug 18 17:47 dir_10143
To start condor, I call condor_master as root, and condor has a umask of 0077.
The filesystem has the following properties output from the command: mount
/dev/mapper/vg00-lv_condor_app on /opt/condor/app type ext3 (rw)
It is a local filesystem, not NFS.
All machines are the same regarding: x86_64, running condor 7.4.2 on RHEL 5.5
Any requests for futher information or suggestions on how to track
down the problem would be greatly appreciated.
Thank You,
Lee
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/