Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Cannot execute Job on remote host, permission denied to write condor_exec.exe
- Date: Wed, 18 Aug 2010 14:56:21 -0400
- From: Lee Mitchell <mr.lee.mitchell@xxxxxxxxx>
- Subject: [Condor-users] Cannot execute Job on remote host, permission denied to write condor_exec.exe
Hello All, Does anyone have a suggestion for how to get past this issue?
When I submit from my negotiator host, and jobs can run on my
negotiator host, but if I force a job to run on some other machine (
not run on the negotiator) in the job submission requirements, eg
( machine != "uskyarpds0310.air.ups.com" )
Then the job Runs for 2 seconds and goes into the Hold state.
condor_q -better says:
-- Submitter: uskyarpds0310.air.ups.com : <10.224.217.231:8452> :
uskyarpds0310.air.ups.com
---
2891.000: Request is held.
Hold reason: Error from slot1@xxxxxxxxxxxxxxxxxxxxxxxxx: STARTER at
10.224.176.128 failed to write to file
/opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe:
(errno 13) Permission denied
-------------
In the logs I see:
== Shadow Log on submit machine ==
08/18 17:47:26 Initializing a VANILLA shadow for job 2891.0
08/18 17:47:27 (2891.0) (18621): Request to run on
slot1@xxxxxxxxxxxxxxxxxxxxxxxxx <10.224.176.128:50433> was ACCEPTED
08/18 17:47:28 (2891.0) (18621): DoUpload: (Condor error code 12,
subcode 13) SHADOW at 10.224.217.231 failed to send file(s) to
<10.224.176.128:53124>; STARTER at 10.224.176.128 failed to write to
file /opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe:
(errno 13) Permission denied
08/18 17:47:28 (2891.0) (18621): Job 2891.0 going into Hold state
(code 12,13): Error from slot1@xxxxxxxxxxxxxxxxxxxxxxxxx: STARTER at
10.224.176.128 failed to write to file
/opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe:
(errno 13) Permission denied
08/18 17:47:28 (2891.0) (18621): **** condor_shadow (condor_SHADOW)
pid 18621 EXITING WITH STATUS 112
== StarterLog.slot1 on the remote execute node ==
08/18 17:47:27 get_file(): Failed to open file
/opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe,
errno = 13: Permission denied.
08/18 17:47:28 get_file(): consumed 18446296 bytes of file transmission
08/18 17:47:28 DoDownload: consuming rest of transfer and failing
after encountering the following error: STARTER at 10.224.176.128
failed to write to file
/opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe:
(errno 13) Permission denied
08/18 17:47:28 WARNING: File
/opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe
can not be accessed by Quill file transfer tracking.
08/18 17:47:28 File transfer failed (status=0).
08/18 17:47:28 ERROR "Failed to transfer files" at line 1882 in file
jic_shadow.cpp
---------
Actually, the failure to write the file to the execute sub dir
happens for all files transfered, not just the exe. I see the same
block of messages in the StarterLog.slot1 for every file that is
specified in my submit file's transfer_input_files value
On the remote execute machine, the permissions for the directory
/opt/condor/app/installation/local.compute-node/execute/dir_10143/
were: (from ls -l )
drwxr-xr-x 2 nobody nobody 4096 Aug 18 17:47 dir_10143
To start condor, I call condor_master as root, and condor has a umask of 0077.
The filesystem has the following properties output from the command: mount
/dev/mapper/vg00-lv_condor_app on /opt/condor/app type ext3 (rw)
It is a local filesystem, not NFS.
All machines are the same regarding: x86_64, running condor 7.4.2 on RHEL 5.5
Any requests for futher information or suggestions on how to track
down the problem would be greatly appreciated.
Thank You,
Lee