Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] permission problems
Hi,
I've set up Condor 6.8.5 on a couple of Debian Sarge machines as
non-root, but I can't get it to run any job.
First, I find in the docs
(http://www.cs.wisc.edu/condor/manual/v6.8/3_6Security.html#SECTION004612100000000000000)
that, if I mark condor_submit as set-GID, it will create group-writable
files. Alas, it doesn't, the created files are not group-writable.
If I pre-create the files and make them group-writable, it still complains:
% condor_submit submit_file
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 33.
WARNING: File [....]/out.0 is not writable by condor.
WARNING: File [....]/err.0 is not writable by condor.
In log.0, this is repeated a few times:
022 (035.000.000) 05/31 16:09:32 Job disconnected, attempting to reconnect
Socket between submit and execute hosts closed unexpectedly
Trying to reconnect to vm2@[...] <[...]:34068>
...
023 (035.000.000) 05/31 16:09:32 Job reconnected to vm2@[...]
startd address: <[...]:34068>
starter address: <[...]:43170>
before it gives up:
007 (035.000.000) 05/31 16:09:32 Shadow exception!
Error from starter on vm2@[...]: Repeated attempts to transfer
output failed for unknown reasons
0 - Run Bytes Sent By Job
75948 - Run Bytes Received By Job
In the ShadowLog, I see lines like these:
5/31 16:09:32 (35.0) (27919): Attempting to reconnect to starter
<[...]:43170>
5/31 16:09:32 (35.0) (27919): Reconnect SUCCESS: connection re-established
5/31 16:09:32 (35.0) (27919): ReliSock::get_file_with_permissions():
Failed to chmod file '[...]/out.0'
: Operation not permitted (errno: 1)
5/31 16:09:32 (35.0) (27919): DoDownload: SHADOW at [...] failed to
receive file [...]/out.0
5/31 16:09:32 (35.0) (27919): Can no longer talk to condor_starter
<[...]:43170>
Of course, the directory where out.0 exists, is group-writable.
The file out.0 actually gets written to, repeatedly, but the job never
leaves the queue and the log files keep growing, listing repeated error
messages until condor_rm is called.
Does anyone have any idea what I'm doing wrong?
kurt.