Dear condor-users,
I installed condor-6.6.1 on our computers, one with glibc 2.3 (the master) and the others with glibc 2.2. Each machine has a condor user, with the home dir on the local disk in /var/home/condor . The release directory (glibc-2.2 version) is exported over nfs to the clients, and is mounted in /var/home/condor/release .
Now here's the problem: Jobs which are scheduled to run on the clients are sent back to the master, creating this entry in the Job log:
....
007 (013.000.000) 03/30 20:07:47 Shadow exception!
Can no longer talk to condor_starter on execute machine (192.168.185.14)
0 - Run Bytes Sent By Job
0 - Run Bytes Received By Job
...
On the execute machine, the StarterLog says:
3/30 20:07:47 couldn't create dir /var/home/condor/execute/dir_2064: Permission
denied
(See below for context in StarterLog)
/var/home/condor/execute is rwxrwxrwxt, and dir creation works if I do it by hand. The condor daemons are started as root, as recommended.
The funny thing is, that it _did_ work when I used one of the glibc 2.2 boxes as the master, and had the release directory on a local disk. But I don't see why having the release dir on an nfs mount should prevent dirs from being created on the local disk.
So, did anybody already have a similar problem? Or can you give me a hint what might be going wrong here?
Thanks in advance,
Michael
StarterLog:
[...]
3/30 20:07:47 ******************************************************
3/30 20:07:47 ** condor_starter (CONDOR_STARTER) STARTING UP
3/30 20:07:47 ** $CondorVersion: 6.6.1 Feb 5 2004 $
3/30 20:07:47 ** $CondorPlatform: I386-LINUX-RH72 $
3/30 20:07:47 ** PID = 2064
3/30 20:07:47 ******************************************************
3/30 20:07:47 Using config file: /var/home/condor/condor_config
3/30 20:07:47 Using local config files: /var/home/condor/condor_config.local
3/30 20:07:47 DaemonCore: Command Socket at <192.168.185.14:32941>
3/30 20:07:47 Done setting resource limits
3/30 20:07:47 Starter communicating with condor_shadow <192.168.185.96:36570>
3/30 20:07:47 Submitting machine is "bommel.bcc.local"
3/30 20:07:47 couldn't create dir /var/home/condor/execute/dir_2064: Permission
denied
3/30 20:07:47 Failed to initialize JobInfoCommunicator, aborting
3/30 20:07:47 Unable to start job.
3/30 20:07:47 **** condor_starter (condor_STARTER) EXITING WITH STATUS 1
[...]
Attachment:
signature.asc
Description: OpenPGP digital signature