[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] starter process exits
- Date: Wed, 19 Jan 2005 12:58:11 -0800
- From: Tim Robertson <timr@xxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] starter process exits
The processes are
started as root but run as the user condor. The execute nodes get their
/home/condor served up by NFS and the dirs are auto-mounted. My global
condor_config is in /opt/condor/etc/condor_config and there is a
symlink
from /home/condor/condor_config to this file.
It sounds like you may have a problem in your config setup -- you
realize that there are two config files, right?
The main config file is in /opt/condor/etc, and the other in
/opt/condor/local.X, where X is the name of your machine. You're
supposed to set the CONDOR_CONFIG environment variable to point to the
location of the main config file (the one in /opt/condor/etc), which,
in turn, has a macro pointing to the local config file. By default,
you need both for condor to start properly.
I'm seeing some strange behavior both when I start up condor_master and
when I submit jobs to the pool. In the case of condor_master, if I
start
this process without first doing an 'ls /home/condor' it dies with a
complaint about not having CONDOR_CONFIG set, not being able to find
/etc/condor/condor_config, or not being able to find
/local/condor/condor_config. The complaint also mentions not finding
~/condor. When I trace the condor_master with strace, however, it
doesn't
look like an open() attempt is ever made on ~/condor_config.
Eventhough
df shows /home/condor as already mounted, if I 'ls /home/condor',
however,
it succeeds in checking for and finding this directory. It seems
there is
some reason condor is not even attempting to open
/home/condor/condor_config.
I don't know why condor would start properly once you've listed the
/home/condor directory. If your CONDOR_CONFIG variable is set
correctly, and your local config file is in the right place, condor
should start.
If you have everything correctly set up, perhaps it's a problem with
your NFS automounting configuration -- I've seen similar problems
before, but never on my own systems.
Best,
Tim