OK, the first problem is still
there, but the second problem was a result of no CONDOR_HOST line in
the "master" condor_config file. I am surprised that wasn't added by
the condor_install script. Once I added that (and allowed the default
COLLECTOR_HOST = CONDOR_HOST to be set properly), my condor_status
works alright.
I don't think I understand very well how the condor_install command
works. I understood something like:
cd /se/app/shared/condor
./condor_install --type=execute --local-dir=/osg-local/condor
would setup all my necessary *local* condor directories for the given
host. I could then go and repeat that on several hosts, all of which
are using the same "site" install of condor (with the "master" config
file in condor/etc/condor_config), and provided CONDOR_CONFIG pointed to the master config that in turn pointed
to a consistent directory for the local config file, then those local
settings would override the "site" settings in the "master" config
file. In fact, I discovered that condor_install only worked the first
time I executed it, and furthermore it did unexpected things like used
the --type setting to update the "master" config file, rather than the
local file.
Cheers,
Ian
Ian Stokes-Rees wrote:
On an execute node, I can run
condor_master no problem from the command line, but my init script
condor.boot generates an error. Below is a trace.
# shows that CONDOR_CONFIG is set and points to a file which exists and
is not empty
[root@mackenzie condor]# ls -Fla $CONDOR_CONFIG
-rw-r--r-- 1 root root 93644 Mar 20 2008
/se/app/shared/condor-7.0.1/etc/condor_config
# shows failed startup script
[root@mackenzie condor]# service condor start
Starting up Condor
Neither the environment variable CONDOR_CONFIG,
/etc/condor/, nor ~condor/ contain a condor_config source.
Either set CONDOR_CONFIG to point to a valid config source,
or put a "condor_config" file in /etc/condor or ~condor/
Exiting.
# shows that condor_master from the command line works
[root@mackenzie sbin]# ./condor_master
[root@nahanni sbin]# ps -ef | grep condor
condor 5990 1 0 17:38 ? 00:00:00 ./condor_master
condor 5991 5990 82 17:38 ? 00:00:02 condor_startd -f
On the "head" node,
when I run condor_status I get an error that the collector cannot be
found, even though it is running.
[root@abitibi sbin]# condor_status
Error: Could not fetch ads --- can't find collector
[root@abitibi sbin]# ps -ef | grep condor
condor 28500 1 1 17:45 ? 00:00:00 ./condor_master
condor 28501 28500 0 17:45 ? 00:00:00 condor_collector -f
condor 28503 28500 1 17:45 ? 00:00:00 condor_negotiator -f
condor 28504 28500 1 17:45 ? 00:00:00 condor_schedd -f
condor 28505 28500 86 17:45 ? 00:00:01 condor_startd -f
root 28506 28504 1 17:45 ? 00:00:00 condor_procd -A
/tmp/condor-lock.abitibi0.0513363986547155/procd_pipe.SCHEDD -S 60 -C
9422
Any hints as to what might be going wrong would be greatly
appreciated. It seems like very strange behavior.
--
Ian Stokes-Rees W: http://sbgrid.org
ijstokes@xxxxxxxxxxxxxxxxxxx T: +1 617 418-4168
SBGrid, Harvard Medical School F: +1 617 432-5600
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
--
Ian Stokes-Rees W: http://sbgrid.org
ijstokes@xxxxxxxxxxxxxxxxxxx T: +1 617 418-4168
SBGrid, Harvard Medical School F: +1 617 432-5600
|