OK, the first problem is still
there, but the second problem was a result of no CONDOR_HOST line in
the "master" condor_config file.  I am surprised that wasn't added by
the condor_install script.  Once I added that (and allowed the default
COLLECTOR_HOST = CONDOR_HOST to be set properly), my condor_status
works alright. 
 
I don't think I understand very well how the condor_install command
works.  I understood something like: 
 
cd /se/app/shared/condor 
./condor_install --type=execute --local-dir=/osg-local/condor 
 
would setup all my necessary *local* condor directories for the given
host.  I could then go and repeat that on several hosts, all of which
are using the same "site" install of condor (with the "master" config
file in condor/etc/condor_config), and provided CONDOR_CONFIG pointed to the master config that in turn pointed
to a consistent directory for the local config file, then those local
settings would override the "site" settings in the "master" config
file.  In fact, I discovered that condor_install only worked the first
time I executed it, and furthermore it did unexpected things like used
the --type setting to update the "master" config file, rather than the
local file. 
 
Cheers, 
 
Ian 
 
 
Ian Stokes-Rees wrote:
On an execute node, I can run
condor_master no problem from the command line, but my init script
condor.boot generates an error.  Below is a trace. 
   
# shows that CONDOR_CONFIG is set and points to a file which exists and
is not empty 
[root@mackenzie condor]# ls -Fla $CONDOR_CONFIG 
-rw-r--r-- 1 root root 93644 Mar 20  2008
/se/app/shared/condor-7.0.1/etc/condor_config 
   
# shows failed startup script 
[root@mackenzie condor]# service condor start 
Starting up Condor 
   
Neither the environment variable CONDOR_CONFIG, 
/etc/condor/, nor ~condor/ contain a condor_config source. 
Either set CONDOR_CONFIG to point to a valid config source, 
or put a "condor_config" file in /etc/condor or ~condor/ 
Exiting. 
   
# shows that condor_master from the command line works 
[root@mackenzie sbin]# ./condor_master 
   
[root@nahanni sbin]# ps -ef | grep condor 
condor    5990     1  0 17:38 ?        00:00:00 ./condor_master 
condor    5991  5990 82 17:38 ?        00:00:02 condor_startd -f 
   
  On the "head" node,
when I run condor_status I get an error that the collector cannot be
found, even though it is running. 
   
[root@abitibi sbin]# condor_status  
Error:  Could not fetch ads --- can't find collector 
   
[root@abitibi sbin]# ps -ef | grep condor 
condor   28500     1  1 17:45 ?        00:00:00 ./condor_master 
condor   28501 28500  0 17:45 ?        00:00:00 condor_collector -f 
condor   28503 28500  1 17:45 ?        00:00:00 condor_negotiator -f 
condor   28504 28500  1 17:45 ?        00:00:00 condor_schedd -f 
condor   28505 28500 86 17:45 ?        00:00:01 condor_startd -f 
root     28506 28504  1 17:45 ?        00:00:00 condor_procd -A
/tmp/condor-lock.abitibi0.0513363986547155/procd_pipe.SCHEDD -S 60 -C
9422 
   
Any hints as to what might be going wrong would be greatly
appreciated.  It seems like very strange behavior. 
  -- 
Ian Stokes-Rees                            W: http://sbgrid.org
ijstokes@xxxxxxxxxxxxxxxxxxx               T: +1 617 418-4168
SBGrid, Harvard Medical School             F: +1 617 432-5600
   
  
 
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/
   
 
 
-- 
Ian Stokes-Rees                            W: http://sbgrid.org
ijstokes@xxxxxxxxxxxxxxxxxxx               T: +1 617 418-4168
SBGrid, Harvard Medical School             F: +1 617 432-5600
 
 |