Hi, Before trying to upgrade Condor 6.6.10 to Condor 6.7 series,
I was trying to test it with BirdBath. So this pool has two machines the central manager (job
submitting host) and the execution node. I followed the configuration for BirdBath and got the Schedd
& Collector WSDL files and put it in the web directory, then configured Schedd
to port 9600. The jobs are running forever and getting shadow exception. Job
is a simple shell script. Here is the log msgs, Job Submitting Host *********************** ShadowLog ************* 6/6 16:54:08 (10.0) (27048): JobLeaseDuration remaining: 930 6/6 16:54:08 (10.0) (27048): Scheduling another attempt to
reconnect in 128 seconds 6/6 16:56:16 (10.0) (27048): Attempting to reconnect to
starter <x.x.x.x:9607> 6/6 16:56:16 (10.0) (27048): getpeername failed so connect
must have failed 6/6 16:56:46 (10.0) (27048): Connect failed for 30 seconds;
returning FALSE 6/6 16:56:46 (10.0) (27048): Attempt to reconnect failed:
Failed to connect to starter <x.x.x.x:9607> 6/6 16:56:46 (10.0) (27048): JobLeaseDuration remaining: 772 6/6 16:56:46 (10.0) (27048): Scheduling another attempt to
reconnect in 256 seconds 6/6 17:01:02 (10.0) (27048): Attempting to reconnect to
starter <x.x.x.x:9607> 6/6 17:01:02 (10.0) (27048): getpeername failed so connect
must have failed 6/6 17:01:32 (10.0) (27048): Connect failed for 30 seconds;
returning FALSE 6/6 17:01:32 (10.0) (27048): Attempt to reconnect failed:
Failed to connect to starter <x.x.x.x:9607> 6/6 17:01:32 (10.0) (27048): JobLeaseDuration remaining: 486 6/6 17:01:32 (10.0) (27048): Scheduling another attempt to
reconnect in 300 seconds Execution Node ****************** StarterLog ************ 6/6 16:49:38 ** Log last touched 6/6 16:19:38 6/6 16:49:38
****************************************************** 6/6 16:49:38 Using config file: /home/condor/condor_config 6/6 16:49:38 Using local config files:
/home/condor/condor_config.local 6/6 16:49:38 DaemonCore: Command Socket at <x.x.x.x:9607> 6/6 16:49:38 Done setting resource limits 6/6 16:49:38 Communicating with shadow <x.x.x.x:9619> 6/6 16:49:38 Submitting machine is "machine.name.edu" 6/6 16:49:38 couldn't create dir /home/condor/execute/dir_25145:
Permission denied 6/6 16:49:38 Failed to initialize JobInfoCommunicator,
aborting 6/6 16:49:38 Unable to start job. 6/6 16:49:38 **** condor_starter (condor_STARTER) EXITING
WITH STATUS 1 Both central manager and execution node services are started
as root. I don’t know why it says permission denied. Could you please let me know what is going on? And how to
make it run. Thanks, Senthil |