Hi Steve, Thanks for replying. I tried that but didn't do quite well. Even if I delete the file or even I don't, running CONDOR_MASTER start condor nicely but still don't start automatically if I reboot. Anything else am I missing? Cheers, Santanu Steven Timm wrote: Remove that lock file in /tmp that is mentioned in the error message below, and condor will start. Steve ------------------------------------------------------------------ Steven C. Timm, Ph.D (630) 840-8525 timm@xxxxxxxx http://home.fnal.gov/~timm/ Fermilab Computing Division, Scientific Computing Facilities, Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader. On Sat, 10 Mar 2007, Santanu Das wrote:Hi, I'm still having the same problem - condor_master just doesn't start automatically after boot. Dose anybody know anything about it? Thanks in advance for your help. Cheers, Santanu Santanu Das wrote:Hi all, We have a ~150 CPU condor cluster; most of them are dual core Xeon and few of them are with single core Xeon. Recently I upgraded to condor-6.8.4 and since then I see a problem, mostly on the all dual core nodes. I start condor from the "rc.local" and the problem I see now Condor is not starting automatically on boot, in spite of having "condor_master" in the rc.local file. If I run condor_master by hand from the console, condor starts and every thing goes fine after that. For some reason, I run condor here as a different user (*NOT* as default "condor" user), but don't think that's the problem. CONDOR_IDS is correct in the local config file. There are no such significant difference (from the configuration point of view) among the nodes; all are almost identically configured (apart from that dual-core and single-core issue). I just see these in the MasterLog: 3/8 17:56:03 ****************************************************** 3/8 17:56:03 ** condor_master (CONDOR_MASTER) STARTING UP 3/8 17:56:03 ** /opt/condor-6.8.4/sbin/condor_master 3/8 17:56:03 ** $CondorVersion: 6.8.4 Feb 1 2007 $ 3/8 17:56:03 ** $CondorPlatform: I386-LINUX_RH9 $ 3/8 17:56:03 ** PID = 3216 3/8 17:56:03 ** Log last touched 3/8 17:56:02 3/8 17:56:03 ****************************************************** 3/8 17:56:03 Using config source: /opt/condor/etc/condor_config 3/8 17:56:03 Using local config sources: 3/8 17:56:03 /home/condorr/condor_config.local 3/8 17:56:03 FileLock::obtain(1) failed - errno 11 (Resource temporarily unavailable) 3/8 17:56:03 ERROR "Can't get lock on "/tmp/condor-lock.farm0420.21308906360446/InstanceLock"" at line 978 in file master.C 3/8 18:08:57 Got SIGTERM. Performing graceful shutdown. 3/8 18:08:57 SafeMsg: sending small msg failed. errno: 22 3/8 18:08:57 Send_Signal: ERROR sending signal 15 to pid 3181 3/8 18:08:57 ERROR: failed to send SIGTERM to pid 3181 3/8 18:08:57 The STARTD (pid 3181) exited with status 0 3/8 18:08:57 All daemons are gone. Exiting. 3/8 18:08:57 **** condor_master (condor_MASTER) EXITING WITH STATUS 0 3/8 18:12:11 passwd_cache::cache_uid(): getpwnam("condor") failed: Success 3/8 18:12:11 passwd_cache::cache_uid(): getpwnam("condor") failed: Success Any idea what might be the problem or what am I missing? Cheers, Santanu HEP, Cavendish Laboratory Cambridge_______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at either https://lists.cs.wisc.edu/archive/condor-users/ http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR_______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at either https://lists.cs.wisc.edu/archive/condor-users/ http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR |