Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] hawkeye on dual-processor nodes
- Date: Mon, 11 Dec 2006 10:47:12 -0500
- From: Junjun Mao <jmao@xxxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] hawkeye on dual-processor nodes
Hi all,
A serious problem just happened to my cluster, causing entire shutdown
of condor. The ownership of schedd was was changed to a regular user!!!
How could this happen?
Here is condor related jobs left on the master node which is the submit
machine.
[root@master1 y-61.1]# ps -ef | grep condor
pwang 26763 1 0 Nov18 ? 00:00:00 condor_shadow -f 886.0
<10.10.20.1:34661> -
pwang 26766 1 0 Nov18 ? 00:00:00 condor_shadow -f 886.2
<10.10.20.1:34661> -
pwang 26772 1 0 Nov18 ? 00:00:00 condor_shadow -f 886.1
<10.10.20.1:34661> -
pwang 29394 1 0 Nov18 ? 00:00:00 condor_shadow -f 886.4
<10.10.20.1:34661> -
condor 19319 1 0 Nov21 ?
00:34:54 /home2/condor/sbin/condor_master
condor 19320 19319 0 Nov21 ? 01:43:02 condor_collector -f
pwang 19393 19319 0 Dec09 ? 00:00:06 condor_schedd -f
condor 19401 19319 0 Dec09 ? 00:02:31 condor_negotiator -f
Restarting condor daemons still yealds wrong owner of schedd. I have to
move job_queue.log to another location to start condor correctly.
Can someone tell me where to look for the cause of the problem?
Junjun