Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Condor 6.9.2 hung schedd
- Date: Wed, 13 Jun 2007 17:41:08 +0200
- From: Steffen Grunewald <steffen.grunewald@xxxxxxxxxx>
- Subject: Re: [Condor-users] Condor 6.9.2 hung schedd
On Mon, Jun 11, 2007 at 09:51:03AM -0500, Dan Bradley wrote:
>
> It is normal for the schedd to temporarily show up as the user id of one
> of the users with jobs in the queue, because the schedd switches user
> ids in order to do some operations on the user's behalf.
>
> However, it is not normal for the schedd to get stuck in this state. To
> find out what is going on, I would suggest using 'gdb' to see the schedd
> stack when it is in this state. Example:
>
> $ gdb -p <pid of schedd>
> (gdb) where
> ...
> (gdb) quit
(gdb) where
#0 0x00002b46f6b2b69a in fcntl () from /lib/libc.so.6
#1 0x000000000058ee53 in flock ()
#2 0x0000000000665261 in lock_file ()
#3 0x000000000060d7e9 in FileLock::obtain ()
#4 0x00000000005c6685 in UserLog::writeEvent ()
#5 0x00000000004d23b8 in Scheduler::WriteReleaseToUserLog ()
#6 0x00000000004d6568 in Scheduler::actOnJobs ()
#7 0x0000000000572889 in DaemonCore::HandleReq ()
#8 0x000000000056f732 in DaemonCore::HandleReq ()
#9 0x000000000056f197 in DaemonCore::Driver ()
#10 0x000000000057c4d9 in main ()
Does this give you more information?
If I now find out how to remove the bad guys from the queue (I cannot
while condor_schedd hangs, and if there are bad guys, condor_schedd will hang
immediately again)...
--
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html