Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] How did I get zombies?
- Date: Fri, 14 Oct 2005 14:21:24 -0700
- From: "Michael Yoder" <yoderm@xxxxxxxxxx>
- Subject: Re: [Condor-users] How did I get zombies?
> I have a cluster of 6.6.9 on W2k3. I have several jobs that were
running
> and we removed (condor_rm), but after removal they stayed as an 'X' in
the
> queue. An analysis of the queue said they were being removed. While
in
> this state, the node's they were on were stuck being claimed with idle
> status. After leaving it a week I did a condor_rm -forcex. Now that
> removed them from the queue, but the nodes are still claimed. Looking
in
> the schedd log I have this
>
> Zombie process has not been cleaned up by reaper - pid 1300
Could be a condor bug. Looks like the schedd is detecting the situation
- it knows there ought to be a zombie - but it isn't *doing* anything
about it. Interestingly, code to do something about it *used* to be
there, but is now commented out.
> How can I get the nodes unclaimed? Later I'll try to figure out how
I
> got into this problem.
Restart the schedd.
condor_restart -name <machine> -schedd
Make sure you're on a machine that has HOSTALLOW_ADMINISTRATOR privs.
Mike Yoder
Principal Member of Technical Staff
Ask Mike: http://docs.optena.com
Direct : +1.408.321.9000
Fax : +1.408.321.9030
Mobile : +1.408.497.7597
yoderm@xxxxxxxxxx
Optena Corporation
2860 Zanker Road, Suite 201
San Jose, CA 95134
http://www.optena.com