Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] condor_status and condor_q disagree about state ofvm's
- Date: Fri, 20 Apr 2007 14:26:00 -0500
- From: Daniel Forrest <forrest@xxxxxxxxxxxxx>
- Subject: Re: [Condor-users] condor_status and condor_q disagree about state ofvm's
Bob,
> I've spent the last couple of days looking for an answer to this
> issue and searched the archives, but came up empty handed. If this
> has been addressed before please excuse the rehash.
>
> I've got a small pool of two SMP machines, both with dual dual-core
> Opteron processors. In the default configuration that's 8 vm's. I
> would expect that this would mean that I should never be able to
> have more than 8 jobs running in this pool at any given time, but
> I have been able to do just that.
>
> For (as of yet) undetermined reasons, the schedd will not recognize
> that a startd is running for on some vms. See below the (trimmed)
> results of a condor_status:
>
> Name OpSys Arch State Activity
>
> vm1@server-1 LINUX X86_64 Unclaimed Idle
> vm2@server-1 LINUX X86_64 Unclaimed Idle
> vm3@server-1 LINUX X86_64 Claimed Busy
> vm4@server-1 LINUX X86_64 Unclaimed Idle
> vm1@server-2 LINUX X86_64 Unclaimed Idle
> vm2@server-2 LINUX X86_64 Unclaimed Idle
> vm3@server-2 LINUX X86_64 Claimed Busy
> vm4@server-2 LINUX X86_64 Claimed Busy
>
> Now look at the (trimmed) results of a condor_q -running:
>
> ID HOST(S)
> 68.0 vm4@server-1
> 69.0 vm4@server-2
> 70.0 vm3@server-1
> 71.0 vm3@server-2
>
> notice that vm4 on server-1 is running a job, but shows up as
> Unclaimed/Idle. Does anyone have an explanation of why this might
> happen or what I can do to further debug the issue?
I have seen this type of behavior before. Check to be sure that there
is only one condor_startd process running on server-1. I have seen
cases where there are two condor_masters, each with a condor_startd,
and what you see in condor_status is the status of the condor_startd
that has most recently sent an update to your condor_collector.
--
Daniel K. Forrest Laboratory for Molecular and
forrest@xxxxxxxxxxxxx Computational Genomics
(608) 262 - 9479 University of Wisconsin, Madison