Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] condor_status and condor_q disagree about state ofvm's
- Date: Fri, 20 Apr 2007 18:57:35 +0100
- From: "Kewley, J \(John\)" <j.kewley@xxxxxxxx>
- Subject: Re: [Condor-users] condor_status and condor_q disagree about state ofvm's
Have you tried using the -direct option to condor_status to get the
info from the node itself rather than from the central node?
BTW do you have a startd on your central node too? If so,
you should be careful, there may be security implications of that.
Cheers
JK
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx]On Behalf Of Bob Kinney
> Sent: Friday, April 20, 2007 6:50 PM
> To: condor-users@xxxxxxxxxxx
> Subject: [Condor-users] condor_status and condor_q disagree
> about state
> ofvm's
>
>
> Hi:
>
> I've spent the last couple of days looking for an answer to
> this issue
> and searched the archives, but came up empty handed. If this
> has been
> addressed before please excuse the rehash.
>
> I've got a small pool of two SMP machines, both with dual dual-core
> Opteron processors. In the default configuration that's 8 vm's. I
> would expect that this would mean that I should never be able to have
> more than 8 jobs running in this pool at any given time, but
> I have been
> able to do just that.
>
> For (as of yet) undetermined reasons, the schedd will not
> recognize that
> a startd is running for on some vms. See below the (trimmed)
> results of
> a condor_status:
>
> Name OpSys Arch State Activity
>
> vm1@server-1 LINUX X86_64 Unclaimed Idle
> vm2@server-1 LINUX X86_64 Unclaimed Idle
> vm3@server-1 LINUX X86_64 Claimed Busy
> vm4@server-1 LINUX X86_64 Unclaimed Idle
> vm1@server-2 LINUX X86_64 Unclaimed Idle
> vm2@server-2 LINUX X86_64 Unclaimed Idle
> vm3@server-2 LINUX X86_64 Claimed Busy
> vm4@server-2 LINUX X86_64 Claimed Busy
>
> Now look at the (trimmed) results of a condor_q -running:
>
> ID HOST(S)
> 68.0 vm4@server-1
> 69.0 vm4@server-2
> 70.0 vm3@server-1
> 71.0 vm3@server-2
>
> notice that vm4 on server-1 is running a job, but shows up as
> Unclaimed/Idle. Does anyone have an explanation of why this might
> happen or what I can do to further debug the issue?
>
> Some other information that might be relevant:
>
> * server-1 is the central manager for this pool and runs a schedd
> * jobs are remotely submitted from other hosts to the schedd
> on server-1
> * server-2 does not seem to have the same issue (i.e. condor_status
> always reports the correct results).
> * if other jobs are submitted to run on server-1 the vm's that will
> report Claimed/Busy will change (i.e. vm3 will be Idle, vm4
> will be Busy).
>
> Thanks in advance to any assistance anyone can offer.
>
> Regards,
> Bob
>
> --
> Earl (Bob) Kinney
> UNIX Systems Administrator
> Harvard-MIT Data Center
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to
> condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at either
> https://lists.cs.wisc.edu/archive/condor-users/
> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
>