[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_off broken?

Date: Tue, 26 Nov 2013 18:51:41 +0100
From: Pek Daniel <pekdaniel@xxxxxxxxx>
Subject: Re: [HTCondor-users] condor_off broken?

OK, the problem a bit more detailed:

I'm using this version:

[root@lxbrb1815 ~]# condor_version

$CondorVersion: 8.1.2 Oct 19 2013 BuildID: 189797 $

$CondorPlatform: x86_64_RedHat5 $

Here's a snippet from condor_status -master output:

[root@condormaster1 ~]# condor_status -master

Name

condormaster1

condormaster2

condorworker02

lxbrb1815.domain.tld

...

I have physical nodes and VMs as startd nodes. Physical nodes have more than one core, so more than one jobslots, while VMs have only one core.

Here's a snippet from condor_status -startd | head:

Name OpSys Arch State Activity LoadAv Mem ActvtyTime

condorworker02 LINUX X86_64 Claimed Busy 0.000 490 0+00:03:13

slot1@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.060 1991 0+00:11:51

slot2@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:13

...

As you can see, condorworker02 is a VM, while lxbrb1815.domain.tld is a physical node with a lot of cores. And that's the only difference. The config file is exactly the same for both cases, and the condor version as well.

Now, my questions:

- Why I see the slotID@xxxxxxxxxxxxxxxxxxxxxx in case of physical nodes and just the hostname in case of VMs?

- Why can't I query the status of a VM but it's working in case of a physical node:

[root@condormaster1 ~]# condor_status -startd lxbrb1815

Name OpSys Arch State Activity LoadAv Mem ActvtyTime

slot1@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.060 1991 0+00:11:51

slot2@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:13

slot3@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:14

slot4@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:15

slot5@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:16

slot6@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:17

slot7@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:18

slot8@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:11

Total Owner Claimed Unclaimed Matched Preempting Backfill

X86_64/LINUX 8 0 0 8 0 0 0

Total 8 0 0 8 0 0 0

[root@condormaster1 ~]# condor_status -startd lxbrb1815.domain.tld

Name OpSys Arch State Activity LoadAv Mem ActvtyTime

slot1@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.060 1991 0+00:11:51

slot2@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:13

slot3@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:14

slot4@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:15

slot5@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:16

slot6@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:17

slot7@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:18

slot8@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:11

Total Owner Claimed Unclaimed Matched Preempting Backfill

X86_64/LINUX 8 0 0 8 0 0 0

Total 8 0 0 8 0 0 0

[root@condormaster1 ~]# condor_status -startd condorworker02

[root@condormaster1 ~]# condor_status -startd condorworker02.domain.tld

[root@condormaster1 ~]#

- Why can't I send condor_off command to VMs but it's working fine in case of physical nodes:

[root@condormaster1 ~]# condor_off -startd lxbrb1815

Sent "Kill-Daemon" command for "startd" to master lxbrb1815.domain.tld

[root@condormaster1 ~]# condor_off -startd condorworker02

Can't find address for master condorworker02.domain.tld

Perhaps you need to query another pool.

Thanks,

Daniel

2013/11/26 Zachary Miller <zmiller@xxxxxxxxxxx>

On Tue, Nov 26, 2013 at 11:37:48AM +0100, Pek Daniel wrote:
> I'm trying to "deactivate" some startd machines:
> [root@cm1 ~]# condor_status
> Name OpSys Arch State Activity LoadAv Mem ActvtyTime
>
> condorworker01 LINUX X86_64 Unclaimed Idle 0.000 2006 5+16:16:41
> condorworker03 LINUX X86_64 Unclaimed Idle 0.000 490 0+00:21:47
> slot1@lxbrl2305 LINUX X86_64 Unclaimed Idle 1.000 1991 4+18:20:46
> slot2@lxbrl2305 LINUX X86_64 Unclaimed Idle 1.000 1991 4+18:21:07
> slot3@lxbrl2305 LINUX X86_64 Unclaimed Idle 1.000 1991 4+18:21:08
> slot4@lxbrl2305 LINUX X86_64 Unclaimed Idle 1.000 1991 4+18:21:09
> slot5@lxbrl2305 LINUX X86_64 Unclaimed Idle 1.000 1991 4+18:21:10
> slot6@lxbrl2305 LINUX X86_64 Unclaimed Idle 0.960 1991 4+18:21:11
> slot7@lxbrl2305 LINUX X86_64 Unclaimed Idle 0.000 1991 4+18:21:12
> slot8@lxbrl2305 LINUX X86_64 Unclaimed Idle 0.000 1991 4+18:21:05
> Total Owner Claimed Unclaimed Matched Preempting Backfill
>
> X86_64/LINUX 10 0 0 10 0 0 0
>
> Total 10 0 0 10 0 0 0
>
> [root@condormaster1 ~]# condor_off -startd -graceful condorworker01
> Can't find address for master condorworker01

Hmmm. What does "condor_status -master" have to say?

Cheers,
-zach

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Follow-Ups:
- Re: [HTCondor-users] condor_off broken?
  - From: Pek Daniel

References:
- [HTCondor-users] condor_off broken?
  - From: Pek Daniel
- Re: [HTCondor-users] condor_off broken?
  - From: Zachary Miller

Prev by Date: Re: [HTCondor-users] condor_off broken?
Next by Date: Re: [HTCondor-users] largest installation
Previous by thread: Re: [HTCondor-users] condor_off broken?
Next by thread: Re: [HTCondor-users] condor_off broken?
Index(es):
- Date
- Thread