OK, the problem a bit more detailed:
I'm using this version:
[root@lxbrb1815 ~]# condor_version
$CondorVersion: 8.1.2 Oct 19 2013 BuildID: 189797 $
$CondorPlatform: x86_64_RedHat5 $
Here's a snippet from condor_status -master output:
[root@condormaster1 ~]# condor_status -master
Name
condormaster1
condormaster2
condorworker02
lxbrb1815.domain.tld
...
I have physical nodes and VMs as startd nodes. Physical nodes have more than one core, so more than one jobslots, while VMs have only one core.
Here's a snippet from condor_status -startd | head:
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
condorworker02 LINUX X86_64 Claimed Busy 0.000 490 0+00:03:13
slot1@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.060 1991 0+00:11:51
slot2@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:13
...
As you can see, condorworker02 is a VM, while lxbrb1815.domain.tld is a physical node with a lot of cores. And that's the only difference. The config file is exactly the same for both cases, and the condor version as well.
Now, my questions:
- Why I see the slotID@xxxxxxxxxxxxxxxxxxxxxx in case of physical nodes and just the hostname in case of VMs?
- Why can't I query the status of a VM but it's working in case of a physical node:
[root@condormaster1 ~]# condor_status -startd lxbrb1815
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
slot1@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.060 1991 0+00:11:51
slot2@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:13
slot3@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:14
slot4@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:15
slot5@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:16
slot6@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:17
slot7@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:18
slot8@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:11
Total Owner Claimed Unclaimed Matched Preempting Backfill
X86_64/LINUX 8 0 0 8 0 0 0
Total 8 0 0 8 0 0 0
[root@condormaster1 ~]# condor_status -startd lxbrb1815.domain.tld
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
slot1@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.060 1991 0+00:11:51
slot2@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:13
slot3@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:14
slot4@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:15
slot5@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:16
slot6@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:17
slot7@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:18
slot8@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1991 0+00:12:11
Total Owner Claimed Unclaimed Matched Preempting Backfill
X86_64/LINUX 8 0 0 8 0 0 0
Total 8 0 0 8 0 0 0
[root@condormaster1 ~]# condor_status -startd condorworker02
[root@condormaster1 ~]# condor_status -startd condorworker02.domain.tld
[root@condormaster1 ~]#
- Why can't I send condor_off command to VMs but it's working fine in case of physical nodes:
[root@condormaster1 ~]# condor_off -startd lxbrb1815
Sent "Kill-Daemon" command for "startd" to master lxbrb1815.domain.tld
[root@condormaster1 ~]# condor_off -startd condorworker02
Can't find address for master condorworker02.domain.tld
Perhaps you need to query another pool.
Thanks,
Daniel