I am testing a condor configuration with partition-able slots and a defrag daemon. The slots on my worker nodes are configure like this:
Each worker node has 2 CPUs, 4Gbytes of RAM, and 40GB or disk space. To these workers I submitted jobs with the following requirements:
slot1@server-1e433 LINUX X86_64 Unclaimed Idle 0.000 2778 0+02:49:38
slot1_1@server-1e4 LINUX X86_64 Claimed Busy 0.000 3072 0+00:00:38
slot1@server-38fcb LINUX X86_64 Unclaimed Idle 0.000 2778 0+02:49:44
slot1_1@server-38f LINUX X86_64 Claimed Busy 0.000 3072 0+00:01:47
slot1@server-4fcf0 LINUX X86_64 Unclaimed Idle 0.000 2778 0+02:49:43
slot1_1@server-4fc LINUX X86_64 Claimed Busy 0.000 3072 0+00:00:47
slot1@server-51c6b LINUX X86_64 Unclaimed Idle 0.010 2778 0+02:49:36
slot1_1@server-51c LINUX X86_64 Claimed Busy 0.000 3072 0+00:01:15
slot1@server-5f5ae LINUX X86_64 Unclaimed Idle 0.000 2778 0+02:54:48
slot1_1@server-5f5 LINUX X86_64 Claimed Busy 0.000 3072 0+00:06:26
...
Clearly these may be served by the existing slots, but it would me more efficient if the existing slots were removed and instead I would end up with two slots per machine. This does not seem to happen, here is the log of my defrag daemon:
04/02/14 11:24:00 ******************************************************
04/02/14 11:24:00 ** condor_defrag (CONDOR_DEFRAG) STARTING UP
04/02/14 11:24:00 ** /usr/libexec/condor/condor_defrag
04/02/14 11:24:00 ** SubsystemInfo: name=DEFRAG type=DAEMON(12) class=DAEMON(1)
04/02/14 11:24:00 ** Configuration: subsystem:DEFRAG local:<NONE> class:DAEMON
04/02/14 11:24:00 ** $CondorVersion: 8.0.6 Feb 01 2014 BuildID: 225363 $
04/02/14 11:24:00 ** $CondorPlatform: x86_64_RedHat6 $
04/02/14 11:24:00 ** PID = 17334
04/02/14 11:24:00 ** Log last touched 4/2 11:21:43
04/02/14 11:24:00 ******************************************************
04/02/14 11:24:00 Using config source: /etc/condor/condor_config
04/02/14 11:24:00 Using local config sources:
04/02/14 11:24:00 /etc/condor/config.d/defrag
04/02/14 11:24:00 /etc/condor/config.d/partition
04/02/14 11:24:00 /etc/condor/config.d/ports
04/02/14 11:24:00 /etc/condor/config.d/scaling
04/02/14 11:24:00 /etc/condor/config.d/soap
04/02/14 11:24:00 /etc/condor/condor_config.local
04/02/14 11:24:00 Daemon Log is logging: D_ALWAYS D_ERROR
04/02/14 11:24:00 DaemonCore: command socket at <myip:40438>
04/02/14 11:24:00 DaemonCore: private command socket at <myip:40438>
04/02/14 11:24:00 State file /var/lock/condor/defrag_state does not yet exist.
04/02/14 11:24:00 Will evaluate defragmentation policy every DEFRAG_INTERVAL=300 seconds.
04/02/14 11:24:00 polling interval 300s, DEFRAG_DRAINING_MACHINES_PER_HOUR = 10.000000/hour = 0/interval + 10/hour + 0/day
04/02/14 11:24:00 There are currently 0 draining and 16 whole machines.
04/02/14 11:24:00 Average pool draining badput = 0.00%
04/02/14 11:24:00 Average pool draining unclaimed = 0.00%
04/02/14 11:24:00 Looking for 7 machines to drain.
04/02/14 11:24:00 Drained 0 machines (wanted to drain 7 machines).
04/02/14 11:29:00 There are currently 0 draining and 16 whole machines.
04/02/14 11:29:00 Average pool draining badput = 0.00%
04/02/14 11:29:00 Average pool draining unclaimed = 0.00%
04/02/14 11:29:00 Doing nothing, because number to drain in next 300s is calculated to be 0.
04/02/14 11:34:01 There are currently 0 draining and 16 whole machines.
04/02/14 11:34:01 Average pool draining badput = 0.00%
04/02/14 11:34:01 Average pool draining unclaimed = 0.00%
04/02/14 11:34:01 Doing nothing, because number to drain in next 300s is calculated to be 0.
04/02/14 11:39:01 There are currently 0 draining and 16 whole machines.
04/02/14 11:39:01 Average pool draining badput = 0.00%
04/02/14 11:39:01 Average pool draining unclaimed = 0.00%
04/02/14 11:39:01 Doing nothing, because number to drain in next 300s is calculated to be 0.
04/02/14 11:44:01 There are currently 0 draining and 16 whole machines.
04/02/14 11:44:01 Average pool draining badput = 0.00%
04/02/14 11:44:01 Average pool draining unclaimed = 0.00%
04/02/14 11:44:01 Doing nothing, because number to drain in next 300s is calculated to be 0.
Am I doing something wrong here? Do the slots need to be Idle for the defrag to happen?