[HTCondor-users] Cluster utilization is low for mixed memory intensive jobs

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Hello,

I have tons of single-core jobs to run with different memory request. As a preliminary work, I set up HTCondor on a single machine using a dynamic slot setup. My goal is simple: keep cluster fully utilized.

Initially, it seems that all memory of the target machine is allocated. However, after running several rounds, the unclaimed slot1 starts containing more and more memory, and only few jobs are run in parallel, with a lot of jobs in the queue as idle state as follows. Those jobs memory request is between 500MB â 1600MB. So I am pretty sure the cluster should have run more jobs.

$condor_status

Name OpSys Arch State Activity LoadAv Mem ActvtyTime

slot1@DummyServer LINUX X86_64 Unclaimed Idle 0.000 84510 0+00:28:17

slot1_1@DummyServer LINUX X86_64 Claimed Busy 0.000 1000 0+00:00:01

slot1_2@DummyServer LINUX X86_64 Claimed Busy 0.000 1000 0+00:00:01

slot1_3@DummyServer LINUX X86_64 Claimed Busy 0.020 1000 0+00:00:04

slot1_4@DummyServer LINUX X86_64 Claimed Busy 0.010 1000 0+00:00:03

slot1_5@DummyServer LINUX X86_64 Claimed Busy 0.020 1000 0+00:00:05

slot1_6@DummyServer LINUX X86_64 Claimed Busy 0.010 1000 0+00:00:03

slot1_7@DummyServer LINUX X86_64 Claimed Busy 0.020 1000 0+00:00:02

slot1_8@DummyServer LINUX X86_64 Claimed Busy 0.000 1000 0+00:00:18

Total Owner Claimed Unclaimed Matched Preempting Backfill Drain

X86_64/LINUX 9 0 8 1 0 0 0 0

Total 9 0 8 1 0 0 0 0

$condor_q

OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS

ââ

Total for query: 2138 jobs; 0 completed, 0 removed, 2130 idle, 8 running, 0 held, 0 suspended

Total for all users: 2138 jobs; 0 completed, 0 removed, 2130 idle, 8 running, 0 held, 0 suspended

I havenât done any fancy setup in condor_config.local. Iâve tried both:

DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD

NUM_SLOTS = 1

NUM_SLOTS_TYPE_1 = 1

SLOT_TYPE_1 = cpus=100%

SLOT_TYPE_1_PARTITIONABLE = true

and

NUM_SLOTS = 1

NUM_SLOTS_TYPE_1 = 1

SLOT_TYPE_1 = cpus=100%

SLOT_TYPE_1_PARTITIONABLE = true

CLAIM_WORKLIFE =0

The cluster utilization is similar low in both setup scenario. For CLAIM_WORKLIEF = 0, I thought after each job completes, the corresponding claimed slot would be returned back to the original unclaimed slot1 so that once a new job showing up, a new slot is created with relative memory allocation. Again, my job workload is mixed, and I donât think to keep a specific amount of static slots can meet my specification.

Here is my sample submission file,

executable = xxx.sh

should_transfer_files = NO

request_cpus = 1

request_memory = 1600

log = xxx.log

output = xxx.txt

Queue

And here is my condor version:

$CondorVersion: 8.8.3 May 26 2019 BuildID: 470254 $

$CondorPlatform: x86_64_Ubuntu18 $

Any comments and suggestions are appreciated.

Best,

Shunxing

Mailing List Archives

Authenticated access

[HTCondor-users] Cluster utilization is low for mixed memory intensive jobs