I am hoping there is a way to assign memory to slots rather than be limited by evenly distributing the RAM between all cores.
The processes we run require a GPU and need as much as 30GB. Our current systems only have 1 GPU but have up to 8 cores available. When I start Condor with the default config I see
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
slot1@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.080 4038 0+00:14:27
slot2@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 4038 0+00:14:48
slot3@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 4038 0+00:14:49
slot4@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 4038 0+00:14:50
slot5@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 4038 0+00:10:08
slot6@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 4038 0+00:10:09
slot7@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 4038 0+00:10:10
slot8@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 4038 0+00:14:46
Total Owner Claimed Unclaimed Matched Preempting Backfill
X86_64/LINUX 8 0 0 8 0 0 0
Total 8 0 0 8 0 0 0
I would like to either:
- Assign most if not all memory to a single slot
- or remove the other slots from the available pool
I have added the following config:
## GPU Config stuff
SLOT1_HAS_GPU=TRUE
SLOT1_GPU_DEV=0
SLOT2_HAS_GPU=FALSE
SLOT3_HAS_GPU=FALSE
SLOT4_HAS_GPU=FALSE
SLOT5_HAS_GPU=FALSE
SLOT6_HAS_GPU=FALSE
SLOT7_HAS_GPU=FALSE
SLOT8_HAS_GPU=FALSE
STARTD_ATTRS=HAS_GPU,GPU_DEV
to the condor_config.local.
Eventually I want to take proper advantage of HTCondor and have a pool of our machines but I am having problems getting the authorization/authentication working.
I am not a trained admin so I would appreciate instructions or advice that is as explicit as possible.
Regards,
Hugh