Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] configuring a GPU machine
- Date: Fri, 12 Jul 2013 12:01:58 +0200
- From: Tobias Beisel <tbeisel@xxxxxxxxxxxxxxxx>
- Subject: Re: [HTCondor-users] configuring a GPU machine
Hi everybody,
after finding another example on the users list, I tried the following
SLOT_TYPE_1 = cpus=100%,auto
SLOT_TYPE_1_PARTITIONABLE = TRUE
NUM_SLOTS_TYPE_1 = 1
MACHINE_RESOURCE_NAMES = GPUS
MACHINE_RESOURCE_GPUS = 4
Unfortunately this only shows one slot:
slot1@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.090 48295 0+00:00:04
Isn't there anybody who has a multi-GPU multi-CPU system running with condor and could provide me with a config-file example?
Best regards,
Tobias
> Hi Eddie,
>
> Thank you for your advice.
>
> Yes, I also tried both the static and the automatic configuration. For the latter I tried the output of the (not officially supported) condorgpu project. In both cases only CPU slots were shown.
>
> SLOT1_HAS_GPU=TRUE
> SLOT1_GPU_DEV=0
> ...
> SLOT4_HAS_GPU=TRUE
> SLOT4_GPU_DEV=3
> STARTD_ATTRS=HAS_GPU,GPU_DEV
>
> Output:
> slot1@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.100 3018 0+00:00:04
> ...
> slot16@xxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 3018 0+00:00:23
>
>
> I also tried a configuration that I found on the users list that actually configured the same hardware combination (4 GPUs, 8 CPUs):
>
> NUM_CPUS = 8
>
> NUM_GPUS = 4
> HasGpus = TRUE
>
> START = (((SlotId < 5) && $(SLOT1_START)) || ((SlotId > 4) && $(SLOT2_START))) || FALSE
>
> SUSPEND = False
> CONTINUE = True
> PREEMPT = False
> KILL = False
> WANT_SUSPEND = False
> WANT_VACATE = False
>
> SLOT1_START = (TARGET.NeedGpu =?= TRUE)
> SLOT2_START = (TARGET.NeedGpu =?= FALSE)
>
> This again only shows the CPUs (8 in this case).
>
> slot1@xxxxxxxxxxxx LINUX X86_64 Owner Idle 0.080 6036 0+00:05:04
> ...
> slot8@xxxxxxxxxxxx LINUX X86_64 Owner Idle 0.000 6036 0+00:05:03
>
>
>
> Btw., the configuration mentioned in my previous mail shows the following status:
>
> slot1@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.030 12073 0+00:00:04
> ...
> slot4@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 12073 0+00:00:07
>
>
> So, currently I either can define slots for the GPUs or the CPUs, not both at the same time and also not the combined approach as intended.
>
> Regards,
> Tobias
>
>
>> Hi Tobias,
>>
>> Did you see this in the recipes section on the wiki?
>>
>> https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToManageGpus
>>
>> I am also a greenhorn but I am about to head down this path (have a couple of servers with GPU's I would like find a better way to advertise and utlize. Currently I am basically using the machine name to target the gpu machines and there is no contention.
>>
>> Eddie
>>
>> -----Original Message-----
>> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Tobias Beisel
>> Sent: Tuesday, July 09, 2013 11:11 AM
>> To: htcondor-users@xxxxxxxxxxx
>> Subject: [HTCondor-users] configuring a GPU machine
>>
>> Hi,
>>
>> I am new to condor and have problems configuring my machine.
>>
>> I'm using HTCondor V8.0.0 on a Ubuntu 12.04 machine with 16 CPUs (8 Cores with Hyperthreading) and 4 NVIDIA Tesla C2070 GPUs. I would like to configure condor to 1. use each GPU combined with 1 CPU as a slot and 2. each 4 of the remaining 12 CPU as a single slot.
>>
>> I managed to provide the slots for GPUs using the following configuration:
>>
>> MACHINE_RESOURCE_gpu = 4
>> MACHINE_RESOURCE_actuator = 20
>>
>> SLOT_TYPE_1 = gpu=1, cpu=1, actuator=1
>> NUM_SLOTS_TYPE_1 = 4
>>
>> condor_status shows these slots correctly.
>>
>> Unfortunately I can not get the remaining CPUs to be configured as slots. The following does not show any slots:
>>
>> SLOT_TYPE_2 = cpu=1, actuator=1
>> NUM_SLOTS_TYPE_2 = 12
>>
>> or
>>
>> SLOT_TYPE_2 = cpu=4, actuator=1
>> NUM_SLOTS_TYPE_2 = 3
>>
>> I tried several other configurations I found from examples, but in best case could manage one slot type to be shown.
>>
>> What would I need to change to make it work?
>>
>>
>> Assuming the above would work, I'd have two more questions on how to create job submission files:
>>
>> 1. As configured, the above mentioned GPU slots show 'Arch x64_64' and so would the CPU slots. How can I choose a different executable based on the provided architecture then, as proposed in chapter 2.5.6 (heterogeneous submit) by using the $$(Arch) macro?
>> 2. Is it also possible to choose different arguments to the executables based on the provided 'Arch'? This would allow to choose the executed code within a single application binary, i.e., figuratively using a 'fat' binary.
>>
>>
>> Thank you for your help,
>> Tobias
>>
>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>
>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/