Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] configuring a GPU machine
- Date: Wed, 10 Jul 2013 21:01:24 +0200
- From: Tobias Beisel <tbeisel@xxxxxxxxxxxxxxxx>
- Subject: Re: [HTCondor-users] configuring a GPU machine
Hi Eddie,
Thank you for your advice.
Yes, I also tried both the static and the automatic configuration. For the latter I tried the output of the (not officially supported) condorgpu project. In both cases only CPU slots were shown.
SLOT1_HAS_GPU=TRUE
SLOT1_GPU_DEV=0
...
SLOT4_HAS_GPU=TRUE
SLOT4_GPU_DEV=3
STARTD_ATTRS=HAS_GPU,GPU_DEV
Output:
slot1@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.100 3018 0+00:00:04
...
slot16@xxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 3018 0+00:00:23
I also tried a configuration that I found on the users list that actually configured the same hardware combination (4 GPUs, 8 CPUs):
NUM_CPUS = 8
NUM_GPUS = 4
HasGpus = TRUE
START = (((SlotId < 5) && $(SLOT1_START)) || ((SlotId > 4) && $(SLOT2_START))) || FALSE
SUSPEND = False
CONTINUE = True
PREEMPT = False
KILL = False
WANT_SUSPEND = False
WANT_VACATE = False
SLOT1_START = (TARGET.NeedGpu =?= TRUE)
SLOT2_START = (TARGET.NeedGpu =?= FALSE)
This again only shows the CPUs (8 in this case).
slot1@xxxxxxxxxxxx LINUX X86_64 Owner Idle 0.080 6036 0+00:05:04
...
slot8@xxxxxxxxxxxx LINUX X86_64 Owner Idle 0.000 6036 0+00:05:03
Btw., the configuration mentioned in my previous mail shows the following status:
slot1@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.030 12073 0+00:00:04
...
slot4@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 12073 0+00:00:07
So, currently I either can define slots for the GPUs or the CPUs, not both at the same time and also not the combined approach as intended.
Regards,
Tobias
> Hi Tobias,
>
> Did you see this in the recipes section on the wiki?
>
> https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToManageGpus
>
> I am also a greenhorn but I am about to head down this path (have a couple of servers with GPU's I would like find a better way to advertise and utlize. Currently I am basically using the machine name to target the gpu machines and there is no contention.
>
> Eddie
>
> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Tobias Beisel
> Sent: Tuesday, July 09, 2013 11:11 AM
> To: htcondor-users@xxxxxxxxxxx
> Subject: [HTCondor-users] configuring a GPU machine
>
> Hi,
>
> I am new to condor and have problems configuring my machine.
>
> I'm using HTCondor V8.0.0 on a Ubuntu 12.04 machine with 16 CPUs (8 Cores with Hyperthreading) and 4 NVIDIA Tesla C2070 GPUs. I would like to configure condor to 1. use each GPU combined with 1 CPU as a slot and 2. each 4 of the remaining 12 CPU as a single slot.
>
> I managed to provide the slots for GPUs using the following configuration:
>
> MACHINE_RESOURCE_gpu = 4
> MACHINE_RESOURCE_actuator = 20
>
> SLOT_TYPE_1 = gpu=1, cpu=1, actuator=1
> NUM_SLOTS_TYPE_1 = 4
>
> condor_status shows these slots correctly.
>
> Unfortunately I can not get the remaining CPUs to be configured as slots. The following does not show any slots:
>
> SLOT_TYPE_2 = cpu=1, actuator=1
> NUM_SLOTS_TYPE_2 = 12
>
> or
>
> SLOT_TYPE_2 = cpu=4, actuator=1
> NUM_SLOTS_TYPE_2 = 3
>
> I tried several other configurations I found from examples, but in best case could manage one slot type to be shown.
>
> What would I need to change to make it work?
>
>
> Assuming the above would work, I'd have two more questions on how to create job submission files:
>
> 1. As configured, the above mentioned GPU slots show 'Arch x64_64' and so would the CPU slots. How can I choose a different executable based on the provided architecture then, as proposed in chapter 2.5.6 (heterogeneous submit) by using the $$(Arch) macro?
> 2. Is it also possible to choose different arguments to the executables based on the provided 'Arch'? This would allow to choose the executed code within a single application binary, i.e., figuratively using a 'fat' binary.
>
>
> Thank you for your help,
> Tobias
>
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/