Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] cron and specific slots
- Date: Tue, 2 Nov 2010 12:50:51 -0600
- From: "Burnett, Ben" <ben.burnett@xxxxxxxx>
- Subject: Re: [Condor-users] cron and specific slots
Nice. Thanks :)
Are your GPU jobs CPU/IO/etc. intensive? I ask because It's not entirely clear to me on our grid if they are, so I'm unsure if having an existing slot tied to a GPU is a waste of a slot (i.e. that it could otherwise be doing other work), or if I should continue with the idea of having GPU slots that are somehow independent of the machine slots. I suppose it depends on the nature of the job, and how much of the work is offloaded to the GPU.
I would be interesting to see CUDA's job to GPU device matching functionality added to the Condor matchmaker (assuming it somehow does a better job of matching than Condor would if GPUs were native resources).
-B
On 2010-11-02, at 11:26 AM, Michael Di Domenico wrote:
> Sure, I can't easily post direct chunks of code, but here's the jist
> (we also use Nvidia only, so this will be biased). I also can't take
> credit, i there's a sourceforge site (forget url) which setup the leg
> work and then i worked with cyclecomputing to hammer out some details.
>
> I wrote a cuda program that cycles through the CPU index and attempts
> to open an Nvidia device with the same index number (starting with
> Zero)
>
> The result of that program output's these classads
>
> GPU_DETECTED = TRUE
> SLOT1_HAS_GPU = TRUE
> SLOT1_GPU_NAME = "QUADRO FX 580"
> SLOT1_GPU_CUDACAPABLE = TRUE
> SLOT1_GPU_MEM = 536150016
> SLOT1_GPU_PROCS = 4
> SLOT1_GPU_CORES = 32
> SLOT1_GPU_CLOCKRATE = 1.12
> SLOT2_HASH_GPU = FALSE
> SLOT3_HASH_GPU = FALSE
> SLOT4_HASH_GPU = FALSE
>
> The Slot2..4 are because I ran the program on a four CPU core machine,
> but this one only had one GPU. We have machines that have Tesla
> S1070's, so there would be one GPU assigned to each slot
>
> I then add the below classads to the configuration of the machine with the GPU
>
> GPU_DETECTED = TRUE
>
> HAS_GPU = GPU_DETECTED && (((SLOT1_HAS_GPU == TRUE) && (SlotID == 1))
> || ... this repeats for each slot
>
> STARTD_ATTRS = $(STARTD_ATTRS), GPU_DETECTED, HAS_GPU
>
> STARTD_CRON_JOBLIST = UPDATEGPUINFO
> STARTD_CRON_UPDATEGPUINFO_EXECUTABLE = /path/to/program/gpudetect
> STARTD_CRON_UPDATEGPUINFO_PERIOD = 1d
> STARTD_CRON_UPDATEGPUINFO_MODE = Periodic
> STARTD_CRON_UPDATEGPUINFO_KILL = True
>
> And then I use these in my submission script
>
> +REQUIRES_GPU = True
> requirements = HAS_GPU
>
> I can't say whether this is the best way to do all this, but it does
> seem to work for me so far, but i'm still testing.
>
>
> On Mon, Nov 1, 2010 at 2:54 PM, Burnett, Ben <ben.burnett@xxxxxxxx> wrote:
>> Mind sharing what you came up with? I'd be interested in seeing the details.
>>
>> -B
>>
>> On 2010-11-01, at 10:25 AM, Michael Di Domenico wrote:
>>
>>> Thanks, I managed (with help) to get the system up to the point where
>>> each slot advertises all the same GPU information (derived from a
>>> script), but uses a SLOT_ classad and requirements expression to
>>> determine whether a job should run or not.
>>>
>>> On Fri, Oct 29, 2010 at 8:14 PM, Burnett, Ben <ben.burnett@xxxxxxxx> wrote:
>>>> If the single slot pattern I mentioned before does not suit your needs, then you could do something like this:
>>>>
>>>> 1) create one "GPU" slot per GPU device;
>>>> 2) continue to populate all the slot ads with the GPU information;
>>>> 3) modify your application to take a GPU device number as a parameter, but pass it the slot number;
>>>> 4) use the cudaSetDevice() in your application to tell CUDA to only use that GPU.
>>>>
>>>> Just a thought.
>>>>
>>>> -B
>>>>
>>>> On 2010-10-29, at 11:31 AM, Michael Di Domenico wrote:
>>>>
>>>>> I'm trying to update the classads from cron, but i only want to add
>>>>> classads from the cron to a specific slot. Is there a mechanism or
>>>>> classad notation that I'm missing that would allow me to do this?
>>>>>
>>>>> Currently when my cron job runs, it outputs the classads, but then
>>>>> those classads are sent for all four slots in my server.
>>>>>
>>>>> I'm trying to register cuda capable devices into condor's startd
>>>>> attrs, perhaps theres a better way?
>>>>> _______________________________________________
>>>>> Condor-users mailing list
>>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>>>> subject: Unsubscribe
>>>>> You can also unsubscribe by visiting
>>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>>
>>>>> The archives can be found at:
>>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>>
>>>> _______________________________________________
>>>> Condor-users mailing list
>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>>> subject: Unsubscribe
>>>> You can also unsubscribe by visiting
>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>
>>>> The archives can be found at:
>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>>
>>> _______________________________________________
>>> Condor-users mailing list
>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/condor-users/
>>
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/