[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] GPU and condor?



On Jan 7, 2010, at 9:38 AM, Miron Livny wrote:

To all GPUers out there,

We would be very interested in hearing from you what Condor can do to help you in managing GPU clusters. So far we did not find much we can offer in this space. Any guidance you can provide will be most welcomed.
Miron

Hi,

We've done work helping customers to set up policies enabling GPU scheduling. Our approach has been to set attributes in GPU-specific jobs and slot-types, and require that the attribute be set to match with GPU-specific slots. Condor handles the scheduling gracefully given this setup.
A majority of the work relates to policies.  It would be great to get  
information about the presence of the GPU, its model, and utilization,  
but we're not aware of any standard ways to do this between GPU  
vendors/models.  GPU model specific scripts can be created to  
advertise this information in the slot ads using Hawkeye/STARTD_CRON  
for a dedicated cluster.  Condor could help by offering concurrency  
limits for an individual host (e.g. this machine has a GPU_Limit=2  
because it has only 2 GPUs), or making dynamic slots more configurable.
Because of the difficulties w/automatic detection and telemetry, using  
pre-created policies seems to work well.
Cheers,

-Ian
--
===================================
Ian D. Alderman
office: 608.554.4605
main: 888.292.5320

Cycle Computing, LLC
Leader in Condor Grid Solutions
Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com
http://www.cyclecloud.com