| Hi, Iâm looking into the appropriate defragmentation setup for our grid cluster, but it looks like the defrag daemon is solving a different problem than the one we have. What we want : if there is a large supply of âmulticoreâ (usually 8 cores, sometimes 4) jobs waiting and these are winning in the quota / priority dance, we want to have a lot of multicore slots, where a lot could be up to half the cluster. The way we did this with Torque is to have nodes dynamically moved into a âmulticore poolâ that only accepted multicore jobs. When multicore jobs were not winning the quota / priority dance, or the supply of waiting multicore jobs had dried up, we moved nodes out of the multicore pool. Iâve experimented with a START _expression_ START = (TARGET.RequestCpus > 4) and this basically has the effect of putting a job in the multicore pool. However, the mechanism for putting the nodes into and out of the pool, Iâd have to build that, AFAICT. Draining / defrag seems to be the native HTCondor mechanism to deal with this. However, this seems to solve a different problem - ensuring that there are some large slots for the occasional âstarvedâ multicore job. Not for ensuring that half the farm is located in large slots. It seems even that this wouldnât be possible to do with defrag, as the âwhole nodeâ is per startd, so our 256-slot nodes would have one single defragmented slot - weâll never make it to half the farm. Also, it seems difficult to organise the defrag; if there are no waiting multicore jobs, we donât want draining, but if there are a lot waiting and few multicore slots, we want aggressive draining; this control seems to be missing. How do others approach this? Is there some key concept Iâve misunderstood or missed? J âwill mcfloat ride againâ T |