Hi Tom, just replying to email #2 but referencing both. In short, I like the dynamic approach of #1 also better but I fear it may fall short if, say, a few hundred jobs of the I/O-hard class wait for resources and a 100 core machine comes back online after maintenance (or becomes available after another user removes her jobs). In that scenario the negotiator ranking would not matter and still fill up the node which would then hammer its local disk for hours to days. On 9/10/20 7:07 PM, tpdownes@xxxxxxxxx wrote: > A quicker-to-implement way might be to use a custom machine resource: > > https://htcondor.readthedocs.io/en/latest/admin-manual/policy-configuration.html?highlight=MACHINE_RESOURCE_NAMES#dividing-system-resources-in-multi-core-machines > > and then direct the user to explicitly consume that resource. > The IMHO clear disadvantage of this is that ir required a full startd restart to update the slot configuration making it a pretty worrisome configuration update throughout a pool. On the other hand, one could predefine a number of virtual resources per machine and tell users to consume these. Besides cluttering the slot definitions with virtA, virtB, virtC, ... users may just by accident try to use the same virtual resource because they chose the same letter based on their first names ;-). Right now, I think I like the simplicity of the latter approach more even though it may (and according to Murphy will) break down sooner or later. But I need to think more about it. Cheers Carsten -- Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics, CallinstraĆe 38, 30167 Hannover, Germany Phone: +49 511 762 17185
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature