Hi again, somewhat related to my earlier question about defrag. We currently face the problem, that we have a large number of jobs ranging from single core to many cores and runtimes from a few minutes to more than a week. And over time the pslots become so fragmented that even medium sized job requests are starved for hours to days. As most of our jobs have problems check pointing (too large memory footprints, usually takes too long to write to disk, ....), I would try to avoid full scale preemption for now. Looking through the FAQ and recipes, it looks maybe I could (ab)use dynamic group quotas and just create a group per user with a guaranteed quota fraction of the pool including overflow so that only a small subset of jobs are evicted if a user wanted to fill their quota. However, given that we have quite a range of possible slot weights, I'm not sure how condor would attribute quotas to user-groups[1]. Is this direction a possibility or are there better methods to get users a foot into the door quickly? Cheers Carsten [1] condor_status -af SlotWeight|sort|uniq -c|sort -g 1 48 1 64 1 7 3 12 5 128 5 15 5 4 6 1 19 8 238 32 380 0 2596 16 -- Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics, CallinstraÃe 38, 30167 Hannover, Germany Phone: +49 511 762 17185
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature