I want to allocate 90% of the cores for jobs that run indefinitely, possibly weeks. I would also like to allocate the remaining cores for short lived jobs, 10 minutes maximum. I can hold the job if it runs longer than 10 minutes. But, I am not sure how to enforce the ratio. A short job can also run on the long running (90% core). I believe with concurrency limits I can do it but is there a way to force a concurrency limit or atleast default to a certain one? Or is there a better way to do this?
I would expect that you could force all jobs to specify a concurrency limit by using submit requirements, or to add a default to all jobs by using submit transforms. (See: https://htcondor.readthedocs.io/en/latest/admin-manual/policy-configuration.html#submit-requirements and https://htcondor.readthedocs.io/en/latest/admin-manual/policy-configuration.html#job-transforms respectively). If you have enough jobs, and the jobs are all the same size, you could maintain the ratio you desire just by making the concurrency limits have the appropriate ratio (576 "long" to 64 "short")
If your pool generally has a fixed membership, and enough machines, you could enforce the ratio for jobs requesting multiple cores by splitting up the machines: nine run only "long" jobs and one runs only "short" jobs. Of course, this proportion won't be maintained if one of the machines stops working properly.
If your jobs request multiple CPUs, you could probably use submit requirements and transforms to require that the job request as many "long" tokens (or "short" tokens, as appropriate) as CPUs it requests. (This could lead to single-CPU jobs dominating the mix of jobs; I don't know what can be done about that.)
If you want to put short jobs on hold after ten minutes, it's probably easiest to do that with system periodic hold:
https://htcondor.readthedocs.io/en/latest/admin-manual/configuration-macros.html#SYSTEM_PERIODIC_HOLD%20and%20SYSTEM_PERIODIC_HOLD_%3CName%3EYou can add an attribute (ShortJob = True) to the job in the submit transform to make writing the hold expression easier, something like:
SYSTEM_PERIODIC_HOLD_NAMES = $(SYSTEM_PERIODIC_HOLD_NAMES) SHORT_JOB_NOT_SHORT SYSTEM_PERIODIC_HOLD_SHORT_JOB_NOT_SHORT = (ShortJob === True) && (JobStatus == 2) && ((EnteredCurrentStatus - time()) > 600) SYSTEM_PERIODIC_HOLD_SHORT_JOB_NOT_SHORT_REASON = "Your short job ran for more than ten minutes. Try resubmitting it with LongJob = True" You can then use EXTENDED_SUBMIT_COMMANDS https://htcondor.readthedocs.io/en/latest/admin-manual/configuration-macros.html#EXTENDED_SUBMIT_COMMANDS to enable the use of "LongJob" without the +. -- ToddM