Thanks Todd,
After your inputs we understand that the matching rate is not what hurting us, but the job start rate.
But first let me describe our workload:
Our typical use-case would be every ~30 minutes burst submitting ~10K jobs.
Our 30 minutes job runtime histogram looks something like:
~0.8K more than 30 minutes
~3.5K 10-30 minutes
~2K 3-10 minutes
~2K less than 3 minutes
We are using the default CLAIM_WORKLIFE = 1200.
What we are seeing is that we have more matches than job starts.
I suspect that the reason matched jobs don't start is related to this StartLog message -"Partitionable slot can't be split to allocate a dynamic slot large enough for the claim".
I followed a certain cycle 5 matches to a specific startd, I sees that two jobs have started and the other 3 got that message.
Do you think its relevant?
Do you have other reasoning to why jobs starts don't surpass matches?
Additionally what is the best way to check the effectiveness of claim mechanism we currently monitoring it by comparing all schedulers
JobsExited
rate vs the
JobsExitedAndClaimClosing
rate. Is that reasonable?
Again many thanks for your help,
Zohar
On 1/14/2021 10:19 AM, Zohar Kol wrote:
Hello,
Setup description:
Condor version 8.8.9
1.5k physical machines
18.9k cores
~200TB RAM
~50k jobs in queues
4 schedulers machines (16 cores 128GB RAM each)
Negotiator / Collector machine runs on 4 core 16GB RAM machine.
Condor configuration description:
Accounting groups quotas
Pslot preemption enabled
All partitionable slots are running within Docker universe
Issue description:
We experiencing a slow job matching rate ~15 per second when the cluster is ~50% idle.
Can anyone share their tips on how to improve this rate?
Thanks,
Zohar
Hi Zohar,
For most HTC workloads, a matching rate of 15 per second is reasonable because the matching rate is different than the rate at which jobs can start running. When the schedd is given a matching slot by the negotiator, the schedd will keep reusing that slot
for job after job after job without needing any addition matches from the negotiator. The only time a slot needs to be matched again is when a) the slot is preempted, b) the CLAIM_WORKLIFE time has expired (default of 20 minutes), or c) the schedd unclaims
the slot because it no longer has any jobs queued that match the slot. Besides the Manual, some details can be found in this recorded HTCondor Architecture workshop presentation at
https://indico.cern.ch/event/936993/contributions/4022092/
Thus I am wondering why match time is impacting you... some thoughts:
Are you constantly submitting many short jobs in small bursts? For example, say you are submitting a pile of 30 second jobs every 60 seconds. In this case, perhaps the schedd gets a bunch of matches and claims a bunch of slots, but ends up releasing these
slots before the next pile of jobs is submitted because the queue is empty. And then when the next pile of jobs is submitted 60 seconds later, the schedd has to wait for the negotiator to match slots again. If this is your scenario, you can tell the schedd
to "hold on" to slot for X amount of seconds even if there are no jobs remaining in anticipation that more jobs will be submitted soon. To do this, use the "keep_claim_idle" setting in your job submit file - the man page for condor_submit discusses the "keep_claim_idle"
setting here:
https://htcondor.readthedocs.io/en/latest/man-pages/condor_submit.html.
What is the setting on your pool for CLAIM_WORKLIFE ? The default is 1200 seconds (20 minutes). Did you change it to be something much smaller?
Is there a lot of slot preemption occurring in your pool?
Hope the above helps
Todd