Hi,
I have 700 machines running startd, and 400 000 identical jobs
submitted to 10 schedds. I have 100 subcollectors on 2 machines, the
main collector on one of these machines and the negotiator on the
other.
All of the jobs are simple scripts:
#!/usr/bin/bash
exit 0
I assigned to the jobs I submitted randomized priorities, because
otherwise the negotiator would go through the schedds sequentially
(first, it runs all the jobs from schedd1, then from schedd2, etc).
I've also set:
USE_GLOBAL_JOB_PRIOS = true
I don't use job arrays or clusters and I can't consider using them,
this is a constraint.
So, what I do is:
# Turn off dispatching
condor_config_val -neg -rset "NEGOTIATOR_SLOT_CONSTRAINT = False";
condor_reconfig -neg
# Submit jobs, RANDOMPRIO ranges from 1 - 100
for i in `seq 1 N`
do
/usr/bin/condor_submit -verbose -append 'priority = $RANDOMPRIO' submitfile
done
# Turn back on dispatching with 400 000 queued jobs on 10 schedds
condor_config_val -neg -runset NEGOTIATOR_SLOT_CONSTRAINT; condor_reconfig -neg
In this way, I could achieve ~10 jobs / sec negotiation (dispatching)
rate (not using priorities doesn't change this).
[root@condormaster1 condor]# condor_version
$CondorVersion: 8.1.2 Oct 19 2013 BuildID: 189797 $
$CondorPlatform: x86_64_RedHat6 $
My questions:
- did anybody measure before a higher dispatch rate?
- is this 10 jobs / sec considered a "normal" or "good enough" value
in case of HTCondor?
- can I do anything without touching the source to increase the
negotiation performance?
Thanks,
Daniel
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/