Hi,
I was experimenting with HTCondor Concurrency Limits, and I have this line
NetworkBandwidth_LIMIT = 100
in my master config file.
In my job file, I have
concurrency_limits = NetworkBandwidth:25
All seems to work just fine. I now only get four jobs running concurrently on my test cluster, even if I queue 100.
However, I noticed all the running jobs have staggered start time, as shown in condor_q:
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
3975.0 admin 9/20 12:18 0+00:02:36 R 0 97.7 sample_load.a 1 0
3976.0 admin 9/20 12:18 0+00:02:16 R 0 97.7 sample_load.a 1 0
3977.0 admin 9/20 12:18 0+00:01:16 R 0 0.1 sample_load.a 1 0
3978.0 admin 9/20 12:18 0+00:00:16 R 0 0.1 sample_load.a 1 0
Compared this to if I donât enforce any concurrency_limits:
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
3979.0 admin 9/20 12:27 0+00:00:50 R 0 0.1 sample_load.a 1 0
3980.0 admin 9/20 12:27 0+00:00:50 R 0 0.1 sample_load.a 1 0
3981.0 admin 9/20 12:27 0+00:00:50 R 0 0.1 sample_load.a 1 0
3982.0 admin 9/20 12:28 0+00:00:50 R 0 0.1 sample_load.a 1 0
This tells me when concurrency limit is enabled, condor is matching one job at a time, and the matchmaking cycle is something like 20 seconds to 1 minute. In our production cluster we need to push through 100k jobs in a day. Obviously matching one job
per minute is not very scalable. So I am wondering if there is anything I have done wrong here.
We are running condor 8.4.7 on Ubuntu 14.04.
Thanks in advance for any help.
Kind Regards
Jason
PRIVACY AND CONFIDENTIALITY NOTICE |