Hi All,
We have a HTCondor cluster of 200 machines. We need to push a large number of jobs (~50k) through the cluster on a daily basis. Currently all job submissions are done from a single machine. There may be thousands of jobs running concurrently at any given
time.
It seems like having a single job submit machine is not the best choice. There are thousands of condor_shadow running on the submit machine at the same time and itâs becoming a bottleneck. I have a recent incident where the condor_shadows running the submit
machine were consuming high percentage of CPU.
Given there is one condor_shadow per job running on the submit machine I would like to know if there is a way for condor to automatically distribute the job submission throughout the cluster e.g. use a random condor_sched for every job?
Thanks
Jason
PRIVACY AND CONFIDENTIALITY NOTICE |