Hi all,
I have written a load balancer for multiple APs so that I can choose least loaded AP and Distribute load evenly.
I am using Total Jobs and RecentDaemonCoreDutyCycle into consideration, I get this info using condor_status -schedd cmd
There are major 2 problems I am facing
1. I am using jobs with max_idle 300 with maximum of 2000 jobs in a cluster, But the jobs in factory are not visible in cmd(condor_status) and I kind of oversubscribe and AP leading to slowness, Is there any way to get total jobs(including in factory), instead of condor_q(as this stucks a lot and I am removing it's dependency to reduce load).
2. If any AP has RecentDaemonCoreDutyCycle high, I am not able to debug the reason, why this is happening?
Are there any other factors I should be considering for load balancer?
Thanks andÂRegards
Raman