Hi Lee,
HTCondor will fill a machine before moving onto the next one. However, if you want to spread your jobs over the entire cluster, you can easily change this behavior. Just read the following page on our wiki:
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToFillPoolBreadthFirst
...Tim
I suspect I'm missing something fundamental but it's the end of the work day and my brain is done.
I have a 6-host cluster. The hosts are mostly the same, they're all VMs running the same OS, configured the same (configuration management via puppet*) and they all have the same NFS mount access to the data. The only real difference is how much RAM the hosts have.
Users are submitting jobs and those jobs keep going to the two busiest nodes in the cluster instead of being spread around. I've just tested and see the same behavior.
When I put a requirements = (name of idle host) the job goes to the idle host with no problems. However, if no hostname requirements are set the jobs keep going to the same busy hosts. Oddly, the busiest hosts are the ones with the least available RAM overall.
I was pretty sure condor should be doing a better job of balancing the loads. What am I missing here?
; condor_status
Name OpSys Arch State Activi
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle
slot1_1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Claimed Busy
slot1_2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Claimed Busy
slot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Claimed Busy
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle
slot1_1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Claimed Busy
slot1_2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Claimed Busy
slot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Claimed Busy
slot1_5@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Claimed Busy
slot1_6@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Claimed Busy
slot1_7@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Claimed Busy
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle
Machines Owner Claimed Unclaimed Matched Preempting Drain
X86_64/LINUX 15 0 9 6 0 0 0
Total 15 0 9 6 0 0 0
; ssh chrusm0 uptime ; ssh chrusm1 uptime ; ssh chrulg0 uptime ; ssh omics0 uptime ; ssh omics1 uptime ; ssh omics2 uptime
16:24:15 up 21 days, 40 min, 10 users, load average: 12.07, 13.56, 15.00
16:24:16 up 20 days, 23:34, 0 users, load average: 7.20, 7.20, 7.15
16:24:16 up 21 days, 40 min, 5 users, load average: 0.00, 0.02, 0.11
16:24:17 up 76 days, 4:58, 0 users, load average: 0.02, 1.53, 2.91
16:24:18 up 76 days, 4:57, 0 users, load average: 0.00, 0.40, 1.14
16:24:18 up 76 days, 4:55, 0 users, load average: 0.00, 0.01, 0.10
thanks,nomad
* - this is a different lab than the one I emailed about last week. Different hosts and configuration management system.
_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
-- Tim Theisen Release Manager HTCondor & Open Science Grid Center for High Throughput Computing Department of Computer Sciences University of Wisconsin - Madison 4261 Computer Sciences and Statistics 1210 W Dayton St Madison, WI 53706-1685 +1 608 265 5736