Hi All,
I've set up an htcondor cluster runningÂ8.6.5.412177 in a vmware environment. Previously, my two execute nodes had 4 vCPU each and our openmpi jobs ran just fine.Â
Last week, I brought the execute nodes down and doubled the vCPU count for each node. After that, all of our openmpi jobs would only run on one node in the cluster. If I requested the available 16 slots, all the processes would run on the same machine.
When I reverted to 4 vCPU on the execute machines, our openmpi jobs worked again as expected.Â
Has anyone seen anything like this before?
Thanks,
Jeremy