On 03/14/2018 08:18 PM, Alex Nitz
wrote:
Note that the reason condor sets OMP_NUM_THREADS isn't primarily to ensure that cpu usage doesn't exceed the requested usage -- the cgroup cpu shares option we set enforces that. And, there's nothing condor can do to prevent a job from changing this environment variable after condor has spawned the job.  However, we see a large number of applications linked with open mp, often, in the case of 3rd party code, without the user knowing open mp is under the hood. The default for openmp is to spawn as many threads as cores detected, and this, combined with the cpu limiting, causes massive code slowdowns. The machine wasn't overloaded, and neighboring jobs on the same machine weren't effected, but when there are 32 threads competing for one core, we could see orders of magnitude performance impacts. I think that trusting the user, when OMP_NUM_THREADS is set explictly in the condor submit file is an appropriate approach. -greg |