[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Help needed understanding cpu core usage with cgroups



On 04/10/2015 09:19 AM, Roderick Johnstone wrote:
Hi

I have a condor job with 5 threads (4 cpu bound) running with request_cpus = 2 in the submit file.
When I have 2 foreground (Owner) jobs running at 100%cpu the condor 
job is only getting the equivalent of 1 cpu between its threads.
I'm measuring this by looking at the aggregate nice cpu percentage 
which is 25% in the output of the top program (the condor jobs are 
niced to 16 while the foreground jobs running at nice 0). This result 
is confirmed by the sum of the cpu percentage of the condor job 
threads adding up to approx 100% indicating that only one core is 
being used.
From the wiki page above, I was expecting that the condor job would 
access 2 cpus rather than 1 under these circumstances. Did I 
misunderstand something here?
HTCondor with cgroups uses the "cpu shares" parameter to limit cpu 
usage.  HTCondor will set the cpu shares of a cgroup to 100 * 
number_of_cores_assigned_to_the_slot.  This works well if the only 
cpu-bound activity on the machine is from HTCondor jobs.
When you say "foreground (Owner)" jobs -- are these processes running 
under HTCondor, or not?  If not, and they aren't in any cgroup, then I 
would expect the behavior that you see, their cpus shares are 
effectively unlimited, and the condor jobs just get the leftovers.
You could fix this by putting the foreground jobs into their own cgroup, 
or running them as a condor job proper.
One point that I'm not sure about is the first paragraph in Option 2. 
HTCondor is started as root (from init scripts; condor is installed 
form the condor repository rpm) but running as the condor user. Does 
that count as "condor daemons being started as root"?
If condor is started from init, that counts as "started as root".


-Greg