HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Thoughts on cgroups-enabling Condor



On 01/04/2011 06:37 PM, Brian Bockelman wrote:

On Jan 4, 2011, at 5:00 PM, Alain Roy wrote:

Hi Brian,

I have one small question about cgroups. From the documentation you pointed at, I see:

Any single subsystem (such as cpu) can be attached to at most one hierarchy.
As a consequence, the cpu subsystem can never be attached to two different hierarchies.

Does this mean that if Condor uses (for example) the cpuacct subsystem to do process accounting, it will be hard or impossible for other programs on the same system to also do process accounting? Is there currently any common usage of cgroups that would interfere with Condor using it?



Each process can be in at most one cgroup per subsystem at a time (note that cgroups can be hierarchical).  So, you can have membership in /foo and /foo/bar, but not /foo and /bar for a given subsystem.  A single cgroup can have one or more subsystems associated with it - hence, you can be in /foo for cpuacct and /bar for memory.

So, if condor wanted to have a single cgroup named "/condor_<jobid>", then we could possibly step on toes.  I was thinking we could have Condor use a configurable prefix and place jobs in $PREFIX/condor_<jobid>.  So, I can put all my Condor-based processes in /jobs/condor_<jobid>, but keep my super-critical hadoop daemon processes in /daemons/hadoop.

I have two reasonable cgroups use cases in mind that would conflict with having Condor use cgroups for process accounting:

1) Have a subset of Condor process family in one hierarchy and a different subset in a separate hierarchy. Let's say I want to measure all the time spent in cmsRun executables, so I configure the cgred (cgroups rule daemon) service to put all processes named cmsRun in their own cgroup and all other processes somewhere else.
2) Have cgred configured so cgroups are changed based on ownership.  I.e., put all students in the /students group and all faculty in the /faculty group.

So, you can configure your system such that cgroups are fallible.  This is why I propose to use them to strengthen the process accounting, but keeping the existing mechanisms.  The procd will consider the fact one is in the $PREFIX/condor_<jobid>  cgroup as one of many criteria for being in a process family (others are dedicated accounts, tracking GIDs, env var cookies, process parentage).

Are these common usages of cgroups?  For interactive nodes possibly, but I'd be somewhat surprised to see them on batch nodes.  Overall though, they're such a new feature that I bet any true conflicting usage is going to be from some use case I've completely overlooked.

Because that's how it always works.

Brian

Indeed true. The facilities for using cgroups are still evolving. There may be some opportunity here for us to explore recommended hierarchies to allow components like Condor and cgred to cooperate without integrating. Though, when I last looked at libcgroup, it really seemed to want to own the full hierarchy. It would be worthwhile looking into how a libcq managed system can have a portion of the hierarchy managed by Condor, for at least accounting and life-cycle purposes.

Best,


matt