HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-devel] Thoughts on cgroups-enabling Condor



Hi all,

During the break, I was able to think more about cgroups and Condor.  For those unfamiliar with cgroups, I think some of the best comprehensive background documentation is provided by Redhat:
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/index.html

In short, cgroups are a kernel-level construct which provides the functionality of the condor_procd.  I have two goals:
1) Improved accuracy for process accounting for memory and CPU usage.
2) Improved accuracy for job killing.
Both (1) and (2) can be done to be basically 100% accurate - no need to worry about short-lived processes or clever fork'ers escaping the watchful eye of the procd.  This can all be done without dedicated accounts or GID tracking.  After examining the procd and starter code, I think these are also doable, short-term goals.

However, I'd like to do this without replacing the procd or even disabling the current functionality (ideally, should be kept as a fallback if cgroups fail).  My current thinking is to use  libcgroups to assist in the cgroup creation and manipulation (adding a new dependency for cmake).  The starter (some combination of VanillaProc::StartJob and OsProc::StartJob) would be responsible for creating the cgroup and launching the parent process.  The procd would register the new process family as before, but get a new command for enabling tracking based upon a cgroup's name.  The process family will become associated with the most-specific cgroup of the root PID.  There will be a CGroupTracker somewhat equivalent to the current GroupTracker

When the ProcFamily is associated with a cgroup, the aggregate_usage functions and spree functions will be replaced by their cgroups equivalent (falling back to the current implementations if the cgroups-enabled one failed).  So, most of the control flow for process startup, monitoring, and shutdown will remain the same; this seems especially important as there's quite a bit of functionality in the procd.

If I can get these done, there are some more far-off imaginative goals:
1) cpuset'ing.  Limit the group of processes to a specific CPU.
2) Managing I/O bandwidth for the condor execute directory device.  Prevents a process from affecting others by hitting the disk hard.
3) Private namespaces.  Provide a private PID and/or filesystem namespace.  Jobs running under the same unix account wouldn't be able to kill each other's processes or write into each other's execute directory.  Would also allow a per-job /tmp.  Somewhat deep voodoo, but would allow sites like Purdue to run all jobs as unix user nobody without worrying about the security implications.

Thoughts?  I'd like to open a ticket to kick this off.

Brian

Attachment: smime.p7s
Description: S/MIME cryptographic signature