Hi all, During the break, I was able to think more about cgroups and Condor. For those unfamiliar with cgroups, I think some of the best comprehensive background documentation is provided by Redhat: http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/index.html In short, cgroups are a kernel-level construct which provides the functionality of the condor_procd. I have two goals: 1) Improved accuracy for process accounting for memory and CPU usage. 2) Improved accuracy for job killing. Both (1) and (2) can be done to be basically 100% accurate - no need to worry about short-lived processes or clever fork'ers escaping the watchful eye of the procd. This can all be done without dedicated accounts or GID tracking. After examining the procd and starter code, I think these are also doable, short-term goals. However, I'd like to do this without replacing the procd or even disabling the current functionality (ideally, should be kept as a fallback if cgroups fail). My current thinking is to use libcgroups to assist in the cgroup creation and manipulation (adding a new dependency for cmake). The starter (some combination of VanillaProc::StartJob and OsProc::StartJob) would be responsible for creating the cgroup and launching the parent process. The procd would register the new process family as before, but get a new command for enabling tracking based upon a cgroup's name. The process family will become associated with the most-specific cgroup of the root PID. There will be a CGroupTracker somewhat equivalent to the current GroupTracker When the ProcFamily is associated with a cgroup, the aggregate_usage functions and spree functions will be replaced by their cgroups equivalent (falling back to the current implementations if the cgroups-enabled one failed). So, most of the control flow for process startup, monitoring, and shutdown will remain the same; this seems especially important as there's quite a bit of functionality in the procd. If I can get these done, there are some more far-off imaginative goals: 1) cpuset'ing. Limit the group of processes to a specific CPU. 2) Managing I/O bandwidth for the condor execute directory device. Prevents a process from affecting others by hitting the disk hard. 3) Private namespaces. Provide a private PID and/or filesystem namespace. Jobs running under the same unix account wouldn't be able to kill each other's processes or write into each other's execute directory. Would also allow a per-job /tmp. Somewhat deep voodoo, but would allow sites like Purdue to run all jobs as unix user nobody without worrying about the security implications. Thoughts? I'd like to open a ticket to kick this off. Brian
Attachment:
smime.p7s
Description: S/MIME cryptographic signature