HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Building a super-procd



Hi Brian -

Seems like the procd flaw the below would address is that on rhel5 and older systems, the out-of-the-box default configuration of the procd does not always catch all child processes of a job. Is this correct?

On rhel6+ it is no longer an issue thanks to your cgroups contribution. :)

But even on rhel5 and older, it is only an issue because many site admins don't configure the procd w/ a small range of unused gids, probably because it is yet one more thing to do at installation. Or the poor busy sysadmin never got around to reading page 611 of the Manual to even know he/she prolly wants to do this.

To help convince ourselves that the below is the best approach to the problem, lets consider an alternative:  instead of relying on the admin to add something in the config file, the procd could simply automatically select a small (size 64?) unused gid range via a simple self-contained function that scans through /etc/passwd|group to make a map of all used gids. A default gid range (or set of ranges) could be documented, and this function would simply check to see if there is a collision and thus pick which default range to use, or let the admin know if there is no such range that does not already have gids in use.

Thoughts? Does this address the same flaw as below on older systems, but in a manner portable to any unix and perhaps via a much more self-contained/small change?

Thanks
Todd

-- Sent from my HP Veer mobile phone


On Jul 19, 2011 5:39 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:

Hi all,

The procd is a pretty flawed component out-of-the-box. I would like to invest some of my "night and weekend" time and implement the techniques described here:

http://osgtech.blogspot.com/2011/06/part-ii-keeping-mindful-eye-on-your.html

Basically, on any Linux 2.6 kernel (including the ones with Debian and RHEL5), the procd can subscribe to a feed from the kernel containing all processes spawned on the system. If a fork bomb occurs such that the procd can't keep up with the incoming feed, we are at least able to detect this occurred and can inform the procd's clients.

The implementation really isn't that hard - there's svn code in the blog post for using the connector API - but would require a refactoring of the existing procd to a non-DC asynchronous infrastructure.

I believe this would be a major improvement to the procd, usable out-of-the-box today. However, I'd like some assurance that if I did the work, someone with commit privileges would be willing to review and accept the code.

Thoughts?

Brian

_______________________________________________
Condor-devel mailing list
Condor-devel@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-devel