[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-devel] Building a super-procd
- Date: Wed, 20 Jul 2011 09:23:58 -0500
- From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
- Subject: Re: [Condor-devel] Building a super-procd
On Jul 20, 2011, at 3:52 AM, David McBride wrote:
> On 19/07/11 23:38, Brian Bockelman wrote:
>> Hi all,
>>
>> The procd is a pretty flawed component out-of-the-box. I would like to invest some of my "night and weekend" time and implement the techniques described here:
>>
>> http://osgtech.blogspot.com/2011/06/part-ii-keeping-mindful-eye-on-your.html
>>
>> Basically, on any Linux 2.6 kernel (including the ones with Debian and RHEL5), the procd can subscribe to a feed from the kernel containing all processes spawned on the system. If a fork bomb occurs such that the procd can't keep up with the incoming feed, we are at least able to detect this occurred and can inform the procd's clients.
>
>
> Hi Brian,
>
> Have you seen the 'cgroups' facilities provided by modern kernels? They look attractive for the sort of thing that you have in mind -- not least because fork-bombing won't cause processes to escape notice, as a cgroup is an (inherited) property of a process.
cgroup integration is already done; see my Condor week presentation:
http://www.cs.wisc.edu/condor/CondorWeek2011/presentations/bockelman-user-isolation.pdf
However, RHEL5 still has legs on it for a long time to come, meaning a significant users (most significantly, the one who employs me...) won't have cgroups available.
>
> (Systemd, a competitor to Upstart and the default init system in Fedora 15, is already using it for precisely this reason.)
>
There's Condor integration for systemd in Fedora rawhide, scheduled for 16. I posted the unit file to gittrac.
> Alternatively, it might be worth exploring using resource limits for process-counts to try to prevent the fork-bomb scenario from being viable.
>
rlimit for process-counts isn't a viable solution unless you want to set it to zero. IIRC, there's a few articles why this doesn't work on LWN or the sandboxing-on-Linux presentation from the Google Chrome team.
Brian