Re: [HTCondor-devel] Future of PrivSep, interested in feedback/opinions


Date: Tue, 23 Apr 2013 12:29:42 -0500
From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
Subject: Re: [HTCondor-devel] Future of PrivSep, interested in feedback/opinions
On Apr 23, 2013, at 12:10 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:

> 
> The list of things that do not work properly if you are running your execute nodes with PrivSep enabled keeps growing.  Off the top of my head, I boldly claim that condor_tail, job x509 proxy updates (gt #104), upcoming Lark work, and most of the job container functionality incl cpu affinity and cgroup containers (limits on memory, process tracking, pid namespace, file namespace) do not work.  Do folks agree ?  Esp re the job container stuff, I am not positive, but I think it is likely borked w/ PrivSep.
> 
> We have three options: (1) make everything work with PrivSep, (2) document all the stuff that stops working if you enable PrivSep and let admins do the risk/benefit analysis themselves, or (3) get rid of PrivSep.
> 
> Option #1 : I have a feeling for what it would take to fix proxy updates (couple days) and condor_tail (few days), but no good feeling re the job container work.  At first blush it seems like a really big task (many weeks? a few months when all said and done?).  Not sure it is worth months.
> 

Containers are somewhat difficult because the privileged process must call fork().  That means something must stick around; for default HTCondor, this is the condor_starter.  We'd have to develop some sort of shim process for the condor_root_switchboard.

I would also suggest that, taking this route, we'd want to re-write the privsep implementation to be part of DC and remove completely running HTCondor as root (time for the rest of the world to eat the dogfood!).  This would prevent us from being in the same place in a few years.

> Option #2 : Seems like another example of 'punt to the user'.  I think most admins would opt for the job container stuff over priv sep.
> 

This also is "punt to future" - we may have more flexibility in 3 years.  We could leave in the code paths but not target them for the new features.

> Option #3 : If we got rid of PrivSep, it would lessen many code paths to continue to support, test.  Plus, would anyone miss it?  Is anyone beyond UW-Madison even using PrivSep (just sent that question out to condor-users)?  My guess is until it is "on by default", very few places will ever use it, and again I think most folks would rather see the container stuff on by default over privsep. Long term, the container stuff may be available to non-root (in RHEL 8 or so), which makes the motivation for PrivSep in general less relevant - HTCondor could run as the same user as all the jobs, and containers would prevent jobs for tamperings w/ the HTCondor daemons or other jobs.
> 

(Are any RedHat folks permitted to spill the beans about what kernel RHEL7 will be based on?  User namespaces in RHEL7 versus RHEL8 is quite a difference in terms of years...)

Of course, user namespaces allow HTCondor to isolate its jobs, but, to things outside the container, the jobs will appear to run as the HTCondor user.  This means a job would access an NFS server as the condor user, not as their UID.

> Thoughts? Comments? I am mis-understanding something?
> 

Don't forget that gWMS uses condor_root_switchboard to implement user separation.

When stacked up against the other "big picture" items I have in the back of my head (fixing-up preemption, improving data management, improving monitoring, scalability, async stageout), spending a good amount of effort on improving PrivSep ranks poorly.

Brian

Attachment: smime.p7s
Description: S/MIME cryptographic signature

[← Prev in Thread] Current Thread [Next in Thread→]