Actually... let me take it back...
treating glexec operation mode as a second class citizen IS a big problem.
The whole idea of an overlay pool is that it gets temporarily ownership of the resources that it was entrusted with.
So, from the overlay pool operator point of view, I do want all the bells and whistles that Condor provides.
Now, I agree glexec currently does not provide the means.
But this should not be an excuse to stop supporting most features in the overlay mode... we should fix glexec (or whatever comes after it) instead!
Igor
On 04/23/2013 11:24 AM, Todd Tannenbaum wrote:
I was thinking in the same direction as Brian below... In the mode where OSG is using HTCondor glidein + glexec, the HTCondor glidein is not responsible for protecting the interests of the resource owner - that is the job of the scheduler instance that launched the glidein. So I was thinking all mechanisms on behalf of the submitting user should work with glexec (e.g. condor_tail, proxy refresh), but it is not important or even sensible to expect mechanisms on behalf of the resource owner (e.g. bind mounts, OMM killer functionality) to work in that mode.
Todd
On 4/23/2013 1:07 PM, Brian Bockelman wrote:
On Apr 23, 2013, at 12:47 PM, Igor Sfiligoi <sfiligoi@xxxxxxxx> wrote:
On 04/23/2013 10:41 AM, Brian Bockelman wrote:
And the OSG VOs need the glexec to work to the best of its options.
I.e. glideins need something along the lines of PrivSep, since running as root is not an option, but we still want privilege separation.
So, I think you should go for (1)...
and actually push it a little further and make sure everything works in "PriveSep" like mode, which includes glexec integration.
Why not use (2)? Continue supporting existing functionality, but don't target new functionality.
I definitely don't want glexec integration to be second class citizen;
whatever works in "regular condor" should work in "glexec condor".
Isn't glexec a second class citizen by definition?
glexec allows us to do one thing - execute a process as a separate, unprivileged user. There's a plethora of kernel functionality ("create a network device" or "set CPU affinity") in the container work that simply does not fall into this category.
What if we:
1) Dropped PrivSep, keep condor_root_switchboard, keep glexec (dropping glexec is obviously not an option!)
2) Port over all features that are feasible to run in "glexec condor". Do not port features that cannot be done with glexec.
3) Re-evaluate in a few years when user-level containers are widely available.
- It's not clear to me what doing glexec within a user container gives the site! Like with VMs where the VO has root, "traceability" is wildly different in such a context.
Brian
|