[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-devel] Avoiding redundant executables in the SPOOL
- Date: Thu, 1 May 2008 02:18:22 +0100 (BST)
- From: Bruce Beckles <mbb10@xxxxxxxxx>
- Subject: Re: [Condor-devel] Avoiding redundant executables in the SPOOL
On Wed, 30 Apr 2008, Alan De Smet wrote:
Question: what happens with when_to_transfer_output=ON_EXIT_OR_EVICT? I
don't know enough about what Condor does here to know whether
this would be a problem under this scheme. If the current
behaviour on eviction is to transfer the executable (along with
everything else) back to the SPOOL directory, and then send the
executable (copied from the execute host that evicted it) out
again to the new execute host, then you might be in trouble.
Reason: if that's the case, then jobs that modify their
executable will currently work fine with
when_to_transfer_output=ON_EXIT_OR_EVICT even if evicted
at some point before completion. Once they've modified their
executable, though, its hash value changes - under the new
scheme, is this going to be a problem for the schedd (or any of
the other daemons, e.g. the startd or starter)?
Other comments inline below - hope they are useful.
-- Bruce
<snip>
1. condor_submit: Calculate a hash of the executable.
Discussion:
- We calculate on the submit side because:
- It allows the schedd to potentially never look at the
file, saving network and schedd processor time for
remote submits.
- submit can be aware that it's submitting the same
binary 50 times in different clusters, the schedd
can't. This means we can do the hash once on the
client instead of 50 times on the server.
- Arguably there is a risk of calculating it on the client
side: the client can lie. Later steps will reveal that we
don't link between jobs of different Owners, so if client
lies about the hash, it only hurts themselves.
Maybe you should think a little more about this? Consider the following
situations:
- I've installed a front-end that allows my users to submit jobs to my
compute resources. At the back end it uses Condor. Since my users
never had direct access to Condor I submitted all jobs as the same user
(just ensuring that the input and output for each job were in separate
subdirectories), because that way the Condor submit hosts don't need to
know about all my different users.
Now I MUST make copy_to_spool=false, or hash collisions mean that
the wrong executable may be run (either maliciously or by accident).
You better make sure I know this in BIG, BRIGHT, GLOWING letters before
I upgrade my Condor installation to a version of Condor with this
feature.
- I use "pool accounts" for my users, i.e. there is a collection of
accounts that get shared out between users depending on who is using my
resources at the time. User B submits an executable with the same hash
as user A, but which is _not_ the same executable as that submitted by
user A (hash collision). User B is/was using pool account ACC01. User
B leaves (temporarily or permanently), but doesn't bother to make sure
they've removed all their existing jobs, and user A now gets assigned
pool account ACC01.
- Multiple submitters to the same schedd (e.g. via Condor-C, etc). If a
bad person can masquerade as me on ANY of those submitters they can
submit a bad executable that will be run instead of the one I actually
submitted.
You might say: "but in that case you've got a problem anyway". Yes, but
previously the bad person could submit jobs and possibly (depending on
the set-up) affect my jobs currently in the queue. Now, they can
DEFINITELY affect my jobs by causing them to run a bad executable.
That bad executable might steal the input data of my real jobs (which
might be valuable) and send it to the bad person.
<snip>
6. condor_schedd: Does the hash path find an existing file?
6a: No: (The path doesn't exist, or it's invalid) Tell
condor_submit: "Send it along." Write the incoming file to
the ickpt. Is hash path valid, just not currently existing?
6aa: No, the hash path is the special value "invalid":
Skip to step 7.
6ab: Yes: Hard link the ickpt to the hash path.
6a: Yes: Great, we'll reuse that. Tell condor_submit: "No
thanks." Hard link from hash path to the ickpt
("$SPOOL/cluster7524.ickpt.subproc0").
Obviously you know that using hard links makes this filesystem specific.
In particular, where you get filesystems masquerading as "normal"
UNIX-type filesystems, you may have the situation that the filesystem's
"support" for hard links is actually by making duplicate copies of the
file's _data_. In such situations you are no better off than you were
before. In fact, you are slightly worse off: you've done lots of work to
get here, you have a potential security hole, AND you still have multiple
copies of the executable. Since the filesystem "lies" that it does
support hard links, the schedd presumably doesn't know that this is the
situation.
Also, there are situations where not all users are allowed to
make hard links (presumably the schedd is running as root (if it can) when
it does all this?). I guess, from your previous e-mail, that here you
fall back to the old non-hashed behaviour?
One more thing: in a previous e-mail, you said that "this won't work in
Windows". Why not? NTFS has supported hard links since at least NTFS 3.0
(Windows 2000).
8. condor_schedd: Time to remove the job from the queue. Check
the hard link count for the ickpt file. Is it 2? We're the last
user, unlink the hash path.
- If it's 3 or above, there are other users of this file.
Leave it be.
- If it's 1, this is a job that didn't have a hash.
9: condor_schedd: unlink the ickpt file, same as before.
...so if a user can create a hard link to the hash path then the ickpt
file will never be unlinked. What are the circumstances in which a user
can do this? In particular, I guess, if the only circumstances in which a
user can do this are where they also have the ability to do other equally
bad things, or to do even worse things, then you don't really need to
worry about it. :)
<snip>
Escaping algorithm:
The goals are:
- We need a character to separate fields in the CmdHash. This
character must be escaped.
- All resulting characters, including the field character, must
be valid as a file name on all operating systems Condor
supports.
Field seperator: - (, might be better, but I don't know if
Windows and MacOSX like it)
Windows (at least from Windows 2000 onward) is perfectly happy with a
comma (,) in the filename. (Of course, as with spaces in filenames, users
need to surround the filename in double quotes.)
My MacOS 10.4.10 system was also happy with a comma in the filename.
<snip>
General notes:
This is insecure in the face of CLAIMTOBE, ANONYMOUS, or any
other situation in which multiple users map to the same Owner.
Don't do that. Even without this behavior, that's very insecure
and users can mess with each other using condor_qedit, condor_rm
and more.
...except that, as I mention above, you can have situations where multiple
users map to the same Owner, but users can't use condor_qedit, etc. (e.g.
if I use some sort of portal or front-end for job submission, monitoring,
etc.). However, in such situations they are now vulnerable to hash
collision attacks on executables whilst previously they were fine.
--
Bruce Beckles,
e-Science Specialist,
University of Cambridge Computing Service.