[HTCondor-devel] Remote IO in vanilla universe


Date: Sun, 27 Jan 2013 20:46:34 -0600
From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
Subject: [HTCondor-devel] Remote IO in vanilla universe
Hi all,

Figured out a relatively simple way of providing remote IO in the vanilla universe and am looking for someone willing to give it a spin.  It's a surprisingly small amount of code - the heavy lifting is done by chirp.  Mostly, the new code is just gluing pre-existing components.

See the design document:


In short:  I created a FUSE filesystem that translates filesystem calls to chirp IO (which does a remote IO with the submit host).  I use the filesystem namespaces feature to make this filesystem only appear to the job (and be automatically unmounted at the job's end).  This way, the job sees the filesystem of the submit host (either as / or as /condor/submitter, depending on the job's requested options).  The technique appears to work well, but I haven't tried pushing it too hard.

I'm not quite sure where Chirp breaks, but I did notice that it has no error codes implemented (either returns 0 or -1, no errno).  Hence, any IO error is converted to EIO.  That will likely be problematic for some applications.  Chirp also has no timeouts or error recovery; the filesystem will likely die if the shadow restarts.

Enjoy!

Brian

Attachment: smime.p7s
Description: S/MIME cryptographic signature

[← Prev in Thread] Current Thread [Next in Thread→]