Brian -
You might check out the existing chirp_fuse module, which should
interoperate with both the Condor Chirp I/O proxy as well as the
standalone Chirp server. When used with the latter, you also get
proper errnos, timeouts, and transparent failure recovery.
http://www.cse.nd.edu/~ccl/software/manuals/man/chirp_fuse.html
Cheers,
Doug
On Sun, Jan 27, 2013 at 9:46 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
> Hi all,
>
> Figured out a relatively simple way of providing remote IO in the vanilla
> universe and am looking for someone willing to give it a spin. It's a
> surprisingly small amount of code - the heavy lifting is done by chirp.
> Mostly, the new code is just gluing pre-existing components.
>
> See the design document:
>
> https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=3465
>
> In short: I created a FUSE filesystem that translates filesystem calls to
> chirp IO (which does a remote IO with the submit host). I use the
> filesystem namespaces feature to make this filesystem only appear to the job
> (and be automatically unmounted at the job's end). This way, the job sees
> the filesystem of the submit host (either as / or as /condor/submitter,
> depending on the job's requested options). The technique appears to work
> well, but I haven't tried pushing it too hard.
>
> I'm not quite sure where Chirp breaks, but I did notice that it has no error
> codes implemented (either returns 0 or -1, no errno). Hence, any IO error
> is converted to EIO. That will likely be problematic for some applications.
> Chirp also has no timeouts or error recovery; the filesystem will likely die
> if the shadow restarts.
>
> Enjoy!
>
> Brian
>
> _______________________________________________
> HTCondor-devel mailing list
> HTCondor-devel@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel
|