On Jan 28, 2013, at 11:59 AM, Erik Paulson <epaulson@xxxxxxxxxxx> wrote:
> I would welcome it on-list, in fact.
>
> Not that this should be a deal-breaker, but is FUSE enabled by default
> in most modern Linux distributions?
>
I claim no deep knowledge of Linux distros, but my understanding is the FUSE kernel module is enabled on most modern distros. I know for sure it is there for recent RHEL5/6.
> I've always wanted to turn the std universe libraries into a straight C-based
> implementation that turned remote I/O into a Chirp call over a pipe to a
> process on the local machine, which would be responsible for actually
> carrying out the I/O. That way, all of the Condor I/O code wouldn't have to
> be loaded into the address space of the user job, nor would there be any
> nastiness with C++ exceptions and runtime libraries, which would make the
> build way simpler too.
>
> The checkpointing code could be done in C as well, and dumped over the pipe
> to that same process and managed externally as well.
>
Well, I hope the checkpointing is someone else's problem (either CRIU or DMTCP)!
I would point out that the combination of a new checkpointing library and this remote IO would allow for a "new standard U" that basically is a special configuration of the vanilla universe. Probably the simplest way of eliminating the the complexities in the standard u.
Brian
> -Erik
>
> On Mon, Jan 28, 2013 at 10:21:14AM -0500, Matthew Farrellee wrote:
>> I doubt anyone will object to doing that on-list.
>>
>> Best,
>>
>>
>> matt
>>
>> On 01/28/2013 10:01 AM, Douglas Thain wrote:
>>> Brian -
>>>
>>> The idea all along has been to have a common protocol definition, so
>>> that the various implementations would interoperate:
>>> http://research.cs.wisc.edu/htcondor/chirp
>>>
>>> We last looked at this about 1.5 years ago -- and it worked -- but I
>>> don't believe there is any regular testing of the interaction between
>>> the cctools chirp and the condor chirp. Without that, things may
>>> drift apart over time.
>>>
>>> I will be happy to put forth some effort from my group to make this
>>> interaction work better. How about we start by identifying the known
>>> problems and any desired features? (Off list, probably.)
>>>
>>> Best -
>>> Doug
>>>
>>>
>>> On Mon, Jan 28, 2013 at 8:49 AM, Brian Bockelman <bbockelm@xxxxxxxxxxx>
>>> wrote:
>>>> Do you know the history behind the split implementation of the chirp
>>>> client? Why can't there just be a common library or codebase for the
>>>> client? I know I've seen jobs bedeviled by the "no timeouts" problem
>>>> when using the CLI shipped (not to mention the issues of thread safety!).
>>>>
>>>> The work described below is really just gluing together the two
>>>> interfaces. Most function implementations look like this:
>>>>
>>>> static int chirp_read(const char * path, char * buffer, size_t size,
>>>> off_t offset, struct fuse_file_info * fi) {
>>>> GET_CLIENT(client);
>>>> assert(path);
>>>> return chirp_client_pread(client, fi->fh, buffer, size, offset);
>>>> }
>>>>
>>>> (GET_CLIENT is a macro to pull the client handle from the FUSE context
>>>> and lock a mutex). Hence, things are mostly at the mercy of the
>>>> underlying client.
>>>>
>>>> Brian
>>>>
>>>> PS - I see that chirp_fuse doesn't use the standard option parsing for
>>>> fuse, meaning it can't be made to be compatible with /etc/fstab. :(
>>>> However, that shouldn't be a roadblock to using it in this case.
>>>>
>>>> On Jan 28, 2013, at 7:29 AM, Douglas Thain <dthain@xxxxxx> wrote:
>>>>
>>>>> Brian -
>>>>>
>>>>> You might check out the existing chirp_fuse module, which should
>>>>> interoperate with both the Condor Chirp I/O proxy as well as the
>>>>> standalone Chirp server. When used with the latter, you also get
>>>>> proper errnos, timeouts, and transparent failure recovery.
>>>>>
>>>>> http://www.cse.nd.edu/~ccl/software/manuals/man/chirp_fuse.html
>>>>>
>>>>> Cheers,
>>>>> Doug
>>>>>
>>>>>
>>>>> On Sun, Jan 27, 2013 at 9:46 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx>
>>>>> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> Figured out a relatively simple way of providing remote IO in the
>>>>>> vanilla
>>>>>> universe and am looking for someone willing to give it a spin. It's a
>>>>>> surprisingly small amount of code - the heavy lifting is done by chirp.
>>>>>> Mostly, the new code is just gluing pre-existing components.
>>>>>>
>>>>>> See the design document:
>>>>>>
>>>>>> https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=3465
>>>>>>
>>>>>> In short: I created a FUSE filesystem that translates filesystem calls
>>>>>> to
>>>>>> chirp IO (which does a remote IO with the submit host). I use the
>>>>>> filesystem namespaces feature to make this filesystem only appear to
>>>>>> the job
>>>>>> (and be automatically unmounted at the job's end). This way, the job
>>>>>> sees
>>>>>> the filesystem of the submit host (either as / or as /condor/submitter,
>>>>>> depending on the job's requested options). The technique appears to
>>>>>> work
>>>>>> well, but I haven't tried pushing it too hard.
>>>>>>
>>>>>> I'm not quite sure where Chirp breaks, but I did notice that it has no
>>>>>> error
>>>>>> codes implemented (either returns 0 or -1, no errno). Hence, any IO
>>>>>> error
>>>>>> is converted to EIO. That will likely be problematic for some
>>>>>> applications.
>>>>>> Chirp also has no timeouts or error recovery; the filesystem will
>>>>>> likely die
>>>>>> if the shadow restarts.
>>>>>>
>>>>>> Enjoy!
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>> _______________________________________________
>>>>>> HTCondor-devel mailing list
>>>>>> HTCondor-devel@xxxxxxxxxxx
>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel
>>>>
>>> _______________________________________________
>>> HTCondor-devel mailing list
>>> HTCondor-devel@xxxxxxxxxxx
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel
>>>
>>
>> _______________________________________________
>> HTCondor-devel mailing list
>> HTCondor-devel@xxxxxxxxxxx
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
|