Re: [HTCondor-devel] Remote IO in vanilla universe


Date: Mon, 28 Jan 2013 13:43:55 -0600
From: Erik Paulson <epaulson@xxxxxxxxxxx>
Subject: Re: [HTCondor-devel] Remote IO in vanilla universe
On Mon, Jan 28, 2013 at 01:11:14PM -0600, Brian Bockelman wrote:
> 
> On Jan 28, 2013, at 11:59 AM, Erik Paulson <epaulson@xxxxxxxxxxx> wrote:
> 
> Well, I hope the checkpointing is someone else's problem (either CRIU or DMTCP)!
> 
> I would point out that the combination of a new checkpointing library and this remote IO would allow for a "new standard U" that basically is a special configuration of the vanilla universe.  Probably the simplest way of eliminating the the complexities in the standard u.
> 

Rudimentary checkpointing (dumping the memory image/restoring the memory image
from inside the process) is not particularly hard. Going further, like CRIU or
DMTCP, is more involved. It'd be up to whoever has to support it, of course,
but I'd still encourage the HTCondor team to ship a basic checkpointing 
library that works much as today's version does, as well as support for
outside libraries. 

You'd still want to make Chirp checkpoint-aware - as Todd points out,
there's state there, and be it a simple Dr. Dobb's style memory dumper or
integration with DMTCP you need to capture that to fully support stduniverse
like jobs. 

-Erik

[← Prev in Thread] Current Thread [Next in Thread→]