On 7/27/06, Alex Gontmakher <gsasha@xxxxxxxxxxxxxxxxx> wrote:
> You've said your job writes to stdout -- if that's the case, wouldn't it > be trivial to just add "|bzip2 >outfile.bz2"? No intermediate storage, > no NFS thrashing, no added code in an already complicated job control > system... And I don't think this would even be platform-specific, as > even Windows (IIRC) supports i/o redirection. There are several problems with the solution you propose. First, Condor does not allow including inline scripts or even just sequences of commands connected by a pipeline - you can write a script around your executable, but there are problems with that which I stated earlier (not to mention that it's somewhat ugly as it would give up on many of Condor's capabilities) Second, Condor does have special handling for program's output and error files (and it actually does have gzip/gunzip functionality for input/output in some cases), so it's quite a natural extension for it.
What it should do in the standard universe on checkpoint is not so clear. Recommencing output to the stream might be a best problematic and at worst require new protocol level functionality. Streaming output may also need changes. No idea about Globus, GAHP, GCB etc.. What compression to use is also something that is best left in the hands of the app writer (who would know whether the significant additional cpu cost of bzip2 or 7z was worth it against using a straight deflate). Providing something like deflate as a useful default does sound nice - but would likely require considerable restrictions on use to prevent it causing a lot more recoding that initially expected. Due to this the ever present "wrap it in a script" option is always likely to be the default response since it allows so much more flexibility and power. Admittedly this is at the cost of one additional line/ entry in the submit script to transfer the 'real' exe as well as the script') and of course the lack of standard universe (but as we mentioned before this has complications if you do that anyways) For the standard universe there is a reasonable likelihood that, if you can relink the app you can prob also change it to output differently. The only really big saving you would get in complexity is if the job is cross platform, then wrapping in a script means creating multiple scripts each doing it differently. I admit this sounds nice but I don't know how many people use this functionality to make the additional code/maintenance complexity worth it (that's a question for the cs.wisc guys and gals :) Matt