[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] using idle computers in computer labs for CFD jobs
- Date: Tue, 08 Mar 2016 14:00:30 -0600
- From: Todd L Miller <tlmiller@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] using idle computers in computer labs for CFD jobs
Now, this type of checkpoint is distinct from the standard universe's
checkpoint, as it's managed internally by the application rather than
the standard universe wrapper applied by condor_compile. For Fluent and
similar applications which can't be relinked in this way, we need to
figure out how to signal Fluent itself to checkpoint periodically.
We expect to be releasing a new developer version (8.5.3) of
HTCondor soon, which will contain some experimental features to help
simplify situations like this. It sounds like you'd still need to write a
wrapper script, but that may be easier than changing the configuration of
your execute nodes. At any rate, if you'd like to help test the new
features (or are just curious about what they'll probably be), please
contact me off-list.
I think that the alternative would have to be having a wrapper script
around the Fluent executable which would be able to recognize the
eviction signals from HTCondor and create the exit-fluent flag file when
such a signal is received.
IIRC, the 'KillSig' job attribute determines which signal is sent
on an eviction, so if you'd rather not trap SIGTERM, you can choose
something else.
- ToddM