[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] File last modification time or job last write() attribute?
- Date: Thu, 26 May 2016 15:19:59 -0400
- From: Michael V Pelletier <Michael.V.Pelletier@xxxxxxxxxxxx>
- Subject: Re: [HTCondor-users] File last modification time or job last write() attribute?
From: Jose Caballero <jcaballero.hep@xxxxxxxxx>
Date: 05/26/2016 02:24 PM
> Is it not possible in your case to have the actual job to do it?
In the real world, timing out is not an option for
some tasks, so
there's no timeouts in the code in that situation.
You can kill it
off in the lab, of course, but it has to be done from
outside the
job.
> Something like forking a separate process that watches over that file,
> and sends a signal to the main process when it does not see
> progress...
> That does not requires any extra HTCondor feature, right? Would
> something like that work?
You can't fork a daemon in a +PreCmd since all those
processes get
killed when the job starts, but you could do it in
a user_job_wrapper.
That might be preferable to a hook in some ways, but
having an extra
process hanging around doing nearly nothing rubs me
the wrong way.
I like the way the update_job_info hook spawns automatically
and has
minimal requirements and overhead. A wrapper-spawned
daemon, though,
would eliminate potential issues if the STARTER_UPDATE_INTERVAL
was
set to an excessive value, since the daemon would
have control over
its own interval and you could check a job attribute
to allow the user
to control the interval.
I like how my hook is setting a job attribute, rather
than trying to
signal the process itself, since that allows the submission
to
set the policy on what to do in a given scenario,
rather than the
hook author.
-Michael Pelletier.