Subject: Re: [HTCondor-users] File last modification time or job last write() attribute?
From: Jose Caballero <jcaballero.hep@xxxxxxxxx> Date: 05/26/2016 02:24 PM
> Is it not possible in your case to have the actual job to do it?
In the real world, timing out is not an option for
some tasks, so there's no timeouts in the code in that situation.
You can kill it off in the lab, of course, but it has to be done from
outside the job.
> Something like forking a separate process that watches over that file,
> and sends a signal to the main process when it does not see
> progress...
> That does not requires any extra HTCondor feature, right? Would
> something like that work?
You can't fork a daemon in a +PreCmd since all those
processes get killed when the job starts, but you could do it in
a user_job_wrapper.
That might be preferable to a hook in some ways, but
having an extra process hanging around doing nearly nothing rubs me
the wrong way. I like the way the update_job_info hook spawns automatically
and has minimal requirements and overhead. A wrapper-spawned
daemon, though, would eliminate potential issues if the STARTER_UPDATE_INTERVAL
was set to an excessive value, since the daemon would
have control over its own interval and you could check a job attribute
to allow the user to control the interval.
I like how my hook is setting a job attribute, rather
than trying to signal the process itself, since that allows the submission
to set the policy on what to do in a given scenario,
rather than the hook author.