Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] File last modification time or job last write() attribute?
- Date: Thu, 26 May 2016 14:23:40 -0400
- From: Jose Caballero <jcaballero.hep@xxxxxxxxx>
- Subject: Re: [HTCondor-users] File last modification time or job last write() attribute?
2016-05-26 14:06 GMT-04:00 Michael V Pelletier
<Michael.V.Pelletier@xxxxxxxxxxxx>:
> From: MIRON LIVNY <miron@xxxxxxxxxxx>
> Date: 05/26/2016 01:46 PM
>
>> You do not have an algorithm to decide when a job stopped making progress
>> based on its Output behavior after it consumed one hour of CPU time.
>>
>> What am I missing?
>
> Ah, I see what you're getting at now.
>
> Regardless of how much time the job has spent in slot, we can decide
> that it is hung and needs to be terminated if it has gone at least one
> hour (for example) without making any updates to a particular file.
>
> -Michael Pelletier.
> _
Hi,
I usually don't follow very closely threads in this forum, but this
one actually caught my attention, for a number of reasons.
Is it not possible in your case to have the actual job to do it?
Something like forking a separate process that watches over that file,
and sends a signal to the main process when it does not see
progress...
That does not requires any extra HTCondor feature, right? Would
something like that work?
Cheers,
Jose