[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] File last modification time or job last write() attribute?



Let me try again. 

My goal is to define the problem without use not any HTCondor terminology or implementation details. 

Here is what I got so far. 

The person submitting the jobs knows the name of the file the records the progress of the job. 

You will consider terminating the job after one hour. 

You do not have an algorithm to decide when a job stopped making progress based on its Output behavior after it consumed one hour of CPU time. 

What am I missing?

Miron

Sent from my iPhone

On May 26, 2016, at 19:56, Michael V Pelletier <Michael.V.Pelletier@xxxxxxxxxxxx> wrote:

From: MIRON LIVNY <miron@xxxxxxxxxxx>
Date: 05/26/2016 02:30 AM

> Thank you student.

>
> OK. This is what I expected to hear. So, what is the expected run time of
> these applications and what is the expected frequency of progress reports.
> Also, do you know the file name where these reports are written?


The name of the progress file can be derived from aspects of the job
submission script. I've written a hook that works as expected, where you
invoke the hook and IO proxy, and set an attribute for the file
name, and it populates the CheckfileLastModifiedTime attribute with the
"mtime" timestamp of that file at 8 seconds and every five minutes.

I figured it would be best to keep this aspect of it very simple and
straightforward, and leave the policy-making to the submit description.

The five-minute interval of the update_job_info hook is good enough
resolution to update the value for my purposes, as we'd be looking to
take action only when the age started heading up to an hour or so
at this point.

A hold _expression_ would need to be written carefully in order to account
for the possibility of a suspended job - you wouldn't want to trigger
a hold of the job was suspended for a few hours the moment the job
is unsuspended. Maybe something referencing LastSuspensionTime?

        -Michael Pelletier.
_
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/