HTCondor Project List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] proposal: PREEMPT_AND_HOLD

Date: Mon, 03 Mar 2008 14:15:10 -0600
From: Dan Bradley <dan@xxxxxxxxxxxx>
Subject: Re: [Condor-devel] proposal: PREEMPT_AND_HOLD



Erik Paulson wrote:

On Mon, Mar 03, 2008 at 01:17:14PM -0600, Dan Bradley wrote:
Whether it should _stay_ on hold is up to the user/schedd-controlledpolicy. The hold mechanism is just a well-defined state to put the jobin after an exception, in my view. We already do this for other events,such as failure to exec the job, missing input file, missing output file.
Exceeding memory is not an exceptional event - trying it again somewhere
else could succeed, especially since our memory usage data is so out oftouch with reality.

Exceeding the amount of memory that the user specified in their memoryrequirements is an exceptional event, in my opinion. We have no reasonto believe that matchmaking will do a better job of putting the job on abigger memory machine. It could just loop forever. Therefore, it isvery likely a malformed job.

The hold state already provides a way to feed back information about anexception and it provides policy controls, both globally in the schedd,and on a per-job basis. I can imagine an additional mechanism that issimilar to this but which has slightly different behavior. Do you thinkthis is really necessary?
I don't think the startd should _ever_ be able to put the job on hold.
I think the startd should return information about why a job is being
thrown off a machine - but even in the case like "failed to exec", it should
be the schedd that decides the fate of the job, with sane defaults.

I agree with this in principle, but in practice, if the sane defaultsare in fact to put the job on hold and then use the hold policy todecide what to do, then it comes down to an implementation detail.


--Dan

Follow-Ups:
- Re: [Condor-devel] proposal: PREEMPT_AND_HOLD
  - From: Derek Wright

References:
- [Condor-devel] proposal: PREEMPT_AND_HOLD
  - From: Dan Bradley
- Re: [Condor-devel] proposal: PREEMPT_AND_HOLD
  - From: Erik Paulson
- Re: [Condor-devel] proposal: PREEMPT_AND_HOLD
  - From: Dan Bradley
- Re: [Condor-devel] proposal: PREEMPT_AND_HOLD
  - From: Erik Paulson

Prev by Date: Re: [Condor-devel] proposal: PREEMPT_AND_HOLD
Next by Date: Re: [Condor-devel] proposal: PREEMPT_AND_HOLD
Previous by thread: Re: [Condor-devel] proposal: PREEMPT_AND_HOLD
Next by thread: Re: [Condor-devel] proposal: PREEMPT_AND_HOLD
Index(es):
- Date
- Thread