Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Defrag shall not preemt jobs
- Date: Mon, 19 Oct 2020 15:40:27 -0500
- From: Mark Coatsworth <coatsworth@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Defrag shall not preemt jobs
Hi Andreas,
Can you try setting MaxJobRetirementTime in your startd to a large
value? That should add to the value set in MaxVacateTime and allow
your partitionable slot to drain the way you want.
Mark
On Mon, Oct 19, 2020 at 4:53 AM Andreas Haupt <andreas.haupt@xxxxxxx> wrote:
>
> Dear list,
>
> just stumbled over an increased job failure rate of ATLAS jobs at our
> site. ATLAS is running a mixture of single-core & multi-core jobs. In
> order to not let multi-core jobs starve, condor_defrag runs.
>
> Looks like condor_defrag is evicting single-core jobs giving them
> MaxVacateTime to come to an end (DEFRAG_DRAINING_SCHEDULE = graceful):
>
> 10/18/20 19:19:53 slot1_2[33437.0]: max vacate time expired. Escalating to a fast shutdown of the job.
> 10/18/20 19:19:53 slot1_1[74229.0]: max vacate time expired. Escalating to a fast shutdown of the job.
>
> However, this is unwanted! It actually kills jobs here.
>
> There's probably a knob for it - but which one do I need to turn to
> just drain the (partitionable) slot until enough resources for the
> usual eight-core jobs are freed (without actively vacating running jobs
> from the chosen system)?
>
> Thanks,
> Andreas
> --
> | Andreas Haupt | E-Mail: andreas.haupt@xxxxxxx
> | DESY Zeuthen | WWW: http://www-zeuthen.desy.de/~ahaupt
> | Platanenallee 6 | Phone: +49/33762/7-7359
> | D-15738 Zeuthen | Fax: +49/33762/7-7216
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
--
Mark Coatsworth
Systems Programmer
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin-Madison