Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How to handle hibernated machines failing to wakeup?

Date: Tue, 2 Dec 2025 14:52:04 -0600 (CST)
From: Todd L Miller <tlmiller@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] How to handle hibernated machines failing to wakeup?

(a) let the matchmaker forget about the assignment of the job to that
   machine (there must be a timeout somewhere?) and

I suspect there isn't a memory -- although I could be wrong -- andthat the problem, as you suggest below, is that the unwakeable machine(s)sort to the same position every time, so if you have k jobs and kunwakeable machines, you won't ever wake a machine.

(b) modify the NEGOTIATOR_PRE_JOB_RANK (I suppose this is the right one)
   to reorder Offline machines so this particular one gets ranked down
   /excluded in the next cycle (as long as there are other machines...)

(Could MachineLastMatchTime be used for (b)?


	Probably.

How to balance it against LastHeardFrom which is already used to get even
"wear"?

Assuming your wear-blancing is `+(k * (time() - LastHeardFrom))`,where `k` is a scaling factor depending on what else is inNEGOTIATOR_PRE_JOB_RANK you probably want `-(l * (time() -MachineLastMatchTime))`, where `l` is a (positive, nonzero) constant lessthan `k`, so as not to overwhelm it.

What else comes to mind?)

The wake-up script could record the last (k) time(s) it tried towake up a given machine and set the unwakeable machine's START expressionto FALSE?


-- ToddM

Follow-Ups:
- Re: [HTCondor-users] How to handle hibernated machines failing to wakeup?
  - From: Steffen Grunewald

References:
- [HTCondor-users] How to handle hibernated machines failing to wakeup?
  - From: Steffen Grunewald

Prev by Date: Re: [HTCondor-users] Failed to see htcondor using DNF search.
Next by Date: Re: [HTCondor-users] Dedicated Scheduler Jobs' PATH prefixed by /usr/bin?
Previous by thread: [HTCondor-users] How to handle hibernated machines failing to wakeup?
Next by thread: Re: [HTCondor-users] How to handle hibernated machines failing to wakeup?
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] How to handle hibernated machines failing to wakeup?