Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Need help debugging HIBERNATE
- Date: Mon, 17 Nov 2025 11:29:56 +0100
- From: Steffen Grunewald <steffen.grunewald@xxxxxxxxxx>
- Subject: [HTCondor-users] Need help debugging HIBERNATE
Good morning,
I'm in the middle of some tests to "hibernate" (which might include going
to full S5 state) and "wakeup" HTCondor EPs, and need some suggestions to
fill in what I feel to be gaps in the documentation.
To start with, I have the following in the config (I'm dropping ROOSTER
stuff and intermediate definitions for clarity):
# condor_config_val -dump -expand | grep -i Hiber
HIBERNATE = ifThenElse(( (State == "Unclaimed") && ( ((time() - EnteredCurrentState) > (30 * 60)) || ((time() - NumDynamicSlotsTime) > (30 * 60)) ) && ((time() - DaemonStartTime) > (6 * 3600)) ), "SHUTDOWN", "NONE")
HIBERNATE_CHECK_INTERVAL = (5 * 60)
HIBERNATION_OVERRIDE_WOL = True
LINUX_HIBERNATION_METHOD = "/sys"
In short, I'd like to leave a machine on and running for at least 6 hours
after it was powered up (which would set the DaemonStartTime), and also
for 30 minutes after becoming fully Unclaimed. This expression evaluates
to True for a few machines - but I cannot see anything happening in the
STARTD log (nor on the central manager).
I don't have pm-suspend installed but would like to use systemd's features,
and I'm in serious doubt whether WOL would work with my hardware.
Running
# condor_status -f "%s:" Machine -af State 'ifThenElse(((State == "Unclaimed") && (((time()-EnteredCurrentState) > (30*60)) || ((time()-NumDynamicSlotsTime) > (30*60))) && ((time()-DaemonStartTime) > (6*3600))), "SHUTDOWN", "NONE")' | dshbak -c
indeed shows some nodes with "SHUTDOWN" - they've been unused for a couple
of days now.
I'm wondering whether the HIBERNATE_CHECK is run at all, as I couldn't find
any hint in the STARTD logs (with the exception of
"HibernationManager: Hibernation is enabled" of course).
So question #1 is: What *_DEBUG setting do I need to see the check happening,
and its outcome - without getting flooded (which D_FULLDEBUG would be doing)?
And where exactly to look for what, if not in the StartdLog?
(#2 will be added soon, I'm afraid.)
Thanks,
Steffen