Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Failed to run hibernation plugin
- Date: Wed, 8 Nov 2023 22:12:36 +0100 (CET)
- From: "Beyer, Christoph" <christoph.beyer@xxxxxxx>
- Subject: Re: [HTCondor-users] Failed to run hibernation plugin
Hi,
easiest thing is replacing the powerplugin with something that is proven to work with your setup, we use this in the essence:
(begin)
if [[ $1 == ad ]]
then
echo "HibernationMethod = \"DESY-utils\""
HibernationMethod="DESY-utils"
echo "HibernationRawMask = 8"
HibernationRawMask="8"
echo "HibernationSupportedStates = \"S5\""
HibernationSupportedStates="S5"
fi
if [[ $@ == "set S5" ]]
then
sudo /sbin/poweroff
fi
(end)
Use HIBERNATION_PLUGIN = [ ... ]
To point to your replacement script ...
best
christoph
--
Christoph Beyer
DESY Hamburg
IT-Department
Notkestr. 85
Building 02b, Room 009
22607 Hamburg
phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx
----- UrsprÃngliche Mail -----
Von: "Justin Killebrew via HTCondor-users" <htcondor-users@xxxxxxxxxxx>
An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
CC: "Justin Killebrew" <jk@xxxxxxx>
Gesendet: Mittwoch, 8. November 2023 15:24:14
Betreff: [HTCondor-users] Failed to run hibernation plugin
Hello. Iâm wrestling with power management on Ubuntu 22.04. The execution point StartLog shows this error:
11/08/23 08:30:44 ResMgr: This machine is about to enter hibernation
11/08/23 08:30:44 Failed to run hibernation plugin '/usr/libexec/condor/condor_power_state set S3': status = 63744
Hibernation is supported:
/usr/libexec/condor$ sudo ./condor_power_state ad
HibernationMethod = "/sys"
HibernationRawMask = 28
HibernationSupportedStates = "S3,S4,S5"
I can suspend/hibernate from the command line using:
$ sudo systemctl hibernate
But condor_power_state fails for both S3 and S4:
/usr/libexec/condor$ sudo ./condor_power_state -d set s4
11/08/23 08:24:55 LinuxHibernator: Error writing 'disk' to '/sys/power/state': Input/output error
condor_power_state: failed to switch the machine's power state.
Hereâs the relevant portion of the EP config:
# Power management
WOL_SUPPORTED = TRUE
HIBERNATE_CHECK_INTERVAL = 20
TimeToWait = 120
ShouldHibernate = ( (State == "Unclaimed") \
&& ($(StateTimer) > $(TimeToWait)) \
&& ($(WOL_SUPPORTED)))
HibernateState = "RAM"
HIBERNATE = ifThenElse( $(ShouldHibernate), $(HibernateState), "NONE" )
The central manager seems to be correct:
RoosterLog:
11/08/23 06:21:00 Will perform unhibernate checks every ROOSTER_INTERVAL=180 seconds.
And the relevant CM config:
# Rooster wakes nodes up
DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, ROOSTER, SHARED_PORT
COLLECTOR_PERSISTENT_AD_LOG = /var/log/condor/PersistentAdLog
ABSENT_REQUIREMENTS = ( (HibernationLevel?:0) == 0 )
EXPIRE_INVALIDATED_ADS = True
CLASSAD_LIFETIME = 900
# 604800s is 7 days
ABSENT_EXPIRE_ADS_AFTER = 604800
OFFLINE_EXPIRE_ADS_AFTER = 604800
ROOSTER_INTERVAL = 180
ROOSTER_UNHIBERNATE = ( Offline && Unhibernate )
ROOSTER_UNHIBERNATE_RANK = buf_cpuindex_avg
How do I debug condor_power_state? Does condor_power_state support the "systemctl hibernateâ method?
Thanks,
JK
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/