Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] power management: ROOSTER_UNHIBERNATE not working
- Date: Wed, 15 Nov 2023 20:03:50 +0100 (CET)
- From: "Beyer, Christoph" <christoph.beyer@xxxxxxx>
- Subject: Re: [HTCondor-users] power management: ROOSTER_UNHIBERNATE not working
Hi Justin,
the classadd of the hibernated machine needs to contain Offline = true
what does `condor_status <hibernated-host> -af Offline` say ?
The 2nd factor for unhibernating is that there needs to be a match for the machine:
MachineLastMatchTime =!= UNDEFINED
Best
christoph
--
Christoph Beyer
DESY Hamburg
IT-Department
Notkestr. 85
Building 02b, Room 009
22607 Hamburg
phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx
----- UrsprÃngliche Mail -----
Von: "Justin Killebrew via HTCondor-users" <htcondor-users@xxxxxxxxxxx>
An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
CC: "Justin Killebrew" <jk@xxxxxxx>
Gesendet: Mittwoch, 15. November 2023 16:58:54
Betreff: [HTCondor-users] power management: ROOSTER_UNHIBERNATE not working
Hello.
My test machine, bench7, is hibernating as configured but when I submit jobs that should match it, the rooster doesnât try to unhibernate.
Relevant excerpts:
RoosterLog:
11/15/23 06:09:46 Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS
11/15/23 06:09:46 Will perform unhibernate checks every ROOSTER_INTERVAL=180 seconds.
11/15/23 09:42:12 Cock-a-doodle-doo! (Time to look for machines to wake up.)
11/15/23 09:42:12 Trying to query collector <127.0.1.1:9618?alias=bench12.timehole.org>
11/15/23 09:42:12 Got 0 startd ads matching ROOSTER_UNHIBERNATE=Offline
PersistentAdLog:
103 <slot1@xxxxxxxxxxxxxxxxxxx> Offline true
bench7 config:
# Power management
HIBERNATE_CHECK_INTERVAL = 300
# (2 * $(HOUR))
TimeToWait = 300
ShouldHibernate = ( (State == "Unclaimed") \
&& ($(StateTimer) > $(TimeToWait)) \
&& (KeyboardIdle > $(TimeToWait)))
# this param is passed to the script so use the string "S5"
HibernateState = "S5"
#
HIBERNATE = ifThenElse( $(ShouldHibernate), $(HibernateState), "NONE" )
# point to my hibernation script
use HIBERNATION_PLUGIN = "/home/justin/jkcode/scripts/JKSuspend.sh"
CM config:
COLLECTOR_PERSISTENT_AD_LOG = /var/log/condor/PersistentAdLog
ABSENT_REQUIREMENTS = ( (HibernationLevel?:0) == 0 )
EXPIRE_INVALIDATED_ADS = True
CLASSAD_LIFETIME = 900
# 604800s is 7 days
ABSENT_EXPIRE_ADS_AFTER = 604800
OFFLINE_EXPIRE_ADS_AFTER = 604800
ROOSTER_INTERVAL = 180
ROOSTER_DEBUG = D_FULLDEBUG
ROOSTER_UNHIBERNATE = Offline
Is this the problem:
11/15/23 09:42:12 Got 0 startd ads matching ROOSTER_UNHIBERNATE=Offline
How do I troubleshoot and fix this?
Thanks,
JK
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/