[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] offline compute nodes and Rooster




On 10/17/10 10:48 AM, Paul Haldane wrote:
2. Hacked together a script using condor_advertise to publish ADS for offline machines.  This works and with the sensible setting for ROOSTER_UNHIBERNATE leads to hibernating machines being woken up by Rooster to service jobs.   Remaining problem was that the ADS disappeared after about 20 minutes.  Bit more poking around took me back to Ian's message to the list (https://lists.cs.wisc.edu/archive/condor-users/2010-January/msg00148.shtml).  Adding ClassAdLifetime to the published AD seems to have done the trick (at least the test machine has stayed visible for over 25 minutes).
I've just looked at the implementation of OFFLINE_EXPIRE_ADS_AFTER.  
Strangely, it only has any effect if the ad is advertised via the 
command UPDATE_STARTD_AD_WITH_ACK and Offline is not set to true in the 
ad that is sent to the collector.  The collector then sets Offline=true 
and overrides a bunch of other stuff too, including ClassAdLifetime.  In 
all other cases, ClassAdLifetime is just preserved as is in the ad.
This certainly doesn't match the documented behavior.  I'm looking into 
what should be done about it.
--Dan