[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Error with condor_power



On Wed, 2026-03-11 at 11:40 +0100, Beyer, Christoph wrote:
> Hi,
> 
> the problem most likely here is that once the machine powers down it sends a last classadd update overwriting the previous offline state. That is a known issue but not yet fixed to my knowledge (?)
> 
> Try setting the shutdown script on the worker to: 
> 
> [root@batch1064 ~]# grep -i kill /etc/systemd/system/condor.service.d/01-condor-basic-overwrites.conf
> # send sigkill instead of sigterm
> KillSignal=SIGKILL
> 
> (SIGKILL instead of SIGSTOP)
> 
> (this is on RH like systems you will need to find the equivalent script on unbuntu-like systems ...) 
> 
> Not pretty but will preserver the offline state ... 
> 
> Best
> christoph
>  

Hi Christoph,
that worked, I have set KillSignal=SIGKILL:

Matched 66.0 sel@xxxxxxxx <10.10.0.47:9618?addrs=10.10.0.47-
9618&alias=t450.sel&noUDP&sock=schedd_997_7003> preempting none
<10.10.0.49:9618?addrs=10.10.0.49-
9618&alias=master03.sel&noUDP&sock=startd_1031_c23b> slot2@xxxxxxxxxxxx
(offline)
Successfully matched with slot2@xxxxxxxxxxxx (offline)
Job 66.0 (delivered=1) matched to offline machine slot2@xxxxxxxxxxxxx
Got 1 startd ads matching ROOSTER_UNHIBERNATE=Offline && Unhibernate
Sending wakeup call to slot2@xxxxxxxxxxxxx


Thank you everyone, this is for me a big step forward.