[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Windows Claim Deactivation



Hello!


With HTCondor
v24.0.6, I am seeing unexpected behavior when attempting to deactivate a claim for a statically-provisioned, Windows Server 2019 execution point with ENABLE_STARTD_DAEMON_ADÂset to False.ÂI have many platform-agnostic executables that specify kill_sig=SIGINT as part of their submission. When migrating to newer versions, removal of claims for Windows execution points stopped working despite the docs stating Windows does not consider kill_sig. Here is some example logging I see:


==> StarterLog.slot1 <==

(pid:1084) Got SIGTERM. Performing graceful shutdown.

(pid:1084) ShutdownGraceful all jobs.

(pid:1084) Send_Signal: ERROR Attempt to send signal 2 to pid 6064, but pid 6064 has no command socket # This is the job's PID

(pid:1084) Send (softkill) signal failed, retrying...


=> StartLog <==

slot1: State change: received VACATE_CLAIM command

slot1: Changing activity: Busy -> Retiring

slot1: State change: claim retirement ended/expired

slotl: Changing state and activity: Claimed/Retiring -> Preempting/Vacating


==> StarterLog.slot1 <==

(pid:1084) Send_Signal: ERROR Attempt to send signal 2 to pid 6064, but pid 6064 has no command socket

(pid:1084) Send (softkill) signal failed twice, hardkill will fire after timeoutÂ


I believe this could be related to PR #665, but I am not sure if it is a misconfiguration.ÂAny help would be greatly appreciated!Â


Let me know if any other logging would be helpful with diagnosing this.Â


Thanks,

T. Rock