[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] how to restart JobRouter ?



Hmm.  That seems like a bug in the condor_restart tool.  It doesn't know about the JOB_ROUTER apparently.

-tj



From: Stefano Belforte <stefano.belforte@xxxxxxx>
Sent: Monday, March 23, 2026 5:41 AM
To: John M Knoeller <johnkn@xxxxxxxxxxx>; HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: stefano.belforte@xxxxxxx <stefano.belforte@xxxxxxx>
Subject: Re: [HTCondor-users] how to restart JobRouter ?

HI John, about this

On 20/03/2026 21:50, John M Knoeller wrote:


I would be curious to know what was in the MasterLog after you ran "condor_restart -daemon JOB_ROUTER", although it would be more helpful if you first added 

MASTER_DEBUG = $(MASTER_DEBUG) D_CAT D_COMMAND:1

it is as simple as: nothing is logged by master.  It looks like the command
fails early and makes no attempt to talk to anybody:

[root@vocms0137 condor]# condor_restart -daemon JOB_ROUTER
Can't find address for local JOB_ROUTER
Perhaps you need to query another pool.
[root@vocms0137 condor]


While condor_off JOB_ROUTER (which works) produced


03/23/26 10:37:42 (D_COMMAND) Calling Handler <SharedPortEndpoint::HandleListenerAccept> (0)
03/23/26 10:37:42 (D_COMMAND) Return from Handler <SharedPortEndpoint::HandleListenerAccept> 0.000080s
03/23/26 10:37:42 (D_COMMAND) Calling Handler <DaemonCommandProtocol::WaitForSocketData> (1)
03/23/26 10:37:42 (D_COMMAND) Return from Handler <DaemonCommandProtocol::WaitForSocketData> 0.000458s
03/23/26 10:37:42 (D_COMMAND) Calling Handler <DaemonCommandProtocol::WaitForSocketData> (1)
03/23/26 10:37:42 (D_COMMAND) Return from Handler <DaemonCommandProtocol::WaitForSocketData> 0.005092s
03/23/26 10:37:42 (D_COMMAND) Calling Handler <DaemonCommandProtocol::WaitForSocketData> (1)
03/23/26 10:37:42 (D_COMMAND) Return from Handler <DaemonCommandProtocol::WaitForSocketData> 0.000568s
03/23/26 10:37:42 (D_COMMAND) Calling Handler <DaemonCommandProtocol::WaitForSocketData> (1)
03/23/26 10:37:42 (D_COMMAND) Calling HandleReq <admin_command_handler> (0) for command 467 (DAEMON_OFF) from condor@cms <[2001:1458:d00:61::100:435]:23589>
03/23/26 10:37:42 (D_ALWAYS) Handling DAEMON_OFF command for JOB_ROUTER
03/23/26 10:37:42 (D_ALWAYS) Sent SIGTERM to JOB_ROUTER (pid 1630814)
03/23/26 10:37:42 (D_COMMAND) Return from HandleReq <admin_command_handler> (handler: 0.000453s, sec: 0.007s, payload: 0.000s)
03/23/26 10:37:42 (D_COMMAND) Return from Handler <DaemonCommandProtocol::WaitForSocketData> 0.001237s
03/23/26 10:37:43 (D_COMMAND) DaemonCore: pid 1630814 exited with status 0, invoking reaper 1 <Daemons::DefaultReaper()>
03/23/26 10:37:43 (D_ALWAYS) The JOB_ROUTER (pid 1630814) exited with status 0
03/23/26 10:37:43 (D_COMMAND) DaemonCore: return from reaper for pid 1630814