[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] how to restart JobRouter ?



To get the PID of the JOB_ROUTER, you can use condor_who..

   condor_who -daemons -quick

will print query the condor_master for a classad describing all of the daemons that it is managing.  This class ad has PID and address and status for each daemon.   

Did you try 

condor_off -daemon JOB_ROUTER ?


I would be curious to know what was in the MasterLog after you ran "condor_restart -daemon JOB_ROUTER", although it would be more helpful if you first added 

MASTER_DEBUG = $(MASTER_DEBUG) D_CAT D_COMMAND:1

to the config and reconfigured the condor_master so that the MasterLog would be sure to record the command arriving. 

If you run "condor_off -daemon JOB_ROUTER" and it succeeds, "condor_who -daemons -quick" should show the status of the JOB_ROUTER as Held.   If condor_off works, I would expect condor_on to work as well.

-tj

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Stefano Belforte via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: Friday, March 20, 2026 11:59 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: Stefano Belforte <stefano.belforte@xxxxxxx>
Subject: Re: [HTCondor-users] how to restart JobRouter ?
 
apparently this works
1) ps auxw|grep -i router
2) note the PID of condor_job_router
3) kill -SIGTERM PID

then condor master restarts it. While a condor_restart, as feared,
made the AP go into a 15min "breath" with all running jobs lost.

Let me know if I should be doing otherwise

Stefano

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/