Hi Gagan,
It is possible to set up to host machines to work together such that if the schedd on one host falls over then the other host will start up the schedd and reconnect with all the corresponding running jobs. This is done with the High
Availability Schedd. Otherwise, there is not currently built in mechanism for an AP to pick up the work of another one if the system has fallen overs.
-Cole Bollig
From: gagan tiwari <gagan.tiwari@xxxxxxxxxxxxxxxxxx>
Sent: Monday, August 14, 2023 11:28 AM To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx> Cc: Cole Bollig <cabollig@xxxxxxxx> Subject: Auto failover between these two schedulers Hi Todd / Cole,
Thanks for pointing that out.
condor_q -global -all did the trick. I am able to get job details from the remote schedd now.
Now, please let me know how to set up failover between these two schedulers. In case one of the submit nodes goes down , all jobs submitted through it should failover to another submit node.
Thanks,
Gagan
On Mon, Aug 14, 2023 at 7:07âPM Cole Bollig via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
|