Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] CondorCE with Condor HA setup broke
- Date: Mon, 13 Dec 2021 17:04:49 +0100
- From: Thomas Hartmann <thomas.hartmann@xxxxxxx>
- Subject: [HTCondor-users] CondorCE with Condor HA setup broke
Hi all,
we moved today our Condor LRMS to HA and I stumbled over a problem, that
the CondorCEs had problems with the two heads. Interestingly, I had not
run into the issue on my test cluster as trace jobs to the test CEs
reached their LRMS Condor.
Also on the production cluster setup I had not noticed the issue at
first as trace jobs to the production CondorCEs went through to Condor
and started to run - however, real user jobs failed to get passed
through [1]
I pinned for the moment the CEs' LRMS condor configs to a non-HA single
CONDOR_HOST, which works with the CondorCE config [2,3].
But I am looking now for the proper setup to attach the CondorCEs to the
HA-aware schedulers ð - and why the trace jobs went through while real
jobs failed? Since the trace jobs should also have gone throught the CE
to reach the cluster, or?
Cheers,
Thomas
[1] SchedLog @ grid-htcondorce1.desy.de
12/13/21 16:21:07 Can't find address for startd grid-htcondorce1.desy.de
12/13/21 16:21:07 Can't find address for negotiator
12/13/21 16:21:07 Failed to send RESCHEDULE to unknown daemon:
12/13/21 16:21:07 Job 977401.0 released from hold: Data files spooled
[2] CE sched conf
JOB_ROUTER_SCHEDD2_SPOOL=/var/lib/condor/spool
JOB_ROUTER_SCHEDD2_NAME=$(FULL_HOSTNAME)
JOB_ROUTER_SCHEDD2_POOL=condor01.desy.de:9618
[3]
# CENTRAL_MANAGER1 = condor01.desy.de
# CENTRAL_MANAGER2 = grid-htc-master02.desy.de
#CONDOR_HOST = condor01.desy.de,grid-htc-master02.desy.de
CONDOR_HOST = condor01.desy.de