Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Struggling with Condor Flocking
- Date: Tue, 21 Jun 2005 09:21:23 -0700
- From: Rajesh Rajamani <raj@xxxxxxxxxx>
- Subject: Re: [Condor-users] Struggling with Condor Flocking
Sai,
I think FLOCK_FROM on cm1.mygrid.com must be
FLOCK_FROM = *.mygrid.com
and NOT
FLOCK_FROM = *.hclgrid.com
Please also check that the FLOCK_TO variable is set in the condor config
files of the Schedd Machine.
Also, please make sure that the HOSTALLOW* entries in the config files
are set appropriately. For details, consult
http://www.cs.wisc.edu/condor/manual/v6.7/5_2Connecting_Condor.html
Let me know if that helps.
--
Rajesh Rajamani
Senior Member of Technical Staff
Direct : +1.408.321.9000
Fax : +1.408.904.5992
Mobile : +1.408.321.9030
raj@xxxxxxxxxx
Optena Corporation
2860 Zanker Road, Suite 201
San Jose, CA 95134
www.optena.com
This electronic transmission (and any attached documents) contains
information from Optena Corporation and is for the sole use of the
individual or entity it is addressed to. If you receive this message in
error, please notify me and destroy the attached message (and all
attached documents) immediately.
Aln Sai Srinivas - CTD, Chennai wrote:
Hi
I'm trying flocking between condor pools. I'm getting an error in
CollectorLog "DC_AUTHENTICATE: attempt to open invalid session
cm.mygrid:27485:1119363742:5, failing" where cm is the host name of
CentralManager of a pool that flocks to.
And flocking never happened.
Here is the scenario...
I'm using Redhat Linux and condor 6.6.9
There are two central managers cm.mygrid.com and cm1.mygrid.com represent
two condor polls respectively..
condor_config is on shared file system.
Here is the configuration
======================================================================
$LOCAL_DIR/condor_config.local for cm.mygrid.com
=======================================================================
COLLECTOR = $(SBIN)/condor_collector
DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, STARTD, SCHEDD
COLLECTOR_NAME = Collector at alpha
CONTINUE = True
FILESYSTEM_DOMAIN = cm.mygrid.com
FLOCK_FROM = *.hclgrid.com
FLOCK_TO = cm1.mygrid.com
PREEMPT = FALSE
SUSPEND = FALSE
LOCK = /tmp/condor-lock.$(HOSTNAME)0.885447545050742
UID_DOMAIN = cm.mygrid.com
NEGOTIATOR = $(SBIN)/condor_negotiator
VACATE = FALSE
CONDOR_ADMIN = root@xxxxxxxxxxxxx
START = TRUE
MAIL = /bin/mail
CONDOR_IDS = 504.504
RELEASE_DIR = /usr/local/condor
CONDOR_HOST = cm.mygrid.com
LOCAL_DIR = /usr/local/condor/local.$(HOSTNAME)
======================================================================
$LOCAL_DIR/condor_config.local for cm1.mygrid.com
=======================================================================
COLLECTOR = $(SBIN)/condor_collector
DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, STARTD, SCHEDD
COLLECTOR_NAME = Collector at microgrid
CONTINUE = True
FILESYSTEM_DOMAIN = cm1.mygrid.com
FLOCK_FROM = *.hclgrid.com
FLOCK_TO = Not defined
PREEMPT = FALSE
SUSPEND = FALSE
LOCK = /tmp/condor-lock.$(HOSTNAME)0.505710791637288
UID_DOMAIN = cm1.mygrid.com
NEGOTIATOR = $(SBIN)/condor_negotiator
VACATE = FALSE
CONDOR_ADMIN = root@xxxxxxxxxxxxxx
START = TRUE
MAIL = /bin/mail
CONDOR_IDS = 504.504
RELEASE_DIR = /usr/local/condor
CONDOR_HOST = cm1.mygrid.com
LOCAL_DIR = /usr/local/condor/local.$(HOSTNAME)
============================================================================
===
CollectorLog at cm1.mygrid.com
============================================================================
=====
6/21 20:03:42 WARNING: No master ad for < cm.mygrid.com >
6/21 20:03:42 ScheddAd : Inserting ** "< cm.mygrid.com , 10.100.207.10 >"
6/21 20:03:42 stats: Inserting new hashent for
'Schedd':'cm.mygrid.com':'10.100.207.10'
6/21 20:03:42 SubmittorAd : Inserting ** "< condor@xxxxxxxxxxxxx ,
10.100.207.10 >"
6/21 20:03:42 stats: Inserting new hashent for
'Submittor':'condor@xxxxxxxxxxxxx':'10.100.207.10'
6/21 20:04:09 Got QUERY_STARTD_ADS
6/21 20:04:09 (Sent 1 ads in response to query)
6/21 20:06:33 (Sent 5 ads in response to query)
6/21 20:06:33 Got QUERY_STARTD_PVT_ADS
6/21 20:06:33 (Sent 1 ads in response to query)
============================================================================
===
CollectorLog at cm.mygrid.com
============================================================================
=====
6/21 20:02:21 DC_AUTHENTICATE: attempt to open invalid session
alpha:27485:1119363742:5, failing.
6/21 20:03:21 SubmittorAd : Inserting ** "< condor@xxxxxxxxxxxxx ,
10.100.207.10 >"
6/21 20:03:21 stats: Inserting new hashent for
'Submittor':'condor@xxxxxxxxxxxxx':'10.100.207.10'
6/21 20:03:21 (Sent 4 ads in response to query)
6/21 20:03:21 Got QUERY_STARTD_PVT_ADS
6/21 20:03:21 (Sent 1 ads in response to query)
6/21 20:03:41 (Sent 4 ads in response to query)
6/21 20:03:41 Got QUERY_STARTD_PVT_ADS
6/21 20:03:41 (Sent 1 ads in response to query)
6/21 20:03:49 DC_AUTHENTICATE: attempt to open invalid session
alpha:27485:1119363829:6, failing.
============================================================================
===
ScheddLog at cm.mygrid.com
============================================================================
=====
6/21 20:03:20 DaemonCore: Command received via UDP from host
<10.100.207.10:32990>
6/21 20:03:20 DaemonCore: received command 421 (RESCHEDULE), calling handler
(reschedule_negotiator)
6/21 20:03:21 Sent ad to central manager for condor@xxxxxxxxxxxxx
6/21 20:03:21 Called reschedule_negotiator()
6/21 20:03:21 DaemonCore: Command received via TCP from host
<10.100.207.10:41493>
6/21 20:03:21 DaemonCore: received command 416 (NEGOTIATE), calling handler
(negotiate)
6/21 20:03:21 Negotiating for owner: condor@xxxxxxxxxxxxx
6/21 20:03:21 Checking consistency running and runnable jobs
6/21 20:03:21 Tables are consistent
6/21 20:03:21 Out of jobs - 1 jobs matched, 0 jobs idle, flock level = 0
6/21 20:03:21 DaemonCore: Command received via UDP from host
<10.100.207.10:32991>
6/21 20:03:21 DaemonCore: received command 421 (RESCHEDULE), calling handler
(reschedule_negotiator)
6/21 20:03:21 Called reschedule_negotiator()
6/21 20:03:23 Started shadow for job 558.0 on "<10.100.207.10:41475>",
(shadow pid = 27559)
6/21 20:03:25 Sent ad to central manager for condor@xxxxxxxxxxxxx
6/21 20:03:42 Activity on stashed negotiator socket
6/21 20:03:42 Negotiating for owner: condor@xxxxxxxxxxxxx
6/21 20:03:42 Checking consistency running and runnable jobs
6/21 20:03:42 Tables are consistent
6/21 20:03:42 Out of servers - 0 jobs matched, 1 jobs idle, 1 jobs rejected
6/21 20:03:42 Increasing flock level for condor@xxxxxxxxxxxxx to 1.
6/21 20:03:42 Sent ad to central manager for condor@xxxxxxxxxxxxx
6/21 20:04:34 Shadow pid 27559 for job 558.0 exited with status 100
6/21 20:04:34 Started shadow for job 559.0 on "<10.100.207.10:41475>",
(shadow pid = 27569)
Could you plz help me where I'm missing..?
Regards
Sai
DISCLAIMER
This message and any attachment(s) contained here are information that is confidential, proprietary to HCL Technologies
and its customers. Contents may be privileged or otherwise protected by law. The information is solely intended for the
individual or the entity it is addressed to. If you are not the intended recipient of this message, you are not authorized to
read, forward, print, retain, copy or disseminate this message or any part of it. If you have received this e-mail in error,
please notify the sender immediately by return e-mail and delete it from your computer
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users