[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Condor Problem with communication between SchedLog and SharedLog Process



Hello,

In order to work on the migration from CentOS 7 to RHEL9 like, I set up a model with 3 servers. It is the same model that I already have on CentOS7. It's running well in this configuration on CentOS 7


My configuration on AL9 is :

 A master scheduler : clrarcce03.in2p3.fr ( 134.158.121.105 )
Name:   clrarcce03.in2p3.fr
Address: 134.158.121.105
Name:   clrarcce03.in2p3.fr
Address: 2001:660:5104:134:134:158:121:105


condor      5394       1  0 16:02 ?        00:00:00 /usr/sbin/condor_master -f
root        5438    5394  0 16:02 ?        00:00:00 condor_procd -A /var/run/condor/procd_pipe -L /var/log/condor/ProcLog -R 1000000 -S 60 -C 990
condor      5439    5394  0 16:02 ?        00:00:00 condor_shared_port
condor      5440    5394  0 16:02 ?        00:00:00 condor_schedd


 
 A Manager :  clrhtcmgtb.in2p3.fr
Name:   clrhtcmgtb.in2p3.fr
Address: 134.158.121.108
Name:   clrhtcmgtb.in2p3.fr
Address: 2001:660:5104:134:134:158:121:108



root@clrhtcmgtb condor]# ps -ef | grep condor
condor      3033       1  0 16:16 ?        00:00:00 /usr/sbin/condor_master -f
root        3082    3033  0 16:16 ?        00:00:00 condor_procd -A /var/run/condor/procd_pipe -L /var/log/condor/ProcLog -R 1000000 -S 60 -C 991
condor      3083    3033  0 16:16 ?        00:00:00 condor_shared_port
condor      3084    3033  0 16:16 ?        00:00:00 condor_collector
condor      3091    3033  0 16:16 ?        00:00:00 condor_negotiator
condor      3092    3033  0 16:16 ?        00:00:00 condor_schedd

 A compute node : clrwn001
Name:   clrwn001.in2p3.fr
Address: 134.158.123.1
Name:   clrwn001.in2p3.fr
Address: 2001:660:5104:134:134:158:123:1


I have the following message in the scheduler logs on the  node clrarcce03 :

07/05/24 17:05:38 (pid:5440) Match record (slot1@xxxxxxxxxxxxxxxxx <134.158.123.1:9618?addrs=134.158.123.1-9618+[2001-660-5104-134-134-158-123-1]-9618&alias=clrwn001.in2p3.fr&noUDP&sock=startd_2920_74a2> for atlas001, 8.0) deleted
07/05/24 17:06:38 (pid:5440) Activity on stashed negotiator socket: <134.158.121.108:6099>
07/05/24 17:06:38 (pid:5440) Using negotiation protocol: NEGOTIATE
07/05/24 17:06:38 (pid:5440) Negotiating for owner: atlas001@xxxxxxxxxxxxxxxxxxxxxx
07/05/24 17:06:38 (pid:5440) Finished sending rrls to negotiator
07/05/24 17:06:38 (pid:5440) Finished sending RRL for atlas001
07/05/24 17:06:38 (pid:5440) Activity on stashed negotiator socket: <134.158.121.108:6099>
07/05/24 17:06:38 (pid:5440) Using negotiation protocol: NEGOTIATE
07/05/24 17:06:38 (pid:5440) Negotiating for owner: atlas001@xxxxxxxxxxxxxxxxxxxxxx
07/05/24 17:06:38 (pid:5440) SECMAN: removing lingering non-negotiated security session <134.158.123.1:9618?addrs=134.158.123.1-9618+[2001-660-5104-134-134-158-123-1]-9618&alias=clrwn001.in2p3.fr&noUDP&sock=startd_2920_74a2>#1720191447#1 because it conflicts with new request
07/05/24 17:06:38 (pid:5440) Negotiation ended: 1 jobs matched
07/05/24 17:06:38 (pid:5440) Finished negotiating for atlas001 in local pool: 1 matched, 0 rejected
07/05/24 17:06:38 (pid:5440) attempt to connect to <134.158.123.1:9618> failed: No route to host (connect errno = 113).
07/05/24 17:06:38 (pid:5440) Failed to send REQUEST_CLAIM to startd slot1@xxxxxxxxxxxxxxxxx <134.158.123.1:9618?addrs=134.158.123.1-9618+[2001-660-5104-134-134-158-123-1]-9618&alias=clrwn001.in2p3.fr&noUDP&sock=startd_2920_74a2> for atlas001: SECMAN:2003:TCP connection to startd slot1@xxxxxxxxxxxxxxxxx <134.158.123.1:9618?addrs=134.158.123.1-9618+[2001-660-5104-134-134-158-123-1]-9618&alias=clrwn001.in2p3.fr&noUDP&sock=startd_2920_74a2> for atlas001 failed.
07/05/24 17:06:38 (pid:5440) Match record (slot1@xxxxxxxxxxxxxxxxx <134.158.123.1:9618?addrs=134.158.123.1-9618+[2001-660-5104-134-134-158-123-1]-9618&alias=clrwn001.in2p3.fr&noUDP&sock=startd_2920_74a2> for atlas001, 8.0) deleted




It seems that my test job match with the node clrwn001, but this "match' was immediatdly remove. I don't have any firewall enable.

Any idea are welcome

Best Regards

Jean-Claude


------------------------------------------------------------------------
Jean-Claude Chevaleyre < Jean-Claude.Chevaleyre(at)clermont.in2p3.fr > 
Laboratoire de Physique Clermont
Campus Universitaire des CÃzeaux
4 Avenue Blaise Pascal
TSA 60026
CS 60026
63178 AubiÃre Cedex

Tel : 04 73 40 73 60

-------------------------------------------------------------------------