Hi all, I just updated a centos 6 central manager to 6.7 and condor 8.4, after a reboot I have > 10/07/15 16:31:38 ****************************************************** > 10/07/15 16:31:38 Using config source: /etc/condor/condor_config > 10/07/15 16:31:38 Using local config sources: > 10/07/15 16:31:38 /etc/condor/config.d/10_pool.conf > 10/07/15 16:31:38 /etc/condor/config.d/20_host.conf > 10/07/15 16:31:38 /etc/condor/config.d/30_cron.conf > 10/07/15 16:31:38 /etc/condor/config.d/99_nfy_glidein2.conf > 10/07/15 16:31:38 /dev/null > 10/07/15 16:31:38 config Macros = 113, Sorted = 113, StringBytes = 3711, TablesBytes = 4156 > 10/07/15 16:31:38 CLASSAD_CACHING is ENABLED > 10/07/15 16:31:38 Daemon Log is logging: D_ALWAYS D_ERROR > 10/07/15 16:31:38 SharedPortEndpoint: waiting for connections to named socket 19015_99d8_5 > 10/07/15 16:31:38 DaemonCore: command socket at <144.92.167.251:9619?addrs=144.92.167.251-9619&noUDP&sock=19015_99d8_5> > 10/07/15 16:31:38 DaemonCore: private command socket at <144.92.167.251:9619?addrs=144.92.167.251-9619&noUDP&sock=19015_99d8_5> > 10/07/15 16:31:38 my_popenv failed > 10/07/15 16:31:38 Failed to execute /usr/sbin/condor_starter.std, ignoring > 10/07/15 16:31:38 VM-gahp server reported an internal error ... nothing unusual until > 10/07/15 16:31:38 slot12: Changing activity: Benchmarking -> Idle > 10/07/15 16:31:41 attempt to connect to <144.92.167.251:9618> failed: Connection refused (connect errno = 111). > 10/07/15 16:31:41 ERROR: SECMAN:2003:TCP connection to collector exocet.bmrb.wisc.edu failed. > 10/07/15 16:31:41 Failed to start non-blocking update to <144.92.167.251:9618>. > 10/07/15 16:31:42 attempt to connect to <144.92.167.251:9618> failed: Connection refused (connect errno = 111). > 10/07/15 16:31:42 ERROR: SECMAN:2003:TCP connection to collector exocet.bmrb.wisc.edu failed. > 10/07/15 16:31:42 Failed to start non-blocking update to <144.92.167.251:9618>. and my condor pool is dead: > $ condor_status > Error: communication error > CEDAR:6001:Failed to connect to <144.92.167.251:9618> Thankfully, > # rpm -e --nodeps condor-external-libs > # rpm -e --nodeps condor-classads > # rpm -e --nodeps condor-procd > # yum downgrade condor fixed it, my pool is back to normal. What's special about this box is OSG flocking and shared port: > USE_SHARED_PORT = TRUE > SHARED_PORT_ARGS = -p 9619 > SEC_WRITE_AUTHENTICATION_METHODS = FS, PASSWORD, CLAIMTOBE, GSI, SSL > SEC_DEFAULT_AUTHENTICATION_METHODS = FS, PASSWORD, GSI, SSL > SEC_CLIENT_AUTHENTICATION_METHODS = FS, PASSWORD, GSI, SSL, CLAIMTOBE > SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = True > SEC_DEFAULT_NEGOTIATION = > AUTH_SSL_CLIENT_CADIR = /etc/grid-security/sslcerts > AUTH_SSL_CLIENT_CERTFILE = /etc/grid-security/hostcert.pem > AUTH_SSL_CLIENT_KEYFILE = /etc/grid-security/hostkey.pem > AUTH_SSL_SERVER_CADIR = /etc/grid-security/sslcerts > AUTH_SSL_SERVER_CERTFILE = /etc/grid-security/hostcert.pem > AUTH_SSL_SERVER_KEYFILE = /etc/grid-security/hostkey.pem > FLOCK_INCREMENT=10 > SCHEDD_MAX_FILE_DESCRIPTORS = 102400 > DAGMAN_MAX_JOBS_SUBMITTED= So which of those did you guys break in 8.4.0? -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
Attachment:
signature.asc
Description: OpenPGP digital signature