$CondorVersion: 8.6.10 Mar 12 2018 BuildID: 435200 $
$CondorPlatform: x86_64_RedHat7 $
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS
jfisher CMD: ngspice 4/9 15:17 _ _ 2700 2700 294.0-2699
2700 jobs; 0 completed, 0 removed, 2700 idle, 0 running, 0 held, 0 suspended
Can anyone help? I recently updated my Condor version and I'm now having trouble getting it to work. Caveat â there were other OS (CentOS) packages I updated at the same time.
I have 48 slots all are reported Unclaimed and Idle. This is just a rerun of something that ran ok a few months ago â so I'm a bit lost.
My CollectorLog looks like this:
04/09/18 16:46:07 Got QUERY_STARTD_PVT_ADS
04/09/18 16:46:07 Number of Active Workers 0
04/09/18 16:46:07 (Sending 48 ads in response to query)
04/09/18 16:46:07 Query info: matched=48; skipped=0; query_time=0.001078; send_time=0.005897; type=MachinePrivate; requirements={true}; peer=<192.168.1.206:27405>; projection={}
04/09/18 16:46:07 Number of Active Workers 0
04/09/18 16:46:07 (Sending 52 ads in response to query)
04/09/18 16:46:07 Query info: matched=52; skipped=9; query_time=0.001561; send_time=0.018082; type=Any; requirements={( ( ( MyType == "Scheduler" ) || ( MyType == "Submitter" ) ) || ( ( MyType == "Machine" ) ) )}; peer=<192.168.1.206:28613>; projection={}
04/09/18 16:46:07 DaemonCore: Can't receive command request from 192.168.1.206 (perhaps a timeout?)
192.168.1.206 is the master machine and it's the machine I was using to start the jobs. (It's also the machine I'm writing this email on, so it's definitely available)
MasterLog looks like this:
04/09/18 16:14:26 ******************************************************
04/09/18 16:14:26 ** condor_master (CONDOR_MASTER) STARTING UP
04/09/18 16:14:26 ** /usr/sbin/condor_master
04/09/18 16:14:26 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
04/09/18 16:14:26 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
04/09/18 16:14:26 ** $CondorVersion: 8.6.10 Mar 12 2018 BuildID: 435200 $
04/09/18 16:14:26 ** $CondorPlatform: x86_64_RedHat7 $
04/09/18 16:14:26 ** PID = 1437
04/09/18 16:14:26 ** Log last touched 4/9 15:53:29
04/09/18 16:14:26 ******************************************************
04/09/18 16:14:26 Using config source: /etc/condor/condor_config
04/09/18 16:14:26 Using local config sources:
04/09/18 16:14:26 /etc/condor/config.d/00master.config
04/09/18 16:14:26 /etc/condor/condor_config.local
04/09/18 16:14:26 config Macros = 86, Sorted = 86, StringBytes = 2342, TablesBytes = 3144
04/09/18 16:14:26 CLASSAD_CACHING is OFF
04/09/18 16:14:26 Daemon Log is logging: D_ALWAYS D_ERROR
04/09/18 16:14:30 SharedPortEndpoint: waiting for connections to named socket 1437_3daf
04/09/18 16:14:30 SharedPortEndpoint: failed to open /var/lock/condor/shared_port_ad: No such file or directory
04/09/18 16:14:30 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s.
04/09/18 16:14:30 DaemonCore: private command socket at <192.168.1.206:0?sock=1437_3daf>
04/09/18 16:14:30 Master restart (GRACEFUL) is watching /usr/sbin/condor_master (mtime:1520895141)
04/09/18 16:14:31 Started DaemonCore process "/usr/libexec/condor/condor_shared_port", pid and pgroup = 1926
04/09/18 16:14:31 Waiting for /var/lock/condor/shared_port_ad to appear.
04/09/18 16:14:32 Found /var/lock/condor/shared_port_ad.
04/09/18 16:14:33 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 1979
04/09/18 16:14:33 Waiting for /var/log/condor/.collector_address to appear.
04/09/18 16:14:34 Waiting for /var/log/condor/.collector_address to appear.
04/09/18 16:14:35 Found /var/log/condor/.collector_address.
04/09/18 16:14:36 Started DaemonCore process "/usr/sbin/condor_negotiator", pid and pgroup = 1987
04/09/18 16:14:37 Started DaemonCore process "/usr/sbin/condor_schedd", pid and pgroup = 1990
I can't restart Condor:
sudo condor_restart
ERROR
SECMAN:2010:Received "DENIED" from server for user unauthenticated@unmapped using no authentication method, which may imply host-based security. Our address was '192.168.1.206', and server's address was '192.168.1.206'. Check your ALLOW settings and IP protocols.
Can't send Restart command to local master
I don't know what an ALLOW setting is and Google doesn't point me to anything useful so far, unless it's this from condor_config:
ALLOW_NEGOTIATOR = 192.168.1.206
ALLOW_NEGOTIATOR_SCHEDD = 192.168.1.206
SharedPortLog looks like this:
04/09/18 16:39:31 About to update statistics in shared_port daemon ad file at /var/lock/condor/shared_port_ad :
ForkedChildrenPeak = 0
RequestsBlocked = 0
ForkedChildrenCurrent = 0
RequestsSucceeded = 94
RequestsPendingPeak = 3
RequestsPendingCurrent = 0
RequestsFailed = 0
SharedPortCommandSinfuls = "<192.168.1.206:9618>,<[::1]:9618>"
MyAddress = "<192.168.1.206:9618?addrs=192.168.1.206-9618+[--1]-9618&noUDP>"
MatchLog
04/09/18 16:43:07 Rejected 295.0 group_ANALOG.jfisher@xxxxxxxxxxxxxx <192.168.1.206:9618?addrs=192.168.1.206-9618+[--1]-9618&noUDP&sock=1437_3daf_4>: no match found
04/09/18 16:44:07 Rejected 295.0 group_ANALOG.jfisher@xxxxxxxxxxxxxx <192.168.1.206:9618?addrs=192.168.1.206-9618+[--1]-9618&noUDP&sock=1437_3daf_4>: no match found
SchedLog
04/09/18 16:44:07 (pid:1990) Activity on stashed negotiator socket: <192.168.1.206:26537>
04/09/18 16:44:07 (pid:1990) Using negotiation protocol: NEGOTIATE
04/09/18 16:44:07 (pid:1990) Negotiating for owner: group_ANALOG.jfisher@xxxxxxxxxxxxxx
04/09/18 16:44:07 (pid:1990) Finished negotiating for group_ANALOG.jfisher in local pool: 0 matched, 1 rejected
NegotiatorLog
04/09/18 16:45:07 ---------- Started Negotiation Cycle ----------
04/09/18 16:45:07 Phase 1: Obtaining ads from collector ...
04/09/18 16:45:07 Getting startd private ads ...
04/09/18 16:45:07 Getting Scheduler, Submitter and Machine ads ...
04/09/18 16:45:07 Sorting 52 ads ...
04/09/18 16:45:07 Got ads: 52 public and 48 private
04/09/18 16:45:07 Public ads include 1 submitter, 48 startd
04/09/18 16:45:07 Phase 2: Performing accounting ...
04/09/18 16:45:07 Phase 3: Sorting submitter ads by priority ...
04/09/18 16:45:07 Phase 4.1: Negotiating with schedds ...
04/09/18 16:45:07 Negotiating with group_ANALOG.jfisher@xxxxxxxxxxxxxx at <192.168.1.206:9618?addrs=192.168.1.206-9618+[--1]-9618&noUDP&sock=1437_3daf_4>
04/09/18 16:45:07 0 seconds so far for this submitter
04/09/18 16:45:07 0 seconds so far for this schedd
04/09/18 16:45:07 Got NO_MORE_JOBS; schedd has no more requests
04/09/18 16:45:07 Request 00295.00000: autocluster 1 (request count 1 of 2700)
04/09/18 16:45:07 Rejected 295.0 group_ANALOG.jfisher@xxxxxxxxxxxxxx <192.168.1.206:9618?addrs=192.168.1.206-9618+[--1]-9618&noUDP&sock=1437_3daf_4>: no match found
04/09/18 16:45:07 Got NO_MORE_JOBS; schedd has no more requests
04/09/18 16:45:07 negotiateWithGroup resources used scheddAds length 0
04/09/18 16:45:07 ---------- Finished Negotiation Cycle ----------