Hi, We have a problem with our Condor. Firstly, we installed Condor using full-install without shared file system option on two machines. One was chosen as central manager, while the other is a client. Condor version is 6.6.8, and PC platform is Scientific Linux CERN 3. (Scientific Linux CERN 3 is a Linux distribution build within the framework of Scientific Linux which in turn is rebuilt from freely available Red Hat Enterprise Linux 3 product sources under terms and conditions of this product EULA.) After installation we started daemons showing that central manager has all five needed processes (condor_master, condor_collector, condor_negotiator, condor_startd, condor_schedd) and that the client has all three processes (condor_master, condor_startd, condor_schedd). Then, using condor_status, we saw that both machines are active. But, when we tried to run some test (examples) jobs, there was a PROBLEM: Jobs submitted on central manager were executed ONLY on central manager, and jobs submitted on clien machine were NOT executed at all! In NegotiatorLog (on central manager) we saw: 4/15 12:34:18 ---------- Started Negotiation Cycle ---------- 4/15 12:34:18 Phase 1: Obtaining ads from collector ... 4/15 12:34:18 Getting all public ads ... 4/15 12:34:18 Sorting 10 ads ... 4/15 12:34:18 Getting startd private ads ... 4/15 12:34:18 Got ads: 10 public and 4 private 4/15 12:34:18 Public ads include 2 submitter, 4 startd 4/15 12:34:18 Phase 2: Performing accounting ... 4/15 12:34:18 Phase 3: Sorting submitter ads by priority ... 4/15 12:34:18 Phase 4.1: Negotiating with schedds ... 4/15 12:34:18 Negotiating with condor@xxxxxxxxxxxxxxxxxxxxx at <147.91.83.228:32770> 4/15 12:34:20 getpeername failed so connect must have failed 4/15 12:34:49 Connect failed for 30 seconds; returning FALSE 4/15 12:34:49 Failed to connect to <147.91.83.228:32770> 4/15 12:34:49 Error: Ignoring schedd for this cycle On client machine, processes condor_shadow, condor_starter and condor_exec were not active at all, and in StartLog we saw: 4/15 12:28:20 Swap space: 522104 4/15 12:28:20 70405876 kbytes available for "/home/condor/execute" 4/15 12:28:20 Looking up RESERVED_DISK parameter 4/15 12:28:20 Reserving 5120 kbytes for file system 4/15 12:28:20 Disk space: 70400756 4/15 12:28:20 Error on stat(/dev/:0,0xbfffe500), errno = 2(No such file or directory) 4/15 12:28:24 Attempting to send update via UDP to collector <147.91.83.254:9618> 4/15 12:28:24 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False 4/15 12:28:24 vm1: Sent update to 1 collector(s) 4/15 12:28:25 Attempting to send update via UDP to collector <147.91.83.254:9618> 4/15 12:28:25 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False 4/15 12:28:25 vm2: Sent update to 1 collector(s) Command condor_q -analyze for all jobs say: 0 ... 4 match, match, but reject the job for unknown reasons 0 ... What is the problem? There are complete client log files in attachment (ClientLog.zip). Thanks in advance, Dusan Radevic
Attachment:
ClientLog.zip
Description: Zip archive