Hi,
I employed a condor pool with two machine.
The version of condor is 7.6.7 and the OS is fedora14.
When I use condor to run a workflow,it appears wrong as follows.
Startlog
5/15/12 22:17:20 Output file: /home/condor/localcondor/execute/dir_8051/_condor_stdout
05/15/12 22:17:20 Error file: /home/condor/localcondor/execute/dir_8051/_condor_stderr
05/15/12 22:17:20 About to exec /home/condor/localcondor/execute/dir_8051/condor_exec.exe
05/15/12 22:17:20 Create_Process succeeded, pid=8053
05/15/12 22:17:20 Process exited, pid=8053, status=1
05/15/12 22:17:20 ReliSock::put_file_with_permissions(): Failed to stat file '/home/condor/localcondor/execute/dir_8051/diff.000004.000008.fits': No such file or directory (errno: 2, si_error: 1)
05/15/12 22:17:20 DoUpload: (Condo!
r error code 13, subcode 2) STARTER at 192.168.1.105 failed to send file(s) to <192.168.1.105:38037>: error reading from /home/condor/localcondor/execute/dir_8051/diff.000004.000008.fits: (errno 2) No such file or directory; SHADOW failed to receive file(s) from <192.168.1.105:55967>
05/15/12 22:17:20 JICShadow::notifyJobTermination(): Sending mock terminate event.
05/15/12 22:17:20 JIC::transferOutput() failed, waiting for job lease to expire or for a reconnect attempt
05/15/12 22:17:20 Returning from CStarter::JobReaper()
05/15/12 22:17:20 Got SIGQUIT. Performing fast shutdown.
05/15/12 22:17:20 ShutdownFast all jobs.
05/15/12 22:17:20 condor_read() failed: recv() returned -1, errno = 104 Connection reset by peer, reading 5 bytes from <192.168.1.105:36233>.
05/15/12 22:17:20 IO: Failed to read packet header
05/15/12 22:17:20 condor_write(): Socket closed when trying to write 97 bytes to <192.168.1.105:36233>, fd is 6
05/15/12 22:17:20 Buf::write(): condor_write() failed
05/15/12 22
:17:20 Failed to send job exit status to shadow
05/15/12 22:17:20 JobExit() failed, waiting for job lease to expire or for a reconnect attempt
05/15/12 22:17:40 Got SIGTERM. Performing graceful shutdown.
05/15/12 22:17:40 ShutdownGraceful all jobs.
05/15/12 22:17:40 condor_write(): Socket closed when trying to write 97 bytes to <192.168.1.105:36233>, fd is 6
05/15/12 22:17:40 Buf::write(): condor_write() failed
05/15/12 22:17:40 Failed to send job exit status to shadow
05/15/12 22:17:40 JobExit() failed, waiting for job lease to expire or for a reconnect attempt
05/15/12 22:17:40 **** condor_starter (condor_STARTER) pid 8051 EXITING WITH STATUS 0
Matchlog
<192.168.1.105:55934> preempting none <192.168.1.106:45394> xuwei.shanda.com
05/15/12 22:15:59 Matched 114.0 condor@xxxxxxxxxx <192.168.1.105:55934> preempting none <192.168.1.105:49!
373> yang.shanda.com
05/15/12 22:15:59 Rejected 115.0 condor@xxxxxxxxxx <192.168.1.105:55934>: no match found
05/15/12 22:15:59 Rejected 108.0 condor@xxxxxxxxxx <192.168.1.105:55934>: no match found
05/15/12 22:16:19 Rejected 118.0 condor@xxxxxxxxxx <192.168.1.105:55934>: no match found
05/15/12 22:16:19 Rejected 108.0 condor@xxxxxxxxxx <192.168.1.105:55934>: no match found
05/15/12 22:17:19 Matched 115.0 condor@xxxxxxxxxx <192.168.1.105:55934> preempting none <192.168.1.106:45394> xuwei.shanda.com
05/15/12 22:17:19  !
; Matched 116.0 condor@xxxxxxxxxx
a> <192.168.1.105:55934> preempting none <192.168.1.105:49373> yang.shanda.com
05/15/12 22:17:19 Rejected 118.0 condor@xxxxxxxxxx <192.168.1.105:55934>: no match found
05/15/12 22:17:19 Rejected 108.0 condor@xxxxxxxxxx <192.168.1.105:55934>: no match found
NegotiatorLog
05/15/12 22:17:19 ---------- Started Negotiation Cycle ----------
05/15/12 22:17:19 Phase 1: Obtaining ads from collector ...
05/15/12 22:17:19 Getting all public ads ...
05/15/12 22:17:19 Sorting 7 ads ...
05/15/12 22:17:19 Getting startd private ads ...
05/15/12 22:17:19 Got ads: 7 public and 2 private
05/15/12 22:17:19 Public ads include 1 submitter, 2 startd
05/15/12 22:17:19 Phase 2: Performing accounting ...
05/15/12 22:17:19 Phase 3: Sorti!
ng submitter ads by priority ...
05/15/12 22:17:19 Phase 4.1: Negotiating with schedds ...
05/15/12 22:17:19 Negotiating with condor@xxxxxxxxxx at <192.168.1.105:55934>
05/15/12 22:17:19 0 seconds so far
05/15/12 22:17:19 Request 00115.00000:
05/15/12 22:17:19 Matched 115.0 condor@xxxxxxxxxx <192.168.1.105:55934> preempting none <192.168.1.106:45394> xuwei.shanda.com
05/15/12 22:17:19 Successfully matched with xuwei.shanda.com
05/15/12 22:17:19 Request 00116.00000:
05/15/12 22:17:19 Matched 116.0 condor@xxxxxxxxxx <192.168.1.105:55934> preempting none <192.168.1.105:49373> yang.shanda.com
05/15/12 22:17:19 &nb!
sp; Successfully matched with yang.shanda.com
05/15/12 22:17:
19 Request 00118.00000:
05/15/12 22:17:19 Rejected 118.0 condor@xxxxxxxxxx <192.168.1.105:55934>: no match found
05/15/12 22:17:19 Request 00108.00000:
05/15/12 22:17:19 Rejected 108.0 condor@xxxxxxxxxx <192.168.1.105:55934>: no match found
05/15/12 22:17:19 Got NO_MORE_JOBS; done negotiating
05/15/12 22:17:19 negotiateWithGroup resources used scheddAds length 1
05/15/12 22:17:19 ---------- Finished Negotiation Cycle ----------
05/15/12 22:18:17 Got SIGTERM. Performing graceful shutdown.
05/15/12 22:18:17 **** condor_negotiator (condor_NEGOTIATOR) pid 7249 EXITING WITH STATUS 0
I'm a fresh to condor.
I'll appreciate if you give some answers and advises.
Thank you with your help.
Yang