[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor_read(): Socket closed when trying to read 5 bytes in StartLog



Hi,

   When I submited a Job it Matched with all Machine in the Pool.
Negotiator sent a Matched with a particular Machine X. But the X machine
Start Log Shows like this and the Jobs Keeps on Idle.

1)Why the condor is not finding another suitable machine in the pool
because the Job is not started in X machine. 

2) why it keep on trying the same X machine to submit that Job.

3) condor_read(): Socket closed is what kind of error.

-------------Start Log ---------

10/17 13:09:57 Remote global job ID is
scorpio.pesgrid.wipro.com#1224227353#161.0
10/17 13:09:57 JobLeaseDuration not defined: using 1800 (alive_interval
[300] * max_missed [6]
10/17 13:09:57 About to Create_Process "condor_starter -f
scorpio.pesgrid.wipro.com"
10/17 13:09:57 Create_Process: using fast clone() to create child
process.
10/17 13:09:57 Got RemoteUser (idealgrid@xxxxxxxxxxxxxxxxx) from request
classad
10/17 13:09:57 Got universe "VM" (13) from request classad
10/17 13:09:57 State change: claim-activation protocol successful
10/17 13:09:57 Changing activity: Idle -> Busy
10/17 13:09:57 condor_read(): Socket closed when trying to read 5 bytes
from <127.0.0.1:43202>
10/17 13:09:57 IO: EOF reading packet header
10/17 13:09:57 Closing job ClassAd update socket from starter.
10/17 13:09:57 DaemonCore: No more children processes to reap.
10/17 13:09:57 Starter pid 31556 exited with status 1
10/17 13:09:57 State change: starter exited
10/17 13:09:57 Changing activity: Busy -> Idle
10/17 13:09:57 Got activate_claim request from shadow
(<10.201.42.242:9603>)
10/17 13:09:57 Read request ad and starter from shadow.
10/17 13:09:57 Swap space: 1052124
10/17 13:09:57 28786748 kbytes available for "/vm/local.grid7/execute"
10/17 13:09:57 Looking up RESERVED_DISK parameter
10/17 13:09:57 Reserving 5120 kbytes for file system
10/17 13:09:57 Total execute space: 28781628
10/17 13:09:57 Remote job ID is 161.0
10/17 13:09:57 Remote global job ID is
scorpio.pesgrid.wipro.com#1224227353#161.0
10/17 13:09:57 JobLeaseDuration not defined: using 1800 (alive_interval
[300] * max_missed [6]
10/17 13:09:57 About to Create_Process "condor_starter -f
scorpio.pesgrid.wipro.com"
10/17 13:09:57 Create_Process: using fast clone() to create child
process.
10/17 13:09:57 Got RemoteUser (idealgrid@xxxxxxxxxxxxxxxxx) from request
classad
10/17 13:09:57 Got universe "VM" (13) from request classad
10/17 13:09:57 State change: claim-activation protocol successful
10/17 13:09:57 Changing activity: Idle -> Busy
10/17 13:09:57 condor_read(): Socket closed when trying to read 5 bytes
from <127.0.0.1:34016>
10/17 13:09:57 IO: EOF reading packet header
10/17 13:09:57 Closing job ClassAd update socket from starter.
10/17 13:09:57 DaemonCore: No more children processes to reap.
10/17 13:09:57 Starter pid 31557 exited with status 1
10/17 13:09:57 State change: starter exited
10/17 13:09:57 Changing activity: Busy -> Idle


by
Johnson


Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. 

www.wipro.com