[Condor-users] Daemon performance

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

stream_output=true

When the scheduler server gets busy jobs seem to die and get placed back into the queue because they cant keep up with the file transfer and I/O (i think). Is there a way to figure this out?

In the Schedd log I see for a particular job,

...cur_host=1, status=2

Shadow pid 23323 for job 145.3 exited with status 107

Match record (slot1@xxxxxxx 145.3) for group user deleted

Deleting Shadow rec for PID 23323, job (145.3)

Maked job as IDLE

Now on the shadowlog I see this around the exact same time,

condor_read(): socket closed when trying to read 5 bytes from startd slot1@xxxxxxx

IO: EOF reading packet header

Can no longer talk to condor_starter

FileLock::obtain(1) ... now WRITE

FileLock::obtain(2) ... now UNLOCKED

Trying to reconnect...

Trying to reconnnect disconnected job

Any thoughts or ideas why the deamons would be behaving like this? Are there any tuning parameters I can use for a more optimal performance?

--
--- Get your facts first, then you can distort them as you please.--

Mailing List Archives

Authenticated access

[Condor-users] Daemon performance