[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problem with MPI universe job



The device on the head node is a raid 5 array with SCSI ultra160
connection to the head node. When the job finishes, Condor has to copy
back on the head node several (~100) files of the size of many MB, and
this happens for 8 nodes at about the same time, so it is reasonable
that the device cannot accommodate the whole load, and some
connections might be timing out. Is it possible to extend this
time-out period? Say, 5 minutes instead of 30 seconds or so?

Thanks,
Pasquale

On 4/4/07, Dan Bradley <dan@xxxxxxxxxxxx> wrote:


Pasquale Tricarico wrote:
> 4/4 02:38:55 condor_write(): timed out writing 65536 bytes to <10.7.7.250:34338>
>

It is timing out after 30 seconds while trying to copy back 65536 bytes
of an output file.  Are your output files being written to a very slow
device?  Or do you have a lot of jobs all writing to this same device at
the same time?

--Dan

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR