Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Jobs stuck in condor_starter
- Date: Tue, 19 Aug 2008 13:55:47 +0200
- From: Steffen Grunewald <steffen.grunewald@xxxxxxxxxx>
- Subject: [Condor-users] Jobs stuck in condor_starter
I've found a job cluster that won't run. Jobs are matched against a slot,
output and error files are created, but condor_starter never transfers
control to the real Executable (which is a Perl script).
In the slot's StarterLog, there are those messages every hour:
8/19 13:40:48 ERROR "Assertion ERROR on (result)" at line 384 in file NTsenders.C
8/19 13:40:48 condor_write(): Socket closed when trying to write 168 bytes to <10.100.200.93:60802>, fd is 5
8/19 13:40:48 Buf::write(): condor_write() failed
8/19 13:40:48 ERROR "Assertion ERROR on (result)" at line 875 in file NTsenders.C
A by-product is that apparently there are more jobs in R state than slots
available (809 free slots, 814 R jobs)
How to interpret the assert() error?
Condor version 7.0.4
Regards,
Steffen
--
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html