Subject: [Condor-users] Execution machines stop the job…
Hi,
We're using Condor to execute jobs which take a lot of time on 15 macintosh G5. After few hours, all the execution machines stop the job, a communication error occurs between the condor_starter and the condor_master (macintosh Xserve):
Cluster01 crashdump: Unable to determine CPSProcessSerNum pid: 11913 name: condor_starter
and in the Shadow log, we have:
ERROR "Can no longer talk to condor_starter on execute machine (192.168.1.23)" at line 63 in file NTreceivers.C
Problem exists with condor6.6.6 and condor6.6.7…
Thank you for your help
Damien
Damien AUTRET:
Unité INSERM 601
Département de Recherche en ImmunoCancérologie
Equipe 6 Biophysique-Cancérologie
9 Quai Moncousu
44093 Nantes Cedex
Tél: 02.40.41.28.21
Fax: 02.40.35.66.97
Sec: 02.40.08.47.47