Iâve made a ticket in our bug-tracking system:
I expect to have a fix in place for our next release.
- Jaime
On Oct 19, 2016, at 3:56 AM, Antonio Dorta < adorta@xxxxxx> wrote:
Hi!
Thanks
for replying!
I
wouldn't say it happens so rarely in our case. I've checked in our logs of about 1 year in a ~200-machine pool and there were 9038 lines like: "Starter pid XXXXX died on signal 11 (signal 11 (Segmentation fault))".
Thanks
once again for your help.
Best
regards,
Quoting
Jaime Frey <jfrey@xxxxxxxxxxx>:
On Oct 18, 2016, at 3:51 AM, Antonio Dorta <adorta@xxxxxx<mailto:adorta@xxxxxx>> wrote:
Yeah, you're right, there is pretty much information in StarterLog.slotX...
It seems it was trying to communicate to another machine and then it died...
This looks like some sloppy cleanup code when the starter exits in certain error conditions. It lost its network connection the shadow daemon on the submit machine during file transfer. It was then told to exit, which precipitated jumbled cleanup code that
included trying to tell the shadow it was exiting (over a dead connection). I havenât trace the exact cause of the crash, but the code involved has several problems that need to be fixed.
The startd should recover from the crash without any assistance. If these crashes are only happening rarely, then I wouldnât worry about them. We will work on a fix, though.
Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project
--
Antonio
Dorta
Servicios
InformÃticos EspecÃficos (SIE)
InvestigaciÃn
y EnseÃanza
Instituto
de AstrofÃsica de Canarias (IAC)
C/
VÃa LÃctea, s/n. 38205 - La Laguna, Santa Cruz de Tenerife
Despacho:
1124. Tfno: 922 60 5278. email: adorta@xxxxxx
Supercomputing
at IAC: http://www.iac.es/sieinvens/SINFIN/Main/supercomputing.php
----------------------------------------------------------------
ADVERTENCIA:
Sobre la privacidad y cumplimiento de la Ley de Proteccion de Datos, acceda a http://www.iac.es/disclaimer.php
WARNING:
For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en
_______________________________________________
HTCondor-users
mailing list
To
unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with
a
subject:
Unsubscribe
You
can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The
archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
|