Dear all, More details regarding the problem with HTCondor is the fact that the jobs are killed with signal 9: failure: "LRMS error: (-1) ExitReason: died on signal 9 (Killed)." You can find attached the ShadowLog and StarterLog content of a job. Regards, Mihai > > Dear all, > > I have a cluster dedicated to ATLAS experiment. It's a ARC-CE configured > with HTCondor+Docker. > It configured to run single core jobs and multi-core jobs. > For couple of days I see that for nost of the single core jobs I got this > error message: > > The worker was cancelled while the job was starting : Condor HoldReason: > Unspecified gridmanager error ; Worker canceled by harvester due to held > too long or not found > > Have any one any idea? > > Thanks in advance, > Mihai > > _______________________________________________ > HTCondor-users mailing list > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with > a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users > > The archives can be found at: > https://lists.cs.wisc.edu/archive/htcondor-users/ > Dr. Mihai Ciubancan IT Department National Institute of Physics and Nuclear Engineering "Horia Hulubei" Str. Reactorului no. 30, P.O. BOX MG-6 077125, Magurele - Bucharest, Romania http://www.ifin.ro Work: +40214042360 Mobile: +40761345687 Fax: +40214042395
Attachment:
condor-ShadowLog-job
Description: Binary data
Attachment:
StarterLog.slot3_1-job
Description: Binary data