Hi Christoph.
Are you saying that there is nothing in the StartLog or StarterLog* files on bird664.desy.de
for these failures?
If there is nothing in those files, perhaps there is something
in the SharedPortLog?
-tj
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Beyer, Christoph <christoph.beyer@xxxxxxx>
Sent: Thursday, December 23, 2021 4:48 AM To: htcondor-users <htcondor-users@xxxxxxxxxxx> Subject: [HTCondor-users] Shadow pid <> for job <> exited with status 108 Hi,
I see a lot of jobs starting up to a hundred shadows before running successfully IMHO the worker denies to start the job maybe due to conditions not met that were previously considered fullfilled (?) The job leaves no trace at all on the workernode, hence it must be a very early thing happening once the claim on the workernode is activated ? /var/log/condor/SchedLog:12/22/21 22:57:38 (pid:3577) Starting add_shadow_birthdate(28709962.0) /var/log/condor/SchedLog:12/22/21 22:57:38 (pid:3577) Started shadow for job 28709962.0 on slot2@xxxxxxxxxxxxxxx <131.169.163.103:33302?addrs=131.169.163.103-33302&alias=bird664.desy.de> for BIRD_cms.lite.uid, (shadow pid = 1596023) /var/log/condor/SchedLog:12/22/21 22:57:38 (pid:3577) Shadow pid 1596023 for job 28709962.0 exited with status 108 /var/log/condor/SchedLog:12/22/21 22:57:38 (pid:3577) Match record (slot2@xxxxxxxxxxxxxxx <131.169.163.103:33302?addrs=131.169.163.103-33302&alias=bird664.desy.de> for BIRD_cms.lite.uid, 28709962.0) deleted /var/log/condor/SchedLog:12/22/21 22:57:38 (pid:3577) match (slot2@xxxxxxxxxxxxxxxxx <131.169.160.194:33133?addrs=131.169.160.194-33133&alias=batch1188.desy.de> for BIRD_cms.lite.uid) switching to job 28709962.0 /var/log/condor/SchedLog:12/22/21 22:57:38 (pid:3577) Starting add_shadow_birthdate(28709962.0) /var/log/condor/SchedLog:12/22/21 22:57:38 (pid:3577) Started shadow for job 28709962.0 on slot2@xxxxxxxxxxxxxxxxx <131.169.160.194:33133?addrs=131.169.160.194-33133&alias=batch1188.desy.de> for BIRD_cms.lite.uid, (shadow pid = 1596024) /var/log/condor/SchedLog:12/22/21 22:57:38 (pid:3577) Shadow pid 1596024 for job 28709962.0 exited with status 108 /var/log/condor/SchedLog:12/22/21 22:57:38 (pid:3577) Match record (slot2@xxxxxxxxxxxxxxxxx <131.169.160.194:33133?addrs=131.169.160.194-33133&alias=batch1188.desy.de> for BIRD_cms.lite.uid, 28709962.0) deleted I would loveto get this down to a more reasonable number as it is irritating and clogging the log files ... Any hints ? Best Christoph -- Christoph Beyer DESY Hamburg IT-Department Notkestr. 85 Building 02b, Room 009 22607 Hamburg phone:+49-(0)40-8998-2317 mail: christoph.beyer@xxxxxxx _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/ |