Re: [HTCondor-users] condor_schedd fails after some time

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

My quick inspection of the code didnât turn up any obvious ways to trigger the double-entry problem.

This is happening while the condor_schedd is attempting to reconnect to running parallel jobs after a restart. Are you seeing this happen more than once?

- Jaime

On Oct 17, 2021, at 2:18 PM, Dmitry A. Golubkov via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:

Dear all,

I have the problem with my cluster, condor_schedd fails after some time with the error in the log:

2021-10-17T13:52:30.814107888Z condor_schedd[12521]: DedicatedScheduler creating Allocations for reconnected job (6.0)

2021-10-17T13:52:30.896151617Z condor_schedd[12521]: DedicatedScheduler creating Allocations for reconnected job (6.53)

2021-10-17T13:52:30.896566762Z condor_schedd[12521]: ERROR "Assertion ERROR on (allocations->insert( cluster, alloc ) == 0)" at line 2929 in file /var/lib/condor/execute/slot1/dir_26614/userdir/.tmpdakAr8/condor-8.9.11/src/condor_schedd.V6/dedicated_scheduler.cpp

2021-10-17T13:52:30.898919572Z condor_schedd[12521]: Cron: Killing all jobs

2021-10-17T13:52:30.898943994Z condor_schedd[12521]: CronJobList: Deleting all jobs

2021-10-17T13:52:30.975443327Z condor_schedd[12521]: Cron: Killing all jobs

2021-10-17T13:52:30.975483659Z condor_schedd[12521]: CronJobList: Deleting all jobs

2021-10-17T13:52:30.975494422Z condor_master[1048]: DefaultReaper unexpectedly called on pid 12521, status 1024.

2021-10-17T13:52:30.975498252Z condor_master[1048]: The SCHEDD (pid 12521) exited with status 4

Any ideas of the problem's reason?

Dmitry A. Golubkov
DATADVANCE
Mob. +7 910 4400124
dmitry.golubkov@xxxxxxxxxxxxxx

This message may contain confidential information
constituting a trade secret of DATADVANCE. Any distribution,
use or copying of the information contained in this
message is ineligible except under the internal
regulations of DATADVANCE and may entail liability in
accordance with the current legislation of the Russian
Federation. If you have received this message by mistake
please immediately inform me of it. Thank you!

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Mailing List Archives

Authenticated access

Re: [HTCondor-users] condor_schedd fails after some time