Hi, Sometimes it happens that the Scheduler does not start anymore. It happens on different machines and different version of htcondor (we used 8.0 or 8.6). In the SchedLog I get and error by corrupt log. I can fix it by deleting job-queue.log in the spool folder. Now I tested with htcondor 8.8.0 and got the same error. 02/19/19 09:09:24 (pid:6712) ****************************************************** 02/19/19 09:09:24 (pid:6712) ** condor_schedd.exe (CONDOR_SCHEDD) STARTING UP 02/19/19 09:09:24 (pid:6712) ** C:\condor\bin\condor_schedd.exe 02/19/19 09:09:24 (pid:6712) ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1) 02/19/19 09:09:24 (pid:6712) ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON 02/19/19 09:09:24 (pid:6712) ** $CondorVersion: 8.8.0 Jan 03 2019 BuildID: 457757 $ 02/19/19 09:09:24 (pid:6712) ** $CondorPlatform: x86_64_Windows10 $ 02/19/19 09:09:24 (pid:6712) ** PID = 6712 02/19/19 09:09:24 (pid:6712) ** Log last touched 2/19 08:35:06 02/19/19 09:09:24 (pid:6712) ****************************************************** 02/19/19 09:09:24 (pid:6712) Using config source: C:\condor\condor_config 02/19/19 09:09:24 (pid:6712) Using local config sources: 02/19/19 09:09:24 (pid:6712) C:\condor/condor_config.local 02/19/19 09:09:24 (pid:6712) config Macros = 187, Sorted = 187, StringBytes = 5237, TablesBytes = 6780 02/19/19 09:09:24 (pid:6712) CLASSAD_CACHING is ENABLED 02/19/19 09:09:24 (pid:6712) Daemon Log is logging: D_ALWAYS D_ERROR 02/19/19 09:09:24 (pid:6712) DaemonCore: non-shared command socket at <192.168.0.27:1134> 02/19/19 09:09:24 (pid:6712) Daemoncore: Listening at <0.0.0.0:1134> on TCP (ReliSock) and UDP (SafeSock). 02/19/19 09:09:24 (pid:6712) DaemonCore: command socket at <192.168.56.1:9618?addrs=192.168.56.1-9618&noUDP&sock=6712_dfde> 02/19/19 09:09:24 (pid:6712) DaemonCore: private command socket at <192.168.56.1:9618?addrs=192.168.56.1-9618&noUDP&sock=6712_dfde> 02/19/19 09:09:24 (pid:6712) History file rotation is enabled. 02/19/19 09:09:24 (pid:6712) Maximum history file size is: 20971520 bytes 02/19/19 09:09:24 (pid:6712) Number of rotated history files is: 2 02/19/19 09:09:24 (pid:6712) NOTE: QUEUE_ALL_USERS_TRUSTED=TRUE - all queue access checks disabled! 02/19/19 09:09:24 (pid:6712) WARNING: Encountered corrupt log record 211 (byte offset 8374) 02/19/19 09:09:24 (pid:6712) 999 02/19/19 09:09:24 (pid:6712) Lines following corrupt log record 211 (up to 3): 02/19/19 09:09:24 (pid:6712) 103 7.0 MachineAttrSlotWeight0 2 02/19/19 09:09:24 (pid:6712) 103 7.0 StartdPrincipal "execute-side@matchsession/192.168.0.27" 02/19/19 09:09:24 (pid:6712) 103 7.0 ShadowBday 1550250864 02/19/19 09:09:24 (pid:6712) ERROR "Error: corrupt log record 211 (byte offset 8374) occurred inside closed transaction, recovery failed" at line 1114 in file C:\condor\execute\dir_6124\sources\src\condor_utils\classad_log.cpp 02/19/19 09:09:24 (pid:6712) Cron: Killing all jobs 02/19/19 09:09:24 (pid:6712) CronJobList: Deleting all jobs 02/19/19 09:09:24 (pid:6712) Cron: Killing all jobs 02/19/19 09:09:24 (pid:6712) CronJobList: Deleting all jobs What could be the reason for this? Is there any idea to avoid this error? Best regards, Werner |