On 10/20/2020 9:47 AM, Stefano Dal Pra wrote:
Hello, condor 8.8.9 speaking
I noticed recently that there are done jobs which seem to disappear from the point of view of condor_history,
also leaving no history log file under PER_JOB_HISTORY_DIR.
Jobs do not enter into the history file(s) when they are completed, they enter the history file(s) when they leave the schedd database.
If you can see the job with condor_q, you will not see it with condor_history. And vice versa.
By default jobs are removed from the schedd whenever they enter the completed state (JobStatus==4) or removed state (JobStatus==3).Â
However this can be customized via the the "leave_in_queue" statement in the job submit file. See the condor_submit man page for details.
Looks like at your site something is setting leave_in_queue as follows, which means the job will stay in the schedd for 10 days in completed state,
and then after 10 days it will be written into the history file(s):
 LeaveJobInQueue = JobStatus == 4 && (CompletionDate =?= undefined || CompletionDate == 0 || ((time() - CompletionDate) < 864000))
Hope the above helps,
Todd
One example. This job completed apparently with no errors after running for ~ 26K seconds:
[root@sn-01 ~]# condor_q -name sn-01 9865068.0 -af:jln LastJobStatus JobStatus AcctGroup LastRemoteHost CpusProvisioned CumulativeRemoteUserCpu RemoteWallClockTime ExitBySignal ExitCode ExitStatus 'abstime(JobStartDate)' 'abstime(JobCurrentStartTransferOutputDate)' NumJobStarts NumJobCompletions ResidentSetSize_RAW 'abstime(x509UserProxyExpiration)' 'abstime(CompletionDate)'
ID = 9865068.0
ÂLastJobStatus = 2
ÂJobStatus = 4
ÂAcctGroup = virgo
ÂLastRemoteHost = slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
ÂCpusProvisioned = 2
ÂCumulativeRemoteUserCpu = 9302.0
ÂRemoteWallClockTime = 26025.0
ÂExitBySignal = false
ÂExitCode = 0
ÂExitStatus = 0
Âabstime(JobStartDate) = absTime("2020-10-17T01:58:58+02:00")
Âabstime(JobCurrentStartTransferOutputDate) = absTime("2020-10-17T09:12:42+02:00")
ÂNumJobStarts = 1
ÂNumJobCompletions = 1
ÂResidentSetSize_RAW = 4461780
Âabstime(x509UserProxyExpiration) = absTime("2020-10-17T12:11:11+02:00")
Âabstime(CompletionDate) = absTime("2020-10-17T09:12:43+02:00")
However:
[root@sn-01 ~]# condor_history -lim 1 -name sn-01 9865068.0
ÂIDÂÂÂÂ OWNERÂÂÂÂÂÂÂÂÂ SUBMITTEDÂÂ RUN_TIMEÂÂÂÂ ST COMPLETED CMD
Finally,
I assume an history job log file existing under $(PER_JOB_HISTORY_DIR).
Several files are there, but there is none (and other alike).
Any idea?
Thanks
Stefano
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
-- Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison Center for High Throughput Computing Department of Computer Sciences HTCondor Technical Lead 1210 W. Dayton St. Rm #4257 Phone: (608) 263-7132 Madison, WI 53706-1685