Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Completed job with no history file
- Date: Tue, 20 Oct 2020 15:14:33 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Completed job with no history file
On 10/20/2020 9:47 AM, Stefano Dal Pra
wrote:
Hello,
condor 8.8.9 speaking
I noticed recently that there are done jobs which seem to
disappear from the point of view of condor_history,
also leaving no history log file under PER_JOB_HISTORY_DIR.
Jobs do not enter into the history file(s) when they are completed,
they enter the history file(s) when they leave the schedd database.
If you can see the job with condor_q, you will not see it with
condor_history. And vice versa.
By default jobs are removed from the schedd whenever they enter the
completed state (JobStatus==4) or removed state (JobStatus==3).
However this can be customized via the the "leave_in_queue"
statement in the job submit file. See the condor_submit man page
for details.
Looks like at your site something is setting leave_in_queue as
follows, which means the job will stay in the schedd for 10 days in
completed state,
and then after 10 days it will be written into the history file(s):
LeaveJobInQueue = JobStatus == 4 && (CompletionDate =?=
undefined || CompletionDate == 0 || ((time() - CompletionDate)
< 864000))
Hope the above helps,
Todd
One example. This job completed apparently with no errors after
running for ~ 26K seconds:
[root@sn-01 ~]# condor_q -name sn-01 9865068.0 -af:jln
LastJobStatus JobStatus AcctGroup LastRemoteHost CpusProvisioned
CumulativeRemoteUserCpu RemoteWallClockTime ExitBySignal ExitCode
ExitStatus 'abstime(JobStartDate)'
'abstime(JobCurrentStartTransferOutputDate)' NumJobStarts
NumJobCompletions ResidentSetSize_RAW
'abstime(x509UserProxyExpiration)' 'abstime(CompletionDate)'
ID = 9865068.0
LastJobStatus = 2
JobStatus = 4
AcctGroup = virgo
LastRemoteHost = slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
CpusProvisioned = 2
CumulativeRemoteUserCpu = 9302.0
RemoteWallClockTime = 26025.0
ExitBySignal = false
ExitCode = 0
ExitStatus = 0
abstime(JobStartDate) = absTime("2020-10-17T01:58:58+02:00")
abstime(JobCurrentStartTransferOutputDate) =
absTime("2020-10-17T09:12:42+02:00")
NumJobStarts = 1
NumJobCompletions = 1
ResidentSetSize_RAW = 4461780
abstime(x509UserProxyExpiration) =
absTime("2020-10-17T12:11:11+02:00")
abstime(CompletionDate) = absTime("2020-10-17T09:12:43+02:00")
However:
[root@sn-01 ~]# condor_history -lim 1 -name sn-01 9865068.0
ID OWNER SUBMITTED RUN_TIME ST COMPLETED CMD
Finally,
I assume an history job log file existing under
$(PER_JOB_HISTORY_DIR).
Several files are there, but there is none (and other alike).
Any idea?
Thanks
Stefano
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing Department of Computer Sciences
HTCondor Technical Lead 1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132 Madison, WI 53706-1685