Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] understanding condor history file
- Date: Mon, 25 Oct 2010 09:45:38 -0500
- From: Daniel Forrest <dan.forrest@xxxxxxxxxxxxx>
- Subject: Re: [Condor-users] understanding condor history file
Santanu,
> Has any one got any answer for me please? I really appreciate some help.
>
> Cheers,
> Santanu
>
>
> On 22/10/10 16:35, Santanu Das wrote:
> >Hi there,
> >
> >Recently we started seeing some mismatch in our accounting data and
> >when I looked in to the history file, I found the number of fields are
> >duplicated. Can any one please explain the meaning of those values
> >please? My concerns are especially with RemoteWallClockTime,
> >CompletionDate and JobStatus but I'd like to know rest of the things
> >as well.
This is my understanding of the duplicate entries.
When a job is added to the job queue (i.e. in the job_queue.log file)
the parameters shared by all jobs in the cluster are identified by
entries of the form:
103 0<cluster>.-1 <parameter> <value>
(I'm not sure why there is a leading zero, but there is.)
Parameters unique to each process within the cluster are then added:
103 <cluster>.<process> <parameter> <value>
Parameters that are deleted are marked like this:
104 <cluster>.<process> <parameter>
What you see in the history file is first the set of cluster-wide
parameters (i.e. the "0<cluster>.-1" values), followed by the final
set of process specific parameters (i.e. the "<cluster>.<process>"
values that have not been deleted).
For example, all jobs are queued with "CommittedTime = 0" for all
processes in the cluster. When a job finishes it gets its own value
for "CommittedTime".
So when you see a duplicate parameter in the history file, you should
ignore the previous value because all it means is that there is a
process specific value that differs from the cluster-wide one.
So you don't want to "sort" a history file entry, order is important!
Use something like this instead:
gawk '{ H[$1]=$0 } END { N=asort(H); for (I=1; I<=N; ++I) print H[I] }'
To apply this to the entire history file you would use something like:
gawk '{ H[$1]=$0 } /^\*\*\*/ { N=asort(H); for (I=1; I<=N; ++I) print H[I]; delete H }'
Note that this reverses the position of the "***" markers, but that
actually looks nicer when checking the file by hand.
I hope this helps.
--
Dan