Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] question about accounting groups
- Date: Thu, 29 Apr 2021 14:16:45 +0200
- From: Stefano Dal Pra <stefano.dalpra@xxxxxxxxxxxx>
- Subject: Re: [HTCondor-users] question about accounting groups
Hi jeff,
On 29/04/21 13:47, Jeff Templon wrote:
Hi Greg, all
On 28 Apr 2021, at 16:09, Greg Thain via HTCondor-users wrote:
Jeff:
The accounting information is stored in the "AccountantNew.log" file,
which is maintained by the condor_negotiator. This file is written
to in a transaction-log style, where all changes to the state are
appended to the file, and periodically the file is rewritten with the
current state.
-greg
Thanks Greg. It seems like HTCondor has a different idea about what
âaccountingâ is than I do, and thatâs whatâs throwing me off track.Â
What I see in the AccountantNew.log file is lots of stuff about the
current state of slots, and lots of stuff about the current and
historical state of aggregate usage for each user. Is this a correct
assessment?
The thing I was looking for when I say âaccountingâ is something like
this:
2020-10-22 16:47:22 some-job-unique-id user=templon group=pdp
cput=07:22:01 wall=07:24:44 ncores=2 physmem=2700 vmem=4321 exstat=0
[ â ]
This is something similar to the above:
[root@ce06-htc 2021-4]# condor_q -jobads history.2411281.0 -af:j Owner
AcctGroup 'Interval(RemoteSysCpu+RemoteUserCpu)'
'Interval(RemoteWallClockTime)'Â 'OriginalCpus ?: (CpusProvisioned ?:
RequestCpus)' 'ResidentSetSize_RAW' 'ImageSize_RAW' exitstatus
2411281.0 atlasprd011 atlas 22:24:46 3:16:05 8 6863744 30293384 0
for this to work:
on each schedd you have defined PER_JOB_HISTORY_DIR to an existing directory
in that directory you'll find one file per finished job (such as
history.2411281.0)
Alternatively, on each schedd you run condor_history, almost the same way:
[root@ce06-htc 2021-4]# condor_history -lim 3 -af:j Owner AcctGroup
'Interval(RemoteSysCpu+RemoteUserCpu)' 'Interval(RemoteWallClockTime)'Â
'OriginalCpus ?: (CpusProvisioned ?: RequestCpus)' 'ResidentSetSize_RAW'
'ImageSize_RAW' exitstatus
2477965.0 pilatlas030 atlas 13:16 1:07:58 1 812544 4117724 0
2468261.0 belleprd belle 12:39:23 13:08:26 1 1586904 4231892 0
2475954.0 pillhcb031 lhcb 8:41:22 8:43:30 1 1461276 5068740 0
Two points:
1) condor_history just "remembers things" until possible: when space is
freed the "memory" of condor_history disappears.
Since we want to keep log history file for further check if needed, we
configured PER_JOB_HISTORY_DIR and took care to
store old history files somewhere outside of the local disk
2) we have local jobs submitted by users using: condor_submit [...]
-spool myjob.sub
In that case finished jobs remain in the schedd queue 864000 seconds
after they are done (this is to let the user retrieve its outputsandbox)
and they are not seen by condor_history (you can see them with condor_q).
The history log file for those jobs is created 10 days after they are done.
Stefano
One line for each job that has run on the system. So if I want to
know how much âtemplonâ has run over the past month, I can select all
the records for the past month and add the wall*ncores. What do
HTCondor folk call this (not accounting I guess) and where is it
stored and how is it accessed?
Thanks,
JT
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/